Brad Woods Digital Garden

Notes / Misc / robots.txt

The Warhammer 40k Adeptus Mechanicus symbol

Table of contents

    A sci-fi octopus-like robot

    robots.txt

    Planted: 

    Tended: 

    Status: decay

    Hits: 216

    A Web Robot (also known as a crawler, spider or search engine bot) is a program that traverses the Web automatically. 1 things they are used by is search engines to download and index web content. A website's robots.txt file communicates with these robots using the Robots Exclusion Protocol. It can be used to inform the robot which areas of the site should and shouldn't be scanned. A robots.txt file need to be located the site's root directory.

    Example

    The robots.txt file below communicates:

    • block OpenAI, Google Bard and Common Crawl bots from crawling,
    • all other bots are allowed to crawl the entire site and
    • the site's sitemap file is located at https://garden.bradwoods.io/sitemap.xml.

    /robots.txt

    User-agent: GPTBot
    Disallow: /
    User-agent: ChatGPT-User
    Disallow: /
    User-agent: Google-Extended
    Disallow: /
    User-agent: CCBot
    Disallow: /
    User-agent: *
    Allow: /
    Sitemap: https://garden.bradwoods.io/sitemap.xml

    Feedback

    Have any feedback about this note or just want to comment on the state of the economy?