robots.txt

Planted: Dec 2021

Tended: Nov 2023

Status: decay

A Web Robot (also known as a crawler, spider or search engine bot) is a program that traverses the Web automatically. 1 things they are used by is search engines to download and index web content. A website's robots.txt file communicates with these robots using the Robots Exclusion Protocol. It can be used to inform the robot which areas of the site should and shouldn't be scanned. A robots.txt file need to be located the site's root directory.

Example

The robots.txt file below communicates:

▪ block OpenAI, Google Bard and Common Crawl bots from crawling,
▪ all other bots are allowed to crawl the entire site and
▪ the site's sitemap file is located at https://garden.bradwoods.io/sitemap.xml.

/robots.txt

User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /

Sitemap: https://garden.bradwoods.io/sitemap.xml