Robots.txt

Robots.txt is often everyone’s first introduction to get into Technical SEO. It tells Google and other search engines what they are and are not allowed to do on your website.  Its also an important part of any Enterprise SEO Audit, or any audit for that matter.

For example, if you do not want a section of your website to be indexed by Google, meaning you do not want it to show up in the results, you can configure the robots.txt to not allow that section of your website to show up. 

Is there a robots.txt you can just use… and not have to learn how to program that includes our sitemap?

Sure! If just use this one:

User-agent: *

Allow: /

Sitemap: https://www.yourwebiste.com/sitemap.xml

This will make sure all your pages get found and indexed by Google.  

What would you want to To 「Disallow」something from being indexed?

It’s pretty common that you might not want something to be indexed by Google.  For example your crawl budget can easily get used up by pages you never wanted to rank in Search Engines in the first place. 

Here are some use cases:

  • If the crawl bots spend too much time on unwanted pages, it hurts the rankings and traffic of the pages you actually care about. 
  • You are building out a section of your website that is not ready for viewers.
  • The checkout page on your eCommerce store. 
  • If you have a paid online course, you might want to block that entire section from being indexed. After all, you do not want people viewing your paid content for free.
  • Any premium pages, if you have a paid online tool you might want to block some pages. 

Is there a code example you can provide to disallow?

Yup! Right here:

User-agent: *

Allow: /

Disallow: /block-this-folder/

Sitemap: https://www.yourwebiste.com/sitemap.xml

What are the basic commands you’d need to learn if you want to dig deeper?

  • user-agent: identifies which crawler the rules apply to.
  • allow: a URL path that may be crawled.
  • disallow: a URL path that may not be crawled.
  • sitemap: the complete URL of a sitemap.

If you are super techie, we suggest you have a look at the Google Docs. This is also an amazing Guide from ahrefs.

Robots.txt Tool – Which Should You Use?

We like these two:

How Do You Create One for WordPress?

We recommend using Yoast SEO or a similar plugin, they do it automatically and also make it very easy to edit.   

Once its created what should you do? 


You should Submit it to the Search Console.  We also have quite a lot more information on our SEO overview and training page.

Final Thoughts

The robots.txt is a very powerful tool if you know how to use it. When you first start out it might be a little confusing and frankly really isn’t something you should focus on.  If you are new, simply create a simple one and submit it to the Search Console. 

It really only comes into play for more advanced sites. 

Leave a Comment

oh my crawl logo
Digital Architecture
For Search Engines
Contact
Brooklyn, NY 11219

Blog