Next JS Robots.txt

When developing a site with Next.js, it’s important to consider how your robots.txt file will be generated. Just like the sitemap and the metadata tags in the website, robots.txt file is an important factor when aiming for performant Next JS SEO optimization metrics, and NextJs offers some great tooling to facilitate the process of building one.

SEO Consultant

Free Weekly SEO Newsletter

Hey! I’m Scott, I’ve been an SEO consultant for nearly 10 years. Get on my weekly newsletter to get..

  • Weekly SEO updates, keep up with the latest and greatest.
  • How we grow websites from zero to millions of visitors
  • A bunch of SEO tips and Tricks you won’t find anywhere else test

TL;DR: Quick Links

What is the robots.txt file for?

When setting up NextJS SEO optimization to build your robots file, it’s important to understand what the robots file actually is and what it’s used for. In a nutshell, similar to metadata tags the robots.txt file is used to tell search engine crawlers which pages they can access on your site.

Many developers and SEO guys think this means you’re essentially telling the search engine it can’t index certain pages, but that’s not necessarily the case. To quote the Google documentation:

“[robots.txt] is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.”

Essentially, google can still see your page and index it, they just won’t crawl and retrieve data such as the meta description. For this reason, robots.txt is used mainly to limit crawler traffic on your site, in order to avoid potentially overloading your site with page requests.

What does the robots.txt look like?

The robots.txt file uses a standard known as the robots exclusion protocol, and the syntax is fairly simple to follow. A basic robots.txt file will look something like this:

User-agent: Googlebot

Disallow: /nocrawl/

User-agent: *

Allow: /

Sitemap: http://www.yoursite.com/sitemap.xml

The robots.txt file is broken up into rules. This particular example reflects to rules:

  • Rule 1: Disallow Googlebot from crawling anything within the /nocrawl directory.
  • Rule 2: All other user agents can crawl the entire site.

It’s important to note that if rule 2 were omitted, the result would be the same. The behavior of the robots standard defaults to allowing all user agents to crawl the site in question.    The options for rules are fairly extensive. If you’re interested in taking a closer look, head on over to the Google docs for a more extensive guide.

Where does robots.txt go in a NextJs project?

For any search engine to be able to access the robots file, it must be located at the root directory of the site. NextJs projects have a /public directory that is used to serve out static files from your site, and this directory represents the root of your project.

This means that you should always put the robots.txt file in the /public directory of your NextJs project. This will result in your robots file being served out in the proper location at https://mysite.com/robots.txt

It can be tedious remembering to include the robots file on each build, which is why when doing SEO with NextJS many opt for using the next-sitemap module to automate the process of generating and configuring your robots file.

Using next-sitemap for your robots.txt file

As the name implies, next-sitemap is a module that handles the creation and maintenance of sitemap files. However, this package also has the ability to generate robots.txt files as well. 

Setting up a next-sitemap is fairly simple. Once your NextJs project is up and running, run the following command to install next-sitemap:

npm i next-sitemap

Now, set up your configuration file to include the generation of a robots.txt file. At the root of your project, create a file called next-sitemap.config.js and add the following code:

const config = {
    siteUrl: 'https://yoursite.com',
    generateRobotsTxt: true, // (Parameter for creating robots.txt file)
  }
 
module.exports = config;

Next you can simply build your project and run the next-sitemap script like so:

npm run build

And..

npm run next-sitemap

And that’s it! You should now have a generated robots.txt file in your public directory.

Configuring Rules in Your robots.txt File With next-sitemap

Creating the robots file was easy with next-sitemap, but that’s not very useful if we can’t add some rules along with it. To create rules within our robots file, we can modify our next-sitemap config file to include the robotsTxtOptions and policies properties.

A next-sitemap file that adds a rule for disallowing all user agents from crawling a specific directory looks like this:

const config = {
    siteUrl: 'https://yoursite.com',
    generateRobotsTxt: true, 
    robotsTxtOptions: {
        policies: [
                   { userAgent: "*", disallow:"/nocrawling"},
          ]
    }
  }
 
module.exports = config;

Just remember, when using this method the policy objects within the array equate to individual rules in the robots file. Pretty simple!

NextJs robots.txt | Overview and Conclusion

Creating a robots.txt file within your web app is as simple as adding the robots file to the /public directory, adding some rules, and building your project. The key here is understanding what the robots file is for, and how you should configure it for your particular needs.

Utilizing the next-sitemap module can help streamline the process of creating a robots file. The next-sitemap module accommodates the programmatic generation of the robots file, which allows developers the flexibility to automate the process of keeping this file up to date.

Overall, just like many aspects of development, NextJs makes this task simple and straightforward. Whether you opt for the manual or automated method, hopefully you can now feel confident when using NextJs to create your robots file.

Leave a Comment

oh my crawl logo
Digital Architecture
For Search Engines
Contact
Brooklyn, NY 11219

Blog