Does Google Index Your robots.txt?

This is a contribution by Organic and AI visibility consultant Moosa Hemani.

It has been edited by Tad – the owner of this blog.

What is robots.txt?
Does Google index your robots.txt file?
Does this make sense?
Do you want to remove it?
How to de-index robots.txt?

These are the questions answered in this post!

What is robots.txt and why do you need one?

robots.txt is a protocol that tells search engines to find out which part of a website to include in its index and which not.

It’s also a simple text file as the name already suggests.

So for example you may want to exclude things that are meant only for you as the admin.

According to Wikipedia

“The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable.”

As an SEO, you must have tried this search operator in Google: [site:example.com].

This simply returns the pages from example.com that have been crawled and included in the Google index.

The Google bot does not crawl any pages that are ‘disallowed’ by the robots.txt file. Everything makes sense until now, doesn’t it?

When robots.txt gets indexed itself

Now here comes the issue that inspired this article.

What if your robots.txt file itself started to appear in Google search results?

To be honest I thought somebody is poking fun at me.

It doesn’t sound logical at all.

After reading a tweet by Peter Handley aka @ismepete I took it seriously though:

@mmhemani yeah I am serious… battling with a logic problem – can I block the robots.txt in the robots.txt file?
— Pete Handley (@ismepete) September 27, 2011

He is one of the brightest minds in the search industry!

Shocked, amazed and I guess somewhat a mix of both, I acted!

I quickly jumped over to Google to see it for myself and guess what I found?

You see, Peter is not the only one dealing with this but websites like

Dailymail
Webmasterworld
Last.fm

and many others… all have their robots.txt file indexed in Google.

You see, it’s simply illogical to block ‘robots.txt’ in a robots.txt file.

This didn’t make any sense to me:

Why does Google actually index this file and how to de-index from the search engine?

Why does Google index the robots.txt?

There can be multiple reasons why Google indexes the robots.txt file.

Yet I have figured out two as the most common reasons.

So why do search engines index particular pages and later show them as results for a query?

Who does Google search show your robots.txt even when you don’t want want to be indexed?

Links

Google follows links, you know it, right? From one link to another and the chain continues.

When links are pointing to the robots.txt file.

It can be from external sources (other websites pointing to your robots.txt file).

Yet internal links also count (some page of your website that points to robots.txt file).

Then Google will probably index it.

Social signals

The faster way to get Google’s attention to a page I know is to share it on social platforms.

It checks the likes of X/Twitter and Facebook (Google currently can’t see private Facebook sharing activity).

Did some one share your robots.txt on social sites?

This can be another common reason that makes Google index the file.

Consider Rishi Lakhani who wrote a letter to Google in his website’s robot.txt file.

He shared his creative robots.txt on Twitter and it went viral.

According to Shared Count, Rishi Lakhani’s robots.txt file got:

Facebook Likes: 21
Facebook Comments: 8
Facebook Shares: 33
Twitter: 1232

Now, you know why Google will probably going to index your robots.txt file so let’s talk about action now!

How to keep robots.txt out of Google?

There are two ways to keep robots.txt out of the Google search index.

One is not to get it indexed in the first place.

The other one is to de-index when it already appears.

Don’t link, don’t share

Just don’t link or share it on social media (which is also usually a link).

This is not always in your control!

Especially if you link a specific page on popular websites like the “Webmasterworld” forum or Last.fm.

Theoretically though if you don’t link it and don’t share on social platforms, Google will not show it.

URL removal request

Google allows you to remove certain files from the Google index in Search Console.

That’s the only idea I have found!

It’s simple yet powerful and safe way to get your robots.txt file out of the Google index.

It’s great because the user’s site ownership is verified and shows even the progress for each request.

These are two of the ways I know how to deal with the above mentioned issue of the unintentionally indexed robots.txt file.

What else?

Do you have a better solution for the robots.txt indexing problem?

Please share it with the SEO community!

Ping is on social media!

Moosa is mostly active on LinkedIn these days.

Tad is @onreact almost everywhere on the Web.

Your "Crawled – Currently Not Indexed" Message is not…

How to Use Reddit for Social SEO Without Acting…

In SEO Trust Matters: How to Build it then?

Kenny from South Park with OMG! The killed SEO! and seo2.blog added as text.

Why GEO Can't Replace SEO

How Do "Stay on Google" Searches Impact SEO?

All You Need to Know About Blogging in one…

A white woman holding a black toddler in Haiti. There are peopel in the background. The baby needed a hug it seems.

Down-to-Earth Nonprofit SEO Approach: Advice and Resources

What the Heaven is Paradise SEO? The Garden Eden…

Do You Still Need a Website As a Solopreneur?…