robots.txt

SEO

Also: Robots Exclusion Protocol · Robots Exclusion Standard

What it isA file that tells crawlers what to skip

Watch forBlocking pages you need indexed

Lives atyoursite.com/robots.txt

Not a security toolPolite instruction, not a lock

Quick definition

robots.txt is a plain text file that sits at the root of your website and tells search engine crawlers which pages or sections they should and should not crawl. It follows the Robots Exclusion Protocol, a widely-adopted convention that well-behaved crawlers respect.

How it varies across Australia

Most Australian websites have a robots.txt file in place, but misconfigured ones are common. The most frequent issue is a staging or development environment getting accidentally crawled, or a legitimate section of the site being blocked during a migration that never got reversed.

See technical SEO patterns across Australian sites →

What it actually means

robots.txt is a convention, not a technical barrier. When Googlebot arrives at your site, it checks the file at yoursite.com/robots.txt before it crawls anything. If the file says a section is off-limits, Googlebot skips it. Most well-behaved crawlers follow the same convention.

The file uses simple directives. User-agent specifies which crawler a rule applies to. Disallow specifies a path the crawler should skip. Allow explicitly permits a path within a disallowed section. A sitemap directive can point to your XML sitemap.

Two things robots.txt cannot do: guarantee a page won't be indexed, and keep determined bad actors out. If another site links to a disallowed URL, Google can still index that URL based on the link alone, it just won't crawl the page content. For pages you genuinely need kept out of search results, a noindex meta tag on the page itself is the reliable tool.

robots.txt is most useful for keeping crawl budget on pages that matter: stopping Googlebot from wasting time on internal search results, filter pages, admin paths and duplicate content.

A robots.txt file is a polite notice on the front door. It keeps honest crawlers out of rooms you don't want indexed. It does nothing to stop someone who ignores signs.

How it shows up

A well-configured robots.txt is invisible in your analytics because it's working quietly in the background. It shows up in Google Search Console under Settings, where you can see the file Google has cached and test specific URLs against it. Misconfigured robots.txt shows up as sudden ranking drops, de-indexing warnings in Search Console, or a site that disappears from Google after a relaunch.

Where people get this wrong

Using robots.txt to hide sensitive content.robots.txt is public and listing a disallowed path tells anyone who reads it exactly what you're trying to hide. A noindex tag or authentication is the right tool for genuinely private pages.

Blocking CSS and JavaScript in robots.txt.Googlebot needs to render your pages to understand them. Blocking the scripts and stylesheets that make a page work prevents Google from seeing it the way a user does, which can tank your rankings.

Forgetting to update robots.txt after a site migration.Disallow rules added to block a staging environment or old URL pattern often carry over into the live site and quietly block pages that should be indexed.

Common questions

Does robots.txt stop Google from indexing a page?

Not reliably. Blocking crawling and blocking indexing are different things. Google can still index a disallowed URL if other sites link to it. To keep a page out of search results, put a noindex meta tag on the page itself.

What happens if I don't have a robots.txt file?

Crawlers treat the absence of the file as permission to crawl everything. That's fine for most sites. Where it causes problems is when there are internal search pages, filter URLs or admin paths that add noise to your crawl without contributing to rankings.

How do I check what my robots.txt file says?

Visit yoursite.com/robots.txt directly in a browser. In Google Search Console, go to Settings and you'll see the cached version Google is using along with a tester that lets you check individual URLs.

Can I have different rules for different crawlers?

Yes. Each rule block starts with a User-agent line. You can write rules specific to Googlebot, Bingbot or any other named crawler, and a wildcard rule using User-agent: covers everyone you haven't named explicitly.

Debrief

Get the next one

No spam. No fluff. Just the next article, straight to your inbox.

Keep exploring

About New Rebellion

New Rebellion is a marketing intelligence consultancy. We build tools, score Australian businesses on how their marketing actually performs, and publish Debrief every day. This dictionary is part of how we work in the open.

How we think →