robots.txt
SEOAlso: Robots Exclusion Protocol · Robots Exclusion Standard
Quick definition
robots.txt is a plain text file that sits at the root of your website and tells search engine crawlers which pages or sections they should and should not crawl. It follows the Robots Exclusion Protocol, a widely-adopted convention that well-behaved crawlers respect.
How it varies across Australia
Most Australian websites have a robots.txt file in place, but misconfigured ones are common. The most frequent issue is a staging or development environment getting accidentally crawled, or a legitimate section of the site being blocked during a migration that never got reversed.
See technical SEO patterns across Australian sites →What it actually means
robots.txt is a convention, not a technical barrier. When Googlebot arrives at your site, it checks the file at yoursite.com/robots.txt before it crawls anything. If the file says a section is off-limits, Googlebot skips it. Most well-behaved crawlers follow the same convention.
The file uses simple directives. User-agent specifies which crawler a rule applies to. Disallow specifies a path the crawler should skip. Allow explicitly permits a path within a disallowed section. A sitemap directive can point to your XML sitemap.
Two things robots.txt cannot do: guarantee a page won't be indexed, and keep determined bad actors out. If another site links to a disallowed URL, Google can still index that URL based on the link alone, it just won't crawl the page content. For pages you genuinely need kept out of search results, a noindex meta tag on the page itself is the reliable tool.
robots.txt is most useful for keeping crawl budget on pages that matter: stopping Googlebot from wasting time on internal search results, filter pages, admin paths and duplicate content.
A robots.txt file is a polite notice on the front door. It keeps honest crawlers out of rooms you don't want indexed. It does nothing to stop someone who ignores signs.
How it shows up
A well-configured robots.txt is invisible in your analytics because it's working quietly in the background. It shows up in Google Search Console under Settings, where you can see the file Google has cached and test specific URLs against it. Misconfigured robots.txt shows up as sudden ranking drops, de-indexing warnings in Search Console, or a site that disappears from Google after a relaunch.
Where people get this wrong
Related terms
Common questions
Does robots.txt stop Google from indexing a page?
Not reliably. Blocking crawling and blocking indexing are different things. Google can still index a disallowed URL if other sites link to it. To keep a page out of search results, put a noindex meta tag on the page itself.
What happens if I don't have a robots.txt file?
Crawlers treat the absence of the file as permission to crawl everything. That's fine for most sites. Where it causes problems is when there are internal search pages, filter URLs or admin paths that add noise to your crawl without contributing to rankings.
How do I check what my robots.txt file says?
Visit yoursite.com/robots.txt directly in a browser. In Google Search Console, go to Settings and you'll see the cached version Google is using along with a tester that lets you check individual URLs.
Can I have different rules for different crawlers?
Yes. Each rule block starts with a User-agent line. You can write rules specific to Googlebot, Bingbot or any other named crawler, and a wildcard rule using User-agent: covers everyone you haven't named explicitly.
Keep exploring
About New Rebellion
New Rebellion is a marketing intelligence consultancy. We build tools, score Australian businesses on how their marketing actually performs, and publish Debrief every day. This dictionary is part of how we work in the open.
How we think →