Internal Site Search Results: Noindex Block of Googlebot

by Mike Valentine on August 4, 2020

In 2007, Matt Cutts of Google clearly stated that they recommend not allowing Googlebot to crawl and index internal search results. He referenced a line in the Google Webmasters Quality Guidelines which says “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.”

Current practice has evolved to use a “noindex, follow” robots meta instruction on all internal search result pages. This assures that if Google finds external links to your search pages, then exposed links within internal search results pages are crawled but the search result page itself will not get indexed.

How Do Your Search Results Pages Get Indexed?

If the URL pattern of your search pages generates a 200 OK header response and if that URL pattern or query strings are not canonicalized – Google has likely indexed your search result pages. I’ve seen Google probe for a directory index.html or index.php page within the /search/ or /s/ directory. If the resulting page produces a 200 OK header response, they index that page.

Do a search on your own site using your own internal search box and record the URL pattern. It will likely include “/s?” or “/search/” or “/results?”. It usually shows a blank screen with just top header and footer and “No Results” or “Phrase not found” on screen. That is what gets indexed. This is a bad user experience, which is especially problematic when that useless page brings traffic from Google.

Search results can get indexed due to users who externally link their own search results on your site, uninformed web production teams who use site search results links in site navigation, glitches in code sometimes allow unintended links to return 200 OK header responses (aka “Soft 404”) – or any number of other odd reasons. But it is best to stop internal search results from getting indexed, whether by accident or error by using a “noindex, follow” robots meta instruction on all internal search result pages.

Go to Google and use the “site:” query operator to see if your internal search results pages are indexed. At date of publication, Amazon had over 2 million search result pages indexed. They always seem to go their own way, and this doesn’t seem to be hurting their Google rankings, but you can easily view how ineffective those Amazon search result pages are by doing a site:amazon.com/s? query at Google. You’ll see pages and pages of one, two and three word terms with no useful information about the product displayed in the Google search results page snippets.

The Problem with Pagination of Internal Search Results

Another concern is pagination of internal search results pages, which can be extensive. If your team have used search results to link within your own site, or worse – from navigation links or from faceted nav based on search – then you’ll find those search links get very quickly indexed by search engines. Again, Google may truncate search result URLs to the last directory and look for index page or truncate query strings (of course you’ve limited URL patterns in Google Search Console).

You should also check Google Analytics to See if your site gets traffic from Search Results pages by looking for all instances of search traffic referrals coming from search result URL patterns, check for visits from organic search in the “Acquisition” area of Analytics. Filter within the “Landing Page” by searching for your search URL pattern – example: “/s?” or “/search/”.

Google tries to limit your search results in their search results, so paginated pages are already limited by the algorithm once they are recognized and not shown in search results.

Problems with Faceted Navigation

Color or size variations in faceted navigation on category pages are often to blame for indexing of large numbers of search pages. Internal search results pages are often programmed to display the query which the user typed printed into a results page title tag and sometimes displays that query as a on-page H1 tag. Imagine you search for a color combination like “gray green” (product name). The result very helpfully displays products tagged with those colors and shows “Gray Green” in the page title and H1 tag.

Depending on your URL patterns for category pages, sometimes Google search result shows different categories and now that query displays in the breadcrumbs but the title tag is exactly the same – therefore all products with those color combos compete against each other for the two-word phrase “Gray Green” which nobody would intentionally search to find your product name. This produces potentially dozens of pages on a small site or potentially millions of “Gray Green” results on a large enterprise level e-commerce site, where they’ll be indexed by Google with many pages of matching title tags.

Self-competition with paginated results for a non-searched term “Gray Green” guarantees it will be ignored by Google and won’t capture meaningful visitors if it ever delivers any. Someone may search for “Gray Green (your product)” – but that internal search result page is never going to rank in Google results with just the colors displayed in your title tag.

If your intention is to capture search traffic on long-tail terms, this must be done in a way that Google won’t see as internal search results, as they have a clearly stated distaste for them. So if you insist on allowing these pages to get indexed, it can be done in a way that makes the page worth indexing and may rank them well – but, of course it is more complex to do so.

Effective Search Result Pages for Google Indexing and for worthwhile rankings:

Title tags of the internal search results page have complete search term included “Gray Green (your product name)” variants should include named category in title tags.
Rules Based Meta Description should be created for search results pages to distinguish them from one another. It can be as simple as “Search term – Product Name, Product Category”
Never paginate result pages for the same term (no more than one single page, limited to less than 50 results per page and paginated pages should be canonical to page one).
URL patterns should not be identifiable across all search terms in results (Google looks for /s/ or /search/ or /results/ or query strings which include those patterns such as domain.com/?s=.
A descriptive block of text option via CMS or back end to add text to successful pages.
Internally linked – search result pages linked from product detail and some category pages. Google sees pages without internal links as orphaned & devalues them.
Value Add by including related (relevant to search query) pages, images, articles, etc.

Being deceptive is not recommended, the preferred method would instead be to create useful and relevant landing pages which are widely linked across relevant pages of your own site. Provide significant descriptive text and discuss both important info that relates tightly to the potential page title, which presumably targets important terms that you want to rank for.

Not “Gray Green Wool” – one of two Amazon internal search results links found in the top ten results at Google. Amazon has made this a search for “Area Rugs” implying they are worth more than clothing or more popular than tapestries. Only two of the results of the 48 rugs reasonably match the colors included in the search term.

It’s Best to Devote Real Effort

Best not be lazy and allow search results to refer dribbles of traffic to search result pages on your own site that have little value for visitors. It’s worth devoting time to creating more useful landing pages that convert visitors than to fill Google with meaningless search phrases made up of color combinations or style variations.

Mike Valentine is an SEO consultant working with Retail E-commerce, Publishers, and Business to Business clients nationally and internationally. Contact us

Virus Silver Lining: Covid 19 Positive Digital Effects

Presidential 404 Error Pages – Lost & Found