Clean Up Junk Pages & Increase Domain Authority
Very often, those viewing a declining graph are horrified that something has gone terribly wrong. It seems counter-intuitive at first glance that a decline could be positive. But surprise! Reducing the number of pages that Google has indexed on your site produces a gain in domain authority. (See the graph at bottom for details)
This has been my initial focus for every enterprise client over the past 10 years. Let’s dive in to the details to understand why an apparent loss is a positive in this case.
Too many pages in the Google index traced to basically five elements
1. On-site Search Results pages – Google hates them and has penalized large sites with excessive search pages in the index. Block internal search pages with robots.txt Disallow. and/or robots meta instructions on-page in the head section of search result pages. Worst case, your site exposes links to search pages. This must be corrected via javascript (Moz.com) or noindexing. Sometimes those search results links are generated by content teams posting internal search links in social media (because it’s easier to use the “search url” to link than navigate to the proper landing page and copy that long URL)
2. URL Parameters and query strings generated on category pages by faceted navigation filters which affect category page display or sort order can easily result in 5X duplicated content and diffuse category pages authority. Parameters are used only rarely by good content management system, but that CMS may be managed by a product team unaware of SEO harm caused by query strings appended to every URL. Help the team understand the issue and then head to Google Search Console to filter and prevent them from being indexed.
3. Broken links on legacy pages remaining after site rebuilds or udpates to URL structure can increase errors and result in horrible user experience when they get 404 errors when clicking on internal links within your site. Legacy URLs tend to exist on any site that has experienced significant site overhauls or redesigns during their lifespan. If no SEO advice was available during those relaunch plans, you may have a significant catalog and mapping of old URLs on your SEO plate. It’s important and must be a priority for every SEO to remediate and 301 Permanent redirect legacy URLs.
4. Temporary 302 redirects are sometimes set as default when technical fixes are made by engineering teams not focused on SEO – switch them to 301 Permanent redirects to de-index the original pages and transfer authority to a new landing page. Run a crawl of the site and record all 302 redirects – then get them changed to 301s ASAP – then offer training to the engineering team to explain that 302 redirects are intended to be Temporary and that 301 redirects are intended to be Permanent. They get it and will make corrections once they are aware of the SEO ramifications.
5. Thin Content – Enterprise sites generate a lot of fluff when default settings generate pages with only titles and no content or sometimes pages with less than a full paragraph of text. These need to be noindexed. Tags in ones and twos, empty entities, unnecessary map coordinates, unneeded store locations, sub-subcategories with a single product or two or three, etc. There are many reasons empty pages can be generated by enterprise systems. Those must be stopped. When they can’t be prevented, then apply noindex robots meta instructions.
As can be seen in the graph reproduced above, the result of making changes required to eliminate search results, resolve broken links, properly redirect legacy content and remove thin content pages is that total indexed pages decline significantly. In this case, over a 50% decline from nearly a million pages to 450,000.
- The result is improved ranking for pages that no longer compete in visibility and ranking with dozens of other pages on your own site for the same search terms.
- The result is a dramatic drop (recorded by Google Search Console) in the number of 404 and 302 errors across the site.
- The result is removing useless, nearly empty pages from the index and increased visibility for pages with strong, useful content.
Does This Tedious Cleanup Really Matter?
There is another issue that SEO’s see fairly consistently – old pages that no longer exist, but are either externally linked or still in search engines. When asking engineers to work on cleanup, they can worry that it has a minimal effect on the site performance.
While it is not high priority, it does matter for general search engine hygiene. The goal should always be to reduce 4xx (401, 403, 404) and 5xx errors to as few as possible. The answer below gets a bit in the weeds, so apologies for length, but…
301 Redirecting is important to SEO because the pages are externally linked somewhere or they would not show up repeatedly as errors. Sometimes they are still in the Google Index and will turn up in search results as duplicate content (Bad for SEO) and result in broken links (Bad user experience) until you redirect them and the Googlebot visits to see the new 301 to a Permanent URL.
They sometimes remain linked from external web sites and the purpose of 301 redirects is to tell the search engine bots and visitors there are alternate landing pages. There is another option to remove them from search engines if 301’s are too much work to properly map. That is to assign them all 410 Gone header responses. This removes them from search engines as well.
404 errors will remove them from search engines someday, but take longer if they are externally linked. If they continue to show up in Webmaster Search Console – it’s best to 301 or 410 those pages to sweep up the mess.
This is all about good SEO maintenance and housekeeping to offer only clean, unbroken content to search engines to catalog for us. Clean up the clutter and improve overall relevancy and domain authority.
Mike Valentine offers SEO Consulting to Startups and Enterprise sites internationally