Photo Copyright © Mike Valentine
If your shiny new content is not crawled by Google, then it won’t be indexed by Google and has zero chance of ranking well in Google. Let’s talk about how to be certain everything you publish on your website gets crawled and indexed quickly – so it will at least have a chance to get ranked well.
How often has someone run to the SEO complaining, “My article isn’t showing up in Google – even with an exact match search of the title in quotation marks!” First, let me offer a great tool to find the content URL – not the title – by using the “info:http://www.site.com/article-name.html” query operator (no space after the colon.) You’ll see the article show up in search every single time – but only if Google knows about it.
Using that info: query offers people who incessantly search for their content assurances that Google has included their article in the index. If it doesn’t surface in a Google searches using the info: query operator, well let’s talk about how to make sure it does.
All the News That’s Fit to Index
News web sites have an edge over everyone else because Google offers them a unique tool called a “news sitemap,” an XML file which can be updated by publishers in real-time as new content is pushed. News Sitemaps are likely to be crawled dozens of times a day – so Google News can offer searchers the most up-to-date, current news.
If your site is not an approved Google News source, you probably don’t push every new article (or product detail page) instantly to your XML sitemap – nor does Google check standard XML sitemaps frequently. However – there are four great ways to get all content indexed quickly:
- 1. New Content Landing Page & Pushing “New” links to the home page
- 2. Related Content Internal Links
- 3. HTML Sitemap Listing All Content
- 4. Social Media participation & engagement around every new article published
New Content Landing Pages
Large publishers which are not news organizations often use “New!” pages which list all fresh content for a day or two. Googlebot tends to more frequently crawl pages that are updated with fresh content regularly. This assures that most of the links to your freshest content get crawled sooner, because this page is frequently refreshed.
This is also why blogs default to showing the latest content in reverse chronological order and also include the “Recent Posts” widget on the home page. If you post to your blog infrequently, but don’t have a “Recent Posts” widget – go set that up right now. Small publishers must also include “Recent” content modules if they are interested in quick and complete indexing of their site by search engines.
Related Content Modules
This is THE most effective, but least used of methods to increase content relevancy, improve keyword density and gain link equity. You’ll definitely see “Related Products” links on retail sites – they know it works to increase page views, time on site, upsells, and has a lovely side benefit – improved SEO.
But “Related Articles” is less often used by content focused sites because they tend to default to “Most Popular” modules, which may improve page views and time on site, but they suck for relevancy and SEO value. As an SEO, I’d like to banish “Most Popular” modules entirely and require “Related” modules. “Related” modules are often based on simple algorithms which require keywords in headlines, tags and content in related categories.
The unseen, but extremely valuable aspect to “Related” modules is that it sprinkles links around the site, only appearing on “Relevant” content pages – which enhances overall relevancy and increases link equity. The very obvious SEO benefit was illustrated for everyone about 7 years ago or so, when Wikipedia began to dominate search results at Google. Wikipedia’s interlinking of relevant content from within body text was so effective, that Google had to dial back the algorithm to favor Wikipedia less. It still dominates in search due to “related” links.
HTML Sitemap Lists Every Page
Now the dull stuff – pages intended ONLY for search engines that list every page that you want indexed by search engines. My favorite example to show is always the New York Times HTML sitemap, aptly named “Spider Bites” at “http://spiderbites.nytimes.com”. Google, as of this writing, shows 18,400 NYTimes Spider Bites sitemap pages indexed which includes articles as far back as the mid 1800’s – and list every single article. Much of this is, justifiably behind a pay wall, but Google knows about all of it.
Realistically, this is the only way to get every single page of your web site crawled and indexed when you have extremely large archives or many hundreds of thousands of pages. There is no need to include all of this in standard XML sitemaps – but it guarantees a crawler will find and include every page. It’s not just large sites that should use this method though.
Everyone with over 50 pages should include HTML sitemaps that are automatically updated as new content is added. This keeps your site fully indexed by making them completely accessible to crawlers. Sometimes it’s a challenge and it may be tough to get engineering resources. Just do it. It works for full indexation.
Social Media Participation as Indexing Tool
I’ve counseled all clients, including those in the very most boring industries, to frequently post links to new content widely in social media outlets – ALL social media outlets. Yes, that means tweeting and posting to Facebook about dull stuff sometimes – but it’s your task to make it interesting – or not.
Posting new content in frequently crawled sites gets your content indexed faster by feeding searchbots links to your fresh content in places that get much more search engine attention than your site. Even if all you do is to post the links on Google Plus saying “New Article on …” it still gets the content crawled and indexed fast.
If you are creative and entertaining and innovative – a post or two may go viral. Then you’ll not only be fully indexed and crawled frequently, but you’ll rank well.
Mike Valentine consults on technical SEO for startups and Enterprise Clients Internationally