Why both XML and HTML Sitemaps?
With each new client and most SEO recommendations given for site improvements comes an inevitable recommendation to add an HTML sitemap. People always point to their XML sitemap and say, “We’ve got a sitemap – but it’s XML – do we need an HTML too?” Yes, you do.
Automated XML Retrieval
Googlebot comes to your site, first it looks at your robots.txt file to see if you want to keep it away from any pages or crawlable links. That was the original reason for robots.txt files – to tell bots where they should NOT go. But back in April of 2007, the search engines got together to announce support for an XML sitemap pointer to ask them to please index all your important pages. SEO’s also encourage their sites to submit the XML sitemap through Webmaster Central (now “Search Console” at Google.)
Why isn’t an XML sitemap enough?
Because crawlers gotta crawl. Search engines need a way to quickly discover all the pages on your web site and understand site structure and page relevancy. Somehow an XML sitemap fails to get the site, and new pages, crawled rapidly. SEO’s have observed over the years that XML sitemap submission or XML sitemap pointers from robots.txt files got pages indexed, but only very slowly. How do we get new pages indexed quickly?
Enter the HTML Sitemap
It can perfectly represent site structure by placing pages within a hierarchy. You can also give it higher visibility and authority by linking to it site-wide from the footer. That HTML sitemap page then helps distribute site-level authority to internal pages. Of course, everyone wants to see one that perfectly represents their own industry. I’ve sent many clients to see how the New York Times links to every article published since 1851 in their Spiderbites HTML Sitemap.
Show me an HTML Sitemap for my Industry
Others want to see one for a retailer, a software platform, a financial services company, etc. – So research provides those examples for every industry – usually with a tendency toward user-focused, rather than bot-focused sitemaps, often without the footer links sitewide, usually with images, search functionality and other unnecessary code which make them slow-loading and ineffective for crawlers to fully index. (See list below for simple best practices for bot friendly HTML sitemaps.)
Smaller Sites Can be Less Than a Hundred Pages
Why would it be hard for a bot to crawl so few pages and list newly added links quickly? Because they are allocated a lower crawl budget, visit less frequently and unless you ask Google to retrieve a new URL every time you post a new blog – they may not visit that new link for weeks. Unless it is quickly accessible through a footer link on every page of your site – pointing at the HTML sitemap, which automatically includes newly added pages instantly.
Lost or Orphaned Topics
There is also the problem of orphaned pages, one-off pages that aren’t included in the site navigation or not listed among top-level category pages, or anywhere in the footer links, what’s new links or within body text or anywhere else for that matter. You’ve got to start seeing by now that there are at least a half-dozen reasons to do this, right? Honestly, it’s worth the extra effort.
Remember that an HTML sitemap is for bots, not humans. Here are recommendations for Optimal HTML sitemaps for all sites:
- Page Load Speed Critical – Make them light weight:
- Avoid graphics (except small logo)
- No javascript (inline or file links)
- No search feature
- No top Nav
- No footer links
- No Social links
- Lists every public page on the site (Dynamically updated)
- Link to HTML sitemap from footer sitewide (Call it “Index” or Sitemap)
That’s it, Simple Stuff.
Now they can get more complex to organize on large sites within a hierarchy with top-level pages then sub-cats, then leaf pages within subs, etc. But it’s easy for small sites to just post a single page with 100 or 200 links with nested category / subcat / leaf pages. And large sites can follow the example of the New York Times (linked above) and organize either by category, by topic, or by date – depending on what makes the most sense in your industry.
Here’s the Kicker
Few sites work HTML sitemaps into the development schedule. It somehow gets de-prioritized and pushed further down the list of projects until it falls off the projects list. If you are able to make it happen in your organization, you’ll gain a leg-up on your competition in getting fresh content crawled faster and ranked due to this extra effort and improved internal linking. If you need just one small thing to give you a competitive advantage, this may be what it takes to push you up that extra notch for important pages. Just do it.
We can do a thorough SEO Audit of your site to determine what needs improvement on your site so search engines can find and index all your pages – then rank your pages among the top search results. Contact Us.