With the announcement by Google and Yahoo that they have launched Sitemaps.org to promote a open standard of XML sitemaps this week, several clients are asking me about it. But since many of my clients are smaller businesses with less than 100 page web sites, I’m emphasizing to those smaller clients that there is another common standard – that is the plain text sitemap – which was already supported by both Google and Yahoo. What follows is an example of a communication with a client after the announcement on Wednesday.
To my understanding Google offers a new feature of uploading the sitemap to their search engine, is this correct? Should we be doing this too?
Yes, Google does offer a sitemaps program (it’s different than you think) and yes, I do recommend that you do it, and as a matter of fact, was just about to start this discussion as I do with all clients in early stages of an optimization campaign.
Sitemaps is a program which allows you to tell Google about all of your URL’s in two different ways. One is for larger sites with thousands of pages and involves an XML document with a half dozen attributes (freshness, importance, last changed, etc.). This should be constantly updated as new pages are added, and others dropped – and is best for ecommerce sites with dynamic content that is always changing.
The other method is for smaller sites with a few hundred static pages or less. That involves a plain text document which is simply a list of URL’s.
The Google Webmaster Central program has a list of other benefits, including stats on your top ranking keywords, pages with best PageRank, and you can benefit even with a plain text sitemap – which I’d written about in August when that program was announced by Google at Search Engine Strategies in San Jose.
There is another benefit in that Yahoo uses these plain text sitemap documents as well and we can submit to both big boys by using the same document posted on your server. Yahoo suggests the name “urllist.txt” for submission through SiteExplorer so I’ve used that as the document name for a few clients. But Yahoo has also accepted other names, so you can name that sitemap text file whatever you like and post it to your server – then submit to both.
Long story short, get started by developing a list of all public URL’s you want the search engines to know about in a plain text document. You can create that same list for each separate domain – every site has it’s own separate sitemap document.
But WAIT There’s More!
The other aspect to this is that your site should have a robots.txt document – which is an exclusion protocol which tells all search engine spiders the pages you DON’T want them to crawl and index. So you can list areas of the site you want the crawlers to stay out of.
In the announcement by Vanessa Fox of Google and Tim Mayer of Yahoo (hmmm MSN missing again) in the video by Chris Richardson of WebProNews recorded at PubCon Las Vegas, Mayer mentions that the engines would like to begin working together on a robots.txt open protocol as well.
Typically you want them to stay out of your cgi-bin and images directory (simply because those files shouldn’t be publicly indexed) and any directories intended for internal use only. Again, here is a sample robots.txt to visit and review as an example for reference:
You can view this file on any site by simply appending robots.txt to any domain name. If they have it, you can view it. SO, it has some security implications. If you have a directory folder named “PASSWDS” for example – you don’t want to list it on that robots text unless it is extremely secure and you believe you can thwart all hackers who want in.
So – start with the robots.txt and then get a list of URL’s for the sitemap.
Google, Yahoo and MSN search blogs announced and commented on their support for this new open standard yesterday. Virtually ALL SEO Gurus are commenting the open source Sitemaps protocol announcement this week.