Dot info (.info) domain tld’s appear to be the new domain of search engine spammers since there is an apparent lack of Google aging delay to list and rank them. They are listed relatively quickly after first crawl by the search engines and are ranking well for some competitive terms. The sleaze monsters among search engine sp*mmers are using software to automate four separate areas, content gathering, article creation, article distribution and blog posting. Some may be using all four techniques in concert in an effort to blanket hundreds of sites with article content in order to slap up Google Adsense or Yahoo Publisher Network Ads.
Various types of thieving goes on in this seamy underbelly of automated search engine sp*m. Recently, “pre-loaded” content sites are being sold by a software developer with articles built-in to sites covering 150 topic areas for $100, or at $10 for individual topics, allowing setting up “Adsense Ready” article sites containing keyword focused content categories obtained from “Free-to-use” articles sites, against clearly posted author terms of use.
Those usage terms posted by authors and on article distribution sites universally prohibit use of those “free-to-use” articles in paid compilations, membership sites or any “for-profit” collections. Some authors are expanding their terms of use to exclude usage by specific networks. Previous slime merchants have avoided copyright lawsuits by giving away those articles with paid software purchases. I’d be surprised if authors didn’t find some way to band together to sue those who abuse their terms of use in this way.
Authors have worried over a “duplicate content penalty” when their articles are distributed for use by other web sites. It’s extremely unlikely that this type of use will lead to penalties for the author web site, linked from resource boxes of those articles of original content. The likely application of duplicate content penalties comes, interestingly when used in exactly the same way by those clueless purchasers of “pre-loaded” sites with precisely duplicated site structure and precisely the same articles AND RSS feeds that won’t vary. Those that use these mirrored sites are the ones that will suffer that duplicate content penalty as they are mirrored sites, which have been filtered for years. Lazy buyers of “pre-loaded” articles sites will be the only ones to receive penalties from the search engines.
In another slimy aspect of this odd netherworld of search engine spam, article gathering site crawlers use IP spoofing which imitate search engine IP addresses to hide themselves within routine traffic on those sites they crawl, trolling the web looking for articles to steal and use in splogs and pre-loaded web site kits. These crawlers hit pages slowly seeking sitemaps or author index pages, grab URL’s to return later under different IP’s and pound away at 10 pages per second or more, grabbing articles from major sites against posted terms of use on those sites. The crawlers usually belong to hosted services which then sell these scraped article collections in SQL databases. Some even offer site subscribers feeds of this stolen content after running it through new article regurgitating software.
This sleazy article theft software product, which takes already written copyrighted articles by other authors, re-orders paragraphs, swaps out interchangable verbs, rearranges sentences and spits out a fairly readable, and sometimes passable article which may not be recognizable to original authors. These stolen, regurgitated articles are then submitted to article banks and distribution sites by splog creators, sometimes using automated submission software or hosted services, so those stolen, regurgitated articles are used across the web to create inbound links leading to the search engine sp*m sites.
Many of these .info domain owners are using sleazy sp*m blog software to create what has become known as “splogs” which use multiple blogging platforms to create automatically updated blogs with posts made regularly in some random time sequencing. They do this to appear to be active bloggers, using automation built into their software, to create keyword focused posts via RSS feeds coming from keyword phrase centered news searches and then “ping” the blog search engines with new automated posts. Depending on the sophistication of the splog owner, you’ll often see footer links leading to other splogs they operate on separate topics.
Virtually all of the .info domains I’ve seen ranking in top results for competitive phrases are entirely Adsense or YPN sites – including splogs, full of autogenerated RSS news feeds and on-the-fly generated title tags and H1 tags based on the search phrase used to find the site. Even the copyright information in the footer of some of these sites is generated on-the-fly to match search queries. While this technique is also being used by some search engine sp*mming .com sites (older than 1 year since creation to avoid aging delay) it can be seen in more .info domains currently.
If Google is truly ranking sites based on clickstream data, imagine the abuses these dynamic spam sites, full of nothing but RSS feeds or stolen, regurgitated content could spawn! Soon they would rule the results pages because they reflect EXACTLY the search terms used by the searchers, which leads to higher click-through ratios, which generates higher rankings. I see a serious hole for abuse here and hope that the PhD’s at Google work out a filter for the technique fast.
This exact match landing page idea is used widely in pay-per-click campaigns as most savvy SEM specialists highly recommend landing pages which reflect exact matches to user clicks because it leads to higher conversion ratios. Perhaps a programmer who spends his days creating PPC landing page scripts is spending his nights creating .info domains with dynamic page title and metadata for competitive search phrases to rule organic SEO?
Of course, whois ownership information is masked by many recent .info domain owners, since those domains were purchased specifically for se-sp*mming sites. When looking up the whois information on highly ranking .info domains to check creation (purchase) dates, you’ll see a preponderance of October through December 2005 creation dates, with a smattering of January 2006 created sites for those well ranked splogs. This must be about the time that spammer forums started noticing and discussing the lack of aging delay for .info domains.
Whois information for dot com (.com) sites ranking well for competitive searches shows that ALL are over a year old and most are 3 to 5 years since creation date.
All of this suggests clear algorithmic aging filters and the apparent lack of .info filtering. My thought is that Google is using this lack of aging delay and lack of filtering as a honeypot for search engine sp*m to gather the bad boys all in one otherwise rarely used tld and then do wide sweeps, tracing their tactics to further filter (forgive me for using the term) Black Hat techniques.