Business, Development
Comments Off Sitemaps: Old or New?
With the major search engines supporting the Sitemap Protocol many developers have switched to using these XML versions of sitemaps for telling search engine crawlers about the pages on their sites. I recently found that this was a big mistake. I personally had switched to using these XML sitemaps 100% on my sites. I no longer had links to old-school sitemaps where we listed every page on our website. For many sites every page can be found through the standard crawling of links. For example, Fido Finder has thousands of Lost and Found dog listings, but the crawler can get to all of these by simply going to the Lost Dogs page and crawling the “Next Page” link, over and over. There are truly no hidden pages that I want Google to index. Tagomatic, being a search engine, is a different story. And there’s actually a story to go with this lesson.
One day I noticed that Tagomatic’s traffic dropped off by about 75%. I was in a panic. Had Google suddenly decided that Tagomatic was no longer important? It took me days to finally realize that for whatever reason Google had de-index any Mod-Rewritten URLs. 3-4 months prior to this I had changed the website listings on Tagomatic to include the domain name (which is unique) in the URL instead of having a standard querystring identifier (so /domain.com/ instead of page.php?id=12345). The only website listings showing up in Google were listings where the “domain name” (mostly older entries that contained sub-folders, so forward slashes) contained characters that did not allow me (with the rewrite code I found) to rewrite them. So ONLY standard querysting URL’s were index in Google for the websites section of Tagmatic. This makes absolutely no sense, and technically Google shouldn’t “know” that these URL’s are rewritten, but the only listings on Tagomatic that were still showing up where standard querystring, non-rewritten, URLs. The new, pretty, URL’s had already been indexed in Google for months. They were correct being redirected if you entered the old URL, but for whatever reason Google decided these URL’s were not worth indexing any more. So I rolled the code back and after some time, and submitting a re-inclusion request via the Google Sitemaps dashboard, the URL’s slowly started to show up again. After 4 months only about 50% of the original URL’s had returned. By 8 months about 70% of the URL’s that were previously index had returned, but the growth had definitely stopped. In the Google Sitemaps dashboard I could see that one of my sitemaps had 25K of its 50K links (max 50K per sitemap) indexed and the other had 23K of 25K indexed. I wanted that ~25K back.
As a test I created an old school sitemap, where any search engine could crawl “next page” after “next page” and see all of the pages on Tagomatic. Within a week 7K more pages showed up as “indexed.” Hopefully we can get back to having 90% of the pages in our sitemaps indexed, but it won’t be thanks to the new sitemap protocol. For whatever reason Google needed an old school sitemap to get all of the pages in the index. Sometimes you have to just experiment and you’ll find something that doesn’t make sense that works anyhow.