In alphabetical order, Google, Microsoft and Yahoo have agreed to all support a unified system of submitting web pages through feeds to their crawlers. Called Sitemaps, taking its name from the precursor system that Google launched last year, all three search engines will now support the method.
More about Sitemaps is to be provided through the new Sitemaps.org site. As part of the announcement, the existing sitemaps protocol from Google gets a version upgrade to Sitemaps 0.9. However, no actual changes to the system have taken place. The new version number was simply done to reflect the protocol moving from an exclusive Google system to one that all three search engines now support.
Anyone already using Google Sitemaps needn't do anything different. The only change is now those sitemaps will be read by Microsoft and Yahoo, as well. More information will either be posted at the Sitemaps.org site or see these sections from each of the search engines, which I expect to be updated soon:
- Google Webmaster Central or sitemaps info.
- Microsoft Live Search submit page or help info (choose the Live Search Site Owner option from the drop-down box. Sorry I can't link to the exact relevant pages. Microsoft has this seemingly impossible to bookmark help system)
- Yahoo Site Explorer or submit information.
Other search engines are also invited to use the system -- it has specifically been placed as open property through Creative Commons so that others can make use of it. FYI, Ask isn't part of this announcement because it wasn't invited by the other three to take part, which I find unfortunate. Then again, among all four, Ask is the only one that doesn't already accept submissions in some way.
How can others contribute to its development? That remains to be worked out. So far, there's a working committee involving the three major search engines named. They say they are open to participation from other search engines, as well as content owners, to see the system grow and develop. I expect we'll find more structure to this emerging soon. At the moment, the key work has been in getting all three to agree to support the existing standard.
How about unification around other search standards, such as improving the robots.txt system of blocking pages. Again, this is something the search engines (specifically Google and Yahoo when I spoke to them), say they're interested in. So fingers crossed, we'll see more of this down the line.
Overall, I'm thrilled. It took nearly a decade for the search engines to go from unifying around standards for blocking spidering and making page description to agreeing on the nofollow attribute for links in January 2005. A wait of nearly two years for the next unified move is a long time, but far less than 10 and progress that's very welcomed. I applaud the three search engines for all coming together and look forward to more to come.
Below is more from the press release. Sorry I can't do a longer post about the system, but I'm also busy attending the PubCon conference, where the announcement has happened.
Las Vegas, November 16, 2006 - In the first joint and open initiative to improve the Web crawl process for search engines, Google, Yahoo! and Microsoft today announced support for Sitemaps 0.90 (www.sitemaps.org), a free and easy way for webmasters to notify search engines about their websites and be indexed more comprehensively and efficiently, resulting in better representation in search indices. For users, Sitemaps enables higher quality, fresher search results. An initiative initially driven by Yahoo! and Google, Sitemaps builds upon the pioneering Sitemaps 0.84, released by Google in June of 2005, which is now being adopted by Yahoo! and Microsoft to offer a single protocol to enhance Web crawling efforts.
Together, the sponsoring companies will continue to collaborate on the Sitemaps protocol and publish enhancements on a jointly maintained website www.sitemaps.org, which provides all of the details about the Sitemaps protocol.
How Sitemaps Work
A Sitemap is an XML file that can be made available on a website and acts as a marker for search engines to crawl certain pages. It is an easy way for webmasters to make their sites more search engine friendly. It does this by conveniently allowing webmasters to list all of their URLs along with optional metadata, such as the last time the page changed, to improve how search engines crawl and index their websites.
Sitemaps enhance the current model of Web crawling by allowing webmasters to list all their Web pages to improve comprehensiveness, notify search engines of changes or new pages to help freshness, and identify unchanged pages to prevent unnecessary crawling and save bandwidth. Webmasters can now universally submit their content in a uniform manner. Any webmaster can submit their Sitemap to any search engine which has adopted the protocol.
The Sitemaps protocol used by Google has been widely adopted by many Web properties, including sites from the Wikimedia Foundation and the New York Times Company. Any company that manages dynamic content and a lot of web pages can benefit from Sitemaps. For example, if a company that utilizes a content management system (CMS) to deliver custom web content – (i.e., pricing, availability and promotional offers) - to thousands of URLs places a Sitemap file on its web servers, search engine crawlers will be able discover what pages are present and which have recently changed and to crawl them accordingly. By using Sitemaps, new links can reach search engine users more rapidly by informing search engine “spiders” and helping them to crawl more pages and discover new content faster. This can also drive online traffic and make search engine marketing more effective by delivering better results to users.
For companies looking to improve user experience while keeping costs low, Sitemaps also helps make more efficient use of bandwidth. Sitemaps can help search engines find a company's newest content more efficiently and avoid the need to revisit unchanged pages. Sitemaps can list what is new on a site and quickly guide crawlers to that new content.
“At industry conferences, webmasters have asked for open standards just like this,” said Danny Sullivan, editor-in-chief of Search Engine Watch. “This is a great development for the whole community and addresses a real need of webmasters in a very convenient fashion. I believe it will lead to greater collaboration in the industry for common standards, including those based around robots.txt, a file that gives Web crawlers direction when they visit a website.”
"Announcing industry supported Sitemaps is an important milestone for all of us because it will help webmasters and search engines get the most relevant information to users faster. Sitemaps address the challenges of a growing and dynamic Web by letting webmasters and search engines talk to each other, enabling a better web crawl and better results," said Narayanan Shivakumar, Distinguished Entrepreneur with Google. "Our initial efforts have provided webmasters with useful information about their sites, and the information we've received in turn has improved the quality of Google's search.”
“The launch of Sitemaps is significant because it allows for a single, easy way for websites to provide content and metadata to search engines," said Tim Mayer, senior director of product management, Yahoo Search. "Sitemaps helps webmasters surface content that is typically difficult for crawlers to discover, leading to a more comprehensive search experience for users.”
“The quality of your index is predicated by the quality of your sources and Windows Live Search is happy to be working with Google and Yahoo! on Sitemaps to not only help webmasters, but also help consumers by delivering more relevant search results so they can find what they're looking for faster,” said Ken Moss, General Manager of Windows Live Search at Microsoft.
The protocol will be available at sitemaps.org, and the companies plan to have Yahoo Small Business host the site. Any site owner can create and upload an XML Sitemap and submit the URL of the file to participating search engines.