Today, Google has unveiled a new Google Sitemaps program allowing webmasters and site owners to feed it pages they'd like to have included in Google's web index. Participation is free. Inclusion isn't guaranteed, but Google's hoping the new system will help it better gather pages than traditional crawling alone allows. Feeds also let site owners indicate how often pages change or should be revisited. Below, a Q&A on the new program with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps.
Can you give us a summary of how the new feed program will work?
Webmasters create XML files containing the URLs they want crawled, along with optional hints about the URLs such as things like when the page last changed, and the rate of change. They host the Sitemap on their server and tell us where it is. We provide an open-source tool called Sitemap Generator to assist in this process. Eventually, we are hoping webservers will natively support the protocol so there are no extra steps for webmasters. When a Sitemap changes, we support auto-notifying us so we can pick up the newest version.
Why are you doing this?
We want to index all publicly available information so we can offer better search results. However, currently web crawling is limited. Crawlers don't know all the pages at a website (e.g., dynamic pages), when those pages change, how often to recrawl pages, how much load to put on a website. So they try to guess. We want to work collaboratively with webmasters to get a big picture of all the URLs we should be crawling, and how often they should be recrawled. Ultimately this benefits our users by increasing the coverage and freshness of our index.
What are the technical details? Just a list of URLs? An XML feed?
We defined a simple XML format that includes the URLs plus optional last modification date, change frequency, and relative priority. We do support a simple list of URLs as well, but using the XML format will help us crawl the sites better.
Do you need for me to prove in some way that I'm associated with the site I'm submitting for?
We accept all the URLs under the directory where you post the Sitemap. For example, if you have posted a Sitemap at www.example.com/abc/sitemap.xml, we assume that you have permission to submit information about URLs that begin with www.example.com/abc/.
Will all my URLs get in? Some? Any guarantee? And how quickly?
At this early stage, we cannot guarantee that we'll crawl or index all your URLs. But as we understand the data better, we hope to get more of the data into our crawl and indices.
How does someone sign-up?
Go to Google Sitemaps and use your Google Account or create a new one to sign in. If you already use Gmail, Groups, My Search History, Alerts, or Froogle Shopping List, you already have a Google Account.
And this is all for free?
Absolutely. Also, this is an open protocol. We are hoping all webservers and search engines adopt this protocol and benefit from the increased collaboration
Any chance you may provide a reporting tool down the line, so people can tell what searches are sending them clicks?
We are starting with some basic reporting, showing the last time you've submitted a Sitemap and when we last fetched it. We hope to enhance reporting over time, as we understand what the webmasters will benefit from. If you have ideas on more of what you would like to see, let us know at the new Google-Sitemaps area at Google Groups.
How will you prevent people from using this to spam the index in bulk?
We are always developing new techniques to manage index spam. All those techniques will continue to apply with the Google Sitemaps.
If I don't use the program, you may still find pages through the regular way of crawling, correct?
Yes. This program is a complement to, not a replacement of, the regular crawl. However, we hope that the hints you offer in the Sitemap will help us do a better job than the regular crawl.
Still have more questions or comments? The Sitemaps FAQ goes into depth on many more details. The Google Sitemaps team will be taking questions and responding all day at our Search Engine Watch Forums thread, Google Sitemaps Now Accepting Web Page Feeds. Long-term, the team will also be monitoring the new Google-Sitemaps area at Google Groups.
This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Early Bird Rates available through Friday, July 18. Register & save!