New “Google Sitemaps” Web Page Feed Program

Today, Google has unveiled a new
Google Sitemaps program
allowing webmasters and site owners to feed it pages they’d like to have
included in Google’s web index. Participation is free. Inclusion isn’t
guaranteed, but Google’s hoping the new system will help it better gather pages
than traditional crawling alone allows. Feeds also let site owners indicate how
often pages change or should be revisited. Below, a Q&A on the new program with
Shiva Shivakumar, engineering director and the technical lead for Google
Sitemaps.

Can you give us a summary of how the new feed program will work?

Webmasters create XML files containing the URLs they want crawled, along
with optional hints about the URLs such as things like when the page last
changed, and the rate of change. They host the Sitemap on their server and
tell us where it is. We provide an open-source tool called Sitemap Generator
to assist in this process. Eventually, we are hoping webservers will natively
support the protocol so there are no extra steps for webmasters. When a
Sitemap changes, we support auto-notifying us so we can pick up the newest
version.

Why are you doing this?

We want to index all publicly available information so we can offer better
search results. However, currently web crawling is limited. Crawlers don’t
know all the pages at a website (e.g., dynamic pages), when those pages
change, how often to recrawl pages, how much load to put on a website. So they
try to guess. We want to work collaboratively with webmasters to get a big
picture of all the URLs we should be crawling, and how often they should be
recrawled. Ultimately this benefits our users by increasing the coverage and
freshness of our index.

What are the technical details? Just a list of URLs? An XML feed?

We defined a simple XML format that includes the URLs plus optional last
modification date, change frequency, and relative priority. We do support a
simple list of URLs as well, but using the XML format will help us crawl the
sites better.

Do you need for me to prove in some way that I’m associated with the site
I’m submitting for?

We accept all the URLs under the directory where you post the Sitemap. For
example, if you have posted a Sitemap at www.example.com/abc/sitemap.xml, we
assume that you have permission to submit information about URLs that begin
with www.example.com/abc/.

Will all my URLs get in? Some? Any guarantee? And how quickly?

At this early stage, we cannot guarantee that we’ll crawl or index all your
URLs. But as we understand the data better, we hope to get more of the data
into our crawl and indices.

How does someone sign-up?

Go to Google Sitemaps
and use your Google Account or
create a new one to sign in. If you already use Gmail, Groups, My Search
History, Alerts, or Froogle Shopping List, you already have a Google Account.

And this is all for free?

Absolutely. Also, this is an open protocol. We are hoping all webservers
and search engines adopt this protocol and benefit from the increased
collaboration

Any chance you may provide a reporting tool down the line, so people can
tell what searches are sending them clicks?

We are starting with some basic reporting, showing the last time you’ve
submitted a Sitemap and when we last fetched it. We hope to enhance reporting
over time, as we understand what the webmasters will benefit from. If you have
ideas on more of what you would like to see, let us know at the new
Google-Sitemaps
area at Google Groups.

How will you prevent people from using this to spam the index in bulk?

We are always developing new techniques to manage index spam. All those
techniques will continue to apply with the Google Sitemaps.

If I don’t use the program, you may still find pages through the regular
way of crawling, correct?

Yes. This program is a complement to, not a replacement of, the regular
crawl. However, we hope that the hints you offer in the Sitemap will help us
do a better job than the regular crawl.

Still have more questions or comments? The Sitemaps FAQ goes into depth on many more details. The Google Sitemaps team will be
taking questions and responding all day at our Search Engine Watch Forums
thread, Google Sitemaps Now Accepting Web Page Feeds. Long-term, the team
will also be monitoring the new
Google-Sitemaps
area at Google Groups.

Related reading

A picture of a spider diagram drawn in red marker in the foreground, with the word SEO in the middle and other terms leading off it: marketing, keywords, strategy, metadata, design, backlinks, intuitive. A blurred person in the background is holding the marker, writing the diagram on what appears to be a clear glass pane.
pr
tools
we can work it out
Simple Share Buttons