In December, Google became the first crawler-based search engine to break the 1.5 billion web page mark. In addition, the service rolled out changes designed to improve the freshness of its results and the ability for users to find news.
The Google index now contains more than 1.5 billion web pages that have been actually visited by Google, as well as an additional half-billion pages that it knows about through links. There are also another 330 million image files and 700 million Usenet posts, which stretch back to 1981.
The enlargement of Google's Usenet information makes it a fantastic resource for researching the early days of the Internet, and Search Engine Watch's associate editor Chris Sherman takes a closer look at the enhanced Google Groups, in his story below.
Sherman's story also provides more details about Google's improved news search results. Since the middle of 2000, Google has provided links to news stories at the top of its results page, in response to certain queries. The news content was pulled from major wire services.
The latest changes now pull that content from hundreds of web sites that Google says it has identified as having news content. Google also says that news links are three times more likely to appear in results, than in the past. When it appears, news content shows up at the top of the standard Google results page, with the word "News" to the left of any links. Try a search for "euro" or "argentina," and you'll see examples of news links.
Unfortunately, the changes still leave Google weak in the news search arena. At competitors AltaVista and FAST, there are dedicated news search offerings. There are also a variety of good, new news search sites such as Daypop and RocketNews available, in addition to established ones such as Moreover. In any of these places, users who specifically want to find news content can be guaranteed to find it.
In contrast, there's no way to specifically perform a news-only search at Google, in the way you can an image search or a newsgroup search. Instead, you have to hope that the Google search algorithm manages to float news search results up in response to your query. To stay competitive, given the huge interest in news search, Google needs to finally make a dedicated news search option available.
Google did roll out a "Headline News" search service also in December, but that's not the same thing. This service aggregates top headlines from more than 100 leading English language newspapers into a single page, as well as grouping them into six categories: World, US, Business, Entertainment, Technology and Sports.
Google is promising future changes, such as more news sources and interface enhancements. Hopefully, one of those enhancements will be the ability to do keyword searching against the Google news search index used to feed its main results page.
Google is also trying to improve the freshness of its web page index. Previously, Google updated its web page index on a roughly monthly basis. This meant that pages could be around a month old, if you used Google just before the latest refresh happened.
The monthly refresh is still continuing, but a new daily refresh now also runs. A few million pages identified as being time-sensitive are being spidered regularly, so that the latest information from them is available.
Google is even highlighting if a page has been refreshed recently by the use of a new "Fresh!" tag that appears next to a page's URL. They show the exact time the page was respidered.
For instance, search for "white house," and you'll see that the US White House site is noted as "Fresh!," having last been visited on January 6.
The Fresh notations are welcomed, but even better would be if Google showed dates for all the pages it lists, in the way AltaVista used to offer. Then, it would be extremely easy to know exactly when a page was last visited by the Google spider.
By the way, that long-standing page date option was available at AltaVista until recently. It now appears to have been pulled, probably because it made it so easy to understand how fresh -- or stale -- AltaVista's index was.
Google Launches New Salvo in Search Engine Size Wars
SearchDay, Dec. 11, 2001
More details on Google getting bigger, enhancing its Google Groups area and making freshness changes.
Google Headline News
News Search Engines
Freshly-updated, a guide to major news search resources.
Newly updated page that's an entertaining read of Google's Cinderella story.