THE SEARCH ENGINE REPORT
August 2, 1999 - Number 33
About The Report
The Search Engine Report is a monthly newsletter that covers developments with search engines and changes to the Search Engine Watch web site, http://searchenginewatch.com/.
The report has 65,000 subscribers. You may pass this newsletter on to others, as long either part is sent in its entirety.
If you enjoy this newsletter, consider showing your support by becoming a subscriber of the Search Engine Watch web site. It doesn't cost much and provides you with some extra benefits. Details can be found at: http://searchenginewatch.com/about/subscribe.html
Please note that long URLs may break into two lines in some mail readers. Cut and paste, should this occur.
In This Issue
Search Engine News
+ Search Engine Coverage Study Published
+ Go Beta Tests User-Assisted Directory
+ LookSmart Live Looks-Up Answers
+ Direct Hit Debuts at MSN Search, Lycos
+ Paid Links Dropped, Other Changes At AltaVista
+ FAST Announces Largest Search Engine
Search Engine Briefs
+ Search Within, Text-Only Features At HotBot
+ Northern Light Adds Clustering
+ Playboy Loses In Excite Banner Dispute
+ Snap Gains Image Search
+ Metasearch Service Caters To Visually Impaired
Search Engine Articles
The new links system that I mentioned last month is officially online. Within Search Engine Links, you'll find listings for all types of different search engines. Eventually, I'll also add links currently contained within the Search Engine Resources area to this new system.
If you actually submitted a site to the new system before July 30, please do a resubmission. An error on my end means that I didn't get a record of submissions before then. My apologies for this problem.
Also, by Tuesday or Wednesday, I'll have a new feature in the Search Spotlight that shows popular queries used at iTrack.com, a metasearch service designed for auction lovers. Watch for the link on What's New.
Search Engine Links
Search Engine News
Search Engine Coverage Study Published
In 1998, a landmark study published in Science magazine found that search engines fell well-short of listing everything on the web. Last month, a repeat to this study was published in Nature magazine. Once again, search engines were found to contain only a small percentage of the web pages available for indexing.
Both studies made for great headlines about how search engines seemingly fail users, but the number of pages a search engine has indexed has nothing to do with whether it is any good. Size alone doesn't matter.
Don't believe me? Go to your favorite search engine and look for a company, any company. Was the company's web site listed first? That's what many users want to have happen. This example is just one of many that illustrate the value of relevancy over sheer size. Improving relevancy means being a smarter search engine, not just a bigger one.
Even the study itself explains that bigger doesn't mean better: "There are diminishing returns to indexing all of the web, because most queries made to the search engines can be satisfied with a relatively small database," the study's authors write.
In fact, study coauthor Dr. Steve Lawrence isn't saying search engines are bad. Lawrence credits them for making more information available to the public than ever before, but he thinks they could do better.
"The amount of information that you can quickly and efficiently search for using search engines has been increasing, but what has been decreasing is the amount you can search for versus what you could potentially search for," Lawrence said.
So having provided some balance to the headlines you may have read recently, let's examine details from the study in more depth.
Conducted by scientists at the NEC Research Institute, the study found that there were 800 million indexable pages and 180 million images on the web, as of February 1999 . "Indexable" means pages that weren't hidden behind password protection, excluded from indexing by robots.txt files, locked away in databases or basically inaccessible to search engines for other reasons.
The first study found that there were 320 million pages as of December 1997, so it sounds as if the web has more than doubled in size in just over a year. However, the two studies cannot be compared fairly because they used completely different methods of estimating the size of the web.
Previously, the size estimate was derived by comparing the overlap between results at different search engines to extrapolate an overall figure for the web. In the latest study, the researchers used a variety of techniques to determine how many web servers were publicly available (2.8 million) and the mean number of pages per server (289). Multiplying these two figures is where the 800 million web pages count comes from.
From here, the next step was to determine how much coverage of the web each search engine provided.* You need a known starting point, and Northern Light was selected. The researchers did a search that told them Northern Light had an index of 128 million web pages.** Then they divided Northern Light's index size by the entire web size to find that Northern Light covers 16 percent of the web.
Unfortunately, you can't get a count of index size for some of the other crawler-based search engines, so the researchers instead ran a series of test queries -- 1,050 in all.*** They totaled how many web pages were returned for each search engine, and that told them proportionally how big or small each service was compared to Northern Light (which coincidentally led the pack in terms of coverage). Here's the complete rundown:
Northern Light: 16%
Inktomi (Snap): 15.5%
Inktomi (HotBot): 11.3%
Inktomi (MSN Search): 8.5%
Inktomi (Yahoo): 7.4%
Inktomi provided search results to several services when the survey was done, which is why it appears multiple times. It powered primary results at MSN Search and secondary results at HotBot, Snap and Yahoo. The variation in coverage reflects what I and others have written about in the past -- not all Inktomi partners tap completely into its 110 million page index.
Lawrence said the percentages for HotBot and Lycos might be slightly higher than shown, because those services will only display one page per web site in their results. That could have caused an undercount, he said.
The big winner was Northern Light, which saw its traffic triple just after the study was published. That had to be satisfying to CEO David Seuss, who said earlier this year that winning the title of biggest search engine would make the still relatively little-known service more popular. Of course, whether those users will stick with Northern Light remains to be seen, and challenger FAST has just ousted Northern Light from the number one spot with its new 200 million web page index. Excite is also about to weigh in later this month with a new index in the 250 million page range - expect more details in the next newsletter. Northern Light is in the 165 million page range, followed by AltaVista's 150 million pages. All sizes are unaudited, self-reported numbers.
Combined, search engines are seen to be covering only 42 percent of the web, meaning that if you used each search engine individually, you'd access more pages across the web than using any single service alone. In comparison, the last study found a combined coverage of 60 percent. So things are getting worse? Probably, but not conclusively. You can't accurately compare these figures because the methods of estimating the size of the web were so different for each study.
Lawrence agrees that direct comparisons won't be exact, but he feels the overall finding that coverage is dropping is basically correct. "It wouldn't change any of the conclusions, though it might have changed the magnitude," he said of the drop, assuming the same size method had been used in both studies.
The study also reported on the number of dead links at each service, which are an indication of how fresh each search engine's index is. The lower the percentage, the fresher the index. Overall results were:
Northern Light: 9.8%
Inktomi (Yahoo): 2.9%
Inktomi (Snap): 2.8%
Inktomi (MSN Search): 2.6%
Inktomi (HotBot): 2.2%
The poor showing by Lycos reflects another fact I've previously reported, that its index was woefully out of date earlier this year. The situation at Lycos has been greatly improved since then. Moreover, primary results at Lycos now come from the Open Directory, not from the spidered index reviewed in the study.
The study also found that it was more likely that pages from popular and well-known sites would be indexed. That's in part a function of how search engines work -- spiders are more likely to visit a site if they keep coming across links to it -- plus it is a conscious decision on the part of most all the major search engines as a means to provide better listings and combat spam.
The study's authors worry that this trend means high-quality but "unpopular" pages may not list well. In my opinion, this is less a concern. If anything, I feel the trend toward popularity measures means that general users will get more relevant and useful results in response to common queries, which is desperately needed. Meanwhile, those performing more refined and focused queries such as research professionals should still be able to locate much information of use.
The solution to better serving the second group probably won't be to create a single, super-comprehensive 800 million web page search engine. Instead, it's likely to be the creation of new specialty services that do in-depth coverage of sites by topic.
Long overdue is probably an academic search engine. I've talked to several people recently who all wish there was a way to do a search across university web sites, with the knowledge that the search engine would have done a deep and frequent crawl of the sites on its list. Whether such a service would make money for its owners is another issue, of course.
But even as specialty services arise, there are definitely advantages for general purpose search engines to enlarge their indexes and keep pace with the web. It means that they can provide more comprehensive coverage for those users that search for unique or obscure information, such as about a rare disease. And users themselves should consider search engine size when selecting among services. Just be wary about using size alone to decide which service is best.
Accessibility and Distribution of Information on the Web
Produced by the authors of the study, you can request a copy to be sent via email, and more details are also promised for the future.
Search Engine Sizes
You'll find comparison charts and links to past articles that deal with size issues, including the previous Science study, all on this page.
* The researchers actually did test queries first, then used Northern Light as a benchmark, but I've reversed the steps to more easily explain the process.
** The researchers did a Boolean NOT search for a term they knew was not in the Northern Light index. As a result, a listing of all pages indexed by Northern Light was displayed.
*** The search engines will provide self-reported size estimates, but there's no way to audit this with some via a search as was done with Northern Light.
Go Beta Tests User-Assisted Directory
Taking a page from the Open Directory, Go is inviting the masses of the web to help manage its directory of web sites. Currently in beta testing, the "Go Guides" program lets anyone participate in the process of categorizing the web.
"To date, it has been really successful. We've received amazing topics, real nuggets of gold," said Jennifer Mullin, Go's Director of Search and Navigation. About 500 guides are currently involved in the program, which will go fully public later this month.
An innovative feature of the program is that guides who do a good job are rewarded with more power and authority, while those who do a poor job are penalized.
For instance, all guides begin at Level 1, which means that they can suggest sites and approve submissions by others, but these actions must be authorized by other guides before they take effect. As their actions are approved, the guides rise in rank and eventually earn the power to do things without needing approval.
Similarly, if a guide with sufficient rank adds a site or makes a change that causes disagreement by another guide, a complaint can be issued. If upheld, the guide loses points and drops in rank.
These checks-and-balances are meant to avoid a problem that has occasionally cropped up with the Open Directory, where editors may sign up for categories and then do nothing but promote their own sites. It's a nice solution to letting the general public participate in the directory while simultaneously protecting its quality. It also appears to be working. Mullin said that the biggest problem has not been spam but instead educating Go Guides on how to write proper site descriptions.
One thing that I particularly like is that you can participate in Go Guides without having to commit to being an editor, as is the case at the Open Directory. It's perfectly valid to join, then suggest sites for whatever categories are of interest to you, rather than being locked into one particular area.
Of course, should you want to manage a category -- Go calls them Topics -- that's an option, too. New Go Guides can manage up to two topics at a time.
There are little bugs still to be worked out, such as improving the system so that sites rejected for small reasons can be easily resubmitted. But overall, I thought the program worked surprisingly well. It was also rather compelling. I planned to spend just a short time using it for this review, but three hours later, I was still at it, raiding my bookmarks to find new sites to add.
Be aware that there is a delay between when sites are added internally and when they appear live at Go -- it seems to be about a week, at the moment. And for those of you unfamiliar with the Go Directory, you can browse topics from the Go home page - just click on the "Topics" tab below the search box. Related topics also appear at the top of search results, in the "Matching topics" section.
Go Guides Beta
Ready to participate? Sign-up via this page.
Go Directory Help
More information about the directory, including how to submit if you are not a Go Guide. A form-based submission feature is coming.
LookSmart Live Looks-Up Answers
Looking for an answer? Look no further than LookSmart, which is providing custom research to frustrated searchers through its new LookSmart Live program.
The concept is simple. Using the LookSmart Live form, you tell LookSmart what you are looking for. The request gets passed on to one of 80 editors involved in the project, and within 24 hours, you get an email back with your answer.
Sound too good to be true? I tested it anonymously, and it really did work. I got a nice response listing several web sites relating to my question, along with categories and information located within LookSmart itself.
LookSmart says it launched the program to better serve users who feel overwhelmed when performing searches. "People are actually pretty frustrated with search. A significant minority of searchers on the Internet, regardless of which service they use, are not getting what they are looking for," said Damian Smith, Director of LookSmart Live.
In the coming weeks, LookSmart Live will transition into a more community-oriented format. The idea is that users will be able to share their knowledge with each other, via forums. Ask a question, and another forum member will provide an answer. There will also be an associated ratings system, so that you can tell at-a-glance the reputation of someone providing an answer. Furthermore, should you be dissatisfied with answers on the forum, LookSmart will then send its own editors out to do some research, Smith said.
So just like the Open Directory and Go Guides, another search service is looking to leverage users actively in organizing the web. Aside from benefiting the services themselves, these new outlets allow the general web public to easily be part of a community without having to publish web pages or join something like Yahoo Clubs.
"I think the industry has remarkably quickly locked on to web pages as the only way to publish information on the web," Smith said. "We think this type of community bulletin board model is an easier way of self-expression. Instead of stetting up home pages, we think it will be equally valuable for you to share your expertise by participating in this community.
LookSmart is promoting the service in two places, via a small link on the home page and via links that appear within the search results themselves. It's this second placement that is really driving traffic to LookSmart Live, and Smith said that hundreds of questions per day are being answered. In fact, you can watch incoming questions appear in a special box on the LookSmart Live home page. You'll also find answers to popular and unusual questions archived on that page.
Direct Hit Debuts at MSN Search, Lycos
Both MSN Search and Lycos are now featuring Direct Hit results, and the company itself has just received $26 million in financing from a variety of venture firms.
At MSN Search, do a search, and if Direct Hit information is available, you'll see an image link immediately below the search box on the results page that says "Top 10 Most Popular Sites." Click on the link, and Direct Hit results will be displayed.
Lycos is now also using Direct Hit information by default, within the "Web Pages" section of search results. Direct Hit says the majority of pages listed here will be coming from its database.
The new funding was announced today, and Direct Hit says its plans to use it for boosting its brand name and image both in the search industry and among consumers.
Paid Links Dropped, Other Changes At AltaVista
There were a number of developments at AltaVista last month, and here's a rundown to the major ones:
+ AltaVista has eliminated its paid links program. "It just didn't work well enough either for the advertisers or the users," said Tracy Roberts, AltaVista's marketing director. AltaVista is planning a new advertiser program in the near future, Robert said, but she wouldn't provide further specifics. Expect more when they can talk about details....
+ AltaVista is now featuring Open Directory information within its new "My AltaVista" channel. Near the bottom of the My AltaVista home page, you'll find a "Web Directory" area where you can browse listings that derive from the Open Directory. AltaVista maintains its partnership with LookSmart for directory listings in the main site, and there are no plans to change this any time soon, Roberts said.
+ My AltaVista itself is new, and it offers AltaVista users the ability to have personalized features such as those available from Yahoo, Go, Excite and Lycos. Users who register can have a customizable AltaVista start page featuring weather, sports scores, news headlines and other options.
+ AltaVista has launched a free Internet access service, AltaVista FreeAccess, for US users.
AltaVista Nixes Paid Search
Wired, July 23, 1999
More quotes from AltaVista on dropping the paid links program, along with background about it.
Many Changes At AltaVista
The Search Engine Report, July 6, 1999
You'll find links to background about the paid links program at AltaVista and information about the CMGI purchase.
FAST Announces Largest Search Engine
FAST Search is now claiming an index of 200 million web pages, which would make it the largest search engine on the web.
FAST Search launched in May of this year, previously under the name All The Web (that URL is still used). The service is backed by Norwegian search technology company Fast Search & Transfer and produced in partnership with Dell.
FAST says the FAST Search site is still intended to be a demonstration of its technology, not a major search portal. The company's real goal is to power other people's search engines, such as how Inktomi and Direct Hit powers results at HotBot.
Consequently, don't expect a lot of bells-and-whistles at FAST Search. But if you are looking for an alternative to the comprehensive search capabilities offered by Northern Light, AltaVista and Inktomi-powered services, then pay a visit to FAST.
FAST Aims For Largest Index
The Search Engine Report, July 5, 1999
More background information about FAST.
Search Engine Briefs
Search Within, Text-Only Features At HotBot
HotBot has added a "Search Within" feature to its site. After you do a search, look to the right of the search box on the results page. You'll see a "Search within these results" option. Select the option, then enter new terms, and HotBot will run your query against only the pages that were found from the original search.
Another relatively new feature is a text-only mode. Hate ads and graphics? HotBot's text-only search is quick and fast. The URL below takes you to it, or click on the "Text-only version" link you'll find at the bottom of HotBot's pages.
HotBot Text-Only Version
Northern Light Adds Clustering
Northern Light is now clustering its results, so that only one page per web site appears in the top listings. Any additional pages are accessible by clicking on the "More Results" link below the page listing.
Northern Light has also reduced the number of results presented by default from 25 to 10, which apparently has upset some users. The service said this was done to cope with an increase in traffic caused by the Nature article on search engine sizes. Northern Light was found to be the most comprehensive search engine in that article, which attracted many new visitors.
Later this year, Northern Light says it will introduce customization so that users can select to see 10, 20, 50 or 100 results by default.
Playboy Loses In Excite Banner Dispute
In January, Playboy sued Excite for selling banner ads that appeared in response to searches for "playboy" and "playmate." Now it has lost its case, but this still doesn't mean that search engines have carte blanche to sell ads linked to trademarked terms. In the Playboy case, the US District Court apparently ruled that the disputed terms were too generic for Playboy to have a sole hold over them. In contrast, a similar case filed against Excite by Estie Lauder could succeed if a court decides the disputed terms in the case are entitled to protection. Of course, a court might also decide that the practice is perfectly acceptable, even if trademarked terms are involved.
Playboy loses trademark suit
News.com, July 22, 1999
Details on the recent ruling.
Excite, Netscape Sued Over Banner Ads
The Search Engine Report, March 1999
Background about the lawsuits.
Snap Gains Image Search
Snap is now featuring an image search capability, powered by Ditto.com. Previously known as ArribaVista, Ditto.com also offers image searching directly via its web site. The company is embarking on a new strategy of powering image search for other sites.
Snap Picture Finder
Metasearch Service Caters To Visually Impaired
A new metasearch service, SETI-search.com, is apparently designed to work well with devices for the visually impaired, such as text-to-speech technology and Braille displays.
Search Engine Articles
AltaVista, Northern Light to launch multimillion-dollar TV ad campaigns
Wall St. Journal, Aug. 2, 1999
Both AltaVista and Northern Light are planning television campaigns. All the search engines that have done this in the past have enjoyed real gains, so it's a wise move for both companies. In fact, AltaVista already enjoys such heavy grassroots traffic that an TV campaign could push it into seriously rivaling Excite, Go or Lycos for traffic. As for Northern Light, this will probably be what finally puts it on the radar screen before the general public.
Centraal forms Net naming policy review board
News.com, July 31, 1999
Centraal has created a board to help it handle disputes if two companies should want the same RealName.
Excite.com Goes to Illinois
Wired, July 14, 1999
Imagine waking up and finding you had complete control over the excite.com domain. That's exactly what happened to a person in Illinois, who was the victim of an apparent prank and hack of Network Solution's database. Traffic to excite.com was never altered, but it could have been, through no fault of Excite@Home.
Excite@Home Goes Shopping
InternetNews.com, July 13, 1999
Excite has acquired one company, iMall, and cut a deal with another that sets it up to provide e-commerce solutions to online merchants.
Disney to buy Infoseek, form Go.com
News.com, July 12, 1999
Disney is to acquire the remaining shares of Infoseek that it does not own, and it will be folded into a new company called Go.com, which will oversee all of Disney's online properties.
Excite@Home, GO Network Launch Online Calendars
InternetNews.com, July 7, 1999
Excite launches an online calendar as part of its portal offerings, while Go says its calendar is coming shortly.
How do I unsubscribe?
+ Use the form at http://searchenginewatch.com/sereport/unsubscribe.html or follow the instructions at the very end of this email.
How do I see past issues?
+ Follow the links at http://searchenginewatch.com/sereport/
Is there an HTML version?
+ Yes, but not via email. View it online at
I didn't get Part 1 or 2. Can you resend it?
+ No, but you can view the entire issue online, via the link above.
How do I change my address?
+ Unsubscribe your old one, then subscribe the new one, using the links above.
I need human help with a list issue!
+ Write to [email protected]. DO NOT send messages regarding list management issues to Danny Sullivan. He does not deal with these.
I have feedback about an article!
+ I'd love to hear it. Use the form at http://searchenginewatch.com/about/contact.html.
This newsletter is Copyright (c) Internet.com LLC, 1999
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!