THE SEARCH ENGINE UPDATE
August 2, 1999 - Number 58
About The Update
The Search Engine Update is a twice-monthly update of search engine news. It is available only to those people who have subscribed to Search Engine Watch, http://searchenginewatch.com/.
Please note that long URLs may break into two lines in some mail readers. Cut and paste, should this occur.
In This Issue
Search Engine News
+ Search Engine Coverage Study Published
+ Go Beta Tests User-Assisted Directory
+ LookSmart Live Looks-Up Answers
+ FAST Announces Largest Search Engine
Search Engine Briefs
+ Paid Links Dropped At AltaVista
+ Direct Hit Gets Funding
+ Northern Light Changes Number Of Results
+ Snap Gains Image Search
+ Metasearch Service Caters To Visually Impaired
Search Engine Articles
The new links system that I mentioned last month is officially online. Within Search Engine Links, you'll find listings for all types of different search engines. Eventually, I'll also add links currently contained within the Search Engine Resources area to this new system.
If you actually submitted a site to the new system before July 30, please do a resubmission. An error on my end means that I didn't get a record of submissions before then. My apologies for this problem.
Also, by Tuesday or Wednesday, I'll have a new feature in the Search Spotlight that shows popular queries used at iTrack.com, a metasearch service designed for auction lovers. Watch for the link on What's New.
Search Engine Links
Search Engine News
Search Engine Coverage Study Published
In 1998, a landmark study published in Science magazine found that search engines fell well-short of listing everything on the web. Last month, a repeat to this study was published in Nature magazine. Once again, search engines were found to contain only a small percentage of the web pages available for indexing.
Both studies made for great headlines about how search engines seemingly fail users, but the number of pages a search engine has indexed has nothing to do with whether it is any good. Size alone doesn't matter.
Don't believe me? Go to your favorite search engine and look for a company, any company. Was the company's web site listed first? That's what many users want to have happen. This example is just one of many that illustrate the value of relevancy over sheer size. Improving relevancy means being a smarter search engine, not just a bigger one.
Even the study itself explains that bigger doesn't mean better: "There are diminishing returns to indexing all of the web, because most queries made to the search engines can be satisfied with a relatively small database," the study's authors write.
In fact, study coauthor Dr. Steve Lawrence isn't saying search engines are bad. Lawrence credits them for making more information available to the public than ever before, but he thinks they could do better.
"The amount of information that you can quickly and efficiently search for using search engines has been increasing, but what has been decreasing is the amount you can search for versus what you could potentially search for," Lawrence said.
So having provided some balance to the headlines you may have read recently, let's examine details from the study in more depth.
Conducted by scientists at the NEC Research Institute, the study found that there were 800 million indexable pages and 180 million images on the web, as of February 1999 . "Indexable" means pages that weren't hidden behind password protection, excluded from indexing by robots.txt files, locked away in databases or basically inaccessible to search engines for other reasons.
The first study found that there were 320 million pages as of December 1997, so it sounds as if the web has more than doubled in size in just over a year. However, the two studies cannot be compared fairly because they used completely different methods of estimating the size of the web.
Previously, the size estimate was derived by comparing the overlap between results at different search engines to extrapolate an overall figure for the web. In the latest study, the researchers used a variety of techniques to determine how many web servers were publicly available (2.8 million) and the mean number of pages per server (289). Multiplying these two figures is where the 800 million web pages count comes from.
From here, the next step was to determine how much coverage of the web each search engine provided.* You need a known starting point, and Northern Light was selected. The researchers did a search that told them Northern Light had an index of 128 million web pages.** Then they divided Northern Light's index size by the entire web size to find that Northern Light covers 16 percent of the web.
Unfortunately, you can't get a count of index size for some of the other crawler-based search engines, so the researchers instead ran a series of test queries -- 1,050 in all.*** They totaled how many web pages were returned for each search engine, and that told them proportionally how big or small each service was compared to Northern Light (which coincidentally led the pack in terms of coverage). Here's the complete rundown:
Northern Light: 16%
Inktomi (Snap): 15.5%
Inktomi (HotBot): 11.3%
Inktomi (MSN Search): 8.5%
Inktomi (Yahoo): 7.4%
Inktomi provided search results to several services when the survey was done, which is why it appears multiple times. It powered primary results at MSN Search and secondary results at HotBot, Snap and Yahoo. The variation in coverage reflects what I and others have written about in the past -- not all Inktomi partners tap completely into its 110 million page index.
Lawrence said the percentages for HotBot and Lycos might be slightly higher than shown, because those services will only display one page per web site in their results. That could have caused an undercount, he said.
The big winner was Northern Light, which saw its traffic triple just after the study was published. That had to be satisfying to CEO David Seuss, who said earlier this year that winning the title of biggest search engine would make the still relatively little-known service more popular. Of course, whether those users will stick with Northern Light remains to be seen, and challenger FAST has just ousted Northern Light from the number one spot with its new 200 million web page index. Excite is also about to weigh in later this month with a new index in the 250 million page range - expect more details in the next newsletter. Northern Light is in the 165 million page range, followed by AltaVista's 150 million pages. All sizes are unaudited, self-reported numbers.
Combined, search engines are seen to be covering only 42 percent of the web, meaning that if you used each search engine individually, you'd access more pages across the web than using any single service alone. In comparison, the last study found a combined coverage of 60 percent. So things are getting worse? Probably, but not conclusively. You can't accurately compare these figures because the methods of estimating the size of the web were so different for each study.
Lawrence agrees that direct comparisons won't be exact, but he feels the overall finding that coverage is dropping is basically correct. "It wouldn't change any of the conclusions, though it might have changed the magnitude," he said of the drop, assuming the same size method had been used in both studies.
The study also reported on the number of dead links at each service, which are an indication of how fresh each search engine's index is. The lower the percentage, the fresher the index. Overall results were:
Northern Light: 9.8%
Inktomi (Yahoo): 2.9%
Inktomi (Snap): 2.8%
Inktomi (MSN Search): 2.6%
Inktomi (HotBot): 2.2%
The poor showing by Lycos reflects another fact I've previously reported, that its index was woefully out of date earlier this year. The situation at Lycos has been greatly improved since then. Moreover, primary results at Lycos now come from the Open Directory, not from the spidered index reviewed in the study.
Somewhat related, the survey estimated how long it takes the typical web page to appear at a search engine. The median age is 57 days -- thus, most documents take about two months to appear. Since direct submission will almost always speed up the listing process at the search engines, this median number can be taken in my opinion as a good indication of how long it takes a page that has never been submitted to appear (assuming, of course, that it gets listed at all). Complete figures were:
AltaVista: 33 days
Excite: 47 days
Inktomi (HotBot): 51 days
Inktomi (MSN Search): 57 days
Infoseek: 60 days
Inktomi (Yahoo): 76 days
Northern Light: 84 days
Inktomi (Snap): 91 days
Lycos: 174 days
By the way, to get these figures, the researchers downloaded every single web page for a variety of queries over time. If a page was never seen before in response to a query, it was considered "new" even if the page itself had been online for several months. Because all results for each query were downloaded, the researchers are fairly certain these "new" pages were appearing because they'd been finally indexed and not just because they'd been indexed but never ranked well before.
Another interest statistic was on meta tag usage. The researchers found that 34.2 percent of web servers make use of either the meta keywords or description tag on their home pages: 31% had at least a meta description tag and 32% had at least a meta keywords tag. The authors concluded, and I'd agree, that such low usage means that take up of proposed RDF/XML tagging standards is likely to be slow. By the way, usage of Dublin Core tags was found on only 0.3 percent of home pages.
Even more stats: the researchers estimate that 83 percent of web servers are commercial in nature, 6 percent are academic or scientific, nearly 3 percent are health-related, just over 2 percent are personal sites, 1.5 percent are pornographic and just over 1 percent are government-related. Sites could be in more than one category, and remember, these are stats for web servers -- not web pages. The percentages could change significantly if individual pages were categorized.
The study also found that it was more likely that pages from popular and well-known sites would be indexed. That's in part a function of how search engines work -- spiders are more likely to visit a site if they keep coming across links to it -- plus it is a conscious decision on the part of most all the major search engines as a means to provide better listings and combat spam.
The study's authors worry that this trend means high-quality but "unpopular" pages may not list well. In my opinion, this is less a concern. If anything, I feel the trend toward popularity measures means that general users will get more relevant and useful results in response to common queries, which is desperately needed. Meanwhile, those performing more refined and focused queries such as research professionals should still be able to locate much information of use.
The solution to better serving the second group probably won't be to create a single, super-comprehensive 800 million web page search engine. Instead, it's likely to be the creation of new specialty services that do in-depth coverage of sites by topic.
Long overdue is probably an academic search engine. I've talked to several people recently who all wish there was a way to do a search across university web sites, with the knowledge that the search engine would have done a deep and frequent crawl of the sites on its list. Whether such a service would make money for its owners is another issue, of course.
But even as specialty services arise, there are definitely advantages for general purpose search engines to enlarge their indexes and keep pace with the web. It means that they can provide more comprehensive coverage for those users that search for unique or obscure information, such as about a rare disease. And users themselves should consider search engine size when selecting among services. Just be wary about using size alone to decide which service is best.
Accessibility and Distribution of Information on the Web
Produced by the authors of the study, you can request a copy to be sent via email, and more details are also promised for the future.
Search Engine Sizes
You'll find comparison charts and links to past articles that deal with size issues, including the previous Science study, all on this page.
* The researchers actually did test queries first, then used Northern Light as a benchmark, but I've reversed the steps to more easily explain the process.
** The researchers did a Boolean NOT search for a term they knew was not in the Northern Light index. As a result, a listing of all pages indexed by Northern Light was displayed.
*** The search engines will provide self-reported size estimates, but there's no way to audit this with some via a search as was done with Northern Light.
Go Beta Tests User-Assisted Directory
Taking a page from the Open Directory, Go is inviting the masses of the web to help manage its directory of web sites. Currently in beta testing, the "Go Guides" program lets anyone participate in the process of categorizing the web.
"To date, it has been really successful. We've received amazing topics, real nuggets of gold," said Jennifer Mullin, Go's Director of Search and Navigation. About 500 guides are currently involved in the program, which will go fully public later this month.
An innovative feature of the program is that guides who do a good job are rewarded with more power and authority, while those who do a poor job are penalized.
For instance, all guides begin at Level 1, which means that they can suggest sites and approve submissions by others, but these actions must be authorized by other guides before they take effect. As their actions are approved, the guides rise in rank and eventually earn the power to do things without needing approval.
Similarly, if a guide with sufficient rank adds a site or makes a change that causes disagreement by another guide, a complaint can be issued. If upheld, the guide loses points and drops in rank.
These checks-and-balances are meant to avoid a problem that has occasionally cropped up with the Open Directory, where editors may sign up for categories and then do nothing but promote their own sites. It's a nice solution to letting the general public participate in the directory while simultaneously protecting its quality. It also appears to be working. Mullin said that the biggest problem has not been spam but instead educating Go Guides on how to write proper site descriptions.
One thing that I particularly like is that you can participate in Go Guides without having to commit to being an editor, as is the case at the Open Directory. It's perfectly valid to join, then suggest sites for whatever categories are of interest to you, rather than being locked into one particular area.
Of course, should you want to manage a category -- Go calls them Topics -- that's an option, too. New Go Guides can manage up to two topics at a time.
There are little bugs still to be worked out, such as improving the system so that sites rejected for small reasons can be easily resubmitted. But overall, I thought the program worked surprisingly well. It was also rather compelling. I planned to spend just a short time using it for this review, but three hours later, I was still at it, raiding my bookmarks to find new sites to add.
Be aware that there is a delay between when sites are added internally and when they appear live at Go -- it seems to be about a week, at the moment. And for those of you unfamiliar with the Go Directory, you can browse topics from the Go home page - just click on the "Topics" tab below the search box. Related topics also appear at the top of search results, in the "Matching topics" section.
Also understand that there are some superguides working within the system. These are Go staffers that have been tasked with overseeing the directory. I find that they quickly review and approve suggestions made to even less-popular topics that have no designated guides, and my only wish is that they were clearly identified as Go employees. Go says this may happen in the future. Basically, if you see someone with what seems an amazing number of points, they are probably on the Go staff.
The chief webmaster question is probably, "Can I submit my own site?" Certainly. If your site is of high-quality and not currently listed, there's no reason why you shouldn't be able to suggest it. It doesn't mean you'll automatically get in, but it does mean that you'll probably get reviewed faster than if you used the external submit feature.
If you decide to submit your site, be smart/nice and submit some other good sites within the same topic or to different topics at the same time. Why? First, it becomes less obvious that you are submitting your own site when the submission is mixed among others. Second, by submitting other sites, you're actually helping to build the directory beyond just your own self-interest. That's the point of the program. So pick out some of your favorite sites and submit them along with your own.
Go Guides Beta
Ready to participate? Sign-up via this page.
Go Directory Help
More information about the directory, including how to submit if you are not a Go Guide. A form-based submission feature is coming.
How Infoseek/Go Works
This page within the Subscribers-Only Area has tips on doing a submission to the Infoseek/Go directory. If you've forgotten your password, retrieve it using the Password Looker-Upper at http://searchenginewatch.com/about/finder.html
LookSmart Live Looks-Up Answers
Looking for an answer? Look no further than LookSmart, which is providing custom research to frustrated searchers through its new LookSmart Live program.
The concept is simple. Using the LookSmart Live form, you tell LookSmart what you are looking for. The request gets passed on to one of 80 editors involved in the project, and within 24 hours, you get an email back with your answer.
Sound too good to be true? I tested it anonymously, and it really did work. I got a nice response listing several web sites relating to my question, along with categories and information located within LookSmart itself.
LookSmart says it launched the program to better serve users who feel overwhelmed when performing searches. "People are actually pretty frustrated with search. A significant minority of searchers on the Internet, regardless of which service they use, are not getting what they are looking for," said Damian Smith, Director of LookSmart Live.
In the coming weeks, LookSmart Live will transition into a more community-oriented format. The idea is that users will be able to share their knowledge with each other, via forums. Ask a question, and another forum member will provide an answer. There will also be an associated ratings system, so that you can tell at-a-glance the reputation of someone providing an answer. Furthermore, should you be dissatisfied with answers on the forum, LookSmart will then send its own editors out to do some research, Smith said.
So just like the Open Directory and Go Guides, another search service is looking to leverage users actively in organizing the web. Aside from benefiting the services themselves, these new outlets allow the general web public to easily be part of a community without having to publish web pages or join something like Yahoo Clubs.
"I think the industry has remarkably quickly locked on to web pages as the only way to publish information on the web," Smith said. "We think this type of community bulletin board model is an easier way of self-expression. Instead of stetting up home pages, we think it will be equally valuable for you to share your expertise by participating in this community.
LookSmart is promoting the service in two places, via a small link on the home page and via links that appear within the search results themselves. It's this second placement that is really driving traffic to LookSmart Live, and Smith said that hundreds of questions per day are being answered. In fact, you can watch incoming questions appear in a special box on the LookSmart Live home page. You'll also find answers to popular and unusual questions archived on that page.
FAST Announces Largest Search Engine
FAST Search is now claiming an index of 200 million web pages, which would make it the largest search engine on the web.
FAST Search launched in May of this year, previously under the name All The Web (that URL is still used). The service is backed by Norwegian search technology company Fast Search & Transfer and produced in partnership with Dell.
FAST says the FAST Search site is still intended to be a demonstration of its technology, not a major search portal. The company's real goal is to power other people's search engines, such as how Inktomi and Direct Hit powers results at HotBot.
Consequently, don't expect a lot of bells-and-whistles at FAST Search. But if you are looking for an alternative to the comprehensive search capabilities offered by Northern Light, AltaVista and Inktomi-powered services, then pay a visit to FAST.
Webmasters, sorry, there's still no Add URL ability at this time. I'll let you know when one becomes available.
FAST Aims For Largest Index
The Search Engine Report, July 5, 1999
More background information about FAST.
Search Engine Briefs
Paid Links Dropped At AltaVista
AltaVista has eliminated its paid links program. "It just didn't work well enough either for the advertisers or the users," said Tracy Roberts, AltaVista's marketing director. AltaVista is planning a new advertiser program in the near future, Robert said, but she wouldn't provide further specifics. Expect more when they can talk about details....
In other updates since the last newsletter:
+ AltaVista says it plans to maintain its partnership with LookSmart for directory listings in the main site, despite using Open Directory information within its My AltaVista area.
+ AltaVista has launched a free Internet access service, AltaVista FreeAccess, for US users.
AltaVista Nixes Paid Search
Wired, July 23, 1999
More quotes from AltaVista on dropping the paid links program, along with background about it.
Many Changes At AltaVista
The Search Engine Report, July 6, 1999
You'll find links to background about the paid links program at AltaVista and information about the CMGI purchase.
One AltaVista advertiser posted here on July 28 that the links were more cost effective than banners. She paid about $5 to $10 per click with banners at AltaVista versus $0.25 to $2 per click with the paid links program. Posting is not yet online, but it should appear shortly. You can also subscribe to this marketing list via the link above.
Direct Hit Gets Funding
Direct Hit has just received $26 million in financing from a variety of venture firms, it was announced today. The company plans to use the money for boosting its brand name and image both in the search industry and among consumers.
Northern Light Changes Number Of Results
In the last newsletter, I noted that Northern Light was clustering its results so that only one page per web site appears in the top listings. The service has also reduced the number of results presented by default from 25 to 10, which apparently has upset some users. Northern Light said this was done to cope with an increase in traffic caused by the Nature article on search engine sizes. Northern Light was found to be the most comprehensive search engine in that article, which attracted many new visitors.
Later this year, Northern Light says it will introduce customization so that users can select to see 10, 20, 50 or 100 results by default.
Snap Gains Image Search
Snap is now featuring an image search capability, powered by Ditto.com. Previously known as ArribaVista, Ditto.com also offers image searching directly via its web site. The company is embarking on a new strategy of powering image search for other sites.
Snap Picture Finder
Metasearch Service Caters To Visually Impaired
A new metasearch service, SETI-search.com, is apparently designed to work well with devices for the visually impaired, such as text-to-speech technology and Braille displays.
Search Engine Articles
Take My Site, Please
Los Angeles Times, Aug. 2, 1999
Nice article about spamming search engines. Be aware articles at the LA Times stay up for only a week, so read it now, if you want to read it at all. The URL will also probably change by tomorrow, so be prepared to search for it.
AltaVista, Northern Light to launch multimillion-dollar TV ad campaigns
Wall St. Journal, Aug. 2, 1999
Both AltaVista and Northern Light are planning television campaigns. All the search engines that have done this in the past have enjoyed real gains, so it's a wise move for both companies. In fact, AltaVista already enjoys such heavy grassroots traffic that an TV campaign could push it into seriously rivaling Excite, Go or Lycos for traffic. As for Northern Light, this will probably be what finally puts it on the radar screen before the general public.
Centraal forms Net naming policy review board
News.com, July 31, 1999
Centraal has created a board to help it handle disputes if two companies should want the same RealName.
Excite.com Goes to Illinois
Wired, July 14, 1999
Imagine waking up and finding you had complete control over the excite.com domain. That's exactly what happened to a person in Illinois, who was the victim of an apparent prank and hack of Network Solution's database. Traffic to excite.com was never altered, but it could have been, through no fault of Excite@Home.
How do I unsubscribe?
+ Follow the instructions at the very end of this email.
How do I subscribe?
+ The Search Engine Update is only available to paid subscribers of the Search Engine Watch web site. If you are not a subscriber and somehow are receiving a copy of the newsletter, learn how to subscribe at: http://searchenginewatch.com/about/subscribe.html
How do I see past issues?
+ Follow the links at:
Is there an HTML version?
+ Yes, but not via email. View it online at:
How do I change my address?
+ Send a message to email@example.com
I need human help with my subscription!
+ Send a message to firstname.lastname@example.org. DO NOT send messages regarding list management or site subscription issues to Danny Sullivan. He does not deal with these directly.
I have feedback about an article!
+ I'd love to hear it. Use the form at
This newsletter is Copyright (c) Internet.com LLC, 1999