More Than Just Music Search

More Than Just Music Search

From The Search Engine Report
June 2, 2000

Until now, the MP3 revolution had pretty much passed me by. I'm based in the United Kingdom, where flat rate Internet calls have only just become possible. Given this, I had no desire or incentive to pay by the minute to download large music files from the web. Consequently, I hadn't been that interested in the fury over Napster, which lets you locate MP3 files.

If you've been in a cave like me, the ZDNet article below is an excellent introduction to Napster and why it has raised the ire of those in the music industry. It was after reading it that I realized Napster wasn't some music-playing software but instead a search tool. In fact, I realized that Napster was putting into action the concept of "distributed search," which has been discussed for many years but not implemented in a mass way until now.

Currently, the major search engines operate under what could be considered a centralized system. The crawler-based services send out spiders, which bring back information to a central index, which you can search. The human-powered services do the same thing, using editors to seek out sites from across the web, though they get ample assistance from webmasters who submit sites.

The problem with centralized search is that information can get easily out of date. It takes time to revisit every page in a 100 million page or larger index. By the time you complete the refresh, it can be time to do over again. Moreover, it can be costly to maintain the hardware to gather up information from all over the web. Wouldn't it be better if web sites themselves transmitted the information they possessed?

That's the concept behind distributed search. Rather than operating from a centralized base, you distribute the load across many sites. In effect, they crawl themselves and report back to a system that allows searching across unified listings.

Napster was an ideal application for distributed search. Existing MP3 search engines have a tough time maintaining listings, because many of the sites posting MP3 files may not exist for long, especially if they post illegally copied songs. In contrast, Napster gets its listings from those running its software. If you are online, and running Napster, then your computer tells Napster what information you have (if you choose). That allows someone to search Napster's listings and be connected to where the song can be downloaded.

Gnutella is an open-source software package like Napster that allows you to locate MP3 files or other types of files across the web. It goes a step beyond Napster, in that it distributes both the query and the index. In other words, Napster takes in information from various sources, but you still use its central index to search. With Gnutella, your query is sent out, bounced around all the computers in the network, which in turn report back to you directly about any finds.

The technology in Gnutella has now been incorporated into a new tool called InfraSearch, which promises fresher search results and suggests it will be able to provide better results than our traditional set of search engines.

These new tools are exciting and offer some significant benefits to search. But don't count out the traditional search engines yet. At their core, these new services depend on users for their information, and what every major search engine will tell you is that you can't trust users. If there is an advantage for someone to lie or mislead a search engine, they will do so.

For instance, Metallica, which has filed a lawsuit against Napster, could hinder the music search service by launching 5,000 applications of Napster and flood the company's database with fake listings for Metallica songs. If the listings were degraded, then the popularity of Napster as an efficient MP3 search tool would diminish. Similarly, if some Napster-like text search tool began accepting listings from across the web, it wouldn't take long until webmasters began flooding it with spam.

Moreover, it is one thing to do distributed search in order to return the location of a song. There's no strong relevancy mechanism that needs to be created. It is quite another to do distributed search and also try to determine which are the most popular documents on the web. It is even harder to do fast, distributed search against the full-text of documents, rather than a few lines that describe a file in abstract.

So, while there's potential in distributed search, don't expect your favorite search engine to disappear overnight, nor even at all. Instead, it is likely we'll see more distributed search applications for particular queries where they make sense, MP3 being one of them. And those seeking MP3 files have every reason to examine Napster, Gnutella and the like, especially as the "traditional" MP3 search engines seem to be losing their value.

For example, I jumped into the MP3 world by purchasing an MP3 player last month, during a visit to the US. Naturally, I wanted songs to play right then and there, but all my CDs were back in the UK. Yes, MP3.com offered some surprisingly good selections, but I also wanted music I knew and loved. Thus, I went in search of illegal copies of music that I owned at home, justifying to myself that this was music that I could have converted legally, if I only had the CD with me.

(By the way, this is the key concept behind MP3.com's "My MP3.com" program, which has been attacked by the music industry. When at home, you put a CD into your computer, and then MP3.com understands that you own that CD. Now, if you are away from your CD, you can listen to that music via MP3.com from anywhere, because you've already "proved" that you own it).

I tried the two biggest MP3 search services that I've written about, AltaVista's and the one at Lycos. They failed me miserably. I generally couldn't find the songs I wanted. I got lots of false hits, and I wasted lots of time. Using some smaller, lesser-known MP3 search engines wasn't any better. In contrast, locating and downloading with Napster was much easier. As for Gnutella, I gave up on it. In order for it to work, you have to connect with another user's IP address to begin with. The site lists some addresses, but after six or seven attempts, I'd had enough.

Finally, a last, non-search comment on this whole MP3 mess, if you'll indulge me. Ultimately, the solution to the pirated MP3 search problem would seem to be making music available online cheaply, not trying to chase some pie-in-the-sky dream of copy protection nor filing expensive lawsuits. I would have loved to have gone to a music industry-backed site, knowing that I could search for and FIND legal copies of songs, which I might then pay 50 cents or $1 each for. I'm not alone in this. It would be convenient, acceptable and ultimately a possible money-maker for the music industry. After all, they aren't having to ship me a physical CD, with packaging, marketing costs, etc. Sure, I could share that song with others -- but I could do that now. The only difference is that the music industry never gave me or others the chance an easy way to buy it from them first.

Napster
http://www.napster.com/

Gnutella
http://gnutella.wego.com/

InfraSearch
http://www.infrasearch.com/

Pointera
http://www.pointera.com/

Plans to launch its own file-sharing search engine on Monday. The company's technology is already demoed on SpinFrenzy (see below).

SpinFrenzy
http://www.spinfrenzy.com/

Like Napster, allows you to search for music located on other people's computers. Unlike Napster, you do your search at a web site, then use a small software applet for downloading.

The Noisy War Over Napster
Newsweek, June 5, 2000
http://newsweek.com/nw-srv/printed/us/st/a20415-2000may27.htm

Nice overview of the issues surrounding Napster and its software cousins.

Napster-like technology takes Web search to new level
News.com, May 31, 2000
http://news.cnet.com/news/0-1005-200-1983259.html

More about InfraSearch. The article is (and InfraSearch themselves are) incorrect in stating that traditional search engines are limited to static content. They can index dynamic content. They just tend to avoid this because of problems that can be encountered with "spider traps," where they might index the same page over and over, because it appears with slightly different URLs.

Napster Wildfire
News.com, May 15, 2000
http://news.cnet.com/news/0-1005-201-1757865-0.html

This special report examines how technology like Napster may change the economics and distribution of entertainment and other content.

The Value of Gnutella and Freenet
WebReview, May 12, 2000
http://webreview.com/pub/2000/05/12/platform/

Examines how Napster-like software can be put to other, helpful uses.

Napster: Net market destabilizer?
ZDNet, May 5, 2000
http://www.zdnet.com/ecommerce/stories/main/0,10475,2562799-1,00.html

Great introduction to the concept of Napster and how it could go beyond just music search. One flaw comes up in the example involving auctions. You cannot have a Napster-like auction tool unless the auction sites themselves distribute information. Given eBay's recent action against auction search site Bidder's Edge, that can't be taken for granted. The centralized sites that control their own listings, which can protect those listings, can't be co-opted into distributed search. Of course, people might chose to abandon conducting auctions at centralized sites like eBay in favor of using purely distributed tools. But one cannot discount the value-add that centralized players bring in creating a friendly, organized environment for their users.

Gnutella News & Links
http://www.clip2.com/preview.jsp?i=307F7C588728D4118000080020B1F147

Links to information and articles about Gnutella and similar programs.

Streamlining the Search for Music
Wired, May 30, 2000
http://www.wired.com/news/culture/0,1284,36629,00.html

A look at some new music search engines that are launching.

Patent Wars!
The Search Engine Report, Oct. 6, 1997
http://searchenginewatch.com/sereport/97/10-patent.html

Infoseek holds a patent relating to distributed search, though it's on the technique of ranking results properly among different sources, rather than distributed search engine in general.

STARTS
http://www-db.stanford.edu/˜gravano/starts_home.html

One of the oldest proposals for distributed search, in relation to web-based text search engines, can be found here. It was developed with the cooperation of several major search engines back in 1996, but has never been implemented.