Each year, the International World Wide Web Conference provides a showcase for innovative web technologies. Here's a chronological list of significant papers over the past decade focusing on searching and search engines.
Perusing the proceedings of the International World Wide Web Conferences is deeply satisfying for both the inner geek and inner historian. There have literally been hundreds of papers presented over the past decade or so at these conferences, and all provide fascinating snapshots of the development of the web.
The papers below are my subjective list of favorites from the archives. In addition to papers, there are also many slide and poster presentations online. Use the link at the bottom of this list to access the full "table of contents" for each year's conference, except for the first -- which has apparently been lost.
Finding What People Want: Experiences with the WebCrawler (1994)
WebCrawler is widely regarded as one of the first, if not the first, crawler-based search engines. Creator Brian Pinkerton discusses how he built the engine, laying the foundation for all modern-day search engines.
The Distributed Link Service: A Tool for Publishers, Authors and Readers (1995)
The authors describe "link servers" -- foreshadowing both link analysis technologies used by contemporary search engines, as well as "link farms" used to spam the engines.
Measuring the Web (1996)
XML co-creator Tim Bray takes on "questions without answer," including How big is the web? What is the "average page" like? How richly connected is it? What are the biggest and most visible sites? What data formats are being used? What does the web look like?
WebQuery: Searching and Visualizing the Web through Connectivity (1997)
This paper hints at what's to come with Google, Teoma and others: "We do this by examining links among the nodes returned in a keyword-based query. We then rank the nodes, giving the highest rank to the most highly connected nodes."
The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998)
The classic paper by then students Larry Page and Sergey Brin, describing their "prototype of a large-scale search engine" with the goofy name, Google.
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery (1999)
The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary "training" documents.
Trawling the Web for Emerging Cyber-Communities (1999)
An overview of how "social networks" can be used to identify hubs, authorities, and other high-quality web pages that form "communities" of information.
Graph Structure in the Web (2000)
The web is shown to have a structure like a "bow tie" with surprising and instructive implications for both search engines and searchers alike.
Scaling Question Answering to the Web (2001)
Investigates the challenges of creating an "answer engine" and discusses MULDER, "the first general-purpose, fully-automated question-answering system available on the web."
All of these papers, as well as hundreds of others, can be found using this "table of contents" for the International World Wide Web Conferences:
International World Wide Web Conferences
This year's conference is coming up. For a preview of what will be presented, as well as access to presentations once they are posted to the web, use this link:
The Eleventh International World Wide Web Conference
Sheraton Waikiki Hotel, Honolulu, Hawaii, USA, 7-11 May 2002
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.