Learning About Search Engines From Google Engineers

Want to learn how Google works? A new archive of publications by Google employees offers deep insights into many aspects of the search engine's operation.

The archive is organized by topic, covering the major functions required to run a search engine, such as information retrieval, search engine design and machine learning.

Many of these articles are heavy-duty, industrial strength treatises suitable only for those with the technical background needed to follow the math and logic presented. But some are eminently readable for non-technical folks, such as Searching the World Wide Web, which originally appeared in Science Magazine.

Most links on this page don't point directly to articles, but rather to abstracts and other useful information provided by Cite Seer from the NEC Research Institute.

Cite Seer is a very cool scientific literature digital library created by a team including Dr. Steve Lawrence, who currently works at Google. Cite Seer results for a particular document offer a ton of useful related information about each paper.

For example, the Cite Seer entry for The Anatomy of a Large-Scale Hypertextual Web Search Engine, the definitive source about Google written by its founders Larry Page and Sergey Brin, includes an abstract, links to articles that cite the paper, an active bibliography (related documents), similar documents based on text, related documents from co-citation... in short, a snapshot not only of the article itself, but numerous links to other directly related articles, all without your needing to search for them.

There are also links to author home pages, to other articles found at the same source (in this case, Stanford University's database group technical reports, another useful collection in its own right), and so on.

The result page for each article provides links that allow you to read papers in a variety of formats (pdf, postscript, DjVu, etc). Tip: The PDF format generally works best, but you might also try pasting the title of an article (in quotes) directly into a Google search box. If it's in Google's index, you'll also often see a "view as HTML" link that lets you read the article directly in your browser.

Papers Written by Googlers
http://labs.google.com/papers.html

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

BBC buys up 'Hutton inquiry' Google links...
Guardian Unlimited Jan 26 2004 12:07PM GMT
Google targeted by pranksters...
San Francisco Chronicle Jan 26 2004 11:11AM GMT
How to organize your Favorites in Internet Explorer...
San Francisco Chronicle Jan 26 2004 11:11AM GMT
Microsoft Search Tool Takes On Google...
Yahoo Jan 26 2004 8:38AM GMT
Eurostat puts EU statistical data online with free access...
PublicTechnology.net Jan 26 2004 8:12AM GMT
Google unveils online social networking service...
Boston Globe Jan 26 2004 7:34AM GMT
Google Adds Keyword-Tracking Option...
dmnews.com Jan 26 2004 6:09AM GMT
MSN Ad Revenue Soars...
dmnews.com Jan 26 2004 6:09AM GMT
That gibberish in your inbox may be good news...
CNET Jan 25 2004 7:34PM GMT
Google Squeak -- Site: Search Now Works by Itself...
Research Buzz Jan 25 2004 9:10AM GMT
Google tries out its own Friendster-style service...
Forbes Jan 24 2004 2:32AM GMT
AOL, Playboy Reach Trademark Settlement...
Boston Globe Jan 23 2004 11:27PM GMT
Microsoft seeks XML-related patents...
CNET Jan 23 2004 8:26PM GMT
powered by Moreover.com

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was About.com's Web Search Guide.