Search Engines 101

Use these outstanding online resources to get the equivalent of an introductory semester of “search engines 101″ — without having to go back to school for your education.

Here at Search Engine Watch, our goal is to keep you up to date with the world of search engines. Although we provide detailed information about how search engines work, if you really want to “go deep” you should check out some of the academic resources focused on the broader field of information retrieval.

Web search engines are an outgrowth of academic research into information retrieval. Work on information retrieval systems began in the late 1960s, long before the web came into being in the 1990s.

Reading the history of information retrieval systems, and how they evolved over the decades, is not only a fascinating exercise in its own right, it can also help you improve your own searching skills.

For an excellent overview, check out Professor Osmar R. Zaïane’s paper “From Resource Discovery to Knowledge Discovery on the Internet.” It’s an engaging history of information retrieval, and discusses many of the basic principles that underlie the operations of today’s search engines.

Despite being a technical report published by Simon Fraser University, the paper is eminently readable. Part one describes basic information retrieval concepts, including brute force document scanning, conventional document retrieval techniques, and the particular challenges of indexing and searching hypertext documents and multimedia content on the web.

Next, Dr. Zaïane offers a survey of “resource discovery on the Internet.” This is a technical glimpse of how search engines and directories are created and maintained. Although the paper was written in the pre-Google era, it still offers solid information to help you understand how your favorite search tools work.

The last section is dedicated to data mining or knowledge discovery on the Internet — what needs to be done to help improve search results.

If you really want to dig deeply into learning about search technology, wander over to the web site of the School of Information Management and Systems at the University of California at Berkeley. Most of the courses taught on campus are supported by web sites, and have lots of links to lecture notes, presentations, articles, and other course materials.

In particular, check out the pages for “SIMS 202: Information Organization and Retrieval.” This truly is “Search Engines 101,” starting from basic concepts like “What is Information” and ending up with “Web Search Architecture and Crawling.” Many of the course materials are available online.

You can find many other similar courses at other university web sites. Use Google’s University Search, with keywords such as “search engines” or “information retrieval” to find them.

Finally, if you really get hooked and want a “fire hose” of information about information retrieval and search engines, try CiteSeer. This specialized search engine, from the NEC Research Institute, is focused on computer science research papers.

CiteSeer can index and make available papers in Postscript format, a capability most other search engines lack (at this point). It also creates interesting linkages to help you find information, such as citations, related documents, links to author home pages, and many other useful features.

From Resource Discovery to Knowledge Discovery on the Internet
The best way to access Dr. Zaïane’s paper is via the link from ResearchIndex, above. On this page you get a lot of information about the paper, including an abstract and links to view the paper in several formats. Look in the upper right corner of the page, next to the word “Cached” — unless you have specialized software, I’d recommend viewing the paper in PDF format. This page also offers useful pointers to other papers that have similar content or have cited the article.

Information Organization and Retrieval
This web site was for an introductory level course from the School of Information Management and Systems at the University of California at Berkeley. Lecture notes, readings, and even assignments are available.

Google University Search
Use Google’s specialized University Search to find colleges offering programs in information retrieval and search technology.

Research Index is a search engine focusing on computer science research and articles. It’s a wonderful browsing tool for finding articles and research relating to searching the web.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Online portals news
Overture Expanding Globally With MSN Korea Apr 2 2003 2:44PM GMT
Internet: international news
China will log ‘keyboard clicking’ to rein in the internet Apr 2 2003 11:01AM GMT
Online portals news
New Portal to International Courts and Tribunals
BeSpacific Apr 2 2003 6:34AM GMT
Online search engines news
Al-Jazeera most sought-after in Internet searches
Yahoo Apr 2 2003 0:24AM GMT
Got a Question? Google It
Readers Digest Apr 1 2003 9:20PM GMT
Use the Google Source (Syntax), Luke, for Google News
Research Buzz Apr 1 2003 5:49PM GMT
Google Secures Scandinavian Deal Apr 1 2003 2:44PM GMT
Internet: international news
Web Hoax on Killer Virus Triggers Hong Kong Panic
Reuters Apr 1 2003 2:17PM GMT
Online search engines news
Google, Ink Search Pact
AtNewYork Apr 1 2003 9:03AM GMT
Domain name news
GSA Might Charge for .gov Domain Apr 1 2003 5:09AM GMT
ICANN Domain Name Dispute Resolution Not an Arbitration Under Federal Arbitration
Mondaq Apr 1 2003 1:02AM GMT
Online search engines news
Are there any ways to evaluate how good our Web-site search is for answering questions?
destinationCRM Apr 1 2003 0:28AM GMT
Making Google search in your own backyard
Sydney Morning Herald Mar 31 2003 7:32PM GMT
powered by

Related reading

serps of tomorrow
2018 SERP changes impact SEO