Search Engines 101
Use these outstanding online resources to get the equivalent of an introductory semester of 'search engines 101' -- without having to go back to school for your education.
Use these outstanding online resources to get the equivalent of an introductory semester of 'search engines 101' -- without having to go back to school for your education.
Use these outstanding online resources to get the equivalent of an introductory semester of “search engines 101″ — without having to go back to school for your education.
Here at Search Engine Watch, our goal is to keep you up to date with the world of search engines. Although we provide detailed information about how search engines work, if you really want to “go deep” you should check out some of the academic resources focused on the broader field of information retrieval.
Web search engines are an outgrowth of academic research into information retrieval. Work on information retrieval systems began in the late 1960s, long before the web came into being in the 1990s.
Reading the history of information retrieval systems, and how they evolved over the decades, is not only a fascinating exercise in its own right, it can also help you improve your own searching skills.
For an excellent overview, check out Professor Osmar R. Zaïane’s paper “From Resource Discovery to Knowledge Discovery on the Internet.” It’s an engaging history of information retrieval, and discusses many of the basic principles that underlie the operations of today’s search engines.
Despite being a technical report published by Simon Fraser University, the paper is eminently readable. Part one describes basic information retrieval concepts, including brute force document scanning, conventional document retrieval techniques, and the particular challenges of indexing and searching hypertext documents and multimedia content on the web.
Next, Dr. Zaïane offers a survey of “resource discovery on the Internet.” This is a technical glimpse of how search engines and directories are created and maintained. Although the paper was written in the pre-Google era, it still offers solid information to help you understand how your favorite search tools work.
The last section is dedicated to data mining or knowledge discovery on the Internet — what needs to be done to help improve search results.
If you really want to dig deeply into learning about search technology, wander over to the web site of the School of Information Management and Systems at the University of California at Berkeley. Most of the courses taught on campus are supported by web sites, and have lots of links to lecture notes, presentations, articles, and other course materials.
In particular, check out the pages for “SIMS 202: Information Organization and Retrieval.” This truly is “Search Engines 101,” starting from basic concepts like “What is Information” and ending up with “Web Search Architecture and Crawling.” Many of the course materials are available online.
You can find many other similar courses at other university web sites. Use Google’s University Search, with keywords such as “search engines” or “information retrieval” to find them.
Finally, if you really get hooked and want a “fire hose” of information about information retrieval and search engines, try CiteSeer. This specialized search engine, from the NEC Research Institute, is focused on computer science research papers.
CiteSeer can index and make available papers in Postscript format, a capability most other search engines lack (at this point). It also creates interesting linkages to help you find information, such as citations, related documents, links to author home pages, and many other useful features.
From Resource Discovery to Knowledge Discovery on the Internet
http://citeseer.nj.nec.com/117999.html
The best way to access Dr. Zaïane’s paper is via the link from ResearchIndex, above. On this page you get a lot of information about the paper, including an abstract and links to view the paper in several formats. Look in the upper right corner of the page, next to the word “Cached” — unless you have specialized software, I’d recommend viewing the paper in PDF format. This page also offers useful pointers to other papers that have similar content or have cited the article.
Information Organization and Retrieval
http://www.sims.berkeley.edu/academics/courses/is202/f02/Assignments.html
This web site was for an introductory level course from the School of Information Management and Systems at the University of California at Berkeley. Lecture notes, readings, and even assignments are available.
Google University Search
http://www.google.com/options/universities.html
Use Google’s specialized University Search to find colleges offering programs in information retrieval and search technology.
CiteSeer
http://www.researchindex.com
Research Index is a search engine focusing on computer science research and articles. It’s a wonderful browsing tool for finding articles and research relating to searching the web.
NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.