How to Build Your Own Search Engine

Date published 8 July 2003 Author

Chris Sherman

Categories

Industry

Want a detailed glimpse into the black boxes we call search engines? Mining the Web is a textbook that discusses everything from building your own crawler to the future of information finding on the web.

Search engines are designed to be simple to use. Type a few words into a query box, and voila, you’re presented with a set of probable results that match your information need.

This simplicity masks some heavy-duty complexity. Although we refer to a “search engine” in the singular, Google, Teoma, AlltheWeb and others are actually software systems made up of a number of components, each specialized and tuned to perform a specific function that contributes to the whole.

Mining the Web: Discovering Knowledge from Hypertext Data is one of the first books that actually describes, in detail, the parts of contemporary search engines and how they function. The author, Soumen Chakrabarti, is an assistant professor of computer science and engineering at the Indian Institute of Technology in Bombay, and the book reveals a rare glimpse at the inner workings of our favorite search tools.

Most commercial search engines guard the details of their innermost operations closely, revealing casual hints here and offhand remarks there, but almost never offering complete information about the “secret sauce” underlying their operations.

That’s what makes this book so interesting. If you really want to understand how search engines work, this book provides an excellent and fairly detailed explanation of the processes they all use, to one degree or another.

The book’s not for the technically faint of heart, however. It assumes a good working knowledge of math, logic and computer science, and the book is dense with formulae and graphs. But don’t let that scare you — Dr. Chakrabarti writes clearly, and the book is well organized, progressing logically from topic to topic.

Even if you find technical language challenging, skimming past the details will leave you with a good fundamental understanding of search engine technology.

The book begins with an introduction to search engine technology. Subsequent chapters deal with crawling the web, search and information retrieval, and basic relevance algorithms. The second part of the book is dedicated to machine learning — how search engines can be engineered to get “smarter” about processing queries and returning better results.

Part three shifts gears, focusing on practical techniques and applications of search engine technology. Here’s where Dr. Chakrabarti really gives us a peek behind the curtain, talking about the differences between Google’s PageRank algorithms and some of the techniques used by other commercial search engines to differentiate themselves from one another.

The last chapter takes a look at the future of web mining, offering tantalizing glimpses of what we can expect over the next few y

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

How to Build Your Own Search Engine

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Google's PageRank algorithm, explained

WebPromo's Q&A with Google's Andrey Lipattsev [transcript]

Former Google Employees Launch Adult Entertainment Search Engine

4 Mistakes Agencies Make to Lose Clients

Customize the New Google Bar to Your Liking

Google Konami Code for Desktop Voice Search Unlocks Unlimited Free Searches

Matt Cutts on SEO, PageRank, Spam & the Future of Google Search at Pubcon L...

New Data Mining Tool Will Let You Make Your Own Private Search Engine