Over at WebmasterWorld Tedster references an interesting short paper about creating your own search engine by Googler Anna Lynn Patterson. This document makes for a good read.
The paper was published in April of 2004 when she was a student at Stanford University. She is also the person whose name appears on the recent Google patent application titled Detecting spam documents in a phrase based information retrieval system.
Basically, she breaks it down into hard drive space, having lots of servers, and CPU power. Anna's document is a good initial primer, but there is another aspect of building a search engine that deserves some emphasis.
The search engine companies have built the largest networks of servers the world has ever known. When I think of Google's core technology assets, I don't think about search engine algorithms, I think about massively deployed server networks operating in close harmony.
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!