Provides information about search engines and hundreds of other specialty “bots” or intelligent agents .
Papers Written By Googlers
This page lists research papers written by those working for Google.
This research project aims to produce a new web search engine for scientists. They want as many search requests as possible, so they can test out new ideas.
Research Index is a search engine focusing on computer science research and articles. It’s a wonderful browsing tool for finding articles and research relating to searching the web.
Search Engine Reviews
Page within Search Engine Watch that has reviews of search engines, including some relevancy testing stories.
Search Engines & Legal Issues: Patents
Page within Search Engine Watch that lists articles about patents held on search technology.
Web Robots Pages
Long-standing guide to web spiders.
Working Papers Concerning The Creation Of Google
Documents that explain how Google was created back when it was a Stanford University project. Includes the often cited “Anatomy of a Large-Scale Hypertextual Web Search Engine.”
Experiments from Google.
Experiments and research papers from Yahoo.
NOTE: Article links often change and are not continually updated. In case of a bad link, use the publication’s search facility, which most have, and search for the headline
And yet another article — but a good, reader-friendly one — about IBM’s WebFountain project, which aims to mine trends on the web and sell the knowledge to companies. It’s especially nice that it makes reference to Clever, one of two big IBM search projects that never made it out of the garage. The other was Outride, to provide personalized results. Google ended up buying Outride about two years ago.
Microsoft’s Robert Scoble Discusses Search Engine Technology
SearchEngineGuide.com, Feb. 4, 2004
Nice Q&A by Andy Beal asking prominent blogger and Microsoft employee Robert Scoble of where he thinks search may be going, which some interesting thoughts on how desktop searching might be made easier, as well as looking at web-wide ideas.
Nice short update on happenings with Jon Kleinberg, whose work on search technology is widely cited and ideas considered by others. See also my article from 1998, Counting Clicks and Looking at Links, http://www.searchenginewatch.com/sereport/article.php/2166431, for back when he was with IBM working on Clever.
Couldn’t agree more with Tara Calishain that claims are nothing; action is everything. Dipsie is a new search engine not even launched yet getting hyped. You can expect more of this as everyone rolls out search services to catch the coattails of hype surrounding Google’s potential IPO.
Dipsie may be great — maybe not. When I looked at a demo back in November, it was locked to showing results for only one query, so it was hard to judge. Expect a closer look at this and other services as they warrant.
In the meantime, Gigablast continues to roll out new features and improve. It still probably won’t replace Google or your favorite major search engine, but Matt Wells is diligently delivering stuff anyone can actually use. See our past coverage on Gigablast.
On Search, the Series
SearchDay, Jan. 29, 2004
Few people who have a deep understanding of search have the ability to write eloquently about it. Search engine pioneer Tim Bray is one of those people, and he has written an absolutely fabulous series of essays that should be essential reading for anyone wanting a thorough understanding of the technology.
Google to set up in Zurich
swissinfo, Jan. 28, 2004
Google is opening a European research center in Switzerland. Low taxes were a main reason behind the choice.
Learning About Search Engines From Google Engineers
SearchDay, Jan. 26, 2004
Want to learn how Google works? A new archive of publications by Google employees offers deep insights into many aspects of the search engine’s operation.
Eurekster Launches Personalized Social Search
SearchDay, Jan. 21, 2004
Personalized search has long been promised as an important next step for increasing relevancy. Now it comes not from Google or Yahoo but instead from tiny Eurekster, which opens to the general public today.
Yahoo Lab to Cook Up New Search Tech
InternetNews.com, Jan. 20, 2004
Let the rebranding begin! In the middle of last year, Overture launched Overture Research. It was designed to show that Overture was just as serious about search research as competitor Google, which opened its own Google Labs back in 2002. Now that Yahoo owns Overture, it has borgified Overture Research to be its own flagship Yahoo Labs.
One interesting thing that appears to be new in the change from Overture Research to Yahoo Labs. Previously, we’ve written about open source search engine Nutch. You still can’t search using Nutch at its own site, Nutch.org. But Yahoo — as a supporter of Nutch — is offering a way to test its own implementation here.
I tried some searches for “cars,” “travel” and “pocket pc” and came away unimpressed. The results for the first two seemed a mishmash and remnant of what you’d expect from a 1998-era search engine. The last one did better. Yahoo does warn not to expect the results to be good, as it’s a work in progress.
Searching for Dominance: What Will Microsoft Search Look Like?
SearchEngineGuide.com, Jan. 12, 2004
Gord Hotchkiss tries to guess what Microsoft’s search solution will be like, especially in terms of the operating system. He focuses on ways Microsoft may try to make searching your own computer easier. Implicitly Query sounds pretty cool. But there’s a danger in assuming that what works for the desktop is somehow useful for searching the web.
In fact, time and again we’ve seen companies claiming a synergy between web search and enterprise search fail to do both well. Open Text got out of web search; AltaVista, Inktomi, Lycos, and Inktomi all dumped enterprise search. The big enterprise search companies like Verity don’t try do to web search. I’m also dubious that the web will latch on to some Microsoft system of tagging files, and there are plenty of web servers that don’t use Microsoft products.
IBM has half a football field of computers churning away at analyzing the web for its WebFountain search engine. But don’t think it’s designed to help you navigate to particular web sites. Instead, it seems configured to help you see patterns based on the documents on the web. WebFountain is designed to analyze and tag pages, in order to understand how to relate words and concepts found in them to other pages. WebFountain is being readied as a pay-for-search service that businesses are hoped to pick up for data mining purposes.
The Future of Human Knowledge: The Semantic Web
TechNewsWorld, July 28, 2003
In this article, we learn that in a few years, the Semantic Web will help search engines know if two different sites have related content. Well, search engines have been able to do this already for years (and users don’t make use of the feature). Search engines have also been able to use natural language for ages — in fact, they generally do not want you to try Boolean searching.
I like the part that says the Semantic Web will allow someone to enter “I want to buy a first edition copy of Gone With The Wind at a store in Beverly Hills this afternoon.” Yeah, I’d like to see that — I assume every rare book shop in Beverly Hills will be online, right?
How about just, “I want to buy a first edition copy of Gone With The Wind.” I can enter that into Google right now, and the first ad for BookFinder.com led me to some promising results. In Google’s editorial results, a link also led to a really good article on how to identify a first edition (and first printing, if that’s also what you want) of the book. Maybe the non-Semantic Web is actually better than Semantic Web proponents think.
How to Build Your Own Search Engine
SearchDay, July 9, 2003
Want a detailed glimpse into the black boxes we call search engines? Mining the Web is a textbook that discusses everything from building your own crawler to the future of information finding on the web.
Big Changes for Search Engines
Wired, May 27, 2003
Search was the focus of over 20 papers at the recent International World Wide Web conference held last month in Budapest. Techniques were described on how to sort through product reviews, visual search applications, personalized search and a technique to speed up how Google processes PageRank calculations.
Researchers Develop Techniques For Computing Google-style Web Rankings Up To Five Times Faster
National Science Foundation Press Release, May 14, 2003
Two Stanford University students will present a paper at the 12th Annual WWW Conference explaining ways to speed up the calculation of PageRank — NOT the ranking algorithm behind the Google search engine but rather one component of that algorithm. The “topic sensitive” calculations sound a lot like the system that Teoma uses. It’s also good to keep in mind that no one knows exactly how Google currently calculates PageRank. This research is based on a 1999 paper about how Google operated and is not necessarily indicative as to how things work today.
How Search Engines Make Sense of the Web
SearchDay, May 5, 2003
Search engines are essentially massive full-text indexes of web pages. The quality of the indexes, and how the engines use the information they contain, is what makes — or breaks — the quality of search results.
The Grammar of Sound
Technology Review, April 30, 2003
Need to search through audio recordings to find particular keyword references? New technology from Fast-Talk Communications aims to make this much easier. Of course, SpeechBot, has allowed you to do this for ages for selected news content. The service either lets you search against written transcripts to match portions of audio recordings or has used speech recognition to create transcripts where none are available. However, FAST-Talk skips the transcription step used by SpeechBot and other audio-indexing companies. Instead, it makes a sound — rather than word — transcription of a document. This is supposed speed the indexing process.
Microsoft Research seeks better search
News.com, April 17, 2003
A look at how Microsoft wants to ease the ability to search for information, primarily via desktop applications.
Help LookSmart Crawl the Web
SearchDay, Apr. 3, 2003
LookSmart is taking a new approach to discovering web content, offering a free downloadable screensaver program that also crawls the web when your computer is idle. Search Engine Watch members should follow through from the blue box to the members edition of this article.
Search Engines 101
SearchDay, April 2, 2003
Use these outstanding online resources to get the equivalent of an introductory semester of “search engines 101” — without having to go back to school for your education.
What makes a good search engine?
ComputerUser, April 1, 2003
Q&A with Teoma’s head of research and development Apostolos Gerasoulis and Steve Berkowitz, president of Teoma-owner Ask Jeeves. Focus is on Teoma’s “community-oriented” approach of determining relevancy.
Disney Would Sell Infoseek Search Tech, Switches Go To Google
SearchEngineWatch.com, March 18, 2003
The Walt Disney Internet Group says that it has been approached by companies looking to acquire the search technology and patents it has left over from the company’s purchase of Infoseek, the search engine later transformed into Go. Realistically, the search technology is worth nothing. Infoseek’s technology last operated to crawl the web back in January 2001 — and even then, the technology was dated. More about the story, as well as the fact that Go is now being powered by Google rather than Overture, can be found below.
The Second Eigenvalue of the Google Matrix
SearchDay, March 17, 2003
Want an analytical peek at some of the core components of Google’s famous PageRank algorithm? These two papers from Stanford offer some heavy-duty insights into Google’s operation.
SearchDay, Mar. 11, 2003
A company specializing in uncovering criminal connections and terrorist networks has released a visualization tool that reveals hidden relationships in Google search results.
Google: Net Hacker Tool du Jour
Wired, March 4, 2003
Another “let’s blame Google” story. Guess what? If you don’t configure your web server correctly, Google — and any other search engine — might index pages that allow people to hack into your server.
The Quest for Search Engine Relevancy
SearchDay, Feb. 4, 2003
Today’s search engines are experiencing dij` vu, it seems, focusing on developing better relevance in search results instead of trying to entertain users as “portals”.
Tech Predictions for the Decade
Wired, Jan. 20, 2003
In the future, semantic web agents will find exactly what we want, rather than us having to scour hundreds of off-topic pages on Google, says this prediction of future technology. Yawn. Will these be similar to the same agents that Autonomy and WiseWire touted in the late 1990s that failed to deliver? And is this the same Google that we’re endlessly told finds everything we need, now suddenly off topic?
Search is often an on-demand activity. People who want “books about Agatha Christie” — an example cited in the article — don’t want to wait until agents come back with answers. They want the answers immediately, so they’ll still continue to turn to a search engine.
That example is also supposed to indicate the failure of search engines, because if you do it, you get books written by Agatha Christie, rather than about her. Of course, if you simply perform that search with quotation marks, you’ll get pages that have that contain that exact phrase. And, surprise, when I tried this at Google, the first match indeed listed books about Agatha Christie. Guess I can go without my search agent for another year.
Tim Mayer of FAST search & transfer
Enfin, January 2003
Long interview covering things that FAST considers to be spam, hardware and software used by the search engine company, that fact that the company now has 6 million paid inclusion URLs and other topics.
Web Search Trends
Searcher, Jan. 2003
Short summary of my keynote about search engine trends at the Internet Librarian conference held November 2002.
Making his mark on the Internet map
Ottawa Citizen, Dec. 9, 2002
Profile of one of the founding parents of web search, Tim Bray of Open Text fame, and how he’s currently looking into ways of mapping information.
Fun With Google’s APIs
SearchDay, Dec. 2, 2002
A free tutorial from IBM shows you how to develop your own applications that harness the behind the scenes capabilities of one of the web’s most powerful search engines.
In Search Of The Relevancy Figure
The Search Engine Report, Dec. 5, 2002
While relevancy is the most important “feature” a search engine can offer, there sadly remains no widely-accepted measure of how relevant the different search engines are. Turning relevancy into an easily digested figure is a huge challenge, but it’s a challenge the search engine industry needs to overcome, for its own good and that of consumers. A look at the challenges and issues involved on the quest to get an accepted relevancy figure.
AltaVista, Overture Speak Up About Perfect Page Test
The Search Engine Report, Dec. 5, 2002
Last month, Search Engine Watch published the results of our “Perfect Page Test” and promised to provide feedback from the search engines tested, if any was received. We only got significant feedback from two services, AltaVista and Overture. Not surprisingly, these were the two that received failing marks. And with apologies to Overture, we never meant to score it alongside the others. At look at what happened with Overture, as well as some detailed feedback on problems with testing relevancy from AltaVista.
A Helping Hand To Find The Invisible Web
Australia.internet.com, Nov. 27, 2002
Profile of YourAmigo, which has products to extract information that is typically “invisible” to web crawlers. The “Spider Linker” product is expressly designed for marketers and others who want to make their content visible in web-wide search engines.
Way back when
New Scientist, Nov. 2002
Q&A with Wayback Machine/Internet Archive founder Brewster Kahle.
Anatomy of a Search Engine: Inside FAST
SearchDay, Oct. 31, 2002
Here’s a behind the scenes glimpse at the inner workings of FAST, the search engine that powers Lycos, AlltheWeb.com and other regional portals.
Anatomy of a Search Engine: Inside Google
SearchDay, October 30, 2002
Search engines aren’t just black boxes — they are programs continually updated to improve indexing, search responsiveness and relevance ranking. Here’s an insider’s look at Google.
Google Fixes Security Flaws in Search Toolbar
IDG News, Aug. 8, 2002
Security holes were found in the popular Google Toolbar, but the company says patches have now automatically been transmitted to anyone using it.
The Web Intelligence Consortium
SearchDay, Aug. 5, 2002
A group of researchers has launched a new consortium focused on artificial intelligence and advanced information technology on the next generation of Web-empowered products, systems, services, and activities.
New Research at DARPA
SearchDay, July 29, 2002
The primary instigator of the Internet, DARPA, is funding research into future technologies — including many that have potential to dramatically improve search systems.
Dinner with the mind behind the mind of God
Red Herring, July 16, 2002
Red Herring takes Google cofounder Sergey Brin to dinner, tries to get him drunk and spill secrets. No dice, but he does answer what the perfect search engine would look like. “It would be the mind of God,” Brin said.
Google Announces Programming Contest Winner
SearchDay, June 3, 2002
Google has awarded a $10,000 prize to a programmer who created a program that lets users to search for web pages within a specified geographic area.
Google’s Gaggle of New Goodies
SearchDay, May 22, 2002
Google has enhanced its already indispensable toolbar, and is offering an intriguing peek inside the kimono through Google Labs, a “technology playground” for ideas that aren’t quite ready for prime time.
Teaching a search engine
San Jose Mercury News, May 16, 2002
If only a search engine could learn what I like and understand what I like. Columnist David Plotnikoff thinks it would be a great idea, and it is — but one that has never gone anywhere because of privacy concerns or because search engine users were afraid personalized results would cause them to “miss” important information. See my last article on the subject for more: Google May Get Personal.
The Seventh Search Engine Conference
Infonortics, April 15-16, 2002
You’ll find links here to presentations from speakers at this regular conference about search engines. The emphasis is more on enterprise search than web-wide search.
The Search Is On
ComputerWorld, April 15, 2002
Looks at four different research projects and products aimed at improving data mining.
The Best of the International World Wide Web Conferences
SearchDay, Apr. 11, 2002
Each year, the International World Wide Web Conference provides a showcase for innovative web technologies. Here’s a chronological list of significant papers over the past decade focusing on searching and search engines.
Google Gets Down to Basis for Chinese
Boston.internet.com, April 10, 2002
Google has licensed technology from Basis to help it understand Chinese.
The search engine: a mix of partners
Europemedia, April 4, 2002
A look at future plans for leading search engines, for mobile searching, in the UK and Europe markets.
CIA-funded Indo-U.S. firm makes smarter Web search
Reuters, Feb. 25, 2002
The US Central Intelligence Agency has invested in a firm developing software to perform pattern searching similar to Autonomy.
Google’s “Search Recipe” Contest
SearchDay, Feb. 12, 2002
Google challenges programmers to write code that does “interesting” things, with a cash prize and a VIP trip to the Googleplex as a reward.
A Pre-Web Search Engine, Gopher Turns Ten
SearchDay #198, Feb. 6, 2002
Before the web became synonymous with cyberspace, Gopher was arguably the most popular Internet search engine, and despite rumors to the contrary, it’s alive and “digging.”
FAST winning the search engine race
Europemedia.net, Feb. 1, 2002
Q&A with FAST, on how it is positioning itself in the search space, with the focus on technology.
Three Minutes With Google’s Eric Schmidt
PC World, Jan. 30, 2002
Q&A with Google’s chairman and CEO, on various issues.
How the Wayback Machine Works
O’Reilly Network, Jan. 21, 2002
The headline says it all.
The geeks who saved Usenet
Salon, Jan. 7, 2002
Behind the scenes about how Google managed to restore 20 years of Usenet posts.
Search Engine Anti-Optimization
SearchDay, Nov. 28, 2001
A novel proposal would allow webmasters to *reduce* search engine rankings for specific keywords and phrases, deliberately making pages all but impossible to be found by search engines.
With links to songs, videos and pictures, search engines advance
Associated Press, Oct. 28, 2001
Primarily a focus on multimedia search company Friskit, hoping to connect music lovers with songs. Also mentions Singingfish, which has several important partnerships already.
AT&T Wireless adding Google to phones
News.com, Oct. 15, 2001
Google has a pretty cool feature that allows those using WAP browsers to use a special version of the search engine where the search results are formatted for small screens. In addition, when you visit any link, Google continues to convert HTML into a WAP format on the fly, making it an easy way to view the web while mobile. This technology makes it no wonder that the search engine is making gains among wireless providers, such as this latest deal with AT&T. It follows on earlier deals with Sprint PCS, Cingular Wireless, Handspring, Palm and Vodafone.
Google May Get Personal
The Search Engine Report, Oct. 2, 2001
With last month’s acquisition of Outride, Google may be poising itself to go forward into an area of search refinement that no major player has gone successfully before: personalized search results. With personalized results, a person would get back a list of results that takes into account some of their demographics.
Search start-ups seek Google’s throne
News.com, Aug. 28, 2001
Looks at how Teoma and Wisenut hope to challenge Google (not to mention Inktomi and FAST) in the search space. Both services are impressive, but keep some things in mind. Every new service to come along talks about how they can do it faster and more cheaply. Also, Inktomi and FAST’s paid inclusion programs do not give pages a ranking boost, so saying they offer “pay for prominence” isn’t really correct.
Search Engine Security Concerns
The Search Engine Report, Aug. 2, 2001
Security issues with Lycos and Google came up in July. They aren’t likely to impact many, if any, users, so don’t get panicked. Here’s a rundown on what happened.
A Master of Headline Grabbing
Time, July 9, 2001
Profile of news search service Moreover.
Illuminating the Web
Time, July 9, 2001
A look at the challenge of indexing the invisible web and enterprises.
Porn sneaks past search filters
News.com, July 2, 2001
Keeping porn images out of image search results is quite the challenge. While image search engines still can’t read images in the way they can text, some can at least identify colors, helping keep files dominant with flesh tones out of the results. It only it weren’t for all those darn baby photos messing everything up — “Babies tend to be showing a lot of skin,” said image search company LookThatUp.
Upstream: Video Searching
Technology Review, July/August 2001
Review into research about making video searching easier.
New IT company aims to increase speed of Internet, database searches
Pittsburgh Post-Gazette, June 28, 2001
Vivisimo is a great meta search engine, but the company actually aims to use its technology to help businesses manage information. Also how the company got its name (which I think looks and sounds great, but which I also always misspell).
Search the Web Like a Map
About.com Web Search Guide, June 18, 2001
Review of tools that let you see web search information in visual or graphical form.
Google Polishes its Image
SearchDay, June 26, 2001
Google has taken the wraps off its new specialized image search engine, allowing you to search and browse more than 150 million digital images.
Search Research From The WWW10 Conference
SearchEngineWatch.com, June 18, 2001
Summarizes major papers relating to search that were presented at the 10th International World Wide Web Conference.
Industry Standard, June 18, 2001
Focus on attempts to mine information located in the “invisible” or “deep” part of the web.
Visual Search Engines
WebDeveloper.com, June 7, 2001
Review of eVision Java-based toolkit, which is designed to improve visual searching.
Lasoo Makes Geosearching Visual
The Search Engine Report, June 4, 2001
We’ve had “geosearching” available through Northern Light for over a year, but it’s moving to a new level with the recent launch of Lasoo, which lets gives you geovisual results. Geovisual? Imagine if your search results were overlaid upon a map. It would be a useful view to have for geographically-related searches.
Ogle Not Google’s Top Scientist
Wired, April 27, 2001
I’ve often wanted to do a piece about women in search. It’s one industry where you’ll find women in many top positions. Case in point: Monika Henzinger, Google’s director of research, who was just named one of the top 25 “Women on the Web.” Other women of note in search? Arguably the most powerful, given Yahoo’s continued dominance as a search resource, is Srinija Srinivasan, vice president and editor in chief of Yahoo. LookSmart’s editor in chief is also a woman, Kate Wingerson — as is the service’s cofounder and president, Tracey Ellery. At Excite, Lynne Mariani Pogue heads up search there, and she was preceded by Kris Carpenter.
Specialized Providers Zero In On Multimedia Content
Interactive Week, April 24, 2001
Profile primarily of streaming media search provider Singingfish.
Terra Lycos to launch mobile search
News.com, April 23, 2001
Lycos is teaming up with another company to offer live answers for a fee via telephone.
Napster Dances To A New Gigabeat
SiliconValley.internet.com, April 10, 2001
Filtering out illegally-copied song files hasn’t been easy for Napster, so the company has acquired Gigabeat.com, which provides music search technology.
MSN adds music and ‘sounds like’ search
InfoWorld, April 4, 2001
MSN apparently has a new music search service that lets you locate songs that sound like other ones.
Accelerating toward a better search engine
RedHerring.com, March 9
Focuses on some promising new players in the search space. Unfortunately, to set up these players, the story positions existing search engines as failures. It goes as far as to suggest that we’ve had no improvements in search over the past 11 years since Archie appeared. “Anyone who has used search engines knows they’re stupid,” the author states. And yet, millions use search engines each day and find what they are looking for. Search engines have problems, but they’ve made great strides forward. What’s stupid is calling these tools stupid, since they usually find something useful from millions of records within seconds.
Sun Snags InfraSearch in Move Towards P2P
InternetNews.com, March 6, 2001
Sun is to acquire InfraSearch, the company which hopes to produce P2P searching applications.
Being Search Boxed To Death
The Search Engine Report, March 5, 2001
Like a Swiss Army Knife, general purpose search engines often can do many different jobs. Nevertheless, your results might be better if you turn to a vertical tool. This may be the year that the general purpose search engines finally figure out a way to get the right vertical tools into the hands of their users.
Will P2P Search Replace Search Engines?
The Search Engine Report, March 5, 2001
Will Napster-style peer-to-peer searching mean an end to search engines? Two articles provide a pro and con.
Searching Inside Of Images
The Search Engine Report, Dec. 4, 2000
The Holy Grail of image searching is to actually “see” what images are about, rather than understanding them based only on the words appearing around them. A new generation of multimedia search tools aims to change this.
Search Us, Says Google
MIT Technology Review, Nov/Dec. 2000
Good Q&A with the founders of Google.
Stanford Launches Better Search Engine Project
siliconvalley.internet.com, Oct. 2, 2000
Yahoo, Excite and Google all came out of Stanford University. Will Global InfoBase be next?
GPulp Opens Up Web Searches
Wired, Sept. 18, 2000
Peer-to-peer Napster-style searching continues to attract attention, and a group of developers are working on an open source solution they hope will become a standard.
Next-Generation Web Search
IEEE Data Engineering Bulletin, Sept. 2000
In this special edition, six technical papers that deal with web searching are presented. They include topics such as link analysis, computing page reputations, creating topic specific search engines and the role a search engine’s interface plays in the success of a searcher. Papers are presented in PostScript. To read them, try the GhostScript viewer.
Information just wants to be Freenet
Salon, Aug. 28, 2000
Uprizer is a company that want to build solutions off the Freenet platform, a distribute search or “peer-to-peer” search system similar to Napster. Lots of talk about putting it to non-music, non-controversial issues, but no real specifics. Will distributed search go mainstream? Everyone seems interested, these days, but then push seemed like a good idea to some that ultimately went nowhere. My feeling is that we’ll still see distributed search as mainly a supplementary system for finding things on intranets.
Microsoft loads up for pirate raids
ZDNet, Aug. 2, 2000
Microsoft’s other search engine isn’t for web surfers. Instead, it helps the company track down pirated software.
Let a Hundred Search Engines Bloom
The Standard, July 17, 2000
Search is hot again, and all that heat is making new search products blossom like mad. A look at who is competing for your attention and general trends on how they hope to differentiate themselves.
The Search Engine as Cyborg
New York Times, June 29, 2000
A comprehensive look at how humans have entered the mix to help improve search results.
WWW9 Features Search Papers
The Search Engine Report, June 2, 2000
Summarizes major papers relating to search that were presented at the Ninth International World Wide Web Conference.
Quantum Leap in Searching
Wired, May 25, 2000
If someone builds a quantum computer, then there’s a search algorithm just waiting to run on it that might allow billions of documents to be processed quickly.
Souped-up search engines
Nature, May 2000
A look at the state of current web search technology.
Online Web Search Issue
Online, May 2000
Coverage of specialized search engines, the future of search engines, upcoming search technologies and a piece I wrote that is an overview of search engine submission basics.
Employee No. 5 at Yahoo
Newsweek, March 27, 2000
Interview with and background about Yahoo editor in chief Srinija Srinivasan.
Google Slices & Dices The Web Simply
Inter@ctive Week, March 23, 2000
Basic details on how Google works, plus a look at how the humans at Google are fed by a staff chef.
You said what?!
Salon, Feb. 25, 2000
A look at a live chat search service that’s in development, with a slant on privacy concerns it may raise.
FBI vows to combat Net terrorism
News.com, Feb. 9, 2000
Round up of denial of service hacks that hit Yahoo and other major sites.
Search sites brush up on people skills
USA Today, Jan. 24, 2000
A look at the growth in using humans to provide search results.
This Search Engine Sees What You Mean
PC World, Jan. 10, 2000
Researchers have developed a search engine that apparently lets you look for matching images regardless of file format, such as finding matches within both a video file and a JPEG file. However, it does not appear to be a solution where you can enter words to find what you want. Instead, you provide a visual example that can be pattern matched.
Web search results still have human touch
News.com, Dec. 27, 1999
Recap of major developments over the year in search.
Netscape Directory Making a Splash
Industry Standard, Dec. 14, 1999
A look at the growth of the Open Directory, with pros and cons of the service.
Internet Labels Lose Meaning in Rush for Popular Addresses
New York Times, Nov. 29, 1999
I can remember in early 1995 when I had to justify to Network Solutions why I needed two similar domain names ending in .com and still wasn’t granted the second one. Today, they’d swipe my card without a second thought. In fact, I was very surprised to discover earlier this year that in addition to .com, .net and .org were being pushed as well — even to non-network companies and for profit organizations. So I was pleased to see this article addressing how the domains are being devalued. And what if we do ever get new domains like .firm? Only a fool would believe that corporations won’t grab every variation of their name for every domain, just to protect themselves. In short, those new names won’t solve anything.
The Perfect Search
Newsweek, Sept. 27, 1999
Nice review of current thinking in search.
Excite.com Goes to Illinois
Wired, July 14, 1999
Imagine waking up and finding you had complete control over the excite.com domain. That’s exactly what happened to a person in Illinois, who was the victim of an apparent prank and hack of Network Solution’s database. Traffic to excite.com was never altered, but it could have been, through no fault of Excite@Home.
Hypersearching the Web
Scientific American, June 1999
Written by members of IBM’s Clever project, it’s a slightly technical look at on how Clever is using link data to build a smarter search service.
Focused crawling: a new approach to topic-specific Web resource discovery
IBM Almaden Research Center, May 1999
Winner of the Best Paper Prize at the WWW8 conference, this technical document from the organization behind CLEVER covers the use of custom crawling and link analysis to create lists of relevant web sites.
On Caching Search Engine Results
University of Crete/ICS-FORTH, Jan. 13, 1999
All the major search engines cache results. In the most simplified description, this means that if you search for “travel,” the search engine will first check fast RAM memory to see if it has already served a set of results to that query. If so, then it doesn’t do the more time-extensive effort of checking against data stored on multiple hard drives, across multiple machines, to retrieve a fresh results list. Sound scary, like you might miss data? Don’t worry — with the number of queries that take place, it isn’t much of an issue. This paper does a good job of describing caching in more detail. It gets technical deeper into the paper, but the introduction will be accessible to many people. Interesting findings of an analysis of about 1 million Excite queries from 1997 found that most queries are popular. That is, they are frequently repeated. The most popular term in the set was requested 2,219 times, down to the 1,000th most popular term being requested 27 times.
Building Responsive User Communities
Internet World, Dec. 14, 1998
Details on how Excite provides its communities service.
Forbes, Nov. 30, 1998
Some nice technical details about how Inktomi works. There’s a quote from Yahoo explaining that it dropped AltaVista for Inktomi because “AltaVista couldn’t scale up as well.” It really had more to do with AltaVista competing with Yahoo. AltaVista was scaling just fine for Yahoo’s need.
Tag, You’re It! XML Supercharges the Net
Industry Standard, Nov. 13, 1998
How XML is being used for some specialty search needs.
Inktomi searches for Net profits in Europe
BBC, July 10, 1998
Interview with Inktomi chief scientist Eric Brewer, focusing mostly on Inktomi’s search technology.
The Further Development of Meta Search Engine Technology
Internet Society, July 22, 1998
Examines the current state of meta search.
A Search Engine Retools for Speed and Dexterity
Internet World, June 29, 1998
Tech details on HotBot’s servers.
New Search Tool Speaks Your Language
Wired News, April 15, 1998
About the Electric Monk, a service that takes a natural language query such as “How do I make chicken soup” and sends it to AltaVista in a way meant to produce better results. Some may like it; some may think it’s no better than regular AltaVista.
Building the Network To Back Lycos’ Deals
Internet World, April 20, 1998
The hardware that keeps Lycos running.
How Digital Built a Huge Net Index
Internet World, March 23, 1998
The nuts-and-bolts about AltaVista’s hardware.
Getting There, or Not: Why Search Is So Ineffective
Internet World, Feb. 23, 1998
A short but interesting sidebar article that summarizes the many things that can cause searching to miss the mark.
Search Veteran Looks to Video
Internet World, Feb. 23, 1998
Excalibur, a company known for its intranet search products, is aiming toward video search technology.
Keeping 66 Sites Up, Holding Costs Down
Internet World, Feb. 2, 1998
Tech details on what it takes to keep Excite’s many servers up and running.
Casting an Information Net
Upside, Feb. 2, 1998
A look at the challenges of information retrieval, ranging from web-based services to information technology. Lots of sidebars, with a listing of main players in different fields.
Searchers on the Beachhead
Searcher, Feb. 1998
Internet Librarians gathered in Monterey, California last November for a conference on searching. One of the highlights was a panel of search engine representatives. An account, for those that missed out on the fun.
AltaVista Ranking Of Query Results
This Nov. 1997 article by Dirk van Eylen examines the AltaVista ranking system, drawn on observations using AltaVista’s personal search software and queries run on a Belgium branded version of the AltaVista web service.
Roving Robot Will Unmask Online Music Pirates
WebWeek, Oct. 20, 1997
MusicBot aims to hunt down sites using music samples without permission.
Search engines: The next generation
Network World Fusion, Oct. 20, 1997
An interesting look at the emergence of specialty search engines, though it incorrectly calls metacrawlers an emerging tool. Metacrawlers have been around for ages — there are just more of them, now.
Alternatives to Hit Lists Include Ability to Fly Through Data
WebWeek, Sept. 15, 1997
A very enjoyable and interesting article on the challenges of guiding people toward what they want. Despite bells and whistles, there’s a tendency to gravitate toward the search box.
Challenges Amid Yahoo’s Hypergrowth
WebWeek, Sept. 15, 1997
Seventy servers. Lots of bandwidth. Tons of disk space. If you like hardware, read on to see what it takes to keep Yahoo running.
Lost in cyberspace
New Scientist, June 28, 1997
An excellent look at why some search engines are moving away from an “index everything” attitude and instead adopting an “index the best” or “sample the web” method. Does it make a difference to searchers if some pages aren’t included? Search engine execs explain why they believe a sample is good enough.
Survey of Information Retrieval
Scott Weiss, April 29, 1997
Guide to technical details about search engines, including a list of vocabulary terms.
Supercharge Your Web Searches
NetGuide, May 1997
Reviews and details of each of the major search engines, which I wrote for NetGuide.
The building and maintenance of robot based internet search services:
A review of current indexing and data collection methods
Desire Project, Sept. 26, 1996 (last revision)
This is a monster report that covers the spectrum of Internet searching and information retrieval technology in mid-1996. Lots of details on how search engines were operating at that time. Things change fast, so many details are no out of date. But much of the report remains interesting reading.
Distributed Indexing/Searching Workshop
Cambridge, Massachusetts, USA, May 28-19, 1996
Notes and information about an important W3-sponsored meeting relating to how the web is indexed and searched.
Multi-Service Search and Comparison Using the MetaCrawler
4th WWW Conference, 1995
A paper describing how MetaCrawler operates, written a few months after its launch. Interesting details of the challenge of gathering results from various sources and providing a quality, combined list.