Excite Enlarging Index, Partnered With LookSmart

Excite Enlarging Index, Partnered With LookSmart

From The Search Engine Report
Sept. 3, 1999
(a longer version is available to site subscribers)

In August, Excite began the first phase of an ambitious plan to enlarge its search index to 250 million web pages and improve the relevancy of its search results. The search engine also debuted new LookSmart-powered directory listings.

Under its new indexing system, which has been in the works for the past year and a half, Excite plans to visit 500 million or more pages across the web on a regular basis. It will then retain only those pages that it determines are most popular, or which offer the best quality information, or which seem to satisfy the queries its users make.

This "visit many, keep some" approach is how Excite hopes to expand its index coverage without simultaneously overwhelming users with irrelevant or off-topic documents.

"We don't think just adding more content will do the job for us," said Kris Carpenter, Excite's Director of Search Products. "We view that as our number one challenge, understanding what's out there and producing that top quality content in the first two pages of results."

Excite is using a number of "off-the-page" criteria to determine both which pages to retain in its index and how to rank those pages in response to queries. By off-the-page, I mean factors that are not tied to what's on the page itself.

For instance, search engines have traditionally ranked pages by criteria such as where and how often search terms appear in them. Since these factors happen "on-the-page," webmasters could make changes to their pages to try and increase rankings.

In contrast, off-the-page criteria are those not directly in a webmaster's control. A good example is link popularity. It is very difficult for a webmaster to try an outwit a good system that uses link popularity as a ranking criteria. That's because such a system leverages information from across the web, which a single webmaster cannot control.

Excite has long made use of link popularity, and that criteria is now being given heavier weight in its new system. Some have also noticed that Excite has been measuring clickthrough from its results. Carpenter said the Excite has experimented with using this data to influence rankings, but that it is not currently being used as part of its relevancy system.

Excite is also using another set of off-the-page information that I can't disclose publicly. I can say that it is unique among the major search engines in using this type of information, and that it would seemingly offer yet another way of getting the best information to the top of search results lists. Of course, the proof will be if relevancy actually does improve in the long term.

Each of these off-the-page criteria are weighted differently, but term frequency and location still come into play. In general, the mixture should work to reward sites with good content or that at least somehow distinguish themselves online.

One big plus to the expanded Excite index will be that good pages should no longer suddenly disappear from the service for no apparent reason. This problem has plagued Excite over the past year. It would constantly drop pages out of its index to make room for new finds. As a result, webmasters with good representation in Excite might suddenly find all their pages gone. Similarly, this had an adverse impact on searchers, because pages that were satisfying their queries one week might no longer be present the next. With the new system, pages that are deemed popular or high quality in some way should be retained.

So when does all this happen? Excite says it is currently at about 113 million web pages indexed, and that they will increase their volume of pages indexed by, on average, a rate of over a million pages per day. It is also introducing a new system meant to revisit pages based on how often they change, in order to keep the entire index as fresh as possible.

In addition to crawling the web, Excite has also maintained a human-compiled directory of web sites. As at Yahoo, this is where sites have been reviewed by editors and organized into categories. A new deal struck in August means that this web directory will now be produced by LookSmart. In fact, LookSmart's information has already be integrated into Excite.

Just like at Yahoo, you can access the directory by selecting a main category from the Excite home page. You'll find them just under the search box. These links take you into one of Excite's "channels," which are filled with information beyond just web site listings.

On the left-hand side of each channel page, you'll see a box called "Directory" filled with topics related to that channel. For instance, in Excite's Lifestyle channel, the first topic in the directory box is "Beauty & Fashion." By selecting this topic, you'll then be shown a list of Beauty & Fashion web sites.

Only a few top sites will automatically be displayed for any topic. To see more, click on the "More Web Sites" link. You'll also see that as you drill down, even more topics will be revealed.

A faster way to get to relevant directory listings is just to do a search at Excite. If Excite finds any categories that match, it will display them in the search results under the heading of "Directory."

A couple of other Excite notes. A new Adult Content filter was introduced earlier this year. You'll find it on the advanced search page. It has to be enabled each time you do a search, unlike filtering options offered by AltaVista, Go and Lycos. A more permanent solution may appear later this year. Filtering is done by a combination of looking for the presence of certain words at the time a page is spidered and through the use of a site block list.

Excite is also offering the ability to search by language. As with other services doing this, language determination is made by looking for the presence of certain words unique to a particular language. You'll find this option on the Advanced Search page.

I also wanted to take a moment and briefly provide an update on Excite's two other search properties, WebCrawler and Magellan.

Magellan is now essentially a stripped-down version of Excite's directory listings and search index. Magellan's home page features the directory -- click on a topic, and you'll get web sites and only web sites -- no channel bells and whistles as you might get at Excite. Do a search, and your query goes against about two million pages from the Excite index, which are predominately site home pages. Magellan also uses Excite's ranking algorithms, so for popular queries, you may get the same results as at Excite.

Magellan also used to feature the ability to view "green light" web sites; however, this kid-friendly feature is temporarily gone. A replacement should appear by end of the year, Excite says.

WebCrawler is similar to Magellan in being a lighter-version of Excite. It also presents directory information, and web searching also goes against only two million page from the entire Excite index. However, the service has much more personality than Magellan, plus it does have expanded channel content that Magellan lacks. Additionally, WebCrawler uses a much different ranking system than Excite, so expect to see differences if comparing the two.

In the future, both services may have their web search ability expanded to tap into about 3 and 5 million pages from the Excite index.


Excite Advanced Search

Click on the words "Advanced Search" on this page to get complete options, including the adult content filter.

Excite WebCrawler

Excite Magellan


Kids Search Engines

Listing of services offering kid-friendly searches