Excite Enlarging Index, Partnered With LookSmart

In August, Excite began the first phase of an ambitious plan to enlarge its search index to 250 million web pages and improve the relevancy of its search results. The search engine also debuted new LookSmart-powered directory listings.

Under its new indexing system, which has been in the works for the past year and a half, Excite plans to visit 500 million or more pages across the web on a regular basis. It will then retain only those pages that it determines are most popular, or which offer the best quality information, or which seem to satisfy the queries its users make.

This "visit many, keep some" approach is how Excite hopes to expand its index coverage without simultaneously overwhelming users with irrelevant or off-topic documents.

"We don't think just adding more content will do the job for us," said Kris Carpenter, Excite's Director of Search Products. "We view that as our number one challenge, understanding what's out there and producing that top quality content in the first two pages of results."

Excite is using a number of "off-the-page" criteria to determine both which pages to retain in its index and how to rank those pages in response to queries. By off-the-page, I mean factors that are not tied to what's on the page itself.

For instance, search engines have traditionally ranked pages by criteria such as where and how often search terms appear in them. Since these factors happen "on-the-page," webmasters could make changes to their pages to try and increase rankings.

In contrast, off-the-page criteria are those not directly in a webmaster's control. A good example is link popularity. It is very difficult for a webmaster to try an outwit a good system that uses link popularity as a ranking criteria. That's because such a system leverages information from across the web, which a single webmaster cannot control.

Excite has long made use of link popularity, and that criteria is now being given heavier weight in its new system. Some have also noticed that Excite has been measuring clickthrough from its results. Carpenter said the Excite has experimented with using this data to influence rankings, but that it is not currently being used as part of its relevancy system.

Excite is also using another set of off-the-page information that I can't disclose publicly. I can say that it is unique among the major search engines in using this type of information, and that it would seemingly offer yet another way of getting the best information to the top of search results lists. Of course, the proof will be if relevancy actually does improve in the long term.

Each of these off-the-page criteria are weighted differently, but term frequency and location still come into play. In general, the mixture should work to reward sites with good content or that at least somehow distinguish themselves online.

This has been the overall trend with all the major search engines, and smart webmasters should be doing everything they can to build up the "reputation" of their sites in order to tap into this trend. Reputation? Yes -- just like people, sites can have reputations. Here are some key ways you can build up yours, in terms of what search engines want:

+ Loving Links: Search engines are making more use of link popularity, so getting people to link to your site is important. However, it's not just a numbers game. You want quality links from sites that are contextually related to you. In other words, getting links from 100 different sites may not be as important as getting links from 10 sites that are similar to you in content. So, get out there and find non-competitive sites that are related to you. Link to them, and ask them to link back.

+ Content Is King: People visit and link to sites that offer unique and substantial information. So, start developing more content if your site is lacking it. Build up FAQ pages and articles about topics related to the search terms that you want to be found for. This is especially important for those that have been devoting most of their energy into "doorway" pages. These are pages designed to rank well for particular search terms, but which typically offer no real content to visitors. Yes, they can be effective, and certainly don't abandon anything that works for you. But these pages do little to build your site reputation, so depending on them too much leaves you unprepared to do well in the future.

+ Get Your Own Domain: Search engines are far more likely to favor your site if you have your own domain, rather than if you reside within free web space such as that offered by GeoCities or Tripod. I know these type of places host many quality web sites. However, if you are concerned about search engines, you should move to your own space.

One big plus to the expanded Excite index will be that good pages should no longer suddenly disappear from the service for no apparent reason. This problem has plagued Excite over the past year. It would constantly drop pages out of its index to make room for new finds. As a result, webmasters with good representation in Excite might suddenly find all their pages gone. Similarly, this had an adverse impact on searchers, because pages that were satisfying their queries one week might no longer be present the next.

With the new system, pages that are deemed popular or high quality in some way should be retained. Excite is also planning to upgrade its submission system to help ensure that new pages or those that its crawler may have missed will also have a presence in the index.

"We want to give every site a shot at being in there long enough to demonstrate that they should stay," said Carpenter.

In particular, pages submitted via the Excite Add URL form that are not already in the database and that are not identified as spam will be far more likely to appear in the index than is currently the case. These pages would then remain within the index for a period of time, the length of which is still being determined. After this period, they might be dropped unless Excite's new crawling and ranking system has somehow tagged them as important.

Excite is also introducing new spam detection systems that are especially aimed at removing duplicate content. This has become a real problem for the service. Over the past year, about the only page a site owner could expect to get listed and keep listed was the home page of a web site. Thus, many have set up multiple web sites, in hopes of increasing their representation at Excite. These "mirror" or "satellite" sites often have only one or two pages that in turn link back to the "real" web site.

As a result, it is not uncommon to do a search for a popular topic and find multiple sites listed that seem independent but which in reality link back to the same place. Excite says it intends to crack down on this practice, as well as the intentional creation of duplicate or near-duplicate pages. So far, I haven't seen a real impact, but the rollout is still continuing.

So when does all this happen? Excite says it is currently at about 113 million web pages indexed, and that they will increase their volume of pages indexed by, on average, a rate of over a million pages per day. It is also introducing a new system meant to revisit pages based on how often they change, in order to keep the entire index as fresh as possible.

As for the Add URL system improvements, expect these to come around mid-September, though I suspect it may take longer than this.

In addition to crawling the web, Excite has also maintained a human-compiled directory of web sites. As at Yahoo, this is where sites have been reviewed by editors and organized into categories. A new deal struck in August means that this web directory will now be produced by LookSmart. In fact, LookSmart's information has already be integrated into Excite.

Just like at Yahoo, you can access the directory by selecting a main category from the Excite home page. You'll find them just under the search box. These links take you into one of Excite's "channels," which are filled with information beyond just web site listings.

On the left-hand side of each channel page, you'll see a box called "Directory" filled with topics related to that channel. For instance, in Excite's Lifestyle channel, the first topic in the directory box is "Beauty & Fashion." By selecting this topic, you'll then be shown a list of Beauty & Fashion web sites.

Only a few top sites will automatically be displayed for any topic. To see more, click on the "More Web Sites" link. You'll also see that as you drill down, even more topics will be revealed.

A faster way to get to relevant directory listings is just to do a search at Excite. If Excite finds any categories that match, it will display them in the search results under the heading of "Directory."

Many webmasters have been frustrated in the past about the inability to submit to the Excite directory. With the transition to LookSmart, those worries are lessened. Now if you submit to LookSmart and get accepted, you'll be included in the Excite directory -- along with the AltaVista directory and at the new version of MSN Search.

Unfortunately, LookSmart's submission system can be rather sluggish. On the plus side, you can submit to multiple categories, as long as you're relevant for them. Also, plans to have an expanded "self-publication" index that I've reported on in the past have been dropped, LookSmart says.

A couple of other Excite notes. A new Adult Content filter was introduced earlier this year. You'll find it on the advanced search page. It has to be enabled each time you do a search, unlike filtering options offered by AltaVista, Go and Lycos. A more permanent solution may appear later this year. Filtering is done by a combination of looking for the presence of certain words at the time a page is spidered and through the use of a site block list.

Excite is also offering the ability to search by language. As with other services doing this, language determination is made by looking for the presence of certain words unique to a particular language. You'll find this option on the Advanced Search page.

I also wanted to take a moment and briefly provide an update on Excite's two other search properties, WebCrawler and Magellan.

Magellan is now essentially a stripped-down version of Excite's directory listings and search index. Magellan's home page features the directory -- click on a topic, and you'll get web sites and only web sites -- no channel bells and whistles as you might get at Excite. Do a search, and your query goes against about two million pages from the Excite index, which are predominately site home pages. Magellan also uses Excite's ranking algorithms, so for popular queries, you may get the same results as at Excite.

Magellan also used to feature the ability to view "green light" web sites; however, this kid-friendly feature is temporarily gone. A replacement should appear by end of the year, Excite says.

WebCrawler is similar to Magellan in being a lighter-version of Excite. It also presents directory information, and web searching also goes against only two million page from the entire Excite index. However, the service has much more personality than Magellan, plus it does have expanded channel content that Magellan lacks. Additionally, WebCrawler uses a much different ranking system than Excite, so expect to see differences if comparing the two.

In the future, both services may have their web search ability expanded to tap into about 3 and 5 million pages from the Excite index. And webmasters -- if you have submitted your web pages to Excite, there is absolutely no reason to also submit them also to Magellan and WebCrawler. They use Excite's spiders and index.


Excite Advanced Search

Click on the words "Advanced Search" on this page to get complete options, including the adult content filter.

Excite WebCrawler

Excite Magellan


How LookSmart Works

Covers tips on submitting to LookSmart.

Kids Search Engines

Listing of services offering kid-friendly searches