SEO News
Search

How AllTheWeb (FAST) Works

author-default
by , Comments

Recent Articles

The articles below appeared in the Search Engine Update newsletter and have important information not yet added to this page. Please review them to find out about any new developments. Further below, you will also find a list of other articles about this search engine that may be of interest.

  • This page is current. See below for past articles.

Overview

AllTheWeb is both a search site and the name of the crawler-based search engine that provides results to the AllTheWeb site and partner sites. This article explains how to show up at the AllTheWeb site and especially focuses on doing so through the AllTheWeb crawler. It also explains how other sites make use of AllTheWeb's results.

Jump to:

AllTheWeb Site & Paid Listings

Relatively few people come to the AllTheWeb site, as until early 2003, the site was meant to showcase the search technology of a Norwegian company called FAST. FAST operated AllTheWeb primarily to show potential partners how it might provide them with web search results. AllTheWeb was also sometimes called FAST Search, which is why you may find AllTheWeb called FAST in Search Engine Watch or elsewhere on the web.

Overture purchased AllTheWeb in April 2003, and the site continues to be primarily a showcase. Consequently, you shouldn't expect to see much traffic from the AllTheWeb site itself. Instead, it's AllTheWeb's partners, covered in the Partners section below, that might send you noticeable traffic.

Perform a search from the AllTheWeb home page, and the results that come back are a blend of paid listings and web results.

Paid listings come from Overture. Generally, these will be the first three listings on the page. They can be identified by the (sponsored) tag that will appear next to each title, like this:

Find Cars at carmax.com (sponsored)
Actual low, no-haggle prices, photos and
descriptions of over 15,000 vehicles.
http://www.carmax.com/

If you'd like to appear as a paid listing for a particular search term, you'll need to have an account with Overture and ensure that you're one of the top three bidders for that term. To learn more about this, see the How Overture Works page.

Web results come from AllTheWeb's crawling of the web. The next sections focus on getting included and performing well in these results. Please note that occasionally you'll see the word "historically" used. This is because AllTheWeb is currently in a transition period, as it is merged with AltaVista. More about this merger is covered in the AllTheWeb's Future section, below.

Getting Listed In Web Results: Link Crawling

The best way for pages from a web site to be listed with AllTheWeb is for the crawler to find the pages naturally, as it moves around the web following links. If your brand new site has even one link pointing at it, AllTheWeb's crawler may eventually find it and index pages within it. The more links pointing at you, the more likely the crawler is likely to come across your pages and add them for free.

Think of it like asking people for recommendations. If you ask 10 different people for a good doctor and several of them suggest the same person, you are more likely to contact that doctor for help. In the same way, having many links pointing at your pages (or a few links from important web sites) makes it more likely the AllTheWeb crawler will visit you.

To gain links, you should offer good content. If you do, people are more likely to link to it. Similarly, you should build links to your pages from good quality sites related to you in content (see the Link Analysis & Link Building page for more tips on this).

The Submitting & Encouraging Crawlers page also has tips on site architectural changes you can do to help improve the odds that your pages will be picked up naturally by the AllTheWeb crawler, as well as other crawler-based search engines.

AllTheWeb's crawler will index dynamic pages, to some degree. Historically, if it reads a static page that links to a dynamic page (as defined by those with &, %, =, $ or ? symbols in their URLs), then the crawler will then index that dynamic page but not follow any links from it.

AllTheWeb does this to keep it from following dynamic loops or following into spider traps. In the future, it may go further and follow links even if they are on dynamic web pages. Dynamic pages can also be submitted through the paid inclusion program described below, for guaranteed indexing. Also see the Dynamic Pages & Search Engines article for more about issues between crawlers and dynamic pages.

Getting Listed In Web Results: Paid Inclusion

If your page is not added naturally by link crawling, the next best option for being listed is to use AllTheWeb's paid inclusion program. The downside to the program is that you must pay for it and that it does NOT guarantee that your pages will rank well for particular terms. The upside is that you know absolutely that your pages will be included in the index and thus may appear in response to searches.

AllTheWeb does not sell paid inclusion directly but rather through resellers. One of these is PositionTech, which we'll use as an example. It sells what's called FAST PartnerSite (the company uses AllTheWeb's old name of FAST, but FAST Partnersite is nonetheless a program that inserts you into AllTheWeb).

The FAST PartnerSite program is a basic submission program designed for those with smaller web sites. It uses a self-serve model, charging a flat fee per page. In other words, you pick the pages that you want indexed, then use a form to submit them along with your credit card details, paying a set amount per page. Once the pages are accepted, they'll appear in AllTheWeb in between 2-3 days and then be revisited every 2 days after that, for up to a year.

AllTheWeb also has a bulk submit or "trusted feed" program, for those with large web sites and who wish to list 1,000 or more pages. Instead of a flat fee per page, as with the FAST PartnerSite program, the bulk submit program uses cost per click (CPC) pricing. This means that you only pay for pages that actually generate clicks to your web site. Bulk submission may be a more economical system for those with thousands of pages that they wish to include with AllTheWeb.

Both basic and bulk submission are sold by three different AllTheWeb resellers. You'll need to shop around for the best combination of price and services (such as clickthrough reporting). Basic submission pricing can be can be found on the Crawler Submission Chart and provides a rough range of what you might expect. The resellers are:

In addition to the resellers above, there are about 20 different resellers who handle only bulk submission. AllTheWeb does not currently have a comprehensive list of these that can be posted. If one becomes available, it will be added to this page.

One final note about AllTheWeb paid inclusion. As explained further below in the AllTheWeb's Future section, AllTheWeb and AltaVista are being merged together. This means any paid inclusion URLs you submit to either search engine will eventually wind up in the other one. For this reason, it might be wise to only purchase the less expensive inclusion with AllTheWeb, as this should effectively get you into AltaVista in the next few months or sooner.

Getting Listed In Web Results: Free Add URL

AllTheWeb maintains an Add URL page that allows you to directly submit key pages to its crawler for free. You need only submit your home page and one or two inside pages, just in case there is a problem reaching your home page. AllTheWeb will crawl links from these pages to find other pages in your web site.

Having said this, there's no guarantee that AllTheWeb will add any of your pages, when you use this service. Nor will submitting each page make a difference. AllTheWeb makes use of Add URL submissions only as backup to its link crawling.

Assuming AllTheWeb does accept your submissions, you can expect to see new pages appear anywhere between two to four weeks.

Content Indexed & Refresh

Historically, AllTheWeb has indexed URL and body text. Meta keywords and description information has not been indexed, though meta description tags are sometimes used for display purposes. The Search Engine Display Chart explains this more.

AllTheWeb is supposed to refresh its entire index within 14 days, though it hasn't always kept to this schedule. Some pages may be refreshed more often if they are deemed to be important, because they either are seen to often change, to be popular based on link analysis, or because they often appear in AllTheWeb's top search results in response to queries. Paid inclusion URLs are obviously refreshed according to the paid inclusion schedule.

In addition to HTML and text documents, AllTheWeb will also collect other textual documents for display in its search results, such as Word, Excel, PowerPoint and Adobe Acrobat/PDF files.

AllTheWeb will also index Flash files, to some degree. It will follow links in Flash files to other pages, and it will also index any relevant text that can be extracted from Flash. However, since most Flash files tend not to have much text, don't expect them to do well against regular HTML content. The articles below have more about how AllTheWeb deals with Flash:

Also see the Search Engine Optimization Articles page which contains additional articles about how search engines deal with Flash, along with other topics.

AllTheWeb has a useful URL Investigator tool that can show you all the pages it has indexed from your site, as well as links it knows to be pointing at a particular page.

AllTheWeb also makes additional information about how they crawl and refresh its database available on its Webmasters' FAQ and Crawler FAQ pages.

Ranking Well

Like other crawler-based search engines, AllTheWeb makes use of on the page factors such as the location and frequency in which search terms appear on a web page to help determine if it should rank well. Thus, following the tips on the Search Engine Placement Tips page may help improve your ranking.

AllTheWeb also make use of link analysis as part of its relevancy ranking system. It analyzes links from across the web to determine both the importance of a page and the terms it might be relevant for. The Link Analysis & Link Building page explains this concept in more depth, plus it has tips on gaining important links to build your reputation in link analysis systems.

The "level" of where your pages are within a web site can have an impact. Historically, pages in "upper" levels are more likely to be ranked better, AllTheWeb has said. For example, here are pages at various levels in the Search Engine Watch web site:

http://searchenginewatch.com/news.html
http://searchenginewatch.com/sereport/bydate.html
http://searchenginewatch.com/sereport/03/03-fast.html

The first page is at the top level of the web site, as the page name "news.html" comes after the first slash after the domain name "searchenginewatch.com." The second page is on the second level, and the third page is on the third level.

Because the third page is "buried" within the site, it might not rank as well. However, other factors may still overcome this. Nevertheless, it would still be best to build a "shallow" site rather than a "deep" site, if you can do so. The Submitting & Encouraging Crawlers page explains this concept more.

By default, AllTheWeb uses what it calls "query rewriting." This means that for some searches, it may automatically turn them into phrases even if this hadn't been done by the searcher originally. It's helpful to be aware of when query rewriting happens, so that you understand exactly what you ranked well for.

It's easy to see when this happens. AllThWeb will tell you right near the top of the page, "Your query was rewritten into..." followed by what it turned the query into. For instance, a search for:

new york hotels

is turned into

"new york" hotels

with AllTheWeb's query rewriting.

Spam Detection

AllTheWeb tries to eliminate spam in various ways. Most important is watching for unusual linkages, sites that appear to be linking together for purposes of making themselves more popular.

Historically, AllTheWeb has also examined the frequency of terms on pages and removed those that seem excessively abnormal, when compared to the normal distribution of a word in a particular language.

That can sound scary -- what if you accidentally create an abnormal page? This isn't likely to happen, AllTheWeb has said. It is checking to see if terms appear excessively in different locations of a document and in high frequency, which can be indicative of those creating "doorway" style pages that they hope will be highly targeted toward a particular term.

In other words, let's say that you wanted to be found for "movies," so you place that word in your title tag ten times, within an H1 header, within link text, within ALT text and use it repeatedly throughout your body copy. That would seem abnormal when compared to the more common collection of documents in the same language about movies, where the word might appear once or twice in the title tag and within the body tag, but not in an extremely high proportion.

There are no hard and fast rules about where and how much is too much that AllTheWeb will release. Naturally, that would defeat the spam analysis they are doing. The main advice to take away is not to try and overly engineer your pages. Make use of the terms you want to be found for in your body copy and in your title tags, but don't go overboard.

AllTheWeb also watches for "gibberish" pages, those where the text may make no sense to a human reader, despite having a sentence structure intended to make it appear normal and relevant to crawlers.

You can find some further definitions of spam with AllTheWeb, plus an address to report it, on the AllTheWeb Content Guidelines page.

Be aware that while AllTheWeb lists some specific things as "spam techniques," such as hidden text or cloaking, it may still allow some pages in even if they use these techniques. That's because AllTheWeb is more concerned about the overall intent and user experience than the technique itself.

For example, a splash page that includes some hidden text that simply summarizes what the page is about, to make up for having no HTML text, might be allowed. Similarly, a page that cloaks but which shows content similar to what an end user sees might also be allowed. Having said this, anyone using any technique explicitly listed on the AllTheWeb Content Guidelines page should be wary, regardless of their intent.

Other Search Indexes

AllTheWeb also lets people search for more than just web pages and textual documents. Using tabs at the top of the AllTheWeb search box, users can also look for:

  • News
  • Pictures
  • Video
  • Audio
  • FTP files

AllTheWeb will also insert several news headlines into regular web search results, when it deems this appropriate.

AllTheWeb's crawler is also used to maintain separate indexes for each of these specialized search types. In other words, just as it may find your web pages automatically, so too might it find images, video files and other types of information you publish. If you don't want your images indexed, the article below provides some tips on avoiding this:

Image Search Faces Renewed Legal Challenge
The Search Engine Update, August 22, 2001

Partners

As mentioned earlier, relatively few people come to the AllTheWeb web site. Instead, people are more likely to encounter AllTheWeb's results -- including any listings you have -- at partners that use AllTheWeb. Here are some to be aware of:

  • Lycos: Terra Lycos is a major partner. It runs the Lycos web site that makes heavy use of AllTheWeb's crawler-based results. More about this use can be found on the How Lycos Works page.

  • HotBot: AllTheWeb's results are also presented to those who choose the "Lycos" setting at HotBot, which is owned by Terra Lycos. More about this use can be found on the How HotBot Works page.

  • InfoSpace: InfoSpace operates a variety of meta search engines, and AllTheWeb results are used by its Excite, MetaCrawler and WebCrawler services. They are not used by the popular Dogpile service, that it also owns.

  • Lycos Europe: Lycos Europe is also another important AllTheWeb partner. It uses AllTheWeb results for its various Lycos Europe sites in Europe (though not the Lycos Europe-owned HotBot Europe sites).

  • Freeserve: Freeserve is a major UK ISP that makes use of AllTheWeb listings.

When you are listed in AllTheWeb's web page index, your pages are made available to all the various partners. This means that you do not need to submit to each partner. However, be aware that different partners may implement the use of AllTheWeb's results in various ways. This is why you may see ranking differences, if you compare results between partners or to AllTheWeb itself.

For example, there may be slight differences between the results at Lycos and AllTheWeb, both in ranking and in counts reported. Why might this happen? Lycos has said that at its site, they boost pages written in English, since the portal is targeted toward English speakers in the US. In contrast, AllTheWeb has a more international mix of traffic, so boosting pages in English isn't done there. In addition, in some instance, Lycos will insert some listings from LookSmart before AllTheWeb results.

AllTheWeb's Future

As mentioned, Overture is now merging AllTheWeb's technology with that of AltaVista, which Overture also bought in April 2003. This technology blending should be complete by the end of 2003, if not sooner. When it happens, there will be one single "Overture" crawler that will provide results to the AllTheWeb and AltaVista web sites, as well as to any Overture partners that want these listings.

To complicate matters, Overture is in the process of being purchased by Yahoo. Yahoo has its own crawler technology -- that of Inktomi, which Yahoo bought in March 2003. Yahoo expects to complete its purchase of Overture by the end of 2003. When this happens, Yahoo will work to combine Inktomi's technology with that of Overture's. The end result will be a Yahoo crawler that will provide results to AllTheWeb, AltaVista, Yahoo and any of Yahoo's partners that want these listings.

Confused? Here's a simple illustration of what's going to happen.

 Mid 2003 Late 2003 Early 2004 |-----------|-------------|----------| | AltaVista |-> | | | AllTheWeb |-> Overture -|-> | | Inktomi --|-------------|-> Yahoo | |-----------|-------------|----------|

Why care about this? All the engineering work that's going on means that you can expect that exactly how AllTheWeb lists web sites may fluctuate through the end of 2003. In the meantime, focusing on things on how things have worked historically with AllTheWeb is the best you can do until a time of stability has been reached again.

Past Articles


SES LondonOptimising Digital Marketing Campaigns with Search, Social and Analytics
At SES London (9-11 Feb) you'll get an overview of the latest tools, tips, and tactics in Paid, Owned, Earned, Integrated Media and Business Intelligence to streamline your marketing campaigns in 2015. Register by 31 October to take advantage of Early Bird Rates.

Recommend this story

comments powered by Disqus