SEO News
author-default

More SpiderSpotting

by , Comments

I no longer track spider names closely and never have tracked IP addresses. This information is mostly in demand by those who wish to employ cloaking programs, and any good cloaking program will provide this data for you. Please see the Page Cloaking article for more resources on this topic. Also see the bottom of the SpiderSpotting page for a list of alternative resources for finding current spider names.

The information below I've kept up for historical reasons. My intent is to probably update key portions and transition this to the individual pages in the How Search Engines Work section.

IP Addresses

I don't currently track IP addresses, which some people are interested in. You can get these by monitoring your logs, as explained on the page, in the main site. You simply need to disable DNS on your server.

Another good source is the Search Engine Spider IP Addresses list maintained by Search Engine World.

This page supplements and enhances the SpiderSpotting chart in the public area of Search Engine Watch.

AltaVista - Agent Names

Scooter/2.0 G.R.A.B. X2.0

AltaVista's normal spider, which visits web sites it knows about on a routine basis.

Scooter/1.0 [email protected]

Another name for AltaVista's normal spider, when present with the normal spider host names, below.

Scooter/1.0

AltaVista's instant spider, when present with the instant spider host names, below. It only visits if a page has been submitted to AltaVista via its Add URL form, and it will only visit that specific page. It will not trigger a corresponding robots.txt request.

Scooter/2.0 G.R.A.B. V1.0.3

When associated with the brillo or soap host names, this is part of AltaVista's page removal and dead link verification service.

Scooter/1.1 (custom)

When associated with the vscooter host name, this is a visit from AltaVista image spider, which populates its Photo Finder service.

Mercator-1.0

This is a research spider that is not part of the normal AltaVista spidering system.

Others

The agent names of AltaVista Intranet V1.0 or AltaVista Intranet V2.0 may also appear, but these are private uses of AltaVista spidering technology. These are not part of the normal AltaVista spidering system.

AltaVista - Host Names

scooter3.av.pa-x.dec.com (example)
scooter.pa-x.dec.com
204.123.9.* (example)

Represents a visit from AltaVista's normal spider, as described above.

add-url.altavista.digital.com

Represents a visit from AltaVista's instant spider, as described above. This host began taking over for the old instant spider host name in Feb. 1998.

ww2.altavista.digital.com

Previous host name for the AltaVista instant spider. It is quite possible that variations such as ww3.altavista.digital.com were also used, though I never spotted any. To be safe, search your logs for ww*.altavista.digital.com.

brillo.av.pa-x.dec.com
soap.av.pa-x.dec.com

When associated with the Scooter/2.0 G.R.A.B. V1.0.3 agent name, this is part of AltaVista's page removal and dead link verification service.

vscooter.av.pa-x.dec.com

When associated with the Scooter/1.1 (custom) host name, this is a visit from AltaVista image spider, which populates its Photo Finder service.

charlotte.av.pa-x.dec.com
av-dev3.av.pa-x.dec.com
eec6.pa-x.dec.com
crawler2.crl.research.digital.com

These are host names for research or experimental spiders that are not part of the normal AltaVista spidering system.

Others

You may occasionally see host names that appear to be part of the AltaVista search engine. However, only those above are associated with adding pages to the index.

Host names such as relay*.pa-x.dec.com or www-relay.das-x.dec.com are from humans at AltaVista.

The host name tarantula.av.pa-x.dec.com is from the Web21's custom AltaVista crawler.

Euroseek - Agent and Host Names

Arachnoidea ([email protected])

Represents a visit from the Euroseek spider.

infra.euroseek.net
ultra.euroseek.net

Represents a visit from the Euroseek spider. In early to mid-1997, the IP numbers 195.84.82.5 or 195.84.82.6 were sometimes reported. Additionally, both names have appeared. Thus, a search for *euroseek.net is recommended.

Excite - Agent Name

ArchitextSpider

Represents a visit from either Excite's "mega" spider, which crawls all web sites it knows of every three weeks, or Excite's "fresh" spider, which crawls select sites once a week. You can determine which by looking at the host name, described below.

Excite is also running some temporary test spiders using the same ArchitextSpider agent name. Only those using the host names described below are used to include pages in the index.

Excite - Host Names

crawl3.atex.com (example)

Represents a visit from Excite's "mega" spider. Different host names are used, but all take the form of crawl*.atex.com, such as crawl2.atext.com or crawl3.atex.com.

crimpshrine.atex.com

Represents a visit from Excite's "fresh" spider.

ride.excite.com (example)
viola.excite.com (example)
snare.excite.com (example)

These are host names for test spiders. They are not part of the normal spidering process.

Inktomi - Agent and Host Names

Slurp/2.0 ([email protected]; http://www.inktomi.com/slurp.html)

Represents a visit from Inktomi, which powers HotBot, MSN Search and other services. Before March 1997, the agent name was the same but had a comma after [email protected], rather than a semicolon .

j2001.inktomi.com (example)

Represents a visit from Inktomi. A variety of domains may appear, but all will have a letter/number preface, such as j2001.inktomi.com or j10.inktomi.com. Because of the various host names, it is usually best to search by agent name.

Infoseek - Agent Names

InfoSeek Sidewinder/0.9

Represents a visit from Infoseek's regular crawler.

Mozilla/3.01 (Win95; I)

When coupled with the host names below, represents a visit from Infoseek's instant spider. It only visits if a page has been submitted to Infoseek via its Add URL form, and it will only visit that specific page.

Ultraseek

This is the default agent name of Infoseek's search engine technology that it sells to other companies. It is not part of the normal Infoseek crawling system, even if it appears coupled with an Infoseek host name.

Infoseek - Host Names

galore-.bbn.infoseek.com -or-
cca2625a.infoseek.com (examples)

Represents a visit from Infoseek. A variety of domains may appear, but usually all take the form of *-bbn.infoseek.com, such as galore-bbn.infoseek.com, or a letter/number combination, such as cca2625a.infoseek.com.

204.162.96.90 (example)

Infoseek sometimes only reports an IP number, rather than a host name. These are usually in the range of 204.162.98.* or 204.162.96.*.

Lycos - Agent Names

Lycos_Spider_(T-Rex)

Represents a visit from the regular Lycos spider, if associated with the first set of host names below. Represents a visit from the instant spider if the second set of host names appear. Previously, the agent name was Lycos_Spider_(T-Rex)/3.0, but this was dropped in late April 1998.

Lycos - Host Names

lycosidae.lycos.com -or-
spider3.srv.pgh.lycos.com (example)

Represents a visit from the regular Lycos spider. A variety of names may appear for the second example, such as spider3.srv.pgh.lycos.com or ocelot.eng.pgh.lycos.com. Because of this, it's usually best to search by agent name.

sjc-fe4-1.sjc.lycos.com (example)

Represents a visit from the Lycos instant spider. However, a visit does not mean the web page is actually added to the index, as with AltaVista and Infoseek. The spider only visits if a page has been submitted to Lycos via its Add URL form, and it will only visit that specific page. Previous host names have included www.lycos.com and a2z.lycos.com.

Northern Light - Agent and Host Names

Gulliver/1.2

Represents a visit from the Northern Light spider. The Gulliver/1.1 agent name was last used in mid-June 1997.

taz.northernlight.com

Represents a visit from the Northern Light spider. Until October 1997, other names were occasionally used, including gulliver.nothernlight.com, tornado.northernlight.com and the IP address of 208.219.77.9.

WebCrawler

The WebCrawler index is now built by the same spiders that create the separate Excite index.


ClickZ Live Toronto Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!*
*Early Bird Rates expire April 17.

Recommend this story

comments powered by Disqus