I no longer track spider names closely and never have tracked IP addresses. This information is mostly in demand by those who wish to employ cloaking programs, and any good cloaking program will provide this data for you. Please see the Page Cloaking article for more resources on this topic. Also see the bottom of the SpiderSpotting page for a list of alternative resources for finding current spider names.
The information below I've kept up for historical reasons. My intent is to probably update key portions and transition this to the individual pages in the How Search Engines Work section.
I don't currently track IP addresses, which some people are interested in. You can get these by monitoring your logs, as explained on the page, in the main site. You simply need to disable DNS on your server.
Another good source is the Search Engine Spider IP Addresses list maintained by Search Engine World.
This page supplements and enhances the SpiderSpotting chart in the public area of Search Engine Watch.
|AltaVista - Agent Names|
Scooter/2.0 G.R.A.B. X2.0
AltaVista's normal spider, which visits web sites it knows about on a routine basis.
Scooter/1.0 [email protected]
Another name for AltaVista's normal spider, when present with the normal spider host names, below.
AltaVista's instant spider, when present with the instant spider host names, below. It only visits if a page has been submitted to AltaVista via its Add URL form, and it will only visit that specific page. It will not trigger a corresponding robots.txt request.
Scooter/2.0 G.R.A.B. V1.0.3
When associated with the brillo or soap host names, this is part of AltaVista's page removal and dead link verification service.
When associated with the vscooter host name, this is a visit from AltaVista image spider, which populates its Photo Finder service.
This is a research spider that is not part of the normal AltaVista spidering system.
The agent names of AltaVista Intranet V1.0 or AltaVista Intranet V2.0 may also appear, but these are private uses of AltaVista spidering technology. These are not part of the normal AltaVista spidering system.
|AltaVista - Host Names|
Represents a visit from AltaVista's normal spider, as described above.
Represents a visit from AltaVista's instant spider, as described above. This host began taking over for the old instant spider host name in Feb. 1998.
Previous host name for the AltaVista instant spider. It is quite possible that variations such as ww3.altavista.digital.com were also used, though I never spotted any. To be safe, search your logs for ww*.altavista.digital.com.
When associated with the Scooter/2.0 G.R.A.B. V1.0.3 agent name, this is part of AltaVista's page removal and dead link verification service.
When associated with the Scooter/1.1 (custom) host name, this is a visit from AltaVista image spider, which populates its Photo Finder service.
These are host names for research or experimental spiders that are not part of the normal AltaVista spidering system.
You may occasionally see host names that appear to be part of the AltaVista search engine. However, only those above are associated with adding pages to the index.
Host names such as relay*.pa-x.dec.com or www-relay.das-x.dec.com are from humans at AltaVista.
The host name tarantula.av.pa-x.dec.com is from the Web21's custom AltaVista crawler.
|Euroseek - Agent and Host Names|
Arachnoidea ([email protected])
Represents a visit from the Euroseek spider.
Represents a visit from the Euroseek spider. In early to mid-1997, the IP numbers 126.96.36.199 or 188.8.131.52 were sometimes reported. Additionally, both names have appeared. Thus, a search for *euroseek.net is recommended.
|Excite - Agent Name|
Represents a visit from either Excite's "mega" spider, which crawls all web sites it knows of every three weeks, or Excite's "fresh" spider, which crawls select sites once a week. You can determine which by looking at the host name, described below.
Excite is also running some temporary test spiders using the same ArchitextSpider agent name. Only those using the host names described below are used to include pages in the index.
|Excite - Host Names|
Represents a visit from Excite's "mega" spider. Different host names are used, but all take the form of crawl*.atex.com, such as crawl2.atext.com or crawl3.atex.com.
Represents a visit from Excite's "fresh" spider.
These are host names for test spiders. They are not part of the normal spidering process.
|Inktomi - Agent and Host Names|
Slurp/2.0 ([email protected]; http://www.inktomi.com/slurp.html)
Represents a visit from Inktomi, which powers HotBot, MSN Search and other services. Before March 1997, the agent name was the same but had a comma after [email protected], rather than a semicolon .
Represents a visit from Inktomi. A variety of domains may appear, but all will have a letter/number preface, such as j2001.inktomi.com or j10.inktomi.com. Because of the various host names, it is usually best to search by agent name.
|Infoseek - Agent Names|
Represents a visit from Infoseek's regular crawler.
Mozilla/3.01 (Win95; I)
When coupled with the host names below, represents a visit from Infoseek's instant spider. It only visits if a page has been submitted to Infoseek via its Add URL form, and it will only visit that specific page.
This is the default agent name of Infoseek's search engine technology that it sells to other companies. It is not part of the normal Infoseek crawling system, even if it appears coupled with an Infoseek host name.
|Infoseek - Host Names|
Represents a visit from Infoseek. A variety of domains may appear, but usually all take the form of *-bbn.infoseek.com, such as galore-bbn.infoseek.com, or a letter/number combination, such as cca2625a.infoseek.com.
Infoseek sometimes only reports an IP number, rather than a host name. These are usually in the range of 204.162.98.* or 204.162.96.*.
|Lycos - Agent Names|
Represents a visit from the regular Lycos spider, if associated with the first set of host names below. Represents a visit from the instant spider if the second set of host names appear. Previously, the agent name was Lycos_Spider_(T-Rex)/3.0, but this was dropped in late April 1998.
|Lycos - Host Names|
Represents a visit from the regular Lycos spider. A variety of names may appear for the second example, such as spider3.srv.pgh.lycos.com or ocelot.eng.pgh.lycos.com. Because of this, it's usually best to search by agent name.
Represents a visit from the Lycos instant spider. However, a visit does not mean the web page is actually added to the index, as with AltaVista and Infoseek. The spider only visits if a page has been submitted to Lycos via its Add URL form, and it will only visit that specific page. Previous host names have included www.lycos.com and a2z.lycos.com.
|Northern Light - Agent and Host Names|
Represents a visit from the Northern Light spider. The Gulliver/1.1 agent name was last used in mid-June 1997.
Represents a visit from the Northern Light spider. Until October 1997, other names were occasionally used, including gulliver.nothernlight.com, tornado.northernlight.com and the IP address of 184.108.40.206.
The WebCrawler index is now built by the same spiders that create the separate Excite index.
Optimising Digital Marketing Campaigns with Search, Social and Analytics
At SES London (9-11 Feb) you'll get an overview of the latest tools, tips, and tactics in Paid, Owned, Earned, Integrated Media and Business Intelligence to streamline your marketing campaigns in 2015. Register by 31 October to take advantage of Early Bird Rates.