About The Update
The Search Engine Update is a twice-monthly update of search engine news. It is available only to Search Engine Watch members. Please note that long URLs may break into two lines in some mail readers. Cut and paste, should this occur.
In This Issue
+ Site News
+ Search Engine Strategies Coming To Dallas
+ Google Reaching Out To Webmasters
+ AltaVista To Close Own Shopping Search
+ Excite Search Voyeur Closes
+ More Layoffs & Other Search Engine Financial News
+ SearchDay Articles
+ Search Engine Resources
+ Search Engine Articles
+ List Info (Subscribing/Unsubscribing)
The Jupiter Media Metrix Search Engine Ratings page has been updated. It shows how some portals made slight gains after the closures of Go and NBCi. It also has a detailed look at the growth of Google in contrast to AltaVista's decline.
Jupiter Media Metrix Search Engine Ratings
I've also updated the How AltaVista Works page. If you've been reading the regular newsletters, you'll be up-to-date already. The update mainly incorporates the formatting changes at AltaVista that happened after I did the last update.
How AltaVista Works
Search Engine Strategies Conference Coming To Dallas
Did you miss the Search Engine Strategies conferences held earlier this year in Boston and San Francisco? Don't worry -- you've got one more chance in 2001. On November 14 & 15, Search Engine Strategies will be coming to Dallas, Texas.
Once again, I've organized two days' worth of sessions packed with information about search engine marketing. If you are a beginner and know little, the conference will bring you up to speed. Advanced? There are plenty of more in-depth sessions to choose from. Been to Search Engine Strategies before? There are a number of new and improved panels.
In addition to creating the event program, I'll be speaking at the conference, along with other search engine marketing experts. There will also be speakers from the search engines themselves, including About.com, AltaVista, Ask Jeeves, FAST Search, Google, Inktomi, LookSmart and Overture (GoTo).
Those interested in sponsoring or exhibiting should contact Frank Fazio Jr, firstname.lastname@example.org, for more information. Those interested in attending can find a conference agenda and more information via the URL below. Be sure to see the "Conference at a Glance" page, if you've come before, for a rundown on what's new.
Search Engine Strategies
Google Reaching Out To Webmasters
As Google's popularity has swelled, so has the interest in getting listed in the service from webmasters. To help with this interest, Google has been moving forward on a number of fronts. It has posted new information for site owners, opened an automated removal tool and even created an online forum for Google questions.
Last month, the "Google Information for Webmasters" area was unveiled. It provides answers to many questions webmasters have about how their pages are listed with Google, such as:
- Getting Listed
- Not Listed
- Incorrect Listing
- Rank Questions
- Dos and Don'ts
- Facts & Fiction
If you are a regular newsletter reader, much of the information will already be familiar. Nevertheless, it's well worth reviewing the information that Google has published as a refresher and to get policies and information directly from Google itself.
The ability to remove pages, page descriptions or cached copies of pages has also been made easier for webmasters. In particular, an automatic removal tool went up in the summer that lets you remove web pages, images, dead links or newsgroup posts in about 24 hours.
One innovation of the tool is that when using it, the robots.txt file need not been in a web server's root directory. That's helpful to those who have web space within another person's domain. These people may not have the ability to install a robots.txt exclusion in the root directory of the web server. However, exclusions done via subdirectory robots.txt files must be renewed every 90 days. Otherwise, the pages will again appear within Google.
In addition to fast robots.txt file removals, pages marked for exclusion with a meta robots tag can also be removed in about 24 hours, when using the automated removal tool.
In both the cases above, the automated removal tool should only be used if you want to REMOVE your pages quickly from Google. It offers absolutely no mechanism for ADDING your pages to the search engine.
The removal tool can also be used to quickly remove page descriptions, or "snippets," as Google calls them, as well as the cached copies of your web pages that Google makes available to its users. Both of these actions are conducted using options within the "Remove an outdated link" area of the removal tool -- which can also quickly kill off dead links to your site.
Removing snippets, either via the automatic removal tool or through regular crawling, depends on installing the NOSNIPPETS meta tag. You place this on any page you don't want to have described, and it looks like this:
This tag only works with Google. It will not prevent descriptions from appearing in other search engines. In addition, it will not prevent a description appearing at Google for your web page if that web page is also described in the Open Directory.
Huh? You see, Google will display both its own snippet and an Open Directory description, if a page within Google's web page index is also listed in the Open Directory. For example, look at this listing for Microsoft:
Welcome to the Microsoft Corporate Web Site
... See how Microsoft Research is working to deliver
digital butlers and more. ...
Description: Official homepage of Microsoft Corporation
Category: Computers > Companies > ... > Consumer Software > Microsoft Corporation
The portion under the title is the Google snippet, formed by seeking the first text on the page that contains the search term. If you used the nosnippets meta tag, this portion would be removed. However, the line that begins with "Description" is the description of the site from the Open Directory. This description would NOT be removed. And, the category link at the end of the listing takes you to where the site is listed, within Google's version of the Open Directory.
Removing a page's snippet also causes any cached copy of the page to be removed. In turn, this could mean that your page might receive a ranking decrease, so you'll want to be careful about removing your snippets. To understand why, let's look more about cached pages and the specific page caching removal command at Google.
Google makes it possible to see what its spider saw, when it visited a web page. For example, when you do a search, you'll see a link called "Cached" appearing below each page that is listed. Clicking on this link brings up a copy of the page out of Google's web page index, not from the site itself.
For instance, when I searched for CNN on Monday, Oct. 15, the CNN site was listed. I clicked on the cached link and saw a copy of the CNN home page from the last time Google visited it -- which was September 13, based on the date that page showed.
This is one plus to the cached page option -- you can easily measure how fresh (or not) the Google index is. The option is also a great way to see copies of pages that no longer exist or which have changed recently. Finally, Google will also highlight your search terms on the cached pages, making it easy to spot the information you seek.
As you might imagine, not every site owner wants their page cached. Indeed, the legality of Google's page caching is unknown. No one has yet sued a web-wide, crawler-based search engine over the copies of pages they make to form their listings. However, when asked in the past about possible legal concerns, the search engines have generally taken the line that because they are not making entire pages available, what they are doing to make listings isn't copyright infringement.
Google can't make that argument, when it comes to page caching. One can indeed see an copy of a web page, at least the text of the page. Google does not cache images, though if those images are still online, the page will often be reconstructed with them.
To alleviate concerns (and probably possible lawsuits), Google allows site owners to "opt-out" of page caching. They can place a special meta tag on each page they do not want to be cached:
When installed, the page will still be listed, along with a snippet, but users cannot see a cached copy of the page.
I think this is a great compromise, one that allows Google to offer the service of page caching to its users while giving site owners control, if they don't want to be cached. Indeed, I love page caching because I think it's a great way to discover people who may be infringing your copyright via IP cloaking.
With cloaking, it is possible for someone to take a copy of your page, present it to a search engine as their own and prevent you from easily knowing what they have done. Because of this, I've long said that I think every search engine should make it possible to see exactly what they have spidered, so that you can determine if copyright theft has occurred.
There are only two real arguments against having this type of mandatory page caching feature. First, there's the issue that the search engine may violate copyright by providing cached copies, as I've described as possibly being the case with Google. This is easily solved by saying that as part of the terms of being listed in a search engine, you allow a cached copy to be presented. If you don't agree, you don't get listed -- simple as that.
The second argument would be that mandatory page caching would wipe out the "advantage" that cloaking can offer, which is to show search engines paged with code optimized for their crawlers while simultaneously showing human visitors more attractive versions of those pages. However, page caching wouldn't harm this, because if a user clicks on a listing, they'll still see the human-optimized version.
Of course, one "advantage" that would be lost is the ability to prevent people from seeing highly-optimized web pages. Some pages are constructed in such a way that the optimizers don't really want they consider to be their "secret recipe" to search engine success to be seen by others. Page cloaking protects their secrets, and making page caching mandatory would definitely remove this security. However, such heavily-engineered pages are also likely to appear as gibberish to the average human visitor, and such gibberish pages are already generally seen as spam by most search engines.
Mandatory page caching doesn't exist at any search engine, even Google, since you can use the opt-out option there and stay listed. However, Google does view pages that choose to opt-out with great suspicion. Why? Because those opting-out tend to be those who are cloaking -- which Google flat out does not allow -- or those who aren't necessarily cloaking but trying to do other things that Google considers to be manipulative.
"I was pretty struck by how few people use the noarchive tag for the reason it was intended [to protect copyright”," said Matt Cutts, the software engineer at Google who deals often with spam and webmaster issues.
Because of this, site owners should be careful in using the noarchive tag, as doing so will probably subject the page to greater scrutiny and a stronger penalty, if it is found to be spam.
"We can use the fact that they say noarchive to single them out," Cutts said. "And, while we don't penalize pages for using it, if a spam page uses the noarchive tag, then the penalty for that page becomes more severe."
In recent months, Google has also taken a stronger stance against spam. It's a change for a search engine whose founders used to quip back in 1999 that they weren't worried about spam. However, in 2001, attempts to spam the engine and tap into Google's rising popularity have become a problem.
"It's a growing priority. We've seen hundreds making attempts," Cutts said.
Google's hit with all the usual suspects, such as mirror sites, low-quality doorway pages and pages with invisible text. Link farms are also a problem, where sites are creating artificial link structures in hopes of boosting their popularity in Google's link analysis system. And cloaking, which Google considers to be spam, nonetheless can still get through the company's filters.
Going forward, Google is planning to tighten its spam filters even more. It's also taking new steps to adjust scoring. For example, text in the no frames area of frame pages is weighted less than ordinary HTML text. Why? Because no frames text is more or less invisible to users and thus seen as less trustworthy, since it is not constantly being evaluated by human visitors. Similarly, text or links that are hidden in some way from easy viewing are also likely to be downgraded in importance.
Google doesn't see such changes as specifically going after spam, however. Instead, the company views this as part of its overall job in producing the best search results possible.
"I would not cast our efforts as a growing hard-line on spam. I'd say that we're strengthening our algorithms in many different ways to improve quality, and that naturally has effect on spam as well. It's certainly true that people try all sorts of tricks, but Google is still more resistant to spam than other engines, and I expect it to become even better at scoring pages over time," Cutts said.
Part of improving results also means helping and educating site owners.
"I think that the overarching trend lately is becoming more responsive to webmasters. The 'info for webmasters' page is a good start, as is the URL removal tool. The Google newsgroup is another way to reach out to webmasters. We have even had employees join forums in a friendly but unofficial capacity. All of these efforts are in an attempt to help webmasters," Cutts said.
That Google newsgroup is a public forum that opened in early September. It's meant to be for all things Google, not just webmaster issues. Nevertheless, site owners may find help with listing issues there.
Google says it doesn't monitor the group on a regular basis but rather now and then. It primarily relies on Google users to help each other, though it will provide assistance directly, if seen as necessary or useful. Google says it also may break out sub-groups for particular topics in the future, if appropriate.
The "unofficial" capacity is a reference to the appearance of "Google Guy" at the Webmaster World web site. A real Google employee, he's asked for webmaster feedback and offered to bust myths occasionally, to those in the area.
Google Information for Webmasters
Google hitting your server too fast? Want to know the best way to get listed. Can a competitor hurt your rankings? Answer to these and more can be found in this area at Google.
Remove Content from Google's Index
Detailed information on removing pages, newsgroup posts, dead links or snippets from Google. Also has links to the fast, automatic removal tool.
Google Public Support Group
This is the Usenet area where Google questions are discussed.
Webmaster World: Greetings From Google
Lots of questions to Google are being posted here, with some infrequent answers. In particular, don't expect Flash content to get indexed anytime soon because there's not really a lot of text to Flash content that can be indexed.
How Google Works
Information for Search Engine Watch members that covers key details on how Google operates.
How To Block Search Engines
Beginner's guide to the robots.txt file, to block pages from being spidered. Also links to information on the similar meta robots tag.
Search Engines and Legal Issues
You'll find articles here that discuss issues about the legality of robot crawling as well as issues involving pagejacking and cloaking.
AltaVista To Close Own Shopping Search
AltaVista will no longer gather its own shopping search information but instead plans to outsource for this feature, in the near future.
"We have been approached by several leading shopping sites who are interested in becoming a strategic partner for us in this space," said spokesperson Kristi Kaspar.
AltaVista gained its own shopping search ability via the Compaq acquisition of Shopping.com back in January 1999. Despite outsourcing shopping search, the company does plan to continue producing its own web-wide search index.
Meanwhile, the paid listings dance at AltaVista that has gone on this year continues. Paid listings from AltaVista or Overture now appear under the "Products and Services" heading at the top of the results page, then again come more links under that heading at the bottom of the results page. At least, that's what was happening Monday. I've seen the bottom listings called "Featured Sites" last week, and no doubt we'll see things change once again, going forward.
Just to confuse you more, you'll also find that in some cases, no paid listings from Overture appear. For example, a search for "cars" brings up only one paid listing, this being sold by AltaVista itself. Most likely, a deal with the advertiser prevents any Overture listings from appearing.
So, if you are an Overture advertiser, you can expect that being in position 1 through 6 may get you on the AltaVista results page, if it hasn't sold any ads itself. However, if AltaVista has sold ads, you may find that you won't appear at all, or you may appear if you are in positions 1-5, if AltaVista has sold an ad without a competitive restriction. And you may find that all this changes next week, if AltaVista once again moves things around without warning or consistency.
AltaVista pulls down the blinds on e-tail unit
Silicon.com, Oct. 8, 2001
Short details on closure of AltaVista's shopping search. AltaVista is "entirely committed" to continuing to operate a consumer-oriented search site, however. Sorry about the big, long URL. That's just the way this site does it. If the URL doesn't work, then try searching for the story title, at this site.
Excite Search Voyeur Closes
Excite has closed its "Search Voyeur" live display feature. The service showed what people were searching on at the Excite service. It grew out of the WebCrawler Search Ticker and thus was the first and oldest of the live search displays offered.
What People Search For
You can find other live search displays here.
More Layoffs & Other Search Engine Financial News
More layoffs -- this time for Inktomi and Lycos Asia, while Yahoo says it is thinking about more for the near future. And will anyone buy Excite?
Yahoo considers more layoffs
News.com, Oct. 10, 2001
Yahoo met analyst expectations, with a net loss of $24.1 million, down from last year's pro forma profit of $81.1 million. The company is considering further layoffs and restructuring.
Lycos Asia axes 60% of staff
BBC, Oct. 11, 2001
Deep cuts at Lycos Asia. While the 11 current sites serving the Asian area will remain operating, the focus will be on Singapore, Hong Kong and China.
Who wants to buy Excite?
News.com, Oct. 8, 2001
What wants to buy Excite@Home's portal business? Probably no one, it seems.
Inktomi holds to Q4 view, cuts jobs, sees charges
Reuters, Oct. 1, 2001
Inktomi has cut another 150 employees.
Here are some recent articles that may be of interest, from Search Engine Watch's daily SearchDay newsletter:
Refuge For About.gone Guides
SearchDay, Oct. 10, 2001
A new grassroots effort is helping former About.com Guides re-establish their sites and providing a clearinghouse of information to help users find their favorite ex-Guides.
Deleted "Sensitive" Web Sites Still Available via Google
SearchDay, Oct. 9, 2001
Heightened security concerns have led a number of organizations to remove "sensitive" information from their web sites, yet much of this information is still available, even to people with relatively modest searching skills. Use the link from this page to reach the extended "members-only" version.
Bookmarks with Brawn
SearchDay, Oct. 4, 2001
Co-citer is a simple but powerful replacement for Internet Explorer's wimpy 'favorites' manager -- and best of all, it's free.
Search Engine Optimization and the Law
SearchDay, Oct. 3, 2001
Legal experts urge webmasters to think carefully before using tactics such as competitors' words in meta tags or buying trademarked keywords on search engines.
About Face at About.com
SearchDay, Oct. 2, 2001
About.com slashes staff and axes more than 300 topic 'Guide' sites from its service, with significant implications for webmasters and searchers alike. Use the link from this page to reach the extended "members-only" version, which includes information on submitting to About.
Searching With Latitude
SearchDay, Oct. 1, 2001
The Degree Confluence Project is an unusual but intriguing search engine, using latitude and longitude as search keywords.
On the archive page below, you'll find more articles like those above, plus have the ability to sign-up for the free newsletter.
Search Engine Resources
Search Engine Watch associate editor Chris Sherman has coauthored a new book on the Invisible Web. As a companion to that book, this new site makes available hundreds of Invisible Web resources that are useful to searchers. Check out the site, to find resources, to learn more about what the Invisible Web is and to learn more about the book.
If you like the idea of seeing your web results visually, this meta search site shows the results with sites being interconnected by keywords. Normally, I find these type of attempts fail to be compelling, but this one was kind of fun. Thanks to reader G. Charriau, for the passing it on.
New service that offers background and links about famous people. Browse or search to find people of interest.
Search Engine Optimizer
I haven't had a chance to play with this page analyzer tool, but it is produced in part by Robin Nobles, a veteran of the SEO industry. It is designed to check pages for possible problems with search engines in over 60 different areas.
Search Engine Articles
On, Nov. 2001
Long profile of Google, both history and current directions.
AT&T Wireless adding Google to phones
News.com, Oct. 15, 2001
Google has a pretty cool feature that allows those using WAP browsers to use a special version of the search engine where the search results are formatted for small screens. In addition, when you visit any link, Google continues to convert HTML into a WAP format on the fly, making it an easy way to view the web while mobile. This technology makes it no wonder that the search engine is making gains among wireless providers, such as this latest deal with AT&T. It follows on earlier deals with Sprint PCS, Cingular Wireless, Handspring, Palm and Vodafone.
On the size of the World Wide Web
Pandia, Oct. 14, 2001
There are now over 8 million web sites according to researchers at the Online Computer Library Center, but the web's growth has slowed markedly when compared to previous years. The vast majority of web sites are written in English -- 73 percent, with German coming in at second place with 7 percent.
Yahoo, MSN Spar Over Traffic Figures
SiliconValley.internet.com, Oct. 12, 2001
We're the biggest, says Yahoo. No, we're the biggest, counters MSN. A look at the dueling audience figures. Yahoo claims its 210 million unique visitors worldwide in September -- as measured by Nielsen//NetRatings -- makes it the largest global web property. MSN says it had 270 million unique visitors according to Nielsen//NetRatings-rival Jupiter Media Metrix.
Web search error adds Muppet to Bin Laden cause
ZDNet UK, Oct. 11, 2001
Bert-Osama Site Taken Down
Reuters, Oct. 12, 2001
Some protestors in Pakistan against the bombings in Afghanistan were holding posters with images of Osama bin Laden -- and one of those images showed him sitting alongside Sesame Street's Bert. What happened is that a joke photograph showing the two together is available on various sites across the web. Because of this, a search for "osama bin laden" on Google's image search service brings up the parody alongside other pictures of bin Laden. Presumably, protestors seeking pictures of bin Laden did a Google search and found the Bert-Osama picture. That's also not really an "error" on Google's part, as the headline of the first article above puts it, because the parody does include bin Laden. The second article explains that the original "Bert Is Evil" site has now closed. However, the image that Google lists was probably copied from that site to another site, so you'll still find the picture appearing in its search results.
Clicking Into History
DC.internet.com, Oct. 11, 2001
Television Archive: Sept. 11
A look at selected sites across the web from Sept. 11 has been preserved by the US Library of Congress. The second URL is to a newly released television archive of coverage from that day.
Intelliseek's BullsEye Turns 3 With Grace
About.com Web Search Guide, Oct. 9, 2001
Review of the software-based search software BullsEye, which has meta search and invisible web search capabilities.
The Fast search engine expands in Europe
Pandia, Oct. 4, 2001
As of November 1, FAST will be powering search results for T-Online, Germany's biggest ISP. It's just another of the company's big wins in the European search space. In some other news, the larger index that FAST plans is supposed to go live either toward the end of this year or early next year. Finally, FAST says its paid inclusion program should emerge from beta testing by December. Expect a longer look to revisit the program, when that happens.
Web Search Engines FAQS: Questions, Answers, and Issues
Information Today, Oct. 2001
In-depth information, resources, tips and advice on web searching from search expert Gary Price.
The Effects of September 11 on the Leading Search Engine
First Monday, Oct. 2001
Look at how Google reacted to the Sept. 11 terrorist attacks.
How do I unsubscribe?
+ Follow the instructions at the very end of this email.
How do I subscribe?
+ The Search Engine Update is only available to paid members of the Search Engine Watch web site. If you are not a member and somehow are receiving a copy of the newsletter, learn how to become a member at: http://searchenginewatch.com/about/subscribe.html
How do I see past issues?
+ Follow the links at:
Is there an HTML version?
+ Yes, but not via email. View it online at:
How do I change my address?
+ Send a message to email@example.com
I need human help with my membership!
+ Send a message to firstname.lastname@example.org. DO NOT send messages regarding list management or membership issues to Danny Sullivan. He does not deal with these directly.
I have feedback about an article!
+ I'd love to hear it. Use the form at