SES Chicago - December 7-11, 2009

January 5, 2009

Track Google Rankings With Google Analytics

Guest blogger for Yoast Andre Scholten has come up with what I think is the best filter ever created for Google Analytics. He has given us a way to track keyword rankings!

The filter is so good it even has options of how deep to go, a choice between word rank by number or page number. Click now - rush to this one - everyone has to read this.

Bravo mate - this goes down as the Best Damn Google Analytics Filter Ever. Mr De Valk - a truly great guest find.

UPDATE: Was linking to wrong page but corrected.

Posted by Frank Watson at 10:55 PM | Permalink | Comments (2)

December 29, 2008

Does Google Weight Algorithm In Their Own Favor?

I was looking for something about analytics today so went to Google and did a search. The number one result was Google Analytics - not surprising given it is very popular and very discussed. But a closer look showed it had 3 listings for the site and one for the blog - a bit of a home field advantage.

So I looked a little deeper.

A search for catalog and catalogue and low and behold Google catalogs tops the list. A little curious - they cannot possibly be the best known online catalogs.

And on to email. We know Yahoo has the biggest email program and yet it comes in second behind them. Does Yahoo just not know how to optimize?

Interestingly, a search for videos places Google Video above YouTube which they also own and is one of the most popular sites on the web.

So is there a self-interest factor in their algorithm or have human hands been involved?

Hey they own it so have every right to present data any way they want - but a little transparency on this would be good.

Posted by Frank Watson at 10:41 PM | Permalink | Comments (7)

April 18, 2008

Case Sensitive Google Search Results Being Found

Chris Silver Smith over at NetConcepts found case sensitive results in Google SERPs. Let's hope this is just some error and not a change that will see many people scrambling to make wholesale changes to their SEO efforts.

Talk about ways to further pollute the results.... rewritten copy saying the same thing but geared towards upper and lower case search results... does anyone win here?

Posted by Frank Watson at 2:28 PM | Permalink

January 25, 2008

Google Position 6 Smack-Down: Filter, Penalty, or Bunk?

Since late December, best-of-breed search marketers have been chattering about a supposed and creepy “Position 6” Google SERPs punishment pattern where pages which, by all indications, should dominate the organic SERPs somehow place at lowly #6. Google's Matt Cutts has previously dismissed the notion that Position 6 is real. Yesterday the debate amongst search marketers flared to full blown public jamming in major SEM blogs and virulent comment threads. Aaron Wall, venerable blogger-purveyor of SEOBOOK, restarted the conversation with his post, “How I Got My Google Ranking #6 Filter Removed.” The post was bookmarked in Sphinn and SEO scientists argued about Position 6 throughout the day, resulting in passionate posts (and even arguments) in trade publications.

Is Position 6 real?

Respected SEM technician Sebastian reflected the position of many SEM pros and noted a lack of studies that that provide ”proof instead of weird assumptions based on claims of webmasters jumping on today's popular band wagon that aren't plausible nor verifiable...such beasts don't exist.”

Danny Sullivan joined the fracas with an SEL post and his impression that Position 6 is real. “Well, I've personally seen this weirdness. Pages that I absolutely thought ‘what on earth is that doing at six' rather than at the top of the page. Not four, not seven -- six. It was freaking weird for several different searches. Nothing competitive, either.”

Is Position 6 an actual Google penalty or fodder for SEMs who are imagining patterns where none exist? Have you experienced Position 6? Real or imagined, it's certainly generating a lot of attention and links amongst search marketers and webmasters. Stay tuned and watch your Google organic SERPs.

Posted by Marty Weintraub at 7:10 AM | Permalink

November 30, 2006

Google Ordered By Another North Carolina Court To Remove Pages

Apparently, North Carolina is going to start a trend of people who get court orders to remove material Google has spidered when left out in public view. This week, Google was ordered to remove material by a court in that state. It follows a similar court order in a different case earlier this year.

North Carolina County Gets Restraining Order Against Google from the Associated Press covers how social security numbers, cell phone numbers and other personal information was left online by Johnston County, which means Google (and likely other search engines) spidered the material.

When the country realized this, they sought to have it removed. However, they were told it might take up to five days to remove, prompting the county to go the legal route:

Fearing the possibility of identity theft, Johnston County officials asked Google on Monday to remove the information. It was first posted on the county's Web site by accident six weeks ago and discovered Friday. Mountain View, Calif.-based Google responded that removal could take up to about five days, said county attorney Mark Payne.

"It surprised me that Google didn't immediately recognize that this was something that posed a real danger of real damage to our citizens," Payne said.

Hey, it surprised me that Johnston County didn't immediately recognize that the information shouldn't have been put on the public web in the first place. However, that appears to have happened because of a third party contractor.

What about the automatic URL removal system? I seem to recall that as getting pages out in 48 hours or less (but I might be remembering incorrectly). Checking today, officially it is longer (unofficially, I hear it goes faster):

You may process your URL for removal from Google's search results. URLs will be removed after we've verified your request. Bear in mind that verification can take several days or longer and all pages submitted via the automatic URL removal system will be removed from the Google index temporarily for six months.

Google Blamed For Indexing Student Test Scores & Social Security Numbers and Follow-Up: School Couldn't Reach Google Until Injunction Filed cover how a school authority in North Carolina went to the courts to remove pages from Google in June.

Posted by Danny Sullivan at 1:06 PM | Permalink

November 27, 2006

Matt Cutts On Site Problems & Mistakes To Avoid

Ever wonder what is going through the head of a Matt Cutts (Googler) while he is sitting on a site reviews panel reviewing sites? Matt Cutts posted his detailed notes of the panel he did in Vegas at PubCon. He explains some on site problems and mistakes a webmaster should avoid. It is worth a read, because Matt totally kicked my coverage of that session.

Posted by Barry Schwartz at 10:19 AM | Permalink

November 24, 2006

Extended Indented Google Results Bug?

Philipp Lenssen spotted listings with more than one indented results for a search on get fuzzy. The first two results were from comics.com and the two other indented results were from Yahoo News. All these results were under the first listing, so we had three indented results showing under the top listing. Typically, there is only one indented result and no more. I cannot replicate this, it seems like a weird bug that may have been fixed. Philipp has a screen capture of it in action.

Posted by Barry Schwartz at 8:42 AM | Permalink

November 7, 2006

Google Recent FAQ List From WebmasterWorld

I reported this morning that WebmasterWorldhas an excellent summary sticky thread with links to important and mostly recent topics that can be very useful to SEOs and Webmasters on Google optimization tips. Here is the rundown.

Posted by Barry Schwartz at 10:22 AM | Permalink

Google's Adam Lasnik Interviewed

Lee Odden interviewed Adam Lasnik of Google the other day. Adam Lasnik is one of the folks at Google responsible for being the voice of the webmaster. His day job is to help webmasters with ranking and indexing issues either through communication or other means. I find the read interesting and helpful - it really shows that Google cares.

As for what questions annoy me the most? There aren't any specific ones that I find particularly frustrating. Rather, I do occasionally grow weary with two types of questions:

1) Questions that are clearly answered in our much-improved Webmaster Central, via a quick search of our Webmaster Help group, or questions that would also be likely answered via use of our Webmaster Tools. There's no such thing as a stupid question, IMHO, but lazy questions… well, that's a different story. 2) Accusatory “questions.” I suppose I need to get some thicker skin, but it stings when people imply that we either don't care or — worse — that a relationship between Webmasters and Google must inherently be adversarial. Every time I've spoken with Larry Page and Marissa Mayer they've made it unequivocally clear that being mindful of Webmaster concerns is something resonating not just in Search Quality, but from the very top of Google.

Posted by Barry Schwartz at 8:26 AM | Permalink

October 30, 2006

Judge To Rule By End Of Year On Kinderstart Case

Reuters reports that Judge Jeremy Fogel said he will take until the end of this year to rule on the Kinderstart case. The case was about how Kinderstart's ranking and PageRank fell and Kinderstart sued Google on numerous counts for the ranking drop. The judge recently said, "Assuming Google is saying that KinderStart's Web site isn't worth seeing. Why can't they say that? That's my question." So he will consider this and other questions in his ruling.

Posted by Barry Schwartz at 9:03 AM | Permalink

October 25, 2006

Google Removes Dynamic Parameter Clause From Webmaster Guidelines

The Google Blog notes that they have updated their webmaster guidelines to be more up to date with their crawling and indexing technology. Since Google now is able to crawl and index URLs with parameters, i.e. dynamic URLs - they have removed the line that reads "Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index." Google still recommends keeping those parameters down to a minimum and calls rewriting dynamic URLs into user-friendly versions, "a good practice."

Posted by Barry Schwartz at 12:42 PM | Permalink

October 17, 2006

Google Webmaster Tools Gain Crawl Charts, Enhanced Crawl Rate & Image Labeler Support

Learn more about Googlebot's crawl of your site and more! at the Official Google Webmaster Central Blog covers new features Google has added, visual charts to show Googlebot's crawling activity, expanded crawl rate support, inclusion in the image search labeling program and number of URLs submitted. I talked with the Google Webmaster Central team earlier this week, and here are a few more details on some of the features.

To see Googlebot activity reports, go to Google Webmaster Tools, choose one of the sites you've verified, then pick the "Crawl rate" option on the Diagnostics tab. You'll get a chart showing how many pages Google has crawled per day over the past three months. For example, here's what it looks like for the Search Engine Watch Blog:

It's interesting to see visually how Google has backed off the number of requests over time. There's nothing I've done to do this, but it may reflect Google getting smarter about the fact that it doesn't need to revisit every page on the site so often. It could also be due to our server being less responsive (see below).

You can also see kilobytes downloaded per day, as well as the time spent downloading a page in milliseconds. The chart on that for us is really revealing:

You can see that our response time nearly doubled at the end of July. That's exactly when we left our servers at Jupitermedia, our old publisher, and switched to new ones with Incisive, our current publisher. Despite the slower time, I haven't noticed any drop in traffic from Google, so the slower responsiveness -- while not good -- hasn't been damaging. But if you did see a plunge in traffic, a chart like this might help you visually realize what might be wrong directly from Google.

At the bottom of the Crawl rate page is the ability to set how fast you want Google to crawl your site. This was introduced back in August, but now it's available to everyone using Google Webmaster Tools, not just some. In addition, Google has simplified the options from five to just three, Faster, Normal and Slower. Google said feedback suggested fewer options would be easier to understand.

Crawl rate still doesn't guarantee that Google will hit your server faster or slower than normal, even if you request it. But Google said it is much more responsive to these requests now. In fact, it is so responsive that you need to renew your choice every 90 days. That's to prevent someone authorized on your account from telling Google to slam your server, then leaving and Google continuing to do that forevermore.

Also on the Diagnostics tab, you'll find an Enhanced Image Search option. What's that about? For now, it simply means that images from your site will be available to those using the Google Image Labeler system, which we wrote about last month: Google Images Labeler: Google's Challenge To Flickr?

Not all images from Google Images are currently added to Google Image Labeler. Google said it currently uses a subset of pictures that it feels site owners would be amenable to having labeled. This new feature lets you explicitly tell Google you'd like to have your pictures play in the new program. More on this is covered in the help page about enhanced image search.

Finally, if you submit a sitemap to Google, it will now tell you the number of pages submitted in that sitemap. Why care? Apparently, at least one person did and requested the feature. As Google explains in that blog post, this person generated a sitemap automatically and so had no idea how many URLs he was spitting out in it. Now he -- and others -- can know.

Posted by Danny Sullivan at 7:04 PM | Permalink

October 16, 2006

Goodbye Google Sandbox, Hello Google Minus Thirty Penalty

The Google Sandbox concept used to be the idea that a new site couldn't rank well on Google until a certain time period had passed. Over the past two years, it's been debated, redefined, morphed and then largely something I'd say people have moved on from. Naturally, we need something else in its place -- and the -30 penalty seems to be the likely candidate.

Drop 30 Points in Google? Meet The "Minus Thirty" Penalty? over at Search Engine Roundtable covers how some people are saying they always rank at number 31 on Google, with their impression being that Google must be placing some type of penalty to kick them past number 30. Google Says "No Ranking Soup for You" at Threadwatch has a similar roundup of forum discussions, along with more debate.

Posted by Danny Sullivan at 11:19 AM | Permalink

October 11, 2006

Weather Reports: Yahoo Search Update & Google Status Report

We received two search "weather reports" last night, the first from the Yahoo Search Blog that announced that an "index update" that has begun rolling out last night. The other from Matt Cutts blog that informed us of Google's "update on search quality/infrastructure on Google going into the fall."

Yahoo told us to expect "some changes in ranking along with shuffling of the pages that are included in the index" but based on my tracking of the search forums, either there is not enough shuffling or Yahoo isn't sending enough traffic these days for SEOs to care about it. So keep an eye out for that.

Matt Cutts basically gave a summary of what happened since his last Google weather report and what to expect in the short-term future. He mentioned Big Daddy, their crawl caching proxy, the new supplemental index, the site: command update, and much more. He also posted on smaller issues later in the night on, as a continuation to his weather report.

Posted by Barry Schwartz at 9:25 AM | Permalink

October 5, 2006

Google Blog Search Adds Ping Support

The Google Blog announced that Google Blog Search now supports the acceptance of pings. So when you add a new blog entry and you support RSS/XML/Atom you can send Google Blog Search a ping at their Blog Search Pinging Service. How do you do this? Well, you can read more at About Google Blog Search Pinging Service and view the Pinging API yourself. I tested it out on a different blog and got a 404, but it is very possible I pinged the wrong URL, I will test it again shortly.

Postscript From Danny:

Movable Type users want to put this: http://blogsearch.google.com/ping/RPC2 into the "Others" box in the "Notify the following sites upon weblog updates" section of "New Entry Default Settings."

WordPress users, go to Options, Writing Options, the paste that line into your Update Services box.

Posted by Barry Schwartz at 12:45 PM | Permalink

October 4, 2006

How Google's Q&A One Box Results Work

The Google Operating Systems blog has slides from Peter Norvig's presentation at UC Berkeley on how the Google One Box Q&A results work. He says that "Google doesn't use predefined patterns, they find the patterns from examples, as this approach is more scalable." The slides show the algorithms that detect these patterns from examples. Is it perfect? No. Ben at the Search Engine Roundtable discovered Google OneBox Q&A Adult Spam last month, in which Matt Cutts confirmed to be an issue with the OneBox extractor code.

Posted by Barry Schwartz at 8:58 AM | Permalink

October 3, 2006

Google's Matt Cutts Answers Questions On PageRank

Ever since Google has become popular, SEOs and webmasters have been trying to uncover the hidden secrets of Google's PageRank. Last night, Matt Cutts of Google answered some questions in his more info on PageRank blog post. What should you take away from the questions and answers? Matt said, "you won't see any search engine result page (SERP) changes as a result of this PageRank export." In short, Google takes their "internal PageRanks, put them on a 0-10 scale, and export them so that they're visible to Google Toolbar users." By the time those numbers are pushed to the Google toolbar and visible to folks, they are already outdated.

Posted by Barry Schwartz at 8:18 AM | Permalink

September 27, 2006

John Battelle Talks With Matt Cutts & Nofollow Attribute The Same As Meta Robots Nofollow?

John Battelle has a short interview with Google spam fighter Matt Cutts. The most interesting part I found was news that the W3C has added a meta nofollow tag to their page with paid links, which Matt seems to say is the same as the completely different nofollow attribute and thus something acceptable for to do by those selling links who fear the wrath of Google.

Let's back up. You can put a meta robots tag on your pages with the value of "nofollow," as described here. This tag, about 10 years old now, long predates any concerns about link selling skewing search results or the nofollow attribute. It is supposed to tell a search engine not to follow any links on a page, for purposes of indexing those links.

In other words, you've got a page with 20 links leading to other pages in your web site. Put nofollow into a meta robots tag, and you're telling the search engine not to follow the links on that page to those other pages.

An important note. Just using nofollow doesn't protect those other pages from being indexed. If there's any other links pointing at them from anywhere on the web, search engines will follow through to them that way. So if you don't want them indexed, you need to make use of a meta noindex tag or robots.txt text to specifically block them.

Now on to the nofollow attribute. Created in January 2005, it was a way to flag particular links to search engines as those a site owner doesn't explicitly approve of. It was never defined as a means to telling search engines not to actually "follow" the link. It was more a way to say that you don't endorse the link. In fact, to my knowledge, Yahoo and perhaps others will still "click on" or follow links even if they make use of the nofollow attribute.

Now to the W3C. W3C Selling PageRank Or Thanking Supporters? covers how some have felt they've effectively been selling links without using the nofollow attribute that Matt Cutts in particular has urged those selling links to do, lest they potentially be penalized by Google.

In Matt's interview, we read that using nofollow in the meta robots tag might be seen as the same thing as a nofollow attribute, at least in Google's eyes. That's a completely new thing to me. I've commented on Matt's blog post about the interview, to see if he'll clarify more.

Aside from nofollow, the interview also gets into some interesting discussion of whether Google should do more to use humans in refining results.

Posted by Danny Sullivan at 7:42 AM | Permalink

September 25, 2006

Google Displaying Really Long Site Descriptions?

Philipp Lenssen spotted a screen capture of Google displaying a really long and extended description within the search results page for a search on [blogspot.com autoregistration]. I personally do not see the nine or so line description myself. Matt Cutts of Google commented saying he was also not able to "recreate those snippets," so maybe a temporary Google bug, spyware, or a fake?

Posted by Barry Schwartz at 12:40 PM | Permalink

September 22, 2006

Google On How To Let Googlebot In, Keep Bad Bots Out

One of the things that came out of our Bot Obedience Course at SES San Jose last month was a wish that search engines somehow made it possible for site owners to know they were sending "trusted" or "certified" spiders. Now Google's suggested one way this can be done.

Those blocking rogue spiders through IP filtering run the risk that they might accidentally keep some of the "good" bots out. If you don't know all the Google IP addresses, there's a chance you might reject a Google spider accidentally. That might cause your pages to be dropped from Google.

How to verify Googlebot from Matt Cutts at the Official Google Webmaster Central Blog covers a suggested technique to avoid this. Basically, all Google spiders will report they are from the googlebot.com domain. So do a DNS lookup on the IP address. If it comes back as googlebot.com, then you're halfway there. Halfway? Yes, that's because people can lie about domain names. To avoid spoofers, you then have to look up the domain name you found to see if it matches the original IP range.

The blog post explains more, and it's going to make the most sense to tech-savvy webmasters that are implementing some type of IP filtering or blocking already. Not doing that? Then don't worry about this -- it's not really for you.

Down the line, perhaps we'll see less tech-savvy solutions come up, for those sites getting slammed by bad bots but without IP filtering. But this is a great start for now.

Matt's also mentioned this on his personal blog, where people are commenting on the technique.

Posted by Danny Sullivan at 5:25 AM | Permalink

September 21, 2006

KinderStart Issues An Amended Complaint Against Google

Eric Goldman wrote that KinderStart has issued a 63 page second amended complaint against Google. KinderStart lost their first case against Google back in July of this year - that case was, in my opinion, ridiculous. This new complaint is even worse. The 43(B)log summarizes the complaints, calling many of them "incomprehensible." Eric Goldman says "I expect Google will file a motion to dismiss, which the judge will grant, at least in part (at minimum, to eliminate the Violation of Free Speech claim). I expect Google to go on the counter-offensive and renew its anti-SLAPP motions."

Posted by Barry Schwartz at 10:51 AM | Permalink

September 20, 2006

See Google Results As If You Are In Another Country

This morning at the Search Engine Roundtable, I reported that you can now easily Check Your Google Results in Any Country. How? Well, go to oy-oy.eu/google/world/ and then select the locations you want to compare side by side. Danny and I tested this out and it seems to be working well. Danny is in the UK and I am in the US, we searched on liar in Google.com, both not signed in to Google. I then compared the results Danny saw on his screen and I saw on my screen, with a side by side comparison of the US location and the UK location with the data center www.google.com. Our results matched the results at the tool.

US Results I saw at Google.com were: (1) www.number-10.gov.uk/output/Page4.asp (2) emperors-clothes.com/indict/liar.htm (3) www.imdb.com/title/tt0119528 (4) www.liar.be (5) www.iep.utm.edu/p/par-liar.htm (6) www.lyingliar.com (7) www.queendom.com/tests/minitests/fx/liar.html (8) stylusmagazine.com/stypod/archives/513 (9) www.allmarketersareliars.com (10) www.bigfatliarmovie.com

UK Results Danny saw at Google.com were: (1) www.number-10.gov.uk/output/Page4.asp (2) www.liar.be (3) www.imdb.com/title/tt0119528 (4) www.emperors-clothes.com/indict/liar.htm (5) www.iep.utm.edu/p/par-liar.htm (6) www.lyingliar.com (7) www.queendom.com/tests/minitests/fx/liar.html (8) www.met.police.uk/about/blair.htm (9) en.wikipedia.org/wiki/Liar_paradox (10) www.allmarketersareliars.com/

Danny's number eight result is very specific to the UK, you see a UK police commission page (link bomb?).

So now we have a tool that you can check local organic result sets for in Google. We also have way to preview AdWords ads by geo specific criteria.

Posted by Barry Schwartz at 9:55 AM | Permalink

Google Webmaster Central's Vanessa Fox & Amanda Camp Interviewed

Seattle 24x7 has an excellent conversation with Vanessa Fox and Amanda Camp of Google on Google Webmaster Central and working at Google. Both Google women began working at Google in April of 2005 in Seattle. They discuss the conception of Google Webmaster Central (also known as Google Sitemaps). The discussion also goes into the 20% time and recruiting Google women. You can see a picture of the "Seattle's Sisters of Search" also.

Posted by Barry Schwartz at 9:02 AM | Permalink

September 19, 2006

URL Vs. Navigational Queries Explained: AKA, Why Did URL Searches At Google Change?

Matt Cutts from Google has a great follow up on our reports that Google Modifies Navigational Search Results from about two-plus weeks ago. In his post, he explains that when you search on a URL (i.e. www.searchenginewatch.com), Google has stopped showing the information for the URL and now shows a standard search on the words in the URL itself. I learned two things from Matt's post.

(1) Entering in the URL of a site into to a search box is not labeled as a "navigational search" it is labeled as a "URL search." Navigational searches are when you search on a company name, i.e. Search Engine Watch versus a URL search is when you search on a company URL, i.e. searchenginewatch.com.

(2) To a normal user, bringing back search results for a URL search is more useful then bring back the information on that URL, in Matt's opinion. If SEOs and webmasters want to pull that information, we still can still do this by using the info:www.domain.com command. It works like this for this blog, [info:blog.searchenginewatch.com] and it shows you information for this URL.

Posted by Barry Schwartz at 8:51 AM | Permalink

September 14, 2006

YAT21L: Matt Cutts On SEO

The only problem with Matt Cutts blogging is that it gets harder and harder to remember all the great nuggets he gives out. I want a chart! Maybe I'll make one. In the meantime, 21 Great SEO Tips From Google's Matt Cutts from SEO Egghead is a nice reference list of Cuttsian proclamations and wisdom over time. Spotted via SEO Black Hat, which notes not everyone might believe what Matt says. True, but it's still nice to know what he said, even if you want to view it with a critical eye. Oh, that acronym? Tell you what? One good bit of link bait deserves another. Link to this post if you think you know what it means. First person I spot with the right answer, well, I'll link to your explanation.

Postscript: And Razvan wins, first link with the right answer I've seen. Check out the explanation over there!

Posted by Danny Sullivan at 9:07 AM | Permalink

September 12, 2006

Search Bugs At Yahoo & Google

In the past twenty-four hours I have discovered and documented four different bugs or weird occurrences at both Yahoo and Google. I will cover the four bugs, include adult ads displaying in Google and Yahoo's contextual programs, Yahoo's contextual ads not displaying ads at all, Google's site operator not functioning properly and Google's AdWords statistics not showing the right data.

(1) The first I named Adult Ads Displayed Within Google's AdSense Program? but it actually affects Yahoo as well. Basically, some people found Google AdSense ads that displayed adult oriented content, something that should not happen on AdSense. But what was amusing, was that after I posted this article, people noticed that when the Yahoo Publisher Network ads showed up, they were showing adult ads as well.

(2) If you were not able to load the ads in example one, then it may be because the Yahoo Publisher Network Ads Still Have Accessibility Issues, even after I reported it last Friday. Basically, some ISPs are not able to resolve the DNS information that hosts those Yahoo ads. This was first documented on August 31st, then acknowledged by a Yahoo representative on September 7th and is still an issue today.

(3) The next issue is that Google's Site Operator Shows Sites Off Domain. It does, I have seen screen captures myself, showing someone searching using the site: command and Google returning results from sites off of that domain. I have pictures and more details at the Search Engine Roundtable.

(4) The final bug I found today was that AdWords Statistics Mixing the Search & Content Network. So you are an advertiser, you set a campaign to only run on Google's content network, but for some reason, your stats in AdWords shows impressions and clicks for that campaign in the search network. This bug is confirmed by Google but stated as a small tiny problem.

There you go, four bugs documented in the past twenty-four hours.

Posted by Barry Schwartz at 9:00 AM | Permalink

September 8, 2006

Google Sitelinks: New Name For Those Links Under The Top Listings

Last year, people started to notice that Google began showing "subtopic" or "subcategory" links below their listing, if they were the first in a Google search. Now, Google's finally confirmed the change as a permanent feature and given it a name: sitelinks.

Here's an example of sitelinks in action, which I tapped into doing a search for HP:

Notice how under the first result, there are a number of sub-listings, such as:

Software & Driver Downloads - http://welcome.hp.com/.../us/en/support.html Contact HP - http://welcome.hp.com/country/us/en/contact_us.html Jobs - http://www.jobs.hp.com/ Small & Medium Business - http://www.hp.com/sbso/

Those are sitelinks, and now named via the Information about Sitelinks post at the Official Google Webmaster Central Blog. Do the search, and you'll see only the first result gets them (you'll also notice that Google is failing to remove the second www.hp.com listing as it should -- only one major listing per web site per results page should be showing. I think there's a bug, at the moment).

How do you get sitelinks? You have to be in the first position for a search. Aside from that, whether they'll still then show up and exactly which ones show, if so, aren't explained. Here's the new help page on that.

Want some clues? Try looking at our Google Web Site Categories Explored post from earlier this year.

Want to comment or discuss? Visit our Getting New Sitelinks Under Your Top Listing At Google thread at the Search Engine Watch Forums.

Posted by Danny Sullivan at 7:02 AM | Permalink

September 5, 2006

Google Updates Terminology Of Last Visit Date In Cache Results

Vanessa Fox posted an update at the Google Webmaster Central Blog on what the date and time displayed on the Google Cache page really means. The date displayed technically shows the last time Google "retrieved" data off the page, meaning if you have a page that hasn't been updated, and Google visits the page and sees that it was not updated, then Google will not retrieve any new information from that page and it won't update the date displayed on the cache page. Here is an example of the cache page of Search Engine Watch, carefully look at the date in the Google cache, right now it reads, " is G o o g l e's cache of http://searchenginewatch.com/ as retrieved on Sep 1, 2006 08:07:09 GMT." And then compare it to the last article that was posted, they should be within a few days of each other - since this site is crawled frequently by Google.

Posted by Barry Schwartz at 11:58 AM | Permalink

September 1, 2006

Google Modifies Navigational Search Results

I reported this morning that Google has changed the way they handle navigational like searches. For example, if you do a search on a site's name (i.e. navigational) you now get a different type of result set then you did a week or so ago.

For example, a search on the popular buy.com will now show: Show Google's cache of www.buy.com Find web pages that are similar to www.buy.com Find web pages that link to www.buy.com Find web pages from the site www.buy.com Find web pages that contain the term "www.buy.com"

Instead it will show you results that match the keyword phrase "buy.com." That includes links to possible competitors. I wonder if that will upset geico.com?

In any event, I have compared how Google, Yahoo, MSN and Ask.com handle these types of navigational queries at the Search Engine Roundtable.

Posted by Barry Schwartz at 9:27 AM | Permalink

How Google Handles Accented Characters

Last night WebmasterRadio.FM aired a show with Vanessa Fox and Matt Cutts of Google, they talked about so many good things including how Google handles accent characters (see archived MP3 here). Last night, Vanessa posted a more detailed explanation saying that a Mexico will return results for both "Mexico" and "México" and the same is if a searcher enters in México, Google will return results for both "Mexico" and "México." It is clear that the results differ in ranking order, but what makes that order change - well, to me, it is not clear from the post. Vanessa also explains what triggers a different interface language and how to restrict search results with the plus sign operator.

Posted by Barry Schwartz at 8:55 AM | Permalink

August 31, 2006

Keywords In URL May Help Rankings, Google's Matt Cutts Says

The hotly debated SEO topic of, does having keywords in your file names help with your rankings, will probably start all over again. Matt Cutts of Google wrote at his blog, and I quote;

Most bloggy sites tend to have words from the title of a post in the url; having keywords from the post title in the url also can help search engines judge the quality of a page.

Did Matt just say, that keywords repeated from the post title in the URL, can help "search engines judge the quality of a page." Honestly, I have been listening to Matt talk in person, on radio, on video and on forums for a long, long time now and I have never seen him come out and say this. So what does that mean? Why would he come out and say this? I personally always felt the keywords in the URL help a bit, but how much was always my question. So I always tried to code client sites with the keywords from the title in the URL. But for Matt to point that out specifically, does that mean that it is more important than I, or you, originally thought?

Up for debate, and where else do debates work best than a discussion forum? Discuss it at Search Engine Watch Forums.

Posted by Barry Schwartz at 9:33 AM | Permalink

August 30, 2006

Matt Cutts & Vanessa Fox On WebmasterRadio.FM Thursday AT 1PM (PST)

Vanessa Fox just posted at the Google Webmaster Central Blog that she and Matt Cutts of Google will be live on WebmasterRadio's GoodKarma show with GoodROI (Greg Niland). You'll be able to tune in tomorrow, Thursday at 1pm (PST) to hear Matt and Vanessa talk shop. Tune instructions at WebmasterRadio.FM.

Posted by Barry Schwartz at 11:19 AM | Permalink

August 28, 2006

Google, Favored Sites, and Editorial Opinion

There's a lot of blog coverage of a new patent from Google, System and method for supporting editorial opinion in the ranking of search results, which was originally filed with the US Patent Office in December of 2000. I'm seeing a lot of questions related to the patent...

  • Is a reranking of sites going on in Google search results based upon a favored or non-favored status?
  • Do search engineers at Google manually decide that some sites are more appropriate for certain queries than others?
  • Is inclusion in the Yahoo! Directory or the DMOZ a way to become a favored site?
  • Does this patent describe Google Co-op?
  • Who are the "editors" that this patent talks about when stating that editors may determine query themes and a favored or non-favored status for a site?
  • Is Google using the process described in this patent?

Steve Bryant surmises over at Google Watch that the New Google Patent Hints at Direction of Social Search. Rand Fiskin notes that the patent may be an indication that Google is looking at the "quality of pages," and points out that a mention is made of ranking "sites" instead of "pages" in Favored vs. Non-Favored Sources. I tried to break down the language of the patent into some easier to digest pieces at SEO by the Sea, in Google looks at Query Themes and Reranking Based upon Editorial Opinion

A good number of white papers and patent applications published since the filing of this patent have looked at user queries and user behavior in fairly complex ways, such as chaining user queries together in sessions to identify user intent, and exploring how a searcher interacts with search results. It's imaginable that if Google has adopted something like what is described in this patent, that decisions regarding query themes and favored status are based on much more than a simple thumbs up or thumbs down.

Posted by Bill Slawski at 8:05 PM | Permalink

August 25, 2006

Google, Yahoo, & MSN Update Search Results

It appears that all the major search engines have been reported to be updating their indexes in some way. Google is updating back links at some of the Google data centers. Yahoo has been recently reported to have updated its algorithm or index, although there is no official word from Yahoo on this as of yet. And MSN Search has confirmed that an update has occurred to their index recently. While Google's update may not be represented in the index, Yahoo and MSN's updates have reports that the search results have indeed changed. For the better or worse - that is in the eyes of the beholder.

Posted by Barry Schwartz at 9:52 AM | Permalink

August 18, 2006

Google Data Refresh: More Supplemental Results?

Wednesday night, Thursday morning, forum threads starting popping up about a Google "data refresh" taking place. A data refresh is like a small Google update, and many webmasters have noticed a change in the search results at Google. Google has not yet confirmed that there has been an update, nor has there been a ton of discussion on the topic, as of yet. That is why I believe this is a "data refresh" and not a full fledge algorithmic change. Part of the data refresh seems to have put many pages into the supplemental index, an index that no webmaster wants to have their active pages included in. With any update, there are those who feel the index/results has improved and there are those who feel the index/results have suffered. I have more specific details at the Search Engine Roundtable, for those that are interested.

Posted by Barry Schwartz at 8:02 AM | Permalink

August 8, 2006

More SEO Video 'Cutts' By Matt

Matt Cutts at Google has posted a few more videos with Google SEO tips for us. Here they are:

+ Session 11: Reinclusion requests + Session 12: Tips for Search Engine Strategies (SES) San Jose 2006 + Session 13: Google Webmaster Tools

Posted by Barry Schwartz at 11:10 AM | Permalink

Google Supplemental Results Get Fresher

I reported this morning at the Search Engine Roundtable that GoogleGuy announced Google has updated those pesky supplemental results. Supplemental results are those pages in a secondary index at Google. The pages in the supplemental results tend to be staler and rank worse then the normal documents in the main Google index. In any event, the supplemental results have been updated and should be somewhat more fresh.

Posted by Barry Schwartz at 11:03 AM | Permalink

August 2, 2006

More Matt Google SEO Videos: Google Terminology & More

Matt Cutts from Google released two more videos, as part of his SEO questions on video collection. These two new videos are:

- Does Webspam use Google Analytics? - 5 minutes and 11 seconds - Does Google Analytics play a part in SERPs (Search Engine Result Pages)? - When does Google detect duplicate content, and how wide is the range? - I want to mark my page as porn in SafeSearch–what do you recommend? - Is it okay to make hyperlinks in option elements? - Google Terminology - 4 minutes and 40 seconds - What's the difference between an index update, an algorithm update, and a data refresh? - I also discuss these definitions in terms of June/July 27th as much as I can.

Postscript From Danny: On Google Analytics, Matt says that this data is not used for web spam detection purposes and to the best of his knowledge, not by others in Google. However, I recently asked Google for clarification on this, in the wake of them NOT excluding the possibility that Google Checkout data might be used for a variety of purposes. I've yet to get a response back. At this point, my assumption is that while Google is probably using things like Google Toolbar data and Google Analytics data in ways beyond just reporting information back to the individual users.

Posted by Barry Schwartz at 10:34 AM | Permalink

August 1, 2006

Google's Matt Cutts Answers Questions On Google Video

Matt Cutts has released six video sessions so far, over the past two days at his blog. They all answers questions sent to him on the topic of search engine optimization. Most of the videos are about five minutes long, you don't have to necessarily watch Matt talk, you can just listen (there is not much going on in the background). Here is a break down of the video SEO sessions:

Session 1: Quality of a Good Site, 5 minutes and 40 seconds. "Matt Cutts answers Google questions: - Does Sitemaps depend on pageviews? - What are the top things to do in SEO? - Should I use bold or strong tags?"

Session 2: Myths, Large Site Launches, and Google Images, 4 minutes 10 seconds. "Matt Cutts answers Google questions: - Myths: 1) sites on the same server, 2) IP, or 2) including off-domain JavaScript - Launching sites with millions of pages: how should I do it best? - Google images: updates on the horizon, and current Google Images technology."

Session 3: Optimize for Search Engines or Users?, 4 minutes and 25 seconds. "Matt Cutts answers Google questions: - Which is more important: search engine optimization (SEO) or end user optimization? - What spam detection tools would you recommend? - Does cleanliness of code (W3C) help at all?"

Session 4: Static vs. Dynamic urls, 4 minutes and 30 seconds. "Matt Cutts answers Google questions: - Static vs. Dynamic urls: does PageRank flow the same to both? What pitfalls should I avoid with dynamic urls? - Can Sitemaps alert webmasters when their site has been hacked? - Can I do geotargetting within Google's Quality Guidelines?"

Session 5: How to structure a site?, 4 minutes and 46 seconds. "Matt Cutts answers Google questions: - Merging acquired domains with 301s? - How to create a site architecture with themes and keywords? - My urls have too many parameters--can I serve up static HTML to Googlebot instead? - How to do split A/B testing?"

Session 6: Supplemental Results, 4 minutes and 12 seconds. "Matt Cutts answers Google questions: - Supplemental Results - Should I worry about results estimates for 1) supplemental results 2) using the site: operator 3) with negated terms and 4) special syntax such as intitle: ? Answer: No. That's pretty far off the beaten path. - Why do 301s take so long to be reflected in supplemental results? It's been months. - I started appearing in the supplemental results in May--should I be worried?"

Great job Matt, really appreciated by the SEOs and SEMs here.

Posted by Barry Schwartz at 9:28 AM | Permalink

July 27, 2006

Changes Spotted In Google Search Results

I reported this morning at the Search Engine Roundtable that Google Search Results Shifting Again. What folks in the forums are finding is that some, but not all, of the pre June 27th results are coming back to the way they were. They are also finding that the Google site command search is again working on those datacenters that have the new results. There is a lot of commotion going on in the forums about these changes that began this morning.

Posted by Barry Schwartz at 11:49 AM | Permalink

July 21, 2006

Site Diagnostics Tab Added to Google AdSense Console

Google has added a new tab, a tab they have been beta testing for a couple months, named Site Diagnostics. What this tool does is show you which pages the AdSense crawler is having problems getting to. Why would they crawler have a problem getting to those pages? The several possible reasons include a robots.txt file blocking then, password protected pages, server down or slow and other reasons explained in the AdSense help pages.

I have posted screen captures at the Search Engine Roundtable.

Posted by Barry Schwartz at 8:37 AM | Permalink

July 19, 2006

What Do Google's Leaked Spidering & Ranking Factors Mean?

Earlier this month we reported on Google's Spam Score Details, but no one knew what it meant exactly. Todd Malicoat took a peek into the Google algorithm a week ago with his best guesses on what each value represents. I am not going to go through all the details, but Todd makes some smart and detailed responses for each listed value listed in the spam score. Re-inspired by Rand.

Posted by Barry Schwartz at 8:57 AM | Permalink

July 14, 2006

KinderStart Becomes KinderStopped In Ranking Lawsuit Against Google

Kinderstart has lost its case over lost rankings on Google, though the company will be allowed to amend defamation claims relating to its PageRank zero score. If it does by September 29, I suspect that reattempt will go down in flames as well. But the entire case exposes vulnerabilities Google has created for itself with mixed messages over how keyword ranking and Pagerank work.

Google Sued Over Site Penalty By KinderStart.com covers the case being filed back in March and provides a link to the actual suit. It was heard in court earlier this month, and you can review the transcript and analysis of that hearing.

Judge dismisses suit over Google ranking from News.com covers yesterday's ruling, where the claims against Google were dismissed. The judge gave leave for KinderStart to revise on some claims, apparently in particular on the idea that KinderStart was defamed by being dropped to a PageRank of zero as reported by the Google Toolbar.

KinderStart now apparently hopes it can enlist other PR0 sites to file a class action lawsuit against Google (info is supposed to be here, but site is currently down). The KinderStart attorney said:

"The decision suggests that, if properly alleged, Google may be defaming a whole class of Web sites sacked with a '0' PageRank," he wrote in a statement. "If plaintiffs show Google manually tampered with even a single Web site's PageRank, Google's entire claim of 'objectivity' of search results and rankings could collapse."

Sure. Fire away with that class action suit. Two class action suits over click fraud, where defendants have real monetary claims arising out of actual contacts with the major search engines, have netted around $60 million for advertisers for over four years worth of advertising activity. Assuming a somewhat nebulous defamation claim won, I can't imagine the settlement would be for much.

Keep in mind that by default, the PageRank meter is still not turned on, to my knowledge. Toolbar users have to specifically enable it. I've never seen any stats or breakdowns on who uses the PR meter, but that seems to be mainly site owners concerned about SEO, rather than typical web surfers.

Still, the case highlights a Google vulnerability. Google has argued in this case that ranking is subjective, an opinion that it offers about web sites. But go to its technology page, and you get this:

PageRank Technology: PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms. Instead of counting direct links, PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement.

So what is it, objective or subjective, or argue what's most convenient, as John Battelle raised earlier. The answer to me gets confused by Google's outdated information online plus confusion between PageRank and ranking.

Ranking, or keyword ranking, is where a site appears in response to a keyword search. It's supposed to be an objective decision made by using a computer algorithm to sort through factors, though not said is how some of those factors might have subjective decisions made over them.

PageRank is a numeric score that counts how important a page is based on analyzing the links pointing to it. It is one of many factors that Google uses to decide where a page should appear when you do a keyword search. In other words, PageRank is part of what determines keyword ranking, but it's not the only factor, nor is it the same as keyword ranking.

But doesn't Google say that pages with a higher PageRank appear at the top of the search results. Yes, and it says this incorrectly. That's right, Google's statement on this is flat out wrong. Wrong, wrong, wrong. Wrong. WRONG.

Am I clear enough? But how can I say Google's official information is wrong? First, I can demonstrate it, as I've done before. Try this tool. Here's a search for cars. Notice how the movie Cars is ranked second. The home page for the site listed is a PR5, putting it above several pages ranking below it with a higher PR score. Got Firefox? Try Aaron Wall's new tool that makes seeing this type of thing even easier. End Of Demonstration.

Google has tons of things they've said publicly that get outdated like this or aren't explained properly by those charged to write up copy. In particular, Google has allowed PageRank to be a synonymous term to mean how a site ranks. You can see how this makes life confusing by the first paragraph in the News.com story about the case:

KinderStart, a directory and search engine for information related to children, sued Google in March after it fell to a "zero" ranking in the Google index.

Actually, I believe that two different things happened. KinderStart:

  • No longer had good keyword rankings, not in the first page of results, but perhaps still buried further down unless it was banned completely. And if it was banned completely, that's not a "zero" rank but instead just called a ban.  
  • Probably had a penalty put on it manually that produced a zero score in the PageRank meter.

The judge does not seem to be saying Google defamed the site through a lower keyword ranking. But he does seem to suggest that the PageRank score in the Google Toolbar meter might have that issue. From Eric Goldberg's nice write-up on the case (and he has a copy of the ruling there, as well):

Google?s statement as to whether a particular website is ?worth your time? necessarily reflects its subjective judgment as to what factors make a website important. Viewed in this way, a PageRank reflects Google?s opinion. However, it is possible a PageRank reasonably could be interpreted as a factual statement insofar as it purports to tell a user ?how Google?s algorithms assess the importance of the page you?re viewing.? This interpretation would be bolstered by evidence supporting Google?s alleged representations that PageRank is ?objective,? and that a reasonable person thus might understand Google?s display of a ?0? PageRank for Kinderstart.com to be a statement that ?0? is the (unmodified) output of Google?s algorithm. If it could be shown, as Kinderstart alleges, that Google is changing that output by manual intervention, then such a statement might be provably false.

I'm actually surprised the judge doesn't seem to know that Google does indeed change that output by manual intervention. That's what the entire SearchKing case was about. First some background on that:

The case involved another US District Court judge ruling that yes, Google had manipulated the PageRank score showing for SearchKing and that it had a constitutionally protected right to do so, to offer its opinion this way.

Of course, the ruling confuses PageRank and keyword ranking as I've explained above often happens:

PageRanks are opinions -- opinions of the significance of particular Web sites as they correspond to a search query.

Still, since the case was indeed focused about the PageRank meter, I suspect we're safe in knowing this was about PageRank scores getting protected status. And what the KinderStart case now tells us is that Google (and other search engines) also have the right to do keyword rankings however they like.

We'll see if the PageRank scores get challenged again. Certainly Google could short-circuit this by dropping the scores and the meter altogether (please do it). As explained, few people to my knowledge use them, and plenty of site owners are tired of newbie search marketers obsessing over them. PageRank was mainly a marketing tactic for Google that's long since been blowing up in its face.

If the meter doesn't go away, certainly Google needs to take a harder look at what it says about both the Google Toolbar and keyword rankings if it doesn't want to be vulnerable in future court cases (plus just be consistent with the public).

For example, what's a site owner told about a PR0 score:

A page may be assigned a rank of zero if Google crawls very few sites that link to it. Additionally, pages recently added to the Google index may also show a PageRank score of zero because they haven't been crawled by Googlebot yet and haven't been ranked. A page's PageRank score may increase naturally with subsequent crawls, so this shouldn't be a cause for concern. To learn more about PageRank, please see http://www.google.com/technology/index.html

There's no mention of the fact that you might have a PR0 score because Google has manually intervened to reduce it. And as for what it tells the general public:

Wondering whether a new website is worth your time? Use the Toolbar's PageRank? display to tell you how Google's algorithms assess the importance of the page you're viewing.

Again, it's more than just the algorithms being involved. Human are making decisions that impact that score, as well.

In short, Google is continuing to make statements that PageRank is objective to the public, but in two court cases now, it has said the scores are subjective. One case as supported its right to make subjective cases. The other has supported a defendants right to challenge if those subjective opinions are fair or defamatory. We'll see what happens next.

Finally, the entire human intervention thing with PageRank scores brings back the issue of Google long saying there's no human intervention in keyword ranking, such as they used to say about censorship:

Google does not censor results for any search term. The order and content of our results are completely automated; we do not manipulate our search results by hand.

And similar to what they still say here:

Sites' positions in our search results are determined automatically based on a number of factors, which are explained in more detail at http://www.google.com/technology/index.html. We don't manually assign keywords to sites, nor do we manipulate the ranking of any site in our search results.

In general, webmasters can improve the rank of their sites by increasing the number of high-quality sites that link to their pages. For more information about improving your site's visibility in the Google search results, we recommend reviewing our webmaster guidelines. They outline core concepts for maintaining a Google-friendly website.

As I've written before, Google does indeed hand manipulate results, but not in the sense of trying to reorder them. Instead, it manually intervenes in terms of banning some sites or putting overall ranking penalties on them. There's even been updated attempts to help site owners know when they've been banned through the Google Sitemaps program.

Overall, Google's got plenty of mixed messages out there that don't help on the PR front and potentially leave it vulnerable on the legal front, as this case has shown.

Posted by Danny Sullivan at 8:48 AM | Permalink

July 13, 2006

Google Adds Supports For NOODP Tag To Opt Out Of ODP Titles

Singing for joy! Google has now added support for the NOODP tag that MSN initiated on May 22nd of this year. Yes, Danny asked for this back in June, and now Google has granted our wish. If you have one of those pesky titles pulled from the ODP (dmoz.org) directory, don't fret it, just add the NOODP tag.

How do you do it? Just add <META NAME="ROBOTS" CONTENT="NOODP"> to your page source. If you want to just exclude MSN use <META NAME="msnbot" CONTENT="NOODP"> if you just want to exclude Google use <META NAME="googlebot" CONTENT="NOODP">.

Keep in mind, it takes time for Google to spider your pages and then determine if you do not want to use the ODP title. So if you add the tag today, it may take several weeks to have an impact.

Webmasters, this can have a huge affect on your organic traffic. If you have a poor ODP title and Google uses it in the results, by tweaking your title, your click-through rate from Google can potentially dramatically increase.

More details at Inside Google Sitemaps blog and the help section at Google.

Postscript: There is a bit of confusion that this tag only tells Google not to display the ODP description in the search results. This is not correct, Google will not display both the description and the title from the ODP in the Google search results with the implementation of this tag.

Posted by Barry Schwartz at 1:19 PM | Permalink

July 11, 2006

Weird Results Counts On Google

I've written before about Google giving strange results counts and why maybe it's time for them to go. Yesterday, I came across the oddest ones ever, when doing some typical searches to gauge the size of the index.

Here's an example. Search for xxkjdiuenmnmd8i, which when I just did it came back with no results. Now search for -xxkjdiuenmnmd8i. In theory, that should show the size of the Google index, all the pages it has.

In reality, that type of search hasn't often worked. It was only last September that this type index estimation technique gave any results at all. Even then, I didn't trust that the numbers were accurate. Still, they seemed better than what's coming up now. Look at the screenshot below:

Ten results? Only ten results, for a search technique that last month would have come up with more than 25 billion? Something funky is going on.

Finding it odd, I tried a search for the, often useful as a fast way to get a sense of how big Google might be, at least for the number of English language pages it has. The query came back with 23 billion matches. So how about -the, I tried, just out of curiosity. Ten matches:

Ten? Ten?!!! And more strangeness. A search for -and, -cars, -movies all did the same thing. The results were different in various ways, but the count was always only 10 matches, when it should be much more.

Note that the results all have additional information that make them appear to come out of Google Base. It all suggests that Google has disabled counting for queries involving a single word, but that somehow, Google Base integration is still happening to throw things off. It might be that Google is still doing a call to Google Base, asking for the top 10 results that it has, in order to integrate those results into a regular web search listing. But because it also has disabled display of regular web search results for a single negative word query, it's only Google Base that shows.

Going back to my post from last month, Google, Kill The Web Search Counts!, I explained how Google had stated that the counts reported for a spam site that were removed were much inflated by a counting glitch. I talked with Google about this and some other issues last week just before leaving for my trip to SES Latino in Miami, where I am now.

Some of what I talked about with Google's Matt Cutts and other engineers at Google has already addressed in a recent blog post. The issue of counts came up, and I'll do a longer post on what Google said after I get back from this trip and clear what I can discuss. The short answer is that they are aware of the issues and are looking to correct things. These strange results counts might be part of that.

More later when I'm back from my current trip, or watch Matt's blog, in case he posts before me.

Posted by Danny Sullivan at 8:12 AM | Permalink

July 10, 2006

Matt Cutts Of Google Comments On Recent Listings Issues

Last week we reported that Google may have revealed the spam scores to the world. Well, Matt Cutts came back from vacation and he confirmed the data "was real." He promised not to "comment on what any of it means" but at least we know Google is part of the borg. Just kidding. I doubt we will see a treasure like that again, but if we do, it would be interesting to see if Google does add "extra settings for fun," such as –initial_time_travel_wormhole=”Wednesday, December 31 1969 11:11 pm."

Posted by Barry Schwartz at 8:23 AM | Permalink

July 6, 2006

Google's Ranking Algorithm Too Dependent On Trust Factors?

Todd Malicoat went off on a bit of a rant which he named The Trust Knob is WAY too High - Google Trustbox. Todd, as do many SEOs, believe that Google places too much weight on "trust factors" when determining if a page should rank well or not. Todd quotes some well-known SEOs saying that trust factors are weighted at 85%, whereas copy is only given 15%. Why does this upset SEOs like Todd? As Todd explains,

One of the extremely big problems with trust filters is that they don’t seem to be retroactive…meaning that sites that were around and trusted BEFORE a particular filter was established can basically get away with murder (and they do).

Todd explains the trust factors as follows; web site age, total number of backlinks and the overall age of those links and total ?trustscore? of other backlinks (i.e. the number of .edu's, .gov's, and high pagerank links). Ok, so those are some of the trust factors involved in the algorithm.

In the past, SEOs were trained not to talk about a site ranking, but rather a page ranking. Each page of a site was independent, and was able to rank well on its own. The old optimization for Google was "Content + high PR links," today it is "Crusty trusted domain + content." The word "crusty" implies the age of the domain, but also shows you how much dislike Todd has towards the "age" component. Pages aren't old - sites are old.

This is the same complaint as we had a year ago. New sites are not given a "fair shot."

Why not give Joe's ultra amazing toothpaste (the company with very little marketing budget because they spend their money making an amazing product) a chance to rank high for "toothpaste" for just a little bit longer instead of HELPING companies who've been spending millions of dollars on their "brand" instead of their product for the last decade or more?

Todd's post makes for a good read and may give some of you additional tips on how Google works today. One aspect I believe Todd left out was the factor of creating buzz for a new site. Yes, they are dependent on "trusted sources" linking to the new site, but it can happen. New sites, I believe, need to think in terms of generating real "reputation" and real "buzz" so that trusted sources provide some "crusty" trusted links.

Posted by Barry Schwartz at 8:37 AM | Permalink

July 5, 2006

Google's Spam Score Details Shown?

Peter Da Vanzo spotted a DigitalPoint thread that found clues as to how Google scores spam results behind the scenes. Now, honestly, I have no idea if this is about spam or something else, it is just that the information posted in the forum, seems to appear like a spam score report by Google. How did it come about? The user was presented with this information after clicking on a cache URL in the Google results. The user was shocked to see the following information revealed to him.

pacemaker-alarm-delay-in-ms-overall-sum 2341989 pacemaker-alarm-delay-in-ms-total-count 7776761 cpu-utilization 1.28 cpu-speed 2800000000 timedout-queries_total 14227 num-docinfo_total 10680907 avg-latency-ms_total 3545152552 num-docinfo_total 10680907 num-docinfo-disk_total 2200918 queries_total 1229799558 e_supplemental=150000 ?pagerank_cutoff_decrease_per_round=100 ?pagerank_cutoff_increase_per_round=500 ?parents=12,13,14,15,16,17,18,19,20,21,22,23 ?pass_country_to_leaves ?phil_max_doc_activation=0.5 ?port_base=32311 ?production ?rewrite_noncompositional_compounds ?rpc_resolve_unreachable_servers ?scale_prvec4_to_prvec ?sections_to_retrieve=body+url+compactanchors ?servlets=ascorer ?supplemental_tier_section=body+url+compactanchors ?threaded_logging ?nouse_compressed_urls ?use_domain_match ?nouse_experimental_indyrank ?use_experimental_spamscore ?use_gwd ?use_query_classifier ?use_spamscore ?using_borg

What does it mean? One can guess. And does Google want to assimilate us all? "?using_borg." Threadwatch also has discussion here.

Posted by Barry Schwartz at 9:57 AM | Permalink

July 3, 2006

BBC News Features Article On Google Search Spam

A BBC News front-page article named Google to stay focused on search brings the issues of search spam to the public. The article explains how seventy-percent of Google's focus in on Web search and then goes into several paragraphs on how search spam is a huge issue. The article quotes Douglas Merrill, of Google engineering, saying, "Spam is an arms race," explaining that "spammers are highly motivated. There is a lot of money at stake."

Posted by Barry Schwartz at 9:38 AM | Permalink

June 22, 2006

When's Matt Cutts Back From Vacation Countdown Clock

Thomas Bindl does what I was hoping someone would do -- make a countdown clock for when Google's Matt Cutts is returning from his vacation, spotted via Threadwatch. I've seen a number of posts in various places suggesting that Google has been having its recent spam and indexing problems because Matt's finally taken a nice, long break. Bull. Matt's great, a huge resource to Google, but the problems going on seem far more fundamental than Matt being away. If they really are due to him being gone, then Google has even bigger issues to deal with. Still, plenty of us will be happy to see him return and jump back into the search conversation.

Posted by Danny Sullivan at 10:46 AM | Permalink

Google, Kill The Web Search Counts!

Number one on my 25 Things I Hate About Google list from March was "web search counts that make no sense." This week's fiasco with the "5 billion spam pages" in Google only underscores that those counts really are a big issue that can be noticed by more than a few tech heads. Fix them or get rid of them, I say.

Adam Lasnik from Google's search quality team has been running around to various public forums explaining that it really wasn't 5 billion pages that got indexed from one master domain but instead a counting glitch that makes the problem seem worse than it was. We noted Monday that he commented over at Threadwatch:

We have noticed that some site: queries are showing bizarre results and it's turned out to be tied to a bad data push. We're fixing it now....

I'm saying that the results counts are drastically off.

Adam's also been at Digg:

Our engineers recently noticed that our site: queries (number of results listed for a search) were showing bizarre results. This has turned out to be tied to a bad data push, and we're fixing this right now.

In the case being discussed above, the number in "about [x billion]" is currently incorrect. We haven't indexed anywhere close to as many pages of these sites as is currently suggested. It's a significant results estimation error, thankfully limited in scope but clearly pretty stark when it appears.

And over at John Battelle's blog:

Compounding the issue, our result count estimates in these contexts was MANY orders of magnitude off. For example, the one site that supposedly had 5.5 billion pages in the index actually had under 1/100,000th of that.

John's post is probably the most important illustration of why those counts really do matter, given that he took them at face value -- and so many others will, as well.

When I saw the story on Monday, I doubted Google really had indexed so many pages, especially given the known problems with the site: command recently. While Google doesn't report the total number of pages it indexes any longer, it wasn't that long ago when 5 billion pages would have been over half the reported size, as John noted:

5 billion pages is the entire size of the Google index just a year or so ago. The last claim, before they stopped MAKING claims, was 8 billion...think about that.

Now sure, maybe Google really did index that many pages. Maybe they've expanded so much that there's plenty of room. More likely, adding that massive amount of pages really should have caused a lot more good pages to go missing, to make room for them. There would have been a ton of screaming *widely* across the web from site owners big and small.

I know, I know -- some believe Google's running out of space, and Eric Schmidt even commented on a "machine crisis" which the company later denied was an issue with web search. Certainly many webmasters have long been reporting missing pages in the wake of shifting to Google's BigDaddy crawling infrastructure. But many webmaster also have not been having problems.

Maybe Google is so screwed up that it IS picking up billions of spam pages from a few sites and dumping good stuff. However, I think that's unlikely. I think lots of pages did get in from this site, though maybe in the millions rather than billions. And perhaps collectively, millions of pages of spam from a number of sites are pushing good stuff out. But that 5 billion figure for this particular site (and its subdomains)? I do think it was a counting error.

That counting error is a big problem in and of itself. As said, many people take the counts at face value, even trying to use these meaningless figures in court cases as Fox News once did or the US Attorney General once did before the US Supreme Court.

Enough is enough. Make the figures accurate or stop reporting them at all. Last year, I lobbied for Google to drop the index count on its home page, something that eventually happened. Now they should strongly consider doing the same thing with results count.

Time For Results Counts / Number Of Matches To Go? from Gary Price last year talked about this perhaps being a good next move for Google and the other search engines to make. Certainly the time now seems right.

Google, like Yahoo won't let you go past the first 1,000 matches anyway (Ask goes to 200; MSN to 250). So who cares about showing how many matches there are? Counts like these are remnants of the days when search engines first appeared and showing that they had lots of matches helped perhaps make you think they must be good or comprehensive. But if the counts mean nothing, why keep using them?

Ah -- but it's only an issue with the counts if you do a site: command, you might say. Certainly we've known about a bug with that since May. We've been told some of it has been fixed, but clearly bugs are still being worked out.

But are regular search counts accurate? If I search for djkfdkjfdkjddfdfdd, I get told there are no matches. So if I shift to -djkfdkjfdkjddfdfdd, I should get a count of all pages in the index that don't contain that word -- and since we know there are no pages with it in the index -- I should get a count of ALL pages Google has indexed. And that count?

Results 1 - 10 of about 25,270,000,000 for -djkfdkjfdkjddfdfdd. (0.07 seconds) 

So there we have it -- Google has 25 billion pages indexed. Maybe. Or maybe not. This type of search sometimes has produced figures in the past that you knew couldn't be right. Plus, as I wrote before, Google's long had counting problems. I don't know whether to trust that count or not. And if I can't trust it, why offer it to me? Especially why offer it to me if after a glitch, you have to run around doing damage control to say the count is wildly inaccurate. Just get rid of it.

Instead, this is what I want to see in the future:

Results 1 - 10 

OK? And how about giving an option to have a number show up next to a result, for those who want it. That would be nice if I want to refer to the exact position of a particular listing to someone else. But the total number of matches? It's meaningless. And the time it took to search? Chest thumping we don't need anymore.

One exception, however. Google Sitemaps has just added a bunch of expanded reporting. I want them to go further and let site owners get accurate index counts through that system.

Keep in mind that a site: command is incredibly processor intensive. It's not something most searchers do, so spending the time, energy and machine power to get hyper-accurate results for regular Google searches isn't a priority.

Instead, move site: searches to work within Google Sitemaps, and you take the burden off your main machines. It's also something you can perhaps have scheduled to run as a report, something generated en masse during slower periods for anyone who wants to get that type of data. If three people all want site:amazon.com data, you run that once and give all three the info on a scheduled basis.

Yahoo rolled out a similar Yahoo Site Explorer tool last September. It was a good move. It would be a good move for Google to also make, along with dropping the general results counting on Google results pages.

Want to comment? Please join our Search Engine Watch Forums thread, Get Rid Of Results Counts On Google?

Posted by Danny Sullivan at 10:32 AM | Permalink

June 20, 2006

Google Sub Sub Domain Issues Clearly Visible

Threadwatch reveals some more examples of issues Google is having. They note a search on queer forum returns CraigsList 97 times out of the top 100 results. That is not all, a search on wedding forum returns about 50 of 100 results from CraigsList's site, just scroll down to number 50 and you will see.

Is CraigsList spamming? No! Is Google suffering? :) Google is clearly having issues with sub sub domains. Continued coverage of Google's public index issues.

Postscript From Danny: Comments at Threadwatch also note Yahoo has the same issue. MSN does not as badly (but that could be the result of spidering fewer pages) and Ask looks very good.

Posted by Barry Schwartz at 8:19 AM | Permalink

June 19, 2006

Google Yanks Sites 5 Billion Pages After Spam Complaint

I covered a DigitalPoint thread which uncovered several domains that was able to rank billions of pages at the top of the Google results within a couple of weeks. The methods deployed to rank the pages seemed to include excessive use of subdomains, cloaking, content theft scraping, alexa traffic boosting and blog comment spam. I listed the documented steps here. Some suspect that Google's new URL handling with the big daddy update allowed "old school" cloaking to begin working again.

A Threadwatch post shows screen captures of the spam and also has a comment from Google representative, Adam Lasnik. Adam directly responds to over 5 billion pages of this domain being indexed, saying:

We have noticed that some site: queries are showing bizarre results and it's turned out to be tied to a bad data push. We're fixing it now.

Yes, we are aware of the site command issues (Google's mentioned them itself). That may mean it is far less than 5 billion pages indexed in this case -- but still, plenty of pages got through.

If the site command is the issue or even if it is not, this is still indicative of other substantial problems plaguing Google that are making the rounds on discussion board and blogs lately.

Posted by Barry Schwartz at 9:09 AM | Permalink

June 15, 2006

MSN's Hand Crafted Results (Fake? - Shame On Me!)

"MSN Hiring People to Hand Code SERPS" at SEO Blackhat is a nice catch from the MSN Search jobs page talking about needing people to help hand-craft results. Philipp Lenssen at Google Blogoscoped reacts with "Oh my." I react with "Hallelujah."

Note: As Threadwatch spots in comments, this page looks like a joke that MSN is hosting. Shame on me for not reading more closely -- type 150 words per minute! The page IS on the real MSN Search domain, but it's not linked from the real jobs area [OK, Pip at Google Blogoscoped found it connected from the jobs page]. Anyway, I'll drop a note and get confirmation. And the points below -- still valid :)

Let's look at the job post first:

When all else fails, and the ranking algorithms do not pass the confidence threshold, we fall back to delivering handcrafted results. Working on a team of approximately 132 other handcrafters in 26 worldwide markets, you will receive a user query, use all the available search engines to quickly scour the web for results, pick the top 10 results for this query, and send it on to the user. Successful handcrafters can typically find top 10 results for a real-time user?s query in less than 3.8 seconds. This is an opportunity to truly connect with customers, because the queries that get routed to you are precisely the ones that the engine cannot answer well. We will have adequate staffing to allow generous coffee and bathroom breaks. If you are an expert at using at least 3 different search engines, well versed with American English/colloquial usage, and can type at > 149 words/minute as measured by the Simia-Lico method ? come join us and delight users real-time!

I agree. Search engine algorithms are not perfect. I'm tired of seeing bad listings make it into the top results that any human reviewer would nix. The Google mantra has always been that they prefer to tailor their computer algorithms to figure out how a human would see and rate things and then get the algorithm to do the right thing. We've had that mantra for years. And yes, generally the algorithms do the right things. Still stuff gets through. So kill off the bad stuff with a human and sure, insert a good quality page you know you are missing.

As a reminder, MSN used to have human editors, as I've written before. That was actually one reason why years ago, they compared pretty well when we would do relevancy tests on popular queries. They had a very sophisticated editing suite that allowed a team of editors to constantly review -- AND FIX -- bad results.

Now I can buy into the "Oh My" idea if MSN is returning to hand crafted results because their automated technology is so bad they've got to fall back on humans. No, that's not good. But if it's to complement and better tune what the automation does? Bring it on. If you want more on the how and why this can help, see my past post, More On Query Refinement, The Human Scale Problem & Creating The Search Dialog.

I also have the "Oh My" reaction if hand handcrafting involves payment. This year, I've had one serious allegation that MSN has rigged one set of its results to favor a top advertiser. I just had another serious allegation like that levied against Yahoo. In the MSN case, the difficulty in pursuing the allegation is deciding whether they are true or an attempt to knock out a competitor that might be ranking well. In the Yahoo case, I'm awaiting that tipster to send me more information beyond the quick eye opening stuff I was shown at our recent London conference.

Yahoo, of course, does hand manipulate already, to my belief (I'm not saying for payment -- only that for whatever reason, they seem to hand craft some results). I wrote about this in 2004 but never got an answer about it from Yahoo, nor did I get an answer when I followed up at least one other time. It also came up on our forums last year and at here at Search Engine Roundtable.

Google has long denied "hand jobs," as wizened search marketers call them. Setting aside censorship cases, I believe that. I've never seen any solid evidence of results being hand selected by Google (and the quality raters we're written about before have not been shown to be manipulating results).

In fact, Google used to trumpet that it had no hand manipulation. That was true in crafting results, but it wasn't true in terms of removing them. As I wrote in 2004:

Of course, Google does indeed intervene manually in search results. It removes material because it may be deemed illegal, as was the case in the infamous chester guide search. The company also removes material in response to DMCA complaints and also because for spamming reasons, as this article explains further.

Such interventions make some marketers confused (or even livid) when they read Google's oft-repeated claims of no hand manipulation of search results. To them, such removals as I've described above are hand manipulation. You can get a flavor of such confusion in this recent WebmasterWorld forum thread.

These interventions are not specifically rank related. When they happen, Google doesn't try to reorder the ranking of how a page appears. Instead, it simply pulls the page from the index entirely. And if you aren't in the index, you naturally no longer rank number one. But to save confusion, it might be better for Google to be clearer in saying that they don't chose by hand which sites rank well.

By the way, I asked Google previously about the reference in a Wired article about wanting to "attach" better sites to queries to ensure it had good information available. I remember being disturbed by this, just as some in the aforementioned thread were, as it indeed suggested that Google was doing hand-ranking in some cases.

I was told by Google that this was a misinterpretation on the part of Wired. The Google engineer apparently meant that the Google search algorithm would be tweaked to produce better results, not that the results would be reordered by hand.

Overall, I'm fine with hand-crafting, hand manipulation, hand jobs or whatever you want to call it as long as:

  • It improves search quality
  • It's not done to favor an advertiser by rigging the editorial results

Posted by Danny Sullivan at 10:16 AM | Permalink

June 14, 2006

Google Not Obeying NoIndex Meta Tag?

I reported at the Search Engine Roundtable that Google.com Displaying Pages in Index with NoIndex Meta Tags. The details come from a WebmasterWorld thread where two members I would trust claim Google is not obeying the noindex meta tag. Currently, I have no evidence, since examples are not allowed at WebmasterWorld. If you have examples of this in action, please let us know by starting a thread in our Google Web Search Forum at Search Engine Watch Forums.

Posted by Barry Schwartz at 9:59 AM | Permalink

June 13, 2006

Submitting Your News Site To Google News

Google News can drive a nice amount of traffic to a site. A few months ago, I had the privilege of having my site included in Google News. Since then others have been asking the question, how can I get my news site included in Google News? This morning, I did my best to answer the question with a post named Getting Into Google News Revisited. I outlined the technical requirements, the editorial requirements and what you can do to encourage Google to accept you into Google News. If you are interested in Google News inclusion, check it out.

Posted by Barry Schwartz at 8:46 AM | Permalink

June 12, 2006

High Rankings In Google Image Search

Amit Agarwal has a nice write-up on how to increase your chances of listing your images high in Google Image Search. The tips include;

+ Use a descriptive and keyword rich file name + Use seven to eight keywords in your title and alt text + Place a short one line description of the image directly below the image + Wrap the content around the image + Try to put the image closer to the top of the page

Those are just a sampling of the tips to get listed high in Google Image Search.

Posted by Barry Schwartz at 8:55 AM | Permalink

June 8, 2006

Alternative Ways Into Google & Yahoo

Search Engine Guide has an article named Alternative Ways to Get Into Google and About.com has an article named 8 Ways to Submit Your Site to Yahoo, so we thought it would be nice to make one summary of both.

There are many ways to get found in Google, some alternative ways include Google Video, Google Base, Google Local, Google Blog Search, and Google News. There are additional ways, of course, including Google Images, Google Finance (news/blog), Google AdWords & AdSense, and Google Coop.

There are also many ways to get found in Yahoo, the article named above lists 8 methods including; free site submission, free mobile site submission, free media content submission, Yahoo search index submission, sponsored search, product submission, travel submission, Yahoo directory submit, and Yahoo standard submission. Not listed here is Yahoo Blog Search, Yahoo News, and I am sure I am missing some basic ones.

Again, the point is, you need to think about ways to vertically creep into the search results. There are plenty of ways to drive search traffic to your site, outside of core Web search.

Posted by Barry Schwartz at 9:01 AM | Permalink

June 6, 2006

Googlebowling A Reality?

Googlebowling is a term used to describe the method of knocking out a page from the Google search results. Googlebowling is conducted by linking to a particular site from sites within bad neighborhoods. Rand over at SEOMoz.org posted recent information he learned about Googlebowling while at SES London a week ago.

To successfully deploy Googlebowling, Rand writes that you need to "use patterns that would show that the site has 'participated' in [a spammy linking] program."

Specifically, this means you would point spammy links at the places the site you are targeting links to. If this is implemented properly and the site you are targeting is not a super authority, the site may be penalized for a long time. Note that the advice here is given not to encourage Googlebowling but to help people understand how it might be possible to impact their own sites.

Rand continues to explain that if a site is Googlebowled, you most likely will want to start fresh and drop the site that was penalized completely. I have discussed Googlebowling a few times at the Search Engine Roundtable. Two entries I would like to point out are:

+ Google Bowling For Dollars by Chris Boggs + Google Bowling Supporters Thread by myself

So can other people hurt your rankings? Can other links hurt you? Some think they can, but some such at Google itself say they cannot.

Posted by Barry Schwartz at 10:15 AM | Permalink

Google Indexing Fewer Pages: Signs Of The Google Crawling Sandbox?

Aaron Wall over at SEOBook.com has an excellent write up on the recent indexing phenomenon at Google. Google has been indexing fewer and fewer pages and webmasters are trying to figure out how to get more of their pages indexed and found by searchers. Aaron posted a blog entry he named The Google Crawling Sandbox.

The title of the post comes from the concept of Google slowing down its crawl process, like it would when ranking new sites (the whole Google Sandbox concept). Aaron explores the rumors of these theories but when push comes to shove, he nails it down to unique content. The more "legitimate useful content" the more likely people will want to read it, the more likely Google will crawl it, the more likely Google will index and then rank it.

Postscript From Danny: One of those commenting on Aaron's blog highlight a key quote from him, one that well deserves that attention:

The less your site needs to rely on Google the more Google will be willing to rely on your site.

This goes right back to the "don't do things only for search engines" or don't be totally dependent on search engines messages that many have said over the past years and is worth repeating. No, you don't want to ignore them -- but if you've done everything to revolve around them, that's a good sign you aren't running a "natural" site of the site search engines are trying to reward.

Posted by Barry Schwartz at 8:49 AM | Permalink

June 5, 2006

A Current List Of Google's Robots

What Bots Does Google Have These Days? from Ben Pfeiffer on my Search Engine Roundtable blog lists the names of the current spiders/robots/bots Google has roaming the web.

The list includes the classic web spider Googlebot, the AdSense spider MediaBot, Google's image spider ImageBot, the AdWords spider AdsBot, Google's RSS feed spider Feedfetcher-Google, and Googlebot-Mobile for the spiders that go mobile. It's a great short post by Ben while I was away.

Posted by Barry Schwartz at 1:38 PM | Permalink

June 1, 2006

Google Base Absorbs Froogle Feeds; Other Submission Systems Remain Independent

When I was at Google last this month, I got an update on Google Base for a forthcoming article. One of the things I was told was that Google Base was now the preferred way for merchants to submit content to Froogle. Really? Then why was Google still telling people on the Froogle site still to submit Froogle feeds? That oversight has now been corrected. As Garett Rogers notes, the feed submission mechanism formerly in the Google Merchant Center has now been replaced with Google Base submissions. Garett also highlights specific help pages about the change here. The consolidation is good, as Google Base is meant to be a central submission point of all content for Google, as I've written before (and SEW members, see also this). However, that goal still remains far off. Google Co-op, Google Sitemaps, Google Book Search and Google Scholar all remain independent ways to submit content of various types independently of Google Base, as the links for those services explain. I'll come back to this issue in more depth, in the future.

Posted by Danny Sullivan at 10:28 AM | Permalink

May 30, 2006

Google Sitemaps: Links To You Can't Hurt You

The Google Sitemaps team posted to their blog in response to a question at SearchEngineWatch Seattle. Interestingly, they note that links from bad neighborhoods do not harm a site's rankings, only links to bad neighborhoods. It has long been theorized that links from bad neighborhoods do cause ranking problems and this goes against conventional thinking.

Link networks often populate quality content sites with paid text links as part of their program. If at all possible, Google obviously wouldn't want to remove quality content from their search engine. One solution is to make outbound links from quality sites that sell links worth nothing towards building rankings for destination sites.

We've heard this from Matt Cutts before: "Link-selling sites can lose their ability to give reputation (e.g. PageRank and anchortext)." If a link from such a site loses it's ability to transfer PageRank, it can make sense that it doesn't harm a site's PageRank either. But that is not a foregone conclusion. The information comes from the Sitemaps team, and not Matt Cutts' anti-spam force.

In the above entry by Matt, he recommends the use of the "nofollow" link attribute to safely purchase links purely for traffic purposes. This infers links from bad neighborhoods indeed can harm a site's rankings in Google. Perhaps Matt implies this to deter link buying, but the advice is good insofar as links from bad neighborhoods also raises the profile of sites that eventually would come under scrutiny by Google. It can also be assumed that text links from bad neighborhoods can harm a site's rankings in other major search engines than Google.

Posted by Detlev Johnson at 8:22 AM | Permalink

May 25, 2006

Google Rankings Depend On Data Center, Geographic Location & Personalization

Aaron Wall has a nice write up on the different ways one searcher can see one set up results, compared to a different search seeing a different set up results, all for the same search query. Aaron explains that three primary things may determine the results sets you see for any particular query. They include the search engine data center you hit, the location of your computer and if you have personalization preference turned on.

Data Centers: Depending on the search engine, especially Google, you may hit a data center that has a different set of indexed pages or a slightly different algorithm. Both have an effect on the search results you see. Google has multiple data centers in order to help return you a quicker response and because it enables them to roll out different indexes and algorithms slowly and to select users. As you can imagine, it will affect the result sets you see.

Geographic Location: Some times, Google tailors the results to your location. So if you are in London, Google may show you results that are more relevant to a person in London. How? They may show more results from .co.uk domains, or from servers hosted in the area or sites that have the language.

Personalization: With most search engines now, you can now sign in, and enable personalization. That means the search engines look at your search history and other preferences and tailor the results specifically for you. As you can imagine, this will have an impact on the results you see for a particular query.

I did not really read Aaron's post, but I suspect it says the same thing I said above. If not, you can blast me in our forums. Read Aaron's post here, it is a nice topic.

Posted by Barry Schwartz at 9:32 AM | Permalink

Google Updates Webmaster Quality Guidelines To Include Affiliates

Google added some lines to the What are Google's quality guidelines? At the top, there is one slight change to the wording, nothing material. But Google added two points to the bottom. They added a bullet to the "Quality guidelines - specific guidelines and a paragraph at the bottom of the page.

The additional content includes;

If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.

If a site doesn't meet our quality guidelines, it may be blocked from the index. If you determine that your site doesn't meet these guidelines, you can modify your site so that it does and request reinclusion.

So, Google only wants sites that add value through unique and relevant content. If your site does not meet the Google quality guidelines, Google may remove you from the index.

Posted by Barry Schwartz at 8:41 AM | Permalink

May 22, 2006

Google Finds Bug With Site Search Command

Vanessa Fox from Google Engineering posted at the Inside Google Sitemaps blog, that Google found a bug with the site search command. The post explains that some of the reason people are noticing indexing issues at Google, is because of this bug. The two of a "few bugs that affected the site: operator" include using the site command with a trailing slash (i.e. site:www.example.com/) or trying it on a hyphenated domain name (i.e. site:www.example-site.com). Google says they will have it fixed within a few days, but until then, use the syntax site:www.example.com. I have the forum roundup on this bug at the Search Engine Roundtable.

Posted by Barry Schwartz at 9:08 AM | Permalink

May 16, 2006

Google's Indexing Timeline

Matt Cutts of Google posted an entry named Indexing timeline. The entry explains how Google crawls and indexes Web sites, these days. Most of it is in response to some of the recent issues reported about Google dropping pages from the index. Matt gives a general overview, digs deeper into the process and history and then provides some examples. I cannot summarize it for you now, but it is worth a read if you have time.

Posted by Barry Schwartz at 5:49 PM | Permalink

May 15, 2006

Google Hires SEM For "Better Conversations" With Webmasters

Matt Cutts, the bridge between Google & SEMs, has announced the hire of Adam Lasnik, an old time affiliate marketer and SEO, to help Matt with the communication between webmasters and Google. Adam said his role will be the "Webmaster Advocate" at Google, pushing for Webmaster needs and concerns. He calls himself the "MiniMatt," attending SEM conferences, replying to Google-related blog or forum posts, responding to some Google e-mails, and more. So now Matt can finally take a vacation.

Posted by Barry Schwartz at 9:29 AM | Permalink

May 10, 2006

Google Ban Checker Tool

This morning, I reported on a tool that allows you to check if you are banned in Google. The tool is a desktop application that searches Google using a site: command and also checks sites that link to you, to see if they are banned as well. You can check out the tool by clicking here. Keep in mind, Google also can notify you of some site penalties with Google Sitemaps.

Posted by Barry Schwartz at 9:38 AM | Permalink

Google Bug or Webmaster Bug? Google Responds To Shared Server Bug Issue

Matt Cutts responded to the Google anomaly we reported last week, where Google was displaying a different site's information from the same shared server. In short, two sites are hosted on the same server and same IP address. When conducting a search that should have brought up Site A, Site B was coming up in the SERPs.

The issue was technically not on Google's side, as Matt explained. The server folks that set up the server set up the virtual hosting configuration incorrectly. So why wasn't it an issue on Yahoo, MSN or Ask.com? Matt explains that Google uses "persistent connections to a webserver via a Keep-Alive header" that allows Google to use one single connection for all the sites on one server, thus taking up less server resources for Google and the Webmaster's server.

Posted by Barry Schwartz at 9:01 AM | Permalink

May 4, 2006

Google Results Suffering After "Big Daddy" Update?

The Register reports that Google is "choking on web spam" ever since the roll out of The Big Daddy Infrastructure. The article highlights a mention from Google CEO Eric Schmidt from last month talking about Google having a storage "crisis." From that New York Times article:

Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis."

This week, problems have gotten worse, webmasters all over the forums are reporting sever issues with pages dropping in and out of the index, pages not being crawled, old cached pages, dead (404) pages being returned by Google and outright irrelevant results. This morning I posted at the Search Engine Roundtable have a nice roundup of forum threads that are discussing the most recent Google issues with indexing pages. We have been tracking Big Daddy issues for too long here, for our last report at SEW see here.

Want to comment or discuss? Visit our thread named BigDaddy, Missing Pages In Google & Is The Big G Out Of Space at our SEW Forums.

Posted by Barry Schwartz at 8:34 AM | Permalink

April 26, 2006

Google Sitemaps Adds Spam Checking, New Webmaster Help Center & Other Features

I just came out of the Meet the Crawlers session, where Google announced new features and a new layout for Google Sitemaps. The Sitemaps blog just posted the details as well. One huge feature is that Google tells you if your site is in the index or not and if it is not, they won't tell you why.

Here is a break down of the new features:

+ New verification method + Indexing snapshot + Notification of violations of the webmaster guidelines + Reinclusion request form + Spam report + New webmaster help center + More about our new look + Adding a Sitemap + Navigating the tabs

Full feature list at sitemaps blog.

Postscript: Matt Cutts just pinged me to let me know he has posted an entry named Notifying webmasters of penalties. That entry explains that the Google Web Search Team and Google Sitemap Team working together to notify "some (but not all)" webmasters of Google site penalties.

Posted by Barry Schwartz at 1:59 PM | Permalink

April 24, 2006

Google Web Site Categories Explored

Social Patterns writes a detailed explanation of what he feels determines a site's Google Web categories. What I define as a Web category are the "quick links" (yahoo calls them) you find under results such as this one. The Web categories attempt to break down the site's main structure, with links to the main areas of the site under the main search result listing. The questions are; what determines which sites deserve these links? Which types of keywords trigger the results? And how does Google determine what to add as a web category link?

Social Patterns dug into this a bit more. He determines they are "likely determined by traffic patterns." His findings in short:

Google snippet links do not return links outside of the home domain. Google snippet links do not have to be from a text link, it can be an image link or even a javascript link. Google snippet link text can be determined from an image's alt text. Google snippet links can be subdomains of the home domain. Google snippet links are not determined by PageRank. Google snippet links are displayed for the top result for a "brand" search or "domain" search. (For example, "zappos" and "zappos shoes")

I tend to agree with this breakdown, however I believe Danny might not agree with all of the findings.

Posted by Barry Schwartz at 10:09 AM | Permalink

Matt Cutts Provides More Information On Google's Crawl Caching Proxy

In response to the new caching techniques Jennifer reported last week, Matt Cutts posted a more detailed explanation of what is called "crawl caching proxy." In short, Google may use all of its spiders, GoogleBot (Web search spider), AdSense spider, News spider, blog search spider and so on for caching purpose. So when all of these spiders crawl your pages, they are stored in what is called a "crawl caching proxy." The "crawl caching proxy" is then used for retrieving a page's cache. My understanding is that when you conduct a search at Google.com, you may see a cache retrieved by the AdSense bot, from within the "crawl caching proxy."

Posted by Barry Schwartz at 8:26 AM | Permalink

April 19, 2006

Google's Matt Cutts Confirms AdSense Bot Helping Googlebot With Indexing

Matt Cutts, who is speaking at this week's PubCon, confirmed that the AdSense mediapartners bot is doing double duty by not only targeting ads for AdSense but also indexing for the regular Google search database, in a bandwidth saving move.

Matt also noted that there is no advantage to being indexed by one bot or the other, however, those cloaking content and serving different pages to each bot could run into problems in the search index. More details on JenSense.

Postscript From Danny: Matt adds more about this on his blog, and it's a super-important clarification:

Pages with AdSense will not be indexed more frequently. It's literally just a crawl cache, so if e.g. our news crawl fetched a page and then Googlebot wanted the same page, we?d retrieve the page from the crawl cache. But there's no boost at all in rankings if you're in AdSense or Google News. You don?t get any more pages crawled either.

In other words, there are two big issues with the AdSense crawler helping Googlebot:

  1. Since the AdSense crawler swoops in fast, it could be a way for people to effectively get fast inclusion of their pages. Just add AdSense, wait for the AdSense bot to fly in, and you're set.  
  2. Is having the AdSense crawler likely to get you a RANKING boost, in addition to getting INDEXED faster. I've capitalized both words to stress them, as a reminder that being in the index isn't the same as ranking well for a query.

Matt's saying that no to both cases. There is no ranking boost. As for fast indexing, no to that as well. The AdSense bot is simply refreshing the cached copy of your page -- but the copy in the index, what people are searching on, won't be updated.

That brings up an entirely new point. It means that Google is now potentially presenting its results as fresher than they are.

What you search on is in the index, not the cache, as I've explained. So if a page changes and the index isn't updated -- only the cache -- then Google won't know about those changes to help with searching.

Matt also noted that there is no advantage to being indexed by one bot or the other, however, those cloaking content and serving different pages to each bot could run into problems in the search index. More details on JenSense.

For searchers, the date on the cache is a useful way to know if Google's index is updated. Now, you can't tell. For site owners, the cache has been a useful way to know that Google has indeed updated your pages. Now it no longer serves that function.

Posted by Jennifer Slegg at 9:47 AM | Permalink

April 17, 2006

Google's Cache Being Helped By The AdSense Mediapartners Bot

Publishers running AdSense on their pages may find that the Mediapartners-Google bot - the special Google bot used by AdSense to determine ad targeting on a publisher page - is actually sharing the results of those crawls with the main Google search database.

Greg Boser spotted it when pages being served strictly to AdSense began showing up in the main search database. And cache dates and times are matching exactly with when the Mediapartners-Google bot visited the page for ad targeting purposes.

How significant is this? At this point, it is uncertain, although Google clearly states that being an AdSense publisher does not help with search engine rankings. And it seems that the Mediapartners-Google bot is not adding new pages to the search index, but rather updating pages currently in the index.

For a more detailed look, visit both WebGuerrilla and JenSense.

Postscript: Matt Cutts has confirmed that the AdSense mediapartners bot is indexing for the main search index.

Posted by Jennifer Slegg at 2:06 AM | Permalink

April 6, 2006

Google Confirms Midpage "See Results For" Results Out Of Testing -- What Should They Be Called?

I griped recently about Google not committing to some of its user interface experiments. Well, they have with one of them, those middle-of-the-page "See results for" suggestions that some have seen. They have been declared officially part of the Google web search results pages, not just an experiment that might go away. More about this below, plus a call for people to suggest a common name for them.

GoogleGuy confirms the feature is part of the regular Google UI in one of our SEW Forums threads, after yet again, someone was asking about them.

Our long-standing thread about this feature is here: New Middle Of The Page "More Results" Experiment On Google. It gives lots of examples on how the feature works. Currently, you should be able to see if in a search for relentless. You should get a section midway down that says:

See results for: relentless records

Relentless Relentless logo. New site coming soon... Recording artists: KT Tunstall · Joss Stone. A&R contact: Relentless 43 Brook Green ... www.relentless-records.net/

index.html Relentless "Metal" Records. Welcome to our domain... Enter. © RELENTLESS "METAL" RECORDS. Tax Attorney · Tax Attorney. www.relentless-records.us/

Relentless Records Home Resurrection was released in 1999 by Angel Witch Productions associated with Relentless Records, then put out by Crook'd Records, then bootlegged by Zoom ... www.angel-witch.net/

Some further background is here, along with them being spotted way back in early August 2005. It can be hit and miss about what exactly triggers it. However, if you do see it happen, others should also see it for the same query.

I was also talking with Google about this feature directly yesterday. I was told these were made part of the official UI a few weeks after initially being tested. They were deemed successful, so Google made them a regular feature.

As GoogleGuy notes, Google expects to have documentation up on Google's guide to its web search results pages shortly. But they don't have an official name for these yet.

So let's help! What do you think they should be called? I'll throw out one suggestion. Call that section the MidBox or MidBox Results. We already have OneBox results at the top of the page, so it kind of fits in with that but also communications the position on the page.

What do you think of the feature? Have some suggested names? Please comment in our SEW Forums thread, Google Confirms Mid-Page "See Results For" Section No Longer A Test; Suggest A Name!

Posted by Danny Sullivan at 7:36 AM | Permalink

March 29, 2006

Matt Cutts Offers Q & A On Recent Google Questions

Matt Cutts posted a blog entry yesterday answering some of the questions he have received over and over again in regards to the "Big Daddy" update, supplemental results issues, the RK value and some other questions. Here are some pulled from Matt's blog...

Q: ?Is Bigdaddy fully deployed?? A: Yes, I believe every data center now has the Bigdaddy upgrade in software infrastructure, as of this weekend.

Q: ?Any new word on sites that were showing more supplemental results?? A: An additional crawling change to show more sites from those sites was checked in late last week, but it may still take a little bit of time (another few days) for that to show up in the index. I?ll keep an eye on sites that people have given as examples to see how those sites are showing up.

Q: ?Is the RK parameter turned off, or should we expect to see it again?? A: I wouldn?t expect to see the RK parameter have a non-zero value again.

Q: ?Seriously, How do you plan on picking which of these questions to answer?? A: I?m tackling the ones that looked interesting, short, and general enough that more than one person would be interested.

Posted by Barry Schwartz at 9:07 AM | Permalink

March 23, 2006

Supplemental Index Issues "Gone"; Welcome Big Daddy Index

Over the past month or so, Google has been having some issues with its "Big Daddy" infrastructure. The main issue reported to Google was that there was an overflow of "supplemental results" listed in the index. Matt Cutts said that the issue should now be gone," at least within a week's time. I have been tracking the supplemental index issues from early on, you can find more detail from my entries on 03/06, 03/09, 03/14, and today.

What is important to note is that Matt Cutts kind of gives us more insight into what exactly is this mysterious Supplemental Index. Google has their official help page on it, but it was never clear. Matt explains; if the site is not crawled as much as the main index, in Big Daddy, it may result in showing pages of that site in the supplemental index.

With the supplemental index issues almost resolved, the big daddy index should be live within the next week or two, meaning at all the different Google data centers.

Posted by Barry Schwartz at 9:18 AM | Permalink

February 8, 2006

Welcome Back To Google, BMW -- Missed You These Past Three Days

I said BMW would be back soon after they got banned on Saturday. Matt Cutts over at Google lets everyone know they are now back in. So, they got a three day slap on the wrist. It demonstrates once again how public spam reports can be so effective and how big major web sites really don't get the "death penalty," when it comes to spamming.

Spam always seems to get removed faster after a big dose of publicity. Back in 2003, I wrote Google Kills eBay Affiliate Spam Quickly, Others Survive for Search Engine Watch members that looked at how an eBay affiliate using doorway pages was quickly removed by Google after public exposure. In contrast, people still complain that nothing happens when they file spam reports with major search engines through official spam reporting feedback forms.

BMW's situation proves once again that the best spam antibiotic is a good topical application of publicity. So did you spot spam? Blog away. Get others to blog, and that will probably help get the spam removed.

Are you spamming? If you're not hiding your tracks well, be forewarned that the publicity monster might roll over you at some point. On the flipside, we'll eventually have so many public spam reports that not all of them will be dealt with.

For example, More European Automaker Sites Do Doorways & Should Search Engines Be Able To Enforce Spam Rules? on the blog from yesterday covered spamming spotted by Porsche Denmark and Chevrolet Sweden, but those two automakers remain listed. I expect they probably will remain listed, too. If BMW took a ding for being banned, Google took some hits from those who feel spam removals ought to happen after a warning. Google's probably thinking about ramping up the spam notification program it was testing before wiping out any more big time sites that might push back on no warning wipeouts.

Meanwhile, a second spam truism gets proven. Big companies hardly face a "death penalty" on Google. They get back in and fast. Let's do some timings. In the Spam Olympics event of getting back in after being banned, we have....

  • WhenU: Banned in 2004, back in after 42 days  
  • WordPress: Banned in 2005, back in after 2 days or less  
  • BMW: Banned in 2006, back in after 3 days

What if you aren't a big company? Matt covered the timeline on getting back into Google in his prior Filing a reinclusion request post.

How long do you have to wait now? That depends on when Google reviews the request and on the type of spam penalty you have. In the days of monthly index updates it could take 6-8 weeks for a site to be reincluded after a site was approved, and the severest spam penalties can take that long to clear out after an approval. For less severe stuff like hidden text, it may only take 2-3 weeks, depending on when someone looks at the request and if the request is approved.

So while BMW was upset that Google didn't give them a heads-up about being banned, at least they didn't have to wait 2-3 weeks to get back in. Over at Matt's blog post, you can see some of people commenting who aren't happy with such express service. Matt responds:

Our main goal has to be to give the most relevant results to our users; there is currently a trade-off between taking action to remove spam from our index vs. removing sites that lots of users look for with navigational queries.

That brings me back to the advice I've long given to those thinking of skirting search engine guidelines. How big do you think you are? If you really think you're running a crucial site, you can sin against Google and gang and probably be forgiven in short order. They do need you. Absolution will be provided. Maybe put you back in so that you don't rank well for generic searches, but you'll be back in and find for navigational ones.

Running some small web site that no one's going to miss? Don't expect express treatment nor gamble you'll be reincluded.

Meanwhile, Barry points to a WebmasterWorld thread finding that the same thing that got BMW banned is still happening. Well, not quite. As Philipp at Google Blogoscoped points out, the pages are gone from the live site but Google is still retaining cached copies of them. Those cached pages should be dropped over time.

Want to comment or discuss? Please do! Visit our Search Engine Watch Forums threads Google Removes BMW Germany For Spamming or BMW debacle good for SEO?

Posted by Danny Sullivan at 10:40 AM | Permalink

February 6, 2006

Google Sitemaps Stats On Most Common Words In Your Anchor Text & Site Content

Along with the cool new robots.txt checker, Google Sitemaps has also released stats showing the most common words used on pages within your web site and the most common words anchor text pointing at your site.

The common words in site content stats will be good fodder for those who believe Google somehow tries to figure out a word "theme" for your entire site. Google's never claimed to do this before -- and seeing sites like Amazon or Wikipedia rank for anything when they are about nothing in particular should demonstrate that you don't need to target all your pages around a particular term or theme.

Still, if Google's generating stats like this for a site, it'll probably tip some people back to worry more about this. I wouldn't - but do as you deem best.

The anchor text analysis is far more intriguing. Again, Google has generally said that each page is measured by the links pointing at that particular page. So if someone points at a deep page in your site, that helps that particular deep page, not the site as a whole. And if someone points at your home page, that helps the home page, not the entire site (Yahoo, in contrast, has said it does some sitewide link crediting).

Now Google's reporting anchor text terms for an entire site -- which suggests that any link to any page in your site might have an impact on other pages. Or not!

Questions, questions. I'll drop a word over to Google blogmeister Matt Cutts to see about getting some answers. I'll postscript here, but I'd also say to watch his blog as well.

Finally, while these stats are promised, I don't see them live for all of my sites my sitemaps yet. If you don't as well, there's probably a delay in getting them rolled out and live.

Posted by Danny Sullivan at 8:24 PM | Permalink

Google Launches Robots.txt File Checker; Now We Need Robots.txt Standardization

Very nice. Wondering how a search engine will process your robots.txt file? Google now provides a way to check on that through the Google Sitemaps program. More stats and analysis of robots.txt files from the official Inside Google Sitemaps blog explains more.

For Search Engine Watch members, the longer version of this article gives a real life example of how nice the checker is in action.

Overall, I'm thrilled with the new tool. I'd like to see the other search engines add similar ones. Even better, I'd like to see them all come together on creating an enhanced and more standardized robots.txt standard. Consider:

Postscript: Matt Cutts from Google has some good comments over here, pointing out Google also has an allow command (I've updated my list above) and further in comments to the post, explaining why they don't support crawl-delay yet because of concerns it might be set too low by mistake by some webmasters.

Posted by Danny Sullivan at 8:08 PM | Permalink

February 2, 2006

Q&A On Google Sitemaps

Google Sitemaps Team Interview has lots and lots of questions being answered from Google about how Google Sitemaps operates, so check it out if you're getting into them. Don't forget there's also a Official Google Sitemaps Blog.

Posted by Danny Sullivan at 12:48 PM | Permalink

Google Bigdaddy Search Infrastructure To Rollout More Broadly

Bigdaddy progress update from Google's Matt Cutts provides an update on the new Bigdaddy search infrastructure that Google has been testing for several weeks. It's now rolling out to various Google data centers, but that's going to be a process that will take weeks.

Want to know more? Matt talked with Greg Boser and Todd Friesen about Bigdaddy as part of their SEO Rockstars show earlier this week (audio file here).

Along with jammin' and spammin' and Bigdaddy, Matt covered sandbox issues, link baiting, the v7ndotcom SEO contest, playing hockey at Google, bad plays and more.

Rand gives you a text summary here of the show here. Threadwatch ponders Matt's use of the term orthogonal here. By the way, you should be listening to SEO Rockstars since Matt himself does when driving to work.

Want more on Bigdaddy? See our New Google "Bigdaddy" Infrastructure Live, Data Center Open For Feedback thread at the Search Engine Watch Forums. Here's a summary post I did within that discussion:

Yep, Matt's been a madman today, but there's a method to his madness. It was all part of setting things up for taking feedback on the Bigdaddy data center, which will migrate to Google in the next month or two. So expect a Feb. 2006 or March 2006 Bigdaddy Update. Key posts, which I'd suggest reading in this order:

  • Bigdaddy on the move: Alert that one of the Bigdaddy data centers is back to showing regular results so fixes can be put into place. Want Bigdaddy, then go to http://66.249.93.103, where it's still live.  
  • Feedback on Bigdaddy data center where he covers how the data center got its name, how this is an entire new infrastructure for Google web search coming online, how it will go live on "regular" Google in the next month or two, how ranking changes you may see now on regular Google are unrelated, how to send feedback about changes you see and more.  
  • SEO advice: discussing 302 redirects on how and why Google handles permanent redirects on regular Google and new Bigdaddy-flavored Google.  
  • SEO advice: interpreting inurl on how to use the inurl operator at Google and why the results probably don't show a hijacking issue, in case you suspect that in regular Google or Bigdaddy.  
  • SEO advice: url canonicalization on my favorite word, how Google determines which domain to use for your listings when there are multiple options. Canonical issues are something Matt hopes Bigdaddy will improve.

By the way, for some additional background on two of the biggest problems that Bigdaddy aims to solve for Google -- hijacking and canonical issues, see these past pieces from Search Engine Watch:

So far, I haven't had a chance to play with Bigdaddy, but I already have a big positive feeling from the effort Matt's put into to prepping people for it and to help them send feedback.

Posted by Danny Sullivan at 11:22 AM | Permalink

January 20, 2006

Second Issue of Google's Librarian Newsletter Released and More Interesting Reading on Web Search

The second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.

Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.

Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.

For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.

Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.

He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."

While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.

Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.

It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."

As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.

After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.

Matt concludes with links to a few more excellent papers.

Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.

As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.

Yes, the concepts used in citation analysis are really what drive link analysis.

If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."

Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.

Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research

As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.

Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.

Posted by Gary Price at 11:58 AM | Permalink

January 11, 2006

New Article Looks at Google, the Domain Registrar

Via Jim Boykins blog, a guest article by domain name expert Nick Wilsdon that looks at what Google knows about domain names. It's not only a great look at how the domain name industry works but also shares info about what info Google has access to as a domain name registrar. The company became a registrar in February 2005. A very interesting read.

Nick writes: However it seems that Google did not start signing as a Registrar in order to buy or sell domains; they did this to have greater access to domain information. Now to clear some of the FUD that was speculated about this. Google as, say a .com Registrar, does not have access to all the customer records of Verisign. They can only access the further details of domains which they personally sold. Tucows and GoDaddy are both Registrars with Verisign, do you think they have access to each other?s entire customer database? Certainly that information would be well worth the $3500 fee. No. Unless the domain is within their own account they have exactly the same access as you or me using the public WHOIS records.

He goes on to offer his views, what he later calls speculation, on why Google became a registrar: I believe Google has built or is building a tool to analyse domain names. The API access they were given as a Registrar allows them to carry out the level of automated queries they needed for this. I would also go further and suggest this tool is building up a historical picture of each domain through regular scraping of their WHOIS records.

Postscript: Since we're on the subject of domain names, let me answer I question I frequently receive. When I share newly registered domain names on the blog I get the information using a combination of my time and the WHOIS.sc Mark Alert service. For $15/month I receive daily updates of domains that are registered containing the word Google. I track not only the word "Google" but numerous other names. The service tracks .com,.org,.net,.info,.us,.biz, and .web domains.

Then, I spend some time each week reviewing the lists and checking the ownership. It's not only useful (I hope) but also interesting (often amazing) to see what people are registering. That same $15 fee includes access to the Whois History database that provides historical ownership info back to about 2001. If you're looking for some basic domanin name stats, WHOIS.sc offers them for free on this page.

If you're wondering, the answer is yes, other services exist that can provide similar types of info more quickly and in an easier to use format. However, they are often very expensive. Examples include: + MarkMonitor + Thomson Compumark + Dialog offers file 225, offering fielded searching (numerous options) of domain name info back to 1997. Caveat: The last time it was updated was September 2004.

Again, all of these services are fee-based.

Note: Dialog's Open Access Service offers limited access to the Dialog 225 file. It's very limited in the what fields you can search. So why mention it? It doesn't require a subscription use, searching is free, and you can pay for records with a credit card.

Posted by Gary Price at 12:01 PM | Permalink

January 4, 2006

Google Wants Your Feedback About Their New "Bigdaddy" Data Center

Google's Matt Cutts offers a FAQ of sorts with information about the new "Bigdaddy" data center that Matt mentioned about a month ago in a response to a post on his blog. In this new blog post Matt says that Google is now ready to collect feedback about the new center.

Matt's post answers such popular questions as to where did the Bigdaddy name come from? Answer: a session at PubCon.

BigDaddy is visible from two data centers: + 64.233.179.104 + 66.249.93.104

Cutts says that Bigdaddy will eventually become the default source of web results in one to two months (a guess).

Matt goes on to say that the Google is looking for quality feedback about results about Bigdaddy and adds that site: searches are improved from all Google data centers.

He writes: We?d like to get general quality feedback. For example, this data center lays the groundwork for better canonicalization, although most of that will follow down the road. But some improvements are already visible with site: searches. The site: operator now returns more intuitive results (this is actually live at all data centers now).

Matt adds that our SEW Forums would be a good place to share your comments about Bigdaddy. Some discussion is already underway in this thread. Make sure to share your thoughts.

Posted by Gary Price at 5:00 PM | Permalink

December 22, 2005

Ending The Year With A Google Sandbox Reexamination

Since everyone else is blogging about our Google Sandbox discussion at the Search Engine Watch Forums, guess I'd better get on it as well. Getting Out Of Google Sandbox Using Subdomain & Redirection was the thread that started things, where moderator Dave Naylor shared with members the tip he gave out at SES Chicago on breaking out of a sandbox-like effect. That brought out later an entire reexamination of whether there really was a sandbox at all. Barry over at Search Engine Roundtable said all the "big dogs" were participating. Yep, a lot of our most highly rated members are in there spinning out examples and having a fine time discussing things. It got so good I decided to spin that portion off to a new thread, which you'll find here: 2005 Year End Revisit: Is There A Google Sandbox?

Posted by Danny Sullivan at 5:18 PM | Permalink

December 19, 2005

First Issue Of Google's Newsletter For Librarians Released; Cutts Writes On Inverted Index & Ranking

In October, Danny blogged about Google's "coming soon" quarterly newsletter for librarians. Today, the first issue went live. It's available here.

I've posted a bit more on ResourceShelf.

The highlight of this issue is the Matt Cutts authored article on how Google crawls content and ranks results, with a very nice explanation of how an inverted index works. For some librarians and many readers of this blog, it will be familiar material. Neverthless, when Mr. Cutts writes, it's always a great read and in this case an excellent review.

Posted by Gary Price at 5:36 PM | Permalink

December 2, 2005

Google Sandbox, Sandbox-Like Filters & Escaping With Trusted Links

For well over a year, there's been massive debate and speculation that Google puts all new sites into a "sandbox," preventing them from ranking well for anything until a set period of time has passed. Now we get more confirmation from Google that if there's not a sandbox, there's at least part of its algorithm that may make it seem that way for some sites -- plus thoughts on how trusted links may help sites escape those filters. Google Sandbox, Sandbox-Like Filters & Escaping With Trusted Links for Search Engine Watch members looks at some of the discussions and articles that have fueled this over the past few weeks.

Posted by Danny Sullivan at 9:59 AM | Permalink

November 16, 2005

Google Sitemaps Expands To Give Query & Indexing Stats!

Google's just added some new and long desired tools as part of its Sitemaps system. You can now get query stats and see top keywords driving traffic to your web site (wow!). Crawl stats also show you how often you've been visited, any particular errors and messages why, such as "You banned us with robots.txt, you idiot." OK, it doesn't say that, but it should. You don't have to submit to Sitemaps to play with the new tools -- you just need to have a free Sitemaps account. Go check it out. More details here on the Google Sitemaps blog. I'm off to play and will do a follow-up afterward. Want to discuss or comment? Visit our SEW Forums thread, New Stats On Queries & More With Google Sitemaps.

Postscript: Remember, to see the more detailed stats, you have to verify your site with Google first. Once verified, then you have access to them

Posted by Danny Sullivan at 3:52 PM | Permalink

How to Submit Your Content to Google

Just noticed this new page that lays out five ways to submit content to Google:

+ Add your URL to Google's index

+ Google Base - New!

+ Google Sitemaps

+ Google Print Publisher Program

+ Google Video Uploader Program

Interestingly, no mention of the Froogle feed program or creating content for inclusion via Blogger.

Posted by Gary Price at 1:20 PM | Permalink

November 10, 2005

Matt Cutts Banned On Google? And Oct. 2005 Jagger Update Winds Down

The Oct. 2005 Google Jagger update saga that has sucked the life out of so many (but not all; some are blissfully unimpacted by it) seems to be ending. Indeed, so says Google's Matt Cutts in his Jagger winding down post. But Matt, if the update is over and bugs worked out, why's your blog banned on Google?

The article I just posted for Search Engine Watch members (go on, support the site - become a member and get to read the full story) goes into detail about the situation, but here are the highlights for everyone.

  • He's not really banned, but less savvy site owners could easily get that impression.  
  • Dave Naylor presents the evidence in Jagger3 hahahaha, where he shows how a search for mattcutts.com brings up no matches. That's often a sign that a site has been banned.  
  • He's not banned, however, as a search for matt cutts shows. You'll see how he's ranking in the top results for that, which wouldn't happen if he was banned.  
  • But wait! Notice how he ranks twice, with listings for both www.mattcutts.com and mattcutts.com! That's a canonical error -- my favorite hated word and problem that I wrote about earlier. It's Google getting confused about which domain name to use for Matt's site (and other sites as well).  
  • Overall, another reason for what I said earlier -- it's overdue for search engines to let us register all our domain names directly with them and indicate the ones they should be using.

Also, by winding down, that doesn't mean winding down on Google itself. Matt's post wrote that you'd find it in action if you went to the http://66.102.9.104/ data center. Over time -- the coming days -- changes will migrate to all the Google data centers.

In some related notes, Barry at Search Engine Roundtable points to Update Saga. Part zillion over at WebmasterWorld, where people are wondering if the update has come to an end. It also notes that GoogleGuy has warned of a PageRank/backlink update to happen between now and the end of the year.

Thoughts on Jagger: Recips Got Hammered, Trust Trumps All from Andy Hagans at the Link Building Blog is a nice, short piece summing up what he things were the two major changes in the update.

First, reciprocal links don't see to work as well (What are they? Want to discuss? check out this SEW Forum thread: Reciprocal Linking ? Dead or Alive?). Second, sites with authority/TrustRank seem to do better (What's that? Check out Yahoo My Web: An eBay For Knowledge).

Want to discuss or comment? Visit our SEW Forum thread, Oct. 2005 Google Update "Jagger". C'mon by Matt -- tell us what's going on :)

Posted by Danny Sullivan at 10:04 AM | Permalink

November 7, 2005

Google Oct. 2005 Jagger Update Continues Into November & Hating The Term Canonical

So I go away for vacation for two weeks, and discussion of Google's October 2005 "Jagger" update is STILL going on when I get back. Nice (or maybe not) to know nothing changes. Here's a fast rundown on things, including a look at canonical issues, what the heck that means and why after a decade of existence, maybe it would be nice if search engines gave us a better way to indicate the domains we own and which to use when listing our pages. Seems like that would help solve canonical/domain name problems.

  • Google: Phone Numbers in Results and Better Precision over at Threadwatch looks at how results for things like [boeing 727] and [nokia 3650] seems cleaner. It also looks at Google showing phone number results for some queries. That's not new, but the links to maps next to listings does seem a recent change.  
  • Reciprocal Linking After Jagger? from Barry Schwartz over at Search Engine Roundtable wraps-up some discussion in our own SEW Forums thread, Reciprocal Linking ? Dead or Alive?, pondering whether reciprocal linking is being hit hard in the latest update.  
  • Jagger or Jäger? Google?s Update Unraveled at Search Engine Lowdown has Jenny Halasz taking a swing at what seems to be some of the most widely discussed changes (hidden text, paid/reciprocal linking, too much internal link optimization) but also how few of her clients are seeing changes. FYI, at our SEW Forums Live event two weeks ago, which was heavily attended by in house SEOs, the Jagger update was of relatively little concern to the audience. In other words, the update may have hit affiliates and others who are light on content harder than some others.  
  • Why I Try to Spend Less Time Analyzing Algorithm Updates from Todd over at Stuntdubl has sage advice I've heard other vets say before. Don't try to analyze too hard now. Wait for the dust to settle.  
  • Jagger Update at Google from Barry at SE Roundtable has him doing the hard work of slogging though the Update Jagger - Part 2 thread over at Webmaster World, which he's found to be the best forum discussion overall about various changes. He summarize two key things that seem to be involved, duplicate content issues and reciprocal linking. Then, as with virtually any other major update you care to discuss, he covers how people are also reporting exact opposite findings of each other.  
  • Jagger 2 Update Info from Google's Matt Cutts has a rundown on changes that people should be seeing when searching Google now, along with lots and lots of comments.  
  • Jagger3 update is the latest weather report from Matt Cutts, saying that more changes and fixes are on the way, including correcting canonical problems. Canonical? Canonical! Oh, how I hate that term. First, I can never say it properly (I'm always saying caniconical!). Second, no one knows what it means, as you can see in comments below Matt's blog posts. Here's a definition from Answers.com:

The actual name of a resource. For example, a canonical name of a server is its true name rather than an alias.

To put that more in SEO terms, it means knowing which domain name a search engine should use for your site. Search Engine Watch, for example, can be found at:

searchenginewatch.com www.searchenginewatch.com sewatch.com

Those are just some of our domains. Usually, Google gets it right and lists our pages using our preferred domain name, searchenginewatch.com, which is the only one we actively promote. But sometimes, it will list our site as if it is two different sites, searchenginewatch.com and www.searchenginewatch.com. For example, look at this search. You'll see that the first page, How To Use HTML Meta Tags, uses the www.searchenginewatch.com domain. Then the third listing is the SEW home page, using the searchenginewatch.com domain.

That's a canonical problem. We're partially at fault. Somehow, we started doing a 302 temporary redirect rather than the 301 permanent redirect that's recommended, which I'm having fixed (we used to do a 301, and I don't know how that got messed up). Despite our bad, it's still a search engine canonical error that it can't figure out these are the same site despite the wrong redirect being used. Or perhaps a better term is a domain name error -- it can't get the domain name right, and that's easier to understand, much less pronounce.

The entire mess also brings up the issue I've raised in the past, most recently with the MSN PageRank 2 case, about why ideally, site owners would simply be able to register the domain names they own with search engines in some trusted manner and indicate the preferred one that should be used. Then hijacking issues, canonical/domain name issues and other problems could more easily be solved.

LET'S GET ON WITH IT, SEARCH ENGINES! Who wants to continue with this type of madness?

Need to talk, discuss and commune about the update more? Part 3 of the Update Jagger over at WebmasterWorld is the latest multipart thread there. Oct. 2005 Google Update "Jagger" is the far more low-key discussion at our own Search Engine Watch Forums.

Posted by Danny Sullivan at 3:16 PM | Permalink

October 21, 2005

The MSN PageRank 2 Controversy & Search Engines Needing To Offer Domain Management Tools

As part of the current Google update underway, it's been noticed that MSN now has a PageRank score of 2. What's the deal, Google -- decide to pull a little love away from MSN? Not so, says Google's Matt Cutts -- they're actually a PR8. Then why do you see a PR2 score when you go to MSN? Let's break it down, as well as revisit the oft-desired need for search engines to allow site owners to tell them directly which domains should be treated as the same.

  • Visit MSN at http://www.msn.com with the Google Toolbar installed, and you'll see a PR2 score reported.  
  • MSN.com down to pagerank 2! at WebmasterWorld has some of the discussion this sparked, where the anonymous GoogleGuy from Google puts the blame on redirection that MSN is doing. Look at http://msn.com, and you'll see a PR8 score is reported, we're told.  
  • OK, but if you try that, you get redirected to http://www.msn.com, where it's still PR2.  
  • Answer? You need to get the Google PageRank score for msn.com in another way than trying to reach that page, since the redirection will send you to what's technically a different page, the home page of www.msn.com  
  • How? Google's Matt Cutts, posting over Threadwatch and sounding pretty in sync with GoogleGuy, explains that msn.com is a PR8 site and points to the Future PageRank checker at SEO Tools as a way to see this. (At this point, you're asking "Isn't Matt Cutts GoogleGuy?" For the record, Matt's never publicly laid claim to being GoogleGuy. But since Matt's more active on commenting with things these days, I think it's well time that GoogleGuy step forward with a real name, so that if they are one and the same, there's isn't confusion that two different people are talking. Honestly, at some point we'll have someone citing GoogleGuy, then someone citing Matt against GoogleGuy, which is absurd if they are the same. I and many others do know the real identity of GoogleGuy. I think it's well time everyone knows and hope GoogleGuy will step forward).  
  • Run a test for msn.com using the checker, and you'll get a list of the PageRank scores reported by various Google datacenters, including the server that feeds the toolbar display. All of them are PR8.  
  • Again, you can't see these scores showing up in your browser when trying to go to msn.com, because you get redirected to www.msn.com, which has a different PR score.

All this brings us back to the issue of redirection. MSN is doing a 302 temporary redirect from msn.com to www.msn.com, which can confuse search engines into knowing if they should be treated at the same site. A 301 permanent redirect would be preferred.

But more preferred than that, life would be a lot easier if site owners could simply register all the various domains that may resolve to their "main" domain with Google and other search engines, rather than them having to guess.

People have been wanting this for ages. C'mon Google and Yahoo! You've both made moves to let us submit sitemaps and URLs to be crawled. Let's get with it so we can register domains with you and how they should be treated through some type of program. It so long overdue. That's especially so given that after the last indexing summit, as I've written, the search engines were unable to unify on any common treatment of dealing with redirects.

Posted by Danny Sullivan at 9:21 AM | Permalink

October 20, 2005

Google Oct. 2005 Update "Jagger" & Contacting Google About It

Noticed changes to Google results? Many have, and now it's official. An update is on, and Google's Matt Cutts gives advice in his Update Jagger: Contacting Google post on how to contact Google with any concerns in terms of spam, bad relevancy, whether you think your site was penalized and so on.

But wait a minute! Didn't Matt downplay earlier that an update was going on in his More info on updates post? Yes, I'd say he did -- but he later added with this:

Just to clarify, these days with lots of smaller and larger changes happening at different points in time, it's a little arbitrary to decide when to call something an update. That decision has usually fallen on Brett Tabke's shoulders over at WebmasterWorld (WMW also chooses what name they want to call it when Brett decides enough has changed to call it an update.) Given that there should be new PageRank/backlinks visible in a few days (assuming no issues at our end), I wouldn't be surprised if Brett slaps a name on it pre-emptively, even though there will still be some flux to come.

Absolutely, WebmasterWorld has been the traditional mechanism for naming updates. But whether something is an update or not isn't a unilateral WebmasterWorld decision, though Matt admittedly does qualify his statement to say this is "usuallly" rather than "always" the case.

Last month is a good example of how this is becoming a bigger issue. There was debate over whether a Google update should be declared. WebmasterWorld initially declared one was underway, dubbed it "Gilligan," then after Matt downplayed if it really was an update, a "false alarm" was declared by WebmasterWorld. I cover this more in my Google's Cutts Says Not An Update - I Say An Update, Just Not A Dance post from earlier, which also explains why even if Matt and WebmasterWorld didn't feel it was an update, I certainly did -- nor was I alone.

Now let's skip forward to this update.  We'd been having scattered posts on our forums about Google changes last week. On Monday, catching up on weekend posts, I saw a new flurry of them coming in. It was enough that I consolidated some threads into a single Oct. 2005 Google Update thread. As far as we were concerned at SEW Forums, an update was on. I went with the naming system we use, just the month and year. Certainly one was in the minds of several members, so having a thread just for them to discuss the issue helps.

Over at Threadwatch, it effectively declared the same in the Google Update drops SEO's in it post that Brian Turner started (he also posted early with us, kicking off our thread). Brian pointed over to a Digital Point forum discussion, Google seems to be dancing, where people were effectively declaring an update last week. He also pointed to much WebmasterWorld talk here (link currently down due to WebmasterWorld routing problems, see new live thread here), though this was before an "official" update was declared.

All this chatter was enough to get Matt to make his More info on updates post, which was a great weather report/confirmation that things officially were going on. However, he stopped short of saying it was an update, leaving that declaration to be WebmasterWorld's call.

It's just silly. Matt and Google know if they are doing an update, especially one that might generate a lot of forum chatter. I love that we're getting weather reports from Google, Yahoo and others now, but we need them issued ahead of time.

It's like standing outside in the rain then deciding to tune in your radio to a weather forecast to see if it's raining. Of course it's raining -- you can feel the drops. Since the search engine decide if rain will fall, give us the heads-up before -- and if you think it's going to be a signficant downpour, then proactively declare an update underway. And err on the site of caution.

I'm totally cool with the idea of WebmasterWorld then continuing to assign a fun, memorable name to declared update. That's been the tradition, and WMW deserves to continue to have that honor. I just want the search engines themselves to declare that the weather might be rough and give the official word that something is coming before it hits.

By the way, Changes and Paranoia - the sky is falling from from Jim Boykin out today is worth a read for some fresh perspective on the insanity that can erupt in dicussions out of update and how to stay focused on things to care about.

Want to comment or discuss? Visit our Oct. 2005 Google Update "Jagger" forum thread.

Posted by Danny Sullivan at 7:29 AM | Permalink

October 17, 2005

Matt Cutts Gives Tips On Moving Sites & Keeping Google Happy

Google's Matt Cutts has moved his blog to a new host, and he shares some advice on how to keep Google happy if you have to do a similar move here. The last step is most important. Once you are sure Googlebots are visiting the new site and not the old, it's safe to shut things down. He also touches on what do to if you need to change domains (short answer, 301 redirect from old to new). By the way, Retaining Traffic after a Web Site Redesign from SearchDay last year has some related tips you may find useful.

Posted by Danny Sullivan at 7:27 PM | Permalink

October 14, 2005

Fake PageRank Detection Tool

Using redirection and cloaking, a site can lay claim to the PageRank score of another site, with Dark SEO Team giving you a classic example here. Is it really a PR10 site, or is it Memorex? Actually, it's Google. Compare the backlinks, and you'll see link:www.pr10.darkseoteam.com = link:google.com! That's a way to detect fake PageRank. Or you can do it more easily with this handy tool from SEOlogs. Enter the domain you suspect of being a faker, push submit, and if the domain you were checking on remains the same, all systems go. If not, they're faking you out.

Posted by Danny Sullivan at 11:24 AM | Permalink

October 10, 2005

Talking With Google's Matt Cutts

Interview Of Matt Cutts from Aaron of SEO Book is a nice talk with Matt Cutts, Google's spam fighter, webmaster relations guru, quality assurance czar and hands-down winner of the needs a better title sweepstakes at Google. Matt, officially a Google software engineer, covers not all SEO being spam, no one being able to guarantee a top ranking free listing on Google, asking for SEO references when hiring a firm, LookSmart's distributed Grub spider being his favorite crawler, how he color codes his email, buzz marketing, Mick Jagger being the Matt Cutts of rock 'n' roll and lots, lots more.

Posted by Danny Sullivan at 3:05 PM | Permalink

September 23, 2005

Google Testing Remove Results Feature

Google's testing a new option letting a small percentage of people remove results they don't like from their own personalized search results. This will only happen if you're logged in and using Google Personalized Search. In other words, see a page you don't like? You can block that page from coming back. It only will impact the personalized results you see -- not the personalized results of others or general results that anyone sees.

Whether it will be released to everyone will depend on how the experiment goes, Google says. Whether the data might be rolled out to being used more broadly for regular, non-personalized results also remains to be seen, I'm told. It's just gone live about an hour ago.

See also the A Search Marketer's Look At Yahoo My Web 2.0 for Search Engine Watch members for a detailed look at how a similar feature operates on Yahoo. Yahoo provides more info on this (it's in My Web 1.0, not My Web 2.0, here).

Postscript: Google's Matt Cutts has posted more info about the feature over here.

Posted by Danny Sullivan at 1:20 PM | Permalink

September 20, 2005

Google's Matt Cutts Provides More Formal Info On Ban Notification Program

Earlier we noted that Google was testing a program to notify webmasters if they'd been banned on Google. Now Google's Matt Cutts has posted information more formally on his blog, in a Q&A format. More here: Alerting site owners to problems.

Posted by Danny Sullivan at 1:50 PM | Permalink

Confirmation From Google That The Sandbox Does Exist

In some catch-up news, Bloggers: Google 'sandboxing' sites at News.com from last month covers how Google engineers at the recent Search Engine Strategies conference finally confess that there is indeed a sandbox where new sites sit until they can rank well.

Google Sandbox Exists - So Says Google at Threadwatch is one source of the information, where member DougS said:

I listened for some time to one of the Google engineers expounding on all things search at Google. He said that internally they do not refer to the probationary period as the sandbox. They've been amused by the term, and have affectionately turned to calling the sand covered volley ball court in their quadrangle "the sandbox". He did, however, openly acknowledge that they place new sites, regardless of their merit, or lack thereof, in a sort of probationary category....

More of his comments are here, including how there are exceptions (as many have long assumed) that can spring a site out of the sandbox.

Oh Ye of Little Faith at SEOmoz is Rand Fishkin as the other key, earlier source. He writes:

We all shared the opinion that ranking new sites at Google was a pain since the inception of "sandbox" and Matt [Cutts, a Google software engineer] noted (this is a near word-for-word quote) - "OK, so it's really working. Even on you (guys)."

--and from another engineer--

He noted in words I cannot remember exactly that they felt it was having a remarkable effect on the quality of the index. We moved on to other subjects after this, but not before he was vehement in explaining to me specifically that they did not design it to affect "all new websites", but that a "filter must be tripped" for a site to be "boxed".

Postscript: Dave Naylor tells me a different story, that he felt the conversation Rand is talking about (Dave was also there) wasn't about sandboxing but instead new spam filters.

Posted by Danny Sullivan at 9:44 AM | Permalink

September 19, 2005

Checking If You're Banned On Google

Now that you know how to get reincluded in Google if banned, thanks to minty fresh advice from Google's Matt Cutts, how do you know if you've been banned at all? Some advice:

  • You could read all the information that Marcela De Vivo wrote up in the Coping With Search Engine Penalties article in SearchDay last month.  
  • You could hope (or pray that you don't!) that you get one of the new "you've been banned" emails that Google is currently testing.  
  • You could try a new tool I spotted via SEOmoz that tries to report if you've been banned. It sees if you have pages coming up via the site: command in Google. If not, that's sometimes a sign you've been banned. It also uses a second method to check if you've been banned, though exactly what this is isn't documented. There's a thread about the tool here, but really doesn't cover the second method.

In the end, the very best way would be if Google provided a ban checking tool of its own. The current test of sending banned notification emails provides a glimmer of hope that this might come. Google's rejected such things in the past, but recent discussion between Matt and I and others at Threadwatch suggests perhaps it could happen.

Want to discuss? Visit our forum thread, Google Testing Ban Notification -- Could New Webmaster Tools Come?

Posted by Danny Sullivan at 10:38 AM | Permalink

Penalized in Google? Here's What to Do

If you've pushed the line with optimization techniques and have been dinged by Google, Matt Cutts offers advice on how to get back into the search engine's good graces. Matt says it boils down to two basic things:

Fundamentally, Google wants to know two things: 1) that any spam on the site is gone or fixed, and 2) that it’s not going to happen again. I’d recommend giving a short explanation of what happened from your perspective: what actions may have led to any penalties and any corrective action that you’ve taken to prevent any spam in the future.

Follow Matt's guidelines and with luck your site will be reinstated in Google search results in anywhere from 2-8 weeks.

Posted by Chris Sherman at 7:43 AM | Permalink

September 15, 2005

Google Testing Notification Of Banning To Webmasters

Nice scoop over at Threadwatch, coming off a thread at Search Engine Forums! Google Pilot New Webmaster Communications Initiative at Threadwatch covers a new Google program where Google is emailing those who run sites where they spot things they think might violate their guidelines. Google's Matt Cutts is going to blog more, but he comments over there:

Google is trying out a pilot program to alert site owners when we're removing their site for violating our guidelines. JavaScript redirects are the first trial, but we've also sent a few emails about hidden text, I believe. This is not targeted to sites like buy-my-cheap-viagra-here.com, but more for sites that have good content, but may not be as savvy about what their SEO was doing or what that "Make thousands of doorway pages for $39.95" software was doing. Personally, I think opening up a line of communication to let webmasters know when we're taking action is a really good thing--a site owner doesn't have to guess about what happened. But again, we're starting with a trial program. I'll blog about it more soon. [Note: Matt's blog is here]

Yeah, communication is great. It's just odd that after years of being told it was impossible or difficult to provide some type of "is my page OK" tool for webmasters to use, now Google's proactively doing it. Such a tool has been dismissed as potentially helping spammers.

The email sent says that a particular URL was removed and lists some reasons why, along with a note that it will be pulled for at least 30 days unless content is changed and a reinclusion request is done.

Posted by Danny Sullivan at 4:03 PM | Permalink

September 14, 2005

Google's Italian Webmaster Guidelines Need Better Translation

Enrico Altavilla writes to note that Google guidelines for Italian webmasters have taken a turn for the worse. After helping them eliminate some translation errors two years ago, he was shocked to find the material reverted in July. He writes:

Back in January 2003, Google.it's guidelines for webmasters (1) were full of translation errors and many Italian webmasters were puzzled by the misleading and meaningless information published on Google.it web site.

So, I sent a correct version of many phrases to the Google translation team, they thanked me with some merchandise and they published a corrected version of their italian guidelines, that you can currently see only in this Archive.org cached page.

The correct version has been on Google.it web site until last July (I can't be more precise), when I noticed that the guidelines reverted to the two years old errors-filled version. This change is producing (again) doubts and questions on webmasters and SEO Italian forums.

I published an article about this problem on my SE related news service and of course I contacted again the translation team to submit the issue. That was on July 17 2005 and since then they did nothing.

That big of a deal? Well, you can judge yourself. Here's how he says the material translates, from what was there, to what's there now, bold noting the changes:

Original: "Keep the links on a given page to a reasonable number (fewer than 100)."

Italian: "Keep the links to a given page to a reasonable number (fewer than 100)."

Original: "In particular, avoid links to web spammers"

Italian: "In particular, avoid links to sites that send unsolicited emails"

Original: "Don't employ cloaking or sneaky redirects."

Italian: "Don't employ cloaking or unaccepted redirection commands."

Original: "Avoid 'doorway' pages created just for search engines, or other "cookie cutter" approaches"

Italian: "Avoid 'doorway' pages created just for search engines, or other approaches to suppress [browser] cookies"

Original: "It's not safe to assume that just because a specific deceptive technique isn't included on this page, Google approves of it."

Italian: "It's not safe to hypotize that Google approves a web page just because no deceptive techniques were adopted."

Posted by Danny Sullivan at 10:50 AM | Permalink

August 3, 2005

In the Search Engine Penalty Box?

Search engines have rules for what's acceptable—and not acceptable—for content, linking, and many other factors that are used to calculate relevance. Some guidelines are clear and public, but other policies are known only to the search engines themselves, and if you step over the line, your site may be dinged with a penalty that decreases your rankings or worse, eliminates your site from search results altogether.

Even though many of these policies are not public, observant search engine optimizers have recognized and described many tactics that can draw a penalty to web site. In today's SearchDay article, Coping with Search Engine Penalties, guest writer Marcela De Vivo describes many of these gotchas, and offers advice and guidance for dealing with penalties.

Posted by Chris Sherman at 8:00 AM | Permalink

July 29, 2005

And Now, A Yahoo Sandbox?

Debate still continues on whether Google has a "sandbox," the idea that new sites simply can't rank well for anything that perhaps their own names until a set period of time has passed. Now rumors and talk of a Yahoo sandbox have begun. Barry rounds up some forum discussions here, and Threadwatch has some talk here. We also have a forum thread here, Yahoo Sandbox?

Sadly, I find clarity in all this gets lost by the fact that "sandbox" has now become a synonym for "I don't rank well in Google." In other words, say someone was ranking well and there was an algorithm shift that made them move from the first to second page or further back.

Happens all the same. Has happened long before we had a sandbox notion at Google or elsewhere. But I've seen people say, "Oh, that's the sandbox" when it clearly does not fit the traditional sandbox notion. This type of mistaken assumption pollutes real understanding of how any Google sandbox may be working.

Sandbox - IN or OUT? at our SEW Forums has lots of background info on the Google sandbox concept and see also New Google Patent May Give Sandbox & Inner Workings Info from the blog recently.

Posted by Danny Sullivan at 9:06 AM | Permalink

July 12, 2005

Revisiting PageRank Lunacy

Over at ClickZ, Mike Grehan's What Price PageRank? column looks at how people continue to be obsessed by PageRank and how the idea that PR scores mean nothing may freak out some in the PageRank economy that revolves around the Google Toolbar's PR meter.

Just how obsessed was underscored at the end of May, when despite it being a holiday weekend in the US and UK, forums were hit with many posts about the PR meter going down temporarily.

Yeah, people obsess over that stupid meter. I try to wean newbies away from this when I do my Intro To Search Engine Marketing talk at our SES shows.

I have a section on link building where I discuss the Google Toolbar and PageRank. I start out by saying that the PageRank score is basically unimportant and then run through a number of reasons why...

  • Are all the links on the page getting the same share?
  • Are some links discounted?
  • What's the context of the link, the words in the anchor text?
  • Did you check for nofollow tags on the links?
  • Are the links in the navigational elements or elsewhere?
  • Are they run of site links?

I speed up faster and faster listing various things to consider and end with a reiteration that because you don't know the exact answers to many of these questions, judging a page purely on its PageRank score is a waste of time.

In fact, I was stunned when I first started hearing of people deciding whether they wanted to get a link based on Google PageRank values back around 2002, as opposed to just getting good links period. I covered this obsession in my Google Sued Over PageRank Decrease article from back then.

That article included The Golden Rules Of Link Building, which I've since broken out into their own page. I think they are still useful for the new person pondering what to do when it comes to links, buying links and freaking out over the PR meter.

See also Google PageRank, Meet Yahoo Web Rank! that looks more at why PageRank doesn't win out over all. And if all the talk about wanting "authority" links and worries over "bad neighborhoods" freak you out, forget those as well.

What's a link you want? The search engines will tell you. As I've written virtually unchanged from around 1997:

By building links, you can help improve how well your pages do in link analysis systems. The key is understanding that link analysis is not about "popularity." In other words, it's not an issue of getting lots of links from anywhere. Instead, you want links from good web pages that are related to the topics you want to be found for.

Here's the simple means to find those good links. Go to the major search engines. Search for your target keywords. Look at the pages that appear in the top results. Now visit those pages and ask the site owners if they will link to you. Not everyone will, especially sites that are extremely competitive with you. However, there will be non-competitive sites that will link to you -- especially if you offer to link back.

Why is this system good? By searching for your target keywords, you'll find the pages that the search engines themselves are telling you are good, as evidenced by the fact that they rank well. Hence, links from these pages are more important -- and important for the terms you are interested in -- than links from other pages. In addition, if these pages are top ranked, then they are likely to be receiving many visitors. Thus, if you can gain links from them, you might receive some visitors who initially go to those pages.

Need more take-me-by-the-hand advice? Link Analysis And Link Building for Search Engine Watch members provides that.

Posted by Danny Sullivan at 10:51 AM | Permalink

June 28, 2005

Reinclusion Tips If Banned From Google Or Yahoo

Yahoo ban info at our Search Engine Watch Forums covers some tips on getting back into Yahoo if you've been removed for some reason, while Why is this site banned from google? touches on getting back in Google's good graces -- or at least how to ask. In summary:

  • Yahoo: Send to ystfeedback @ yahoo.com (remove those spaces, obviously). Alternatively, you can try the online support form.  
  • Google: The old email request system no longer works -- I just double-checked that. Instead, use the Google online support form specifically for webmasters.

Before requesting any reinclusion, it's always best to ensure that you've sorted out any problems you already know that your site has. What types of problems might those be? Read the manuals.

Google Information for Webmasters is your guide from Google while Yahoo Search Help leads to plenty of similar information for webmasters.

Posted by Danny Sullivan at 6:37 AM | Permalink

June 17, 2005

PageRank Decoder Offers Flash-Based Guestimates On Linking Impact

Want to understand how PageRank will build between pages you link? Only Google actually knows how that works. There's been so much tinkering and tampering with what they do since the original PageRank formula was published years ago that using that equation to understand what happens today is like teaching sciences with a textbook that's hundreds of years old.

Nevertheless, that's all PageRank Decoder has to work with -- the old formula. Spotted via Search Engine Roundtable, this Flash-based application lets you link between actual pages to guestimate (strong on the guess) how things might change. Further comments on the tool from Search Engine Roundtable and

For a nice, healthy and recent debate on how much we can really know about PageRank calculations, I recommend reading our Revisiting whether PR is lost when adding pages to a site thread on the Search Engine Watch Forums. For a reminder that it's anchor text rather than PageRank to worry about, see the How Important Is Page Rank? thread. Other factors come into play, as well, as What Factors Other Than PR Determine Google Rank? covers.

Posted by Danny Sullivan at 8:57 AM | Permalink

June 3, 2005

Google Adds New Content to "Google Information for Webmasters" FAQ; Explains Supplemental Index

Google has posted some new content to their "Google Information for Webmasters" FAQ.

A section titled, "Advanced Questions" includes answers to the following questions:

  • How often will Google crawl my site?
  • How can I migrate my site to a new IP address?
  • I'd like my site to return for pages from a specific country.

Also included is the official Google line about results labeled "supplemental."

From the FAQ Supplemental sites are part of Google's auxiliary index. We're able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.

The index in which a site is included is completely automated; there's no way for you to select or change the index in which your site appears. Please be assured that the index in which a site is included does not affect its PageRank.

Posted by Gary Price at 10:11 AM | Permalink

June 2, 2005

New "Google Sitemaps" Web Page Feed Program

Today, Google has unveiled a new Google Sitemaps program allowing webmasters and site owners to feed it pages they'd like to have included in Google's web index. Participation is free. Inclusion isn't guaranteed, but Google's hoping the new system will help it better gather pages than traditional crawling alone allows. Feeds also let site owners indicate how often pages change or should be revisited. Below, a Q&A on the new program with Shiva Shivakumar, engineering director and the technical lead for Google Sitemaps.

Can you give us a summary of how the new feed program will work?

Webmasters create XML files containing the URLs they want crawled, along with optional hints about the URLs such as things like when the page last changed, and the rate of change. They host the Sitemap on their server and tell us where it is. We provide an open-source tool called Sitemap Generator to assist in this process. Eventually, we are hoping webservers will natively support the protocol so there are no extra steps for webmasters. When a Sitemap changes, we support auto-notifying us so we can pick up the newest version.

Why are you doing this?

We want to index all publicly available information so we can offer better search results. However, currently web crawling is limited. Crawlers don't know all the pages at a website (e.g., dynamic pages), when those pages change, how often to recrawl pages, how much load to put on a website. So they try to guess. We want to work collaboratively with webmasters to get a big picture of all the URLs we should be crawling, and how often they should be recrawled. Ultimately this benefits our users by increasing the coverage and freshness of our index.

What are the technical details? Just a list of URLs? An XML feed?

We defined a simple XML format that includes the URLs plus optional last modification date, change frequency, and relative priority. We do support a simple list of URLs as well, but using the XML format will help us crawl the sites better.

Do you need for me to prove in some way that I'm associated with the site I'm submitting for?

We accept all the URLs under the directory where you post the Sitemap. For example, if you have posted a Sitemap at www.example.com/abc/sitemap.xml, we assume that you have permission to submit information about URLs that begin with www.example.com/abc/.

Will all my URLs get in? Some? Any guarantee? And how quickly?

At this early stage, we cannot guarantee that we'll crawl or index all your URLs. But as we understand the data better, we hope to get more of the data into our crawl and indices.

How does someone sign-up?

Go to Google Sitemaps and use your Google Account or create a new one to sign in. If you already use Gmail, Groups, My Search History, Alerts, or Froogle Shopping List, you already have a Google Account.

And this is all for free?

Absolutely. Also, this is an open protocol. We are hoping all webservers and search engines adopt this protocol and benefit from the increased collaboration

Any chance you may provide a reporting tool down the line, so people can tell what searches are sending them clicks?

We are starting with some basic reporting, showing the last time you've submitted a Sitemap and when we last fetched it. We hope to enhance reporting over time, as we understand what the webmasters will benefit from. If you have ideas on more of what you would like to see, let us know at the new Google-Sitemaps area at Google Groups.

How will you prevent people from using this to spam the index in bulk?

We are always developing new techniques to manage index spam. All those techniques will continue to apply with the Google Sitemaps.

If I don't use the program, you may still find pages through the regular way of crawling, correct?

Yes. This program is a complement to, not a replacement of, the regular crawl. However, we hope that the hints you offer in the Sitemap will help us do a better job than the regular crawl.

Still have more questions or comments? The Sitemaps FAQ goes into depth on many more details. The Google Sitemaps team will be taking questions and responding all day at our Search Engine Watch Forums thread, Google Sitemaps Now Accepting Web Page Feeds. Long-term, the team will also be monitoring the new Google-Sitemaps area at Google Groups.

Posted by Danny Sullivan at 7:52 PM | Permalink

June 1, 2005

GoogleGuy Shares Advice About May 2005 "Bourbon" Update

Via Searchblog and Threadwatch some advice from GoogleGuy about the Google update named "Bourbon" that's currently underway.

GG writes: Here's the advice that I'd give now: take a break from checking ranks for several more days. Bourbon includes something like 3.5 improvements in search quality, and I believe that only a couple are out so far. The 0.5 will go out in a day or so, and the last major change should roll out over the next week or so. Then there will still be some minor changes after that as well. So my "weather report" along the lines of http://www.ysearchblog.com/archives/000095.html would be a recommendation that rankings may still change somewhat over the next several days.

Posted by Gary Price at 6:29 PM | Permalink

April 18, 2005

Overture Becomes Yahoo Search Marketing & Comparing Listing Products At Yahoo To Google

The rebranding promised in March has happened. Overture has officially become Yahoo Search Marketing, marked by the launch of a new Yahoo Search Marketing site that lists all of Yahoo's search-related listing products.

It's a good change that ought to help new advertisers. Rather than having to explain that they need to buy "Overture" to be on Yahoo, Yahoo can now direct them to a site that retains its branding.

But with rebranding can come confusion, so I thought it would be helpful to look at all the products listed at the new site and also compare them to Google products. In particular, an email I got from a reader prompted the idea:

I am trying to find the "comparable" Yahoo program to Google AdWords. Since their rebranding of Overture last week, I'm still looking unsuccessfully for something like Precision Match, but it looks as if the program has been axed?

We've been using Google AdWords since it launched and are very happy with the format and back office (most of all the results). Is Yahoo offering a similar program? Honestly, I've read about their "Sponsored Search" and it's simply not obvious.

Meanwhile at our Search Engine Watch forums, a thread on the rebranding shows similar confusion:

I thought Overture was being renamed to Yahoo Search Marketing, but this page boasts a range of products, including Shopping, Travel, Directory, PPI & Overture (sponsored search).

The chart below gives you a side-by-side look at all the products listed on the new Yahoo site, along with some other listings areas that I thought made sense to add. If you're a Search Engine Watch member, see this extended post that provides commentary and additional advice and information about each listing area.

Listing Type Yahoo Google Web Search Listings Yahoo Submit Your Site Add Your URL To Google Web Search Paid Inclusion Search Submit Express & Search Submit Pro n/a (but advertisers can get listing support) Search Ads (Paid Placement) Sponsored Search AdWords (search targeted) Contextual Ads Content Match AdWords (content-targeted; AdSense is name for PUBLISHER program) Shopping Listings Product Submit Froogle Feed (free) Travel Listings Travel Submit n/a Directory Listings Directory Submit ODP Submit Local Search Ads Local Sponsored Search AdWords Regional & Local Targeting Local Search Listings Local Enhanced Listings & Local Listings (free) Google Local Business Center News Listings Yahoo News Submissions Google News Source Suggestion

Want to discuss the change from Overture to Yahoo? Visit our forum thread, Yahoo! Search Marketing is Released. Also check out Yahoo To Buy Overture for background on Yahoo buying Overture back in 2003, GoTo Makes Overture To New Name for the last rebranding Overture went through, that of losing it original name of GoTo back in 2001 and GoTo Sells Positions, about GoTo's launch in 1998.

Posted by Danny Sullivan at 9:48 AM | Permalink

February 11, 2005

Talking About Google's Feb. 2005 Update

For Search Engine Watch members, I've posted a Google's Feb. 2005 Update article that summarizes some things about the Google update that's recently happened. No major revelations, but pointers to various forum threads and a few thoughts.

Yes, there was an update -- Google has confirmed what everyone has already observed. In something new, the company has also created a special email address allowing you to send feedback of any type specifically about the change. To do so, email feb05feedback@googlegroups.com, but keep in mind:

  • This is not a way to get indexing support. If you need that type of support, visit Google's webmaster info section for answers to many questions.  
  • Google's engineers will see the message and review them and make any changes they think may help the index overall.  
  • If you love something, ensure you tell Google what the search query was you did and the page or pages you think shined.  
  • If you hate something, again -- tell Google the query you did and the page or pages that were disappointing.

The feedback will all be gathered to the review that Google already does of changes.

"We do in-depth testing of the changes we make to ensure that we're improving our relevancy and results," said Google software engineer Matt Cutts.

Want to discuss or learn more? The What's Going On With Google: Feb. 2005 Update summarizes key threads on the topic in our forums.

Posted by Danny Sullivan at 1:52 PM | Permalink

February 9, 2005

Google The Registrar

We first blogged about Google becoming an authorized domain registrar about a week ago. Since then, lots of speculation but nothing about Google's exact plans with its new status. Bob Tedeschi offers a review of what we do and don't know about "Google the Registrar" along with comments from people in the domain name business in the article: A New Direction at Google.

Posted by Gary Price at 10:57 AM | Permalink

February 4, 2005

Feeling Like Google Dance Time

The days of Google Dances, the monthly changes that used to shake up Google's index, have long gone. But that doesn't mean that the company doesn't keep tweaking and changing things that can have an impact on search results, sometimes in a big way. And one of those big ways seems to be happening, based on chatter on our forums. Here are a few threads you might wish to check out:

Also see:

Posted by Danny Sullivan at 8:47 AM | Permalink

January 4, 2005

Press Releases Google's Trusted Feed?

Over in our forums, the Positioning by Press Release thread is worth checking out. Though brief at the moment, it touches on how Google News has almost become like a trusted feed service into Google, for those who know how to play the press release game.

Posted by Danny Sullivan at 9:18 AM | Permalink

December 2, 2004

Google Toolbar PageRank Display Just For Entertainment

All too often, I've wished the Google Toolbar's PageRank meter never existed. Site owners obsess over what it says, fixating on PR scores as if they are the end all be all. They aren't, as I've written before, such as in Google PageRank, Meet Yahoo Web Rank!

Now it's semi-official. The PageRank meter means nothing, according to Google's email support team. John Galt posted out on our forums being sent this recently:

The PageRank that is displayed in the Google Toolbar is for entertainment purposes only. Due to repeated attempts by hackers to access this data, Google updates the PageRank data very infrequently because is it not secure. On average, the PR that is displayed in the Google Toolbar is several months old.

There's more in the forum thread itself: Google says: Toolbar PageRank is for entertainment purposes only. However, the quote above is enough for a fun compare-and-contrast.

PageRank is for fun and often out-of-date? But what's Google telling searchers who enable it on the toolbar? Let's check the help page:

PageRank display - Gives an indication of the PageRank for the page you're currently viewing. PageRank is the importance Google assigns to a page based on an automatic calculation that factors in the link structure of the web and many other variables.

Time to amend that to say it's the importance Google thought of a page several months ago and in fact might mean nothing at all :)

Posted by Danny Sullivan at 1:26 PM | Permalink

November 22, 2004

Tips On Getting Back Into Google

Pandia has a nice three part article on getting back into Google's good graces, if you've run into trouble: Help, my site has been banned by Google!

Posted by Danny Sullivan at 8:34 AM | Permalink

November 18, 2004

Google & Approved Cloaking

Last May, I wrote about how Google effectively approves of cloaking in the case of content from NPR. The new Google Scholar launch, while good for searchers, leaves the company open to even more hypocrisy over its published policy on cloaking.

My article on Google Scholar touches on this to a limited degree. I've also posted a new article for our Search Engine Watch members that takes a longer look at the issues involved: Google & The Approved Cloaking Problem.

In summary, Google needs to change its cloaking definition to acknowledge that approved cloaking is allowed -- and it definitely needs to move forward with providing better support to ALL web site owners, rather than just some of them.

Posted by Danny Sullivan at 7:53 AM | Permalink

October 17, 2004

Search Engine Watch Forum's 101 Threads

Last week, one of our most energetic forum moderators Nacho Hernandez started a thread called Search Engine Marketing 101. In it, he leads off with a variety of resources useful for those getting started with search engine marketing. Comments and further contributions follow.

Nacho also kicked off a theme. Orion, one of our newest moderators, followed up with Block Analysis 101. That looks at the concept of search engines breaking up a page into "blocks," to better understand which particular content or links within that content should be given greater or less weight.

Member Nick W's now dived in to look at the often controversial issue of cloaking: Cloaking 101 - Questions and Answers. Some previous good threads and debate on this topic include The Great Doorway Debate, How Do I Spot Cloaked Sites?. You might also look over an article I did last year, Ending The Debate Over Cloaking.

Returning back to Nacho, he's compiled a great list of Google Sandbox 101-style resources in Sandbox - IN or OUT? The sandbox concept relates to the idea that new pages, new links or new sites might not be allowed to do well in Google until a certain period of time has passed. The Filthy Linking Rich thread touches on this, as well.

Posted by Danny Sullivan at 11:24 AM | Permalink | Comments (0)

September 20, 2004

Redirection Problems With Google, Yahoo

A month after it was raised during a session of the Search Engine Strategies show, and even longer since it was raised on various search forums, a bug allowing people to hijack listings at Google continues. Pandia has a nice summary: Spammers hijack web site listings in Google. Discussion in our forums here: MIA in Google? Google bug allows 3rd party hijacking.

Meanwhile, another long-standing problem with redirections, this time on Yahoo's side, also continues. More in our forums: Yahoo 301 Redirect indexing problem

I'm planning a longer look at both issues, for the near future.

Posted by Danny Sullivan at 9:05 AM | Permalink | Comments (0)

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Senior Digital Planner
U.S. International Media Los Angeles, United States

Senior Search Analyst
U.S. International Media Los Angeles, United States New York, United States

Webmaster - Marketing
West Virginia School of Osteopathic Medicine Lewisburg, United States

Web Marketing Manager
Harvard Business Publishing Watertown, United States


0