Industry2007: A Search Odyssey? Keynote Conversation with Matt Cutts

2007: A Search Odyssey? Keynote Conversation with Matt Cutts

At SES London, Google's most famous engineer shared his thoughts on search spam, personalization, and the future of Google.

“In this ecosystem,” wrote one analyst recently, “Matt Cutts is a brand new oxymoron: the celebrity engineer.” In addition to being an oxymoron, though, the same writer reassures us that Matt “is not the dark Sith overlord of the Googleplex,” but rather “just a really nice guy,” a guy you’d like to have a beer with. However, if you’re spamming Google, you’d be smarter to stick with his trademark Sprite if you’re going drink for drink with Matt. You don’t want him getting more secrets out of you than vice-versa.

Looking for more biographical notes on Matt, you’ll find that he “works in the quality group at Google.” Evidently, Google isn’t too big on titles.

Day Two of Search Engine Strategies London kicked off with the staple keynote conversation, this time led by conference chair Chris Sherman, who interviewed Matt “Jagger” Cutts of Google. Only the most grizzled of grizzled search marketing veterans dared miss this session, and those tiny few skipped out on the misguided grounds that there’d be nothing new to learn.

In fact, Cutts had plenty of new information to share with the large audience, while also touching on some of the basics. As for audience size, certainly the largest SES ever for London – you can always gauge this when the whole group comes together for the keynote. The large hall was nearly full, but with some empty chairs left and plenty of unused standing room, it could have accommodated another couple hundred. Anything the organizers tell you about record attendance: consider it roughly accurate.

Sherman’s questions, though sometimes controversial, were delivered in a manner analogous to wait service in a fine restaurant. So smooth you barely noticed.

Cutts led off by highlighting two of the key types of search engine index spam they’re fighting these days at Google. By and large, it seems that search quality and spam issues run on a continuum of relevancy and troublesome issues for users – from “most useful and relevant” to “least useful and relevant.” Though we’ve noticed in some high-level search engine research and patents, there is a school of thought that will be willing to label pages and sites as spam, period, most issues are not as clear-cut. This is reflected in Cutts’ focus on “off-topic spam” and “duplicate spam” as warm-button issues for Google. In many cases these aren’t spam pages on purpose, but still pose a problem for relevance.

Indexing Explained

Cutts then turned to a quasi-technical discussion of matters such as indexing frequency. He reminisced about the bad old days of computing platforms and processes that would, only seven short years later, make a search engine CEO’s hair curl. Prior to 2000, Google’s index had no ability to “revert” to an earlier “state” if an update was proving troublesome, for example.

In 2000, the service went to a monthly update, which included “new binaries, new executables, as well as new algorithms.” Easy for you to say, Matt.

By 2003, the Google index was moving towards incremental updates that would mean fresher results on many queries. When the iPhone was released, the old index wouldn’t have reflected that for up to a month. Obviously, things are much better today. (I also remember back to the days when you had to get out of your chair to change the channel on the television.)

Today, Google’s more robust systems allow for different update levels depending on the type of information, using what Cutts called a “data push.” A data push might happen daily, every 2-3 weeks, or every 3-4 months, depending on the type of content. For example, for popular Web sites, many factors change quickly, and users want freshness.

So if you searched for “Barry Schwartz’s site,” for example, Google might offer several popular internal site links as part of the search results, as has become common in some instances. Users likely prefer it if this information is very fresh, whereas other types of data aren’t feasible to refresh as often.

Google still does major algorithmic updates from time to time, on no set schedule. If a major signal of relevance or quality was altered in the algorithm, this would affect rankings on a large scale, and might take quite a while to incorporate into the whole index. These are not “freshness-type” changes that can be made every other day.

Debunking SEO Myths

Sherman highlighted Cutts’ role in debunking popular SEO myths through his blog (Matt Cutts: Gadgets, Google, and SEO – as if he needs the link). For example, does visiting an unindexed Web page with the Google toolbar installed in your browser lead to it being indexed? (Note: this isn’t exactly the way that experts in search talk – they might say: “debunking the toolbar thing, you know”… as if you know.)

Cutts seemed to encourage Web site owners to create their own little experiments to test such theories. At the very least, a theory like “toolbar visit leads to indexing” can be fairly easily debunked by performing experiments to see if it’s true. Here, it seems to me that Cutts is being a bit disingenuous. Depending on the theory being tested (let’s say it’s one that’s a bit more complex, such as clickstream data being included in the ranking algorithm, so webmasters begin clicking heavily on Page Two listings for long tail queries to test the effect), there’s nothing to stop Google from seeing mentions online of waves of webmasters conducting a certain type of outlandish experiment and seeming to get one answer; then shutting down the loophole that led to the results; then debunking the “myth” on the blog; and then quietly reintroducing said signal into the algorithm a month or a year later. That being said, some myths are in fact easily debunked, and it helps a lot when Cutts is the one doing the debunking.

In 2000, a poster on Webmaster World claimed he’d “just gotten off the phone with Google” and he’d been informed that “advertising on AdWords actually will help your organic rankings.” As someone who has the opportunity to get on and off the phone with Google (for real), I don’t feel susceptible to that type of misinformation. But what about the vast majority of people who are willing to pick up on rumors like this, because someone’s been cagey about the way they frame the rumor? In this regard, the debunking and clarifying role of Cutts (and increasingly other Google bloggers) has been invaluable to those seeking straight answers.

Sherman asked Cutts to offer his favorite “ridiculous spammer stories.” (Kind of like America’s Dumbest Criminals.) Cutts referred to the poster who bragged that he used “undetectable methods” to dupe Google – “something like super-duper cloaking.” In reality, Cutts and the Google team found it easy to detect the methodology, which included some pages with URLs ending in “/doorway_google.”

When it comes to the cold call e-mails from companies offering tricks to boost ranking in Google – such as the cheesier link farm companies – Cutts makes no bones about the fact that he and others at Google will promptly get in touch with them, posing as a customer. Cutts simply follows all the links and talks to them about what they offer until he gets the information he needs.

“They even e-mail Google with automated messages that say ‘we can increase the visibility of Google.com.’ Here I thought we were a pretty well known site,” smiles Cutts.

Not leaving the world of spam, but turning to academic study of it, Sherman asked Cutts about a group that he belongs to that studies “adversarial information retrieval” (essentially, how to run a search engine in an open environment characterized by the presence of pecuniary incentives for Web site publishers to cheat and spam). “It’s essentially a program committee that I belong to,” says Cutts. “I review several academic papers a year. There’s been a real progression of academic interest in spam. Papers have gotten more and more quantitative.”

Sherman asked about increased “coopetition” among search engines, alluding to their recent moves to adopt standards on certain elements of the indexing and webmaster relations process. “Are you getting more buddy-buddy?”

Buddy-buddy they are not. “It’s definitely the case that we compete very hard,” admits Cutts, but “regular webmasters shouldn’t have to deal with hassles. It took ten years from robots.txt to nofollow, and it wasn’t very long after that that we all agreed on the common Sitemaps protocol, Sitemaps.org.”

Cutts went on to explain that the common Sitemap format would ensure that sites could convey information about themselves to the search engines using the same XML schema, so they wouldn’t have to submit different information to different search engines. In essence, with Google as the catalyst, the engines together have now collectively embraced a contemporary version of the old third party “site submitters.” After all, you do want your site to appear on all the engines, don’t you?

Following Nofollow

As for the rel=”nofollow” attribute, if you haven’t been keeping up, it’s for those who want to avoid passing any “link juice” (a link that might lead to a boost in the other site’s search rank) to sites that they link to. This might be for negative mentions of a company whereby you as a publisher want your audience to know about an evil site, but you don’t want that site to receive link juice from you in terms of improved organic search rankings.

The main reason for the development of nofollow was blog comment spam. Automated bots often post gibberish messages with self-promotional links to them. Sometimes, human commenters do the same. By setting all comments in blogs to a nofollow regime, your blog won’t facilitate some spammer’s search strategy. It’s also worth mentioning that nofollow is not widely adopted today, so as a spam strategy, link comment spam might not yet be completely obsolete.

(Further to nofollow: don’t bother looking it up on Wikipedia if you’re trying to understand the issue. The page on this has apparently become increasingly contentious and had to be shut down, because of the controversy around Wikipedia itself adding nofollow to links from Wikipedia. An informal vote of web developers and wiki enthusiasts showed 61% were in favor of removing nofollow from Wikipedia links.)

Personalization a Threat to SEO?

Sherman then asked about the trend towards search personalization. “I’ve noticed – and I actually enjoy it very much – that you now serve personalized results to anyone with a Google account who happens to be logged in. It’s a great thing. And also a threat to search engine optimization, because there will be no common rankings that apply to every query.”

Cutts argued that “the best SEOs don’t reverse-engineer particular algorithmic changes; they see what’s happening down the road, and try to get ready. Linkbaiting is essentially white hat SEO. The nice thing about personalization is you don’t see one monolithic set of results. In 2002, if you ranked on Page 1 for the search phrase [data recovery”, you were happy. Now everyone can rank in the top ten for some niche, so there is no weird step function; it’s not winner-take-all anymore.”

“Realistically,” Cutts continued, “building your strategy around showing up #1 for your trophy phrase is not a good approach. If you’re going after that, it’s fantastic if you get it, but diversification is even better,” he added, evoking the concept of the Long Tail of search query frequency.

Injecting a local flavor, Cutts argued that “you should get different results for searching ‘football’ in the UK than you do in the US. We’re already doing different rankings for many different countries and languages around the world, of course.”

Sherman noted that Cutts’ mention of trying to shine in a niche made him think of the increasing importance of local search. Cutts responded, “Danny Sullivan has long been saying to business owners: ‘Don’t just go after Web search.’ Local searches will often give you a Onebox result leading to a map, an address, and phone number. It’s great for you if you show up there. Sure, you can make all this effort to go after Web search and post your articles in Digg, Furl, and so on, but that doesn’t look nearly as good on a mobile phone.”

Conspiracy Theories

Sherman then turned to China. “Google allows search results to be censored in China, but all you have to do is click to Google.com and you get unfiltered search. It’s a delicate dance. How do you balance this?”

Cutts broadened the issue to highlight the level of disclosure Google feels is appropriate globally. “In 2002, we developed a new process for Digital Millennium Copyright Act complaints. At first the violator would be notified of a pending complaint and content removal. We then added counter-notification and disclaimers on the Web site. So when copyright material is removed, we provide context so searchers know what’s happening.” Cutts is referring to the disclosure that talks about an item being censored or removed, with a link to a page providing more information.

Turning to Google China, he noted that they have a “strong team of Googlers in China who are native Chinese, with many being educated in the U.S.” As for the biggest challenge there, Cutts points to “organized crime spamming the index.” (We should be so lucky if Google could turn its attention to every line of business operated by organized crime.) On the whole, Cutts emphasized a “big focus on Chinese search quality this year.”

Sherman’s black-helicopter question was next. Consumer advocate Daniel Brandt has held Cutts’ past NSA top secret security clearance up as some sort of indication of nefarious conspiracies between Google and the US federal government. “True?,” Sherman asked, concisely.

“There is no government link that I have anything to do with,” Cutts replied. In one term in university, Cutts thought that joining a federal security agency might be interesting, so he decided to sign up for a program. Basically, Cutts seemed to say he tried it but didn’t inhale. More recent evidence points to Google’s defense of user privacy against government intrusion when other companies chose to go quietly. “When the DOJ went after us with a subpoena against 34 companies, Google was the only one they fined. We fought. The judge agreed. Users need to come first.”

In what might or might not have been a rehearsed (and humorous, more than a little Nixonian) statement, Cutts asserted: “I am not a spook.”

(For his part, Brandt’s involvement in theorizing about (and sometimes participating in) conspiracy theories long predates Google; in one instance, he picks up on small mentions in Bill Clinton’s inaugural address from 1992, enlarging them into a general theory of historical puppetmaster organizations, cooptation of Vietnam anti-war protest, and beyond. Brandt is banned from editing on Wikipedia following exchanges with members and founder Jimmy Wales over biographical entries he considers to be “invasions of privacy.” He has added a site called Wikipedia-Watch to his stable of protest activities, the most famous of which is his Google-Watch.org site, which currently sports a ‘humorous cartoon’ showing rifle-toting Chinese communists hailing ‘Comrade Schmidt’ of Google.)

Cutts’ Favorite Tools

On his favorite online services, Cutts cited a number from Google as well as some for competitors. The one many attendees took note of was Google Browser Sync, which synchronizes your Firefox settings on different computers, no matter where you are (including passwords). On this point among others, Cutts made assurances of privacy protection. Other Cutts favorites include Google Reader, Gmail, Google Calendar, Yahoo Site Explorer, MSN’s advanced search, Bloglines, and Ask Smart Answers.

Sherman asked: “If you could boil down to one thing, what is the single best way to improve a site?”

Cutts enthused about Google’s new Webmaster Console, the latest iteration of the webmaster relations tool that began life as Sitemaps. “It will alert you to 404 errors, page load slowness, some spam penalties, and integrates application for reinclusion, so it’s now a wonderful one-stop shop.” Gesturing towards the hallway he saluted Vanessa Fox and her team for building and improving the tools; the crowd applauded appreciatively.

A key feature of the new console is a full backlinks list, something Google plans to develop and improve steadily.

Cutts also noted that Google Webmaster Blog will enable comments. Presumably, this is to take some of the onus of Matt’s own blog as a repository for Google feedback.

Sherman identified a sea change in Google’s overall approach to communications. “Google used to be seen as monolithic; secretive. Now you’re more open.”

“This trend will continue,” replied Cutts. “We want to be as transparent as possible. After removing spammers, and so on, we want to share more information about how to get better indexed. We just needed more resources and time to be able to communicate better with the outside world. Now that we’ve built that capability it’s more feasible than it once was.”

The Future of Google

“Craig Silverstein [director of technology at Google” joked that he wanted Google to be like HAL from 2001: A Space Odyssey, but not killing people,” continued Sherman. “What’s the future of search going to look like?”

Cutts stressed that his was not the official company line, but he felt that personalization and localization would be key, but also that the effort to build Google Search and related services leads to a new ability: massive data storage and a kind of personalized memory.

“You could almost run a startup for free on Google services. It’s never been cheaper to start a business,” enthused Cutts. “And today you can search for all kinds of different data. Five-plus years ago, Google was only Web search. Now our mission statement is ‘to organize the world’s information and make it universally accessible’. I never realized before how much search matters to e-mail, for example. In the past, you had to save important stuff, put it in a folder, or save a file, or something. Now search allows you to find it later. It’s like a safety net.”

Cutts alluded to a range of other search types: image search, video, code search, and Google Desktop – which, like e-mail, Cutts believes is a safety net because it also remembers all the pages you’ve visited online and can bring up cached versions. Of course, one man’s safety net could be another man’s Exhibit A. “As you fast forward, there will be new types of data you can search. The list goes on. Patents, books, etc.”

“So, Ubiquitous Google,” ventured Sherman.

“I always want there to be competition,” concluded Cutts. Yep, even if only so there’s another huge brain that can shut off HAL when he starts running amok.

Andrew Goodman is the founder and principal of Page Zero Media and author of Winning Results with Google AdWords.

Search Headlines

We report the top search marketing news daily at the Search Engine Watch Blog. You’ll find more news from around the Web below.

Resources

The 2023 B2B Superpowers Index

whitepaper | Analytics The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing

whitepaper | Analytics Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook

whitepaper | Digital Marketing The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

whitepaper | Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y