SES Chicago - December 7-11, 2009

August 12, 2008

SEW Experts: Proper SEO and the Robots.txt File

There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes. In today's organic SEO column, "Proper SEO and the Robots.txt File," Mark Jackson shows that by taking a good look at your Web site's robots.txt file and making sure that the syntax is set up correctly, you can avoid some search engine ranking problems.

» Full story

Posted by Kevin Newcomb at 12:00 AM | Permalink | Comments (0)

February 29, 2008

Google Yahoo MSN Live Sitemaps: Cross-Hosting Grokked by SEOs for SEOs

With sitemaps cross-hosting (or cross-submission), Google, Yahoo, and Microsoft cracked open the door for corporations to outsource search engine optimization.

How big a deal is this?

Not enough to make Robert Scoble cry. Or join the circus.

When SEL broke the news at SMX (described in excellent summary by Vanessa Fox of vanessfoxnude fame), I was hoping for a revolutionary change. Then I read the blog posts at the Google Webmaster Central, Yahoo Search and Live Search Webmaster Center blogs so you don't have to. (I'm just kidding all you search engine PR gals … and guy.)

Robots.txt ruined my night. I felt like I was decepticonned - hoping for the breakthrough that would make outsourcing SEO much easier for major corporations. Or an announcement that might provide guidance for SEOs to improve rankings for their clients.

SEW Experts SEM Crossfire columnist Chris Boggs ended the robots nightmare: "I think it's a big step forward in making it easier for companies to outsource, but the caveat is having full access to the robots.txt. Some industries such as banking and pharma may still have issues."

Still, we don't want to beat up on the search engines (unnecessarily). In the past, search engines required companies with multiple Web sites to have "one set of servers to rule them all."

In short, search engines required sitemaps to be on the same host and path as the URLs they contained. That meant the same server needed to host both sitemaps and site content.

Google, Yahoo and Live Search put aside their fierce competition for a moment to make life a little easier for Webmasters and SEOs by standardizing sitemaps in November 2006, when the Big Three formed Sitemaps.org.

SEW Experts By The Numbers columnist, Eric Enge, CEO of Stone Temple Consulting, noted, "The announcement affects Web site owners who don't have the freedom to place a sitemaps file in the root directory of the domain. Historically, site owners without the ability to place a file in the root folder for their domain haven't been able to make use of sitemaps."

A cross-hosting sitemaps scenario or two? "There are many scenarios. Shared hosting environments and people in large corporations who may be running subdomains of a much larger site," said Enge. "This now allows them to place the sitemaps file in a different location, even on another server or domain. The sitemaps file then needs to be pointed to by the robots.txt file for the original domain. The site owner will still need the ability to make that change."

Search Engine Watch, for example, has several domains and subdomains. Our main domain, searchenginewatch.com, features a few subdomains: blog.searchenginewatch.com, forums.searchenginewatch.com and jobs.searchenginewatch.com, for example.

Now we can host all our sitemaps in one location or subdomain: such as "notreally-oursitemaps.searchenginewatch.com."

So what does cross-hosting mean for the global SEO community? "Ultimately this opens up the site maps protocol to a large number of site owners who couldn't make use of it before," said Enge. "The SEO impact really relates to that fact. SEOs may not have been able to use sitemaps on a site previously, due to the limitations of the prior implementation. Now those SEOs have the capability available to them."

Cool.

"The impact of offsite hosting for sitemaps? It will make it easier for sitemap management by allowing site owners to manage multiple sitemaps in one location," explained Lee Odden of TopRank. "It will also make it easier for those with sites that use subdomains."

So bottom line: will SEOs be able to leverage cross-hosting to improve rankings for targeted keywords?

"As for impact on rankings, it's no different than the effect of making sitemap data available previously," said Odden. "Providing a list of URLs to search engines serves as a supplemental source of information to what their spiders would find in the wild."

Here's how it works:

"Search engines make no guarantee that providing URLs in a sitemap will increase the number of pages indexed - but they might," said Odden. "So in that regard, making it easier for sites that previously did not provide sitemaps, especially subdomains, may help them get more pages indexed, but I see no effect on actual rankings."

For the Google Guy's take on sitemaps, nofollow and other great tips, read the highest ranked Matt Cutts interview ever done (by Eric Enge).

Posted by Kevin Heisler at 1:01 AM | Permalink

May 30, 2007

SEW Experts: A Look at Latency In Search Engine Ranking

In today's By the Numbers column, A Look at Latency In Search Engine Ranking, Eric Enge presents a case study of a niche content site that reveals differences in latency involved in each search engine's index, and in how each search engine responds to removed pages.

Posted by Kevin Newcomb at 8:04 AM | Permalink

November 30, 2006

Google Ordered By Another North Carolina Court To Remove Pages

Apparently, North Carolina is going to start a trend of people who get court orders to remove material Google has spidered when left out in public view. This week, Google was ordered to remove material by a court in that state. It follows a similar court order in a different case earlier this year.

North Carolina County Gets Restraining Order Against Google from the Associated Press covers how social security numbers, cell phone numbers and other personal information was left online by Johnston County, which means Google (and likely other search engines) spidered the material.

When the country realized this, they sought to have it removed. However, they were told it might take up to five days to remove, prompting the county to go the legal route:

Fearing the possibility of identity theft, Johnston County officials asked Google on Monday to remove the information. It was first posted on the county's Web site by accident six weeks ago and discovered Friday. Mountain View, Calif.-based Google responded that removal could take up to about five days, said county attorney Mark Payne.

"It surprised me that Google didn't immediately recognize that this was something that posed a real danger of real damage to our citizens," Payne said.

Hey, it surprised me that Johnston County didn't immediately recognize that the information shouldn't have been put on the public web in the first place. However, that appears to have happened because of a third party contractor.

What about the automatic URL removal system? I seem to recall that as getting pages out in 48 hours or less (but I might be remembering incorrectly). Checking today, officially it is longer (unofficially, I hear it goes faster):

You may process your URL for removal from Google's search results. URLs will be removed after we've verified your request. Bear in mind that verification can take several days or longer and all pages submitted via the automatic URL removal system will be removed from the Google index temporarily for six months.

Google Blamed For Indexing Student Test Scores & Social Security Numbers and Follow-Up: School Couldn't Reach Google Until Injunction Filed cover how a school authority in North Carolina went to the courts to remove pages from Google in June.

Posted by Danny Sullivan at 1:06 PM | Permalink

Microsoft On How To Let MSNBot In, Keep Bad Bots Out

The Live Search Blog described how you can verify if the MSNBot you see crawling your site, is truly the MSNBot from Microsoft or some rogue spider trying to steal your content. Microsoft has added a way to look up the reverse DNS information for the IP of the bot and described what you should see, to ensure that it is the official MSNBot, if it is not, then you may want to block it or report it to Microsoft. A step by step guide is at the Live Search Blog.

What about Googlebot? We covered that here.

Posted by Barry Schwartz at 9:01 AM | Permalink

November 16, 2006

Search Engines Unite On Unified Sitemaps System

In alphabetical order, Google, Microsoft and Yahoo have agreed to all support a unified system of submitting web pages through feeds to their crawlers. Called Sitemaps, taking its name from the precursor system that Google launched last year, all three search engines will now support the method.

More about Sitemaps is to be provided through the new Sitemaps.org site. As part of the announcement, the existing sitemaps protocol from Google gets a version upgrade to Sitemaps 0.9. However, no actual changes to the system have taken place. The new version number was simply done to reflect the protocol moving from an exclusive Google system to one that all three search engines now support.

Anyone already using Google Sitemaps needn't do anything different. The only change is now those sitemaps will be read by Microsoft and Yahoo, as well. More information will either be posted at the Sitemaps.org site or see these sections from each of the search engines, which I expect to be updated soon:

Other search engines are also invited to use the system -- it has specifically been placed as open property through Creative Commons so that others can make use of it. FYI, Ask isn't part of this announcement because it wasn't invited by the other three to take part, which I find unfortunate. Then again, among all four, Ask is the only one that doesn't already accept submissions in some way.

How can others contribute to its development? That remains to be worked out. So far, there's a working committee involving the three major search engines named. They say they are open to participation from other search engines, as well as content owners, to see the system grow and develop. I expect we'll find more structure to this emerging soon. At the moment, the key work has been in getting all three to agree to support the existing standard.

How about unification around other search standards, such as improving the robots.txt system of blocking pages. Again, this is something the search engines (specifically Google and Yahoo when I spoke to them), say they're interested in. So fingers crossed, we'll see more of this down the line.

Overall, I'm thrilled. It took nearly a decade for the search engines to go from unifying around standards for blocking spidering and making page description to agreeing on the nofollow attribute for links in January 2005. A wait of nearly two years for the next unified move is a long time, but far less than 10 and progress that's very welcomed. I applaud the three search engines for all coming together and look forward to more to come.

(Postscript: Announcements are up now from Yahoo, Microsoft and Google)

Below is more from the press release. Sorry I can't do a longer post about the system, but I'm also busy attending the PubCon conference, where the announcement has happened.

Las Vegas, November 16, 2006 - In the first joint and open initiative to improve the Web crawl process for search engines, Google, Yahoo! and Microsoft today announced support for Sitemaps 0.90 (www.sitemaps.org), a free and easy way for webmasters to notify search engines about their websites and be indexed more comprehensively and efficiently, resulting in better representation in search indices. For users, Sitemaps enables higher quality, fresher search results. An initiative initially driven by Yahoo! and Google, Sitemaps builds upon the pioneering Sitemaps 0.84, released by Google in June of 2005, which is now being adopted by Yahoo! and Microsoft to offer a single protocol to enhance Web crawling efforts.

Together, the sponsoring companies will continue to collaborate on the Sitemaps protocol and publish enhancements on a jointly maintained website www.sitemaps.org, which provides all of the details about the Sitemaps protocol.

How Sitemaps Work

A Sitemap is an XML file that can be made available on a website and acts as a marker for search engines to crawl certain pages. It is an easy way for webmasters to make their sites more search engine friendly. It does this by conveniently allowing webmasters to list all of their URLs along with optional metadata, such as the last time the page changed, to improve how search engines crawl and index their websites.

Sitemaps enhance the current model of Web crawling by allowing webmasters to list all their Web pages to improve comprehensiveness, notify search engines of changes or new pages to help freshness, and identify unchanged pages to prevent unnecessary crawling and save bandwidth. Webmasters can now universally submit their content in a uniform manner. Any webmaster can submit their Sitemap to any search engine which has adopted the protocol.

The Sitemaps protocol used by Google has been widely adopted by many Web properties, including sites from the Wikimedia Foundation and the New York Times Company. Any company that manages dynamic content and a lot of web pages can benefit from Sitemaps. For example, if a company that utilizes a content management system (CMS) to deliver custom web content – (i.e., pricing, availability and promotional offers) - to thousands of URLs places a Sitemap file on its web servers, search engine crawlers will be able discover what pages are present and which have recently changed and to crawl them accordingly. By using Sitemaps, new links can reach search engine users more rapidly by informing search engine “spiders” and helping them to crawl more pages and discover new content faster. This can also drive online traffic and make search engine marketing more effective by delivering better results to users.

For companies looking to improve user experience while keeping costs low, Sitemaps also helps make more efficient use of bandwidth. Sitemaps can help search engines find a company's newest content more efficiently and avoid the need to revisit unchanged pages. Sitemaps can list what is new on a site and quickly guide crawlers to that new content.

“At industry conferences, webmasters have asked for open standards just like this,” said Danny Sullivan, editor-in-chief of Search Engine Watch. “This is a great development for the whole community and addresses a real need of webmasters in a very convenient fashion. I believe it will lead to greater collaboration in the industry for common standards, including those based around robots.txt, a file that gives Web crawlers direction when they visit a website.”

"Announcing industry supported Sitemaps is an important milestone for all of us because it will help webmasters and search engines get the most relevant information to users faster. Sitemaps address the challenges of a growing and dynamic Web by letting webmasters and search engines talk to each other, enabling a better web crawl and better results," said Narayanan Shivakumar, Distinguished Entrepreneur with Google. "Our initial efforts have provided webmasters with useful information about their sites, and the information we've received in turn has improved the quality of Google's search.”

“The launch of Sitemaps is significant because it allows for a single, easy way for websites to provide content and metadata to search engines," said Tim Mayer, senior director of product management, Yahoo Search. "Sitemaps helps webmasters surface content that is typically difficult for crawlers to discover, leading to a more comprehensive search experience for users.”

“The quality of your index is predicated by the quality of your sources and Windows Live Search is happy to be working with Google and Yahoo! on Sitemaps to not only help webmasters, but also help consumers by delivering more relevant search results so they can find what they're looking for faster,” said Ken Moss, General Manager of Windows Live Search at Microsoft.

The protocol will be available at sitemaps.org, and the companies plan to have Yahoo Small Business host the site. Any site owner can create and upload an XML Sitemap and submit the URL of the file to participating search engines.

Posted by Danny Sullivan at 12:00 AM | Permalink

November 10, 2006

Hack Reveals How To Remove Sites From MSN Live Search?

Boogybonbon.com has revealed how you can potentially de-list your competitor's site from Microsoft's search engine. In short, most sites return a 200 status header for when you go to a page like domain.com/index.html?test=test or domain.com/index.html?test=test1234, etc. You can play on that by convincing Microsoft that a particular site has hundreds or thousands of duplicate pages, and at some point, Microsoft may penalize the site with a duplicate content penalty, where they de-list your site and home page. That is the short story, if you want the long write up visit Boogybonbon.com.

Postscript: Other coverage at Threadwatch and Search Engine Watch Forums.

Posted by Barry Schwartz at 9:32 AM | Permalink

October 17, 2006

Google Webmaster Tools Gain Crawl Charts, Enhanced Crawl Rate & Image Labeler Support

Learn more about Googlebot's crawl of your site and more! at the Official Google Webmaster Central Blog covers new features Google has added, visual charts to show Googlebot's crawling activity, expanded crawl rate support, inclusion in the image search labeling program and number of URLs submitted. I talked with the Google Webmaster Central team earlier this week, and here are a few more details on some of the features.

To see Googlebot activity reports, go to Google Webmaster Tools, choose one of the sites you've verified, then pick the "Crawl rate" option on the Diagnostics tab. You'll get a chart showing how many pages Google has crawled per day over the past three months. For example, here's what it looks like for the Search Engine Watch Blog:

It's interesting to see visually how Google has backed off the number of requests over time. There's nothing I've done to do this, but it may reflect Google getting smarter about the fact that it doesn't need to revisit every page on the site so often. It could also be due to our server being less responsive (see below).

You can also see kilobytes downloaded per day, as well as the time spent downloading a page in milliseconds. The chart on that for us is really revealing:

You can see that our response time nearly doubled at the end of July. That's exactly when we left our servers at Jupitermedia, our old publisher, and switched to new ones with Incisive, our current publisher. Despite the slower time, I haven't noticed any drop in traffic from Google, so the slower responsiveness -- while not good -- hasn't been damaging. But if you did see a plunge in traffic, a chart like this might help you visually realize what might be wrong directly from Google.

At the bottom of the Crawl rate page is the ability to set how fast you want Google to crawl your site. This was introduced back in August, but now it's available to everyone using Google Webmaster Tools, not just some. In addition, Google has simplified the options from five to just three, Faster, Normal and Slower. Google said feedback suggested fewer options would be easier to understand.

Crawl rate still doesn't guarantee that Google will hit your server faster or slower than normal, even if you request it. But Google said it is much more responsive to these requests now. In fact, it is so responsive that you need to renew your choice every 90 days. That's to prevent someone authorized on your account from telling Google to slam your server, then leaving and Google continuing to do that forevermore.

Also on the Diagnostics tab, you'll find an Enhanced Image Search option. What's that about? For now, it simply means that images from your site will be available to those using the Google Image Labeler system, which we wrote about last month: Google Images Labeler: Google's Challenge To Flickr?

Not all images from Google Images are currently added to Google Image Labeler. Google said it currently uses a subset of pictures that it feels site owners would be amenable to having labeled. This new feature lets you explicitly tell Google you'd like to have your pictures play in the new program. More on this is covered in the help page about enhanced image search.

Finally, if you submit a sitemap to Google, it will now tell you the number of pages submitted in that sitemap. Why care? Apparently, at least one person did and requested the feature. As Google explains in that blog post, this person generated a sitemap automatically and so had no idea how many URLs he was spitting out in it. Now he -- and others -- can know.

Posted by Danny Sullivan at 7:04 PM | Permalink

June 14, 2006

Google Not Obeying NoIndex Meta Tag?

I reported at the Search Engine Roundtable that Google.com Displaying Pages in Index with NoIndex Meta Tags. The details come from a WebmasterWorld thread where two members I would trust claim Google is not obeying the noindex meta tag. Currently, I have no evidence, since examples are not allowed at WebmasterWorld. If you have examples of this in action, please let us know by starting a thread in our Google Web Search Forum at Search Engine Watch Forums.

Posted by Barry Schwartz at 9:59 AM | Permalink

April 26, 2006

Google Sitemaps Adds Spam Checking, New Webmaster Help Center & Other Features

I just came out of the Meet the Crawlers session, where Google announced new features and a new layout for Google Sitemaps. The Sitemaps blog just posted the details as well. One huge feature is that Google tells you if your site is in the index or not and if it is not, they won't tell you why.

Here is a break down of the new features:

+ New verification method + Indexing snapshot + Notification of violations of the webmaster guidelines + Reinclusion request form + Spam report + New webmaster help center + More about our new look + Adding a Sitemap + Navigating the tabs

Full feature list at sitemaps blog.

Postscript: Matt Cutts just pinged me to let me know he has posted an entry named Notifying webmasters of penalties. That entry explains that the Google Web Search Team and Google Sitemap Team working together to notify "some (but not all)" webmasters of Google site penalties.

Posted by Barry Schwartz at 1:59 PM | Permalink

November 28, 2005

WebmasterWorld's Brett Tabke Speaks On Rogue Spidering Woes, Plus The Need For Expanded Feeds

Brett Tabke from WebmasterWorld dropped me a note about a new thread where he's answering many questions about WebmasterWorld banning all spiders, while Barry over at Search Engine Roundtable also has an interview with him. In both places, you'll learn of spiders being an increasing burden to the site, though I still am very, very wary that others should follow the route that Brett's taken.

Attack of the Robots, Spiders, Crawlers.etc at WebmasterWorld picks up from the Lets try this for a month or three thread where Brett announced last week that WebmasterWorld was banning all spiders by excluding them via robots.txt and through other measures such as required logins.

WebmasterWorld Bans Spiders From Crawling and WebmasterWorld Out Of Google & MSN from the SEW Blog covers more about the move fallout with WebmasterWorld no longer being visible in two major search engines.

In his latest posts, Brett explains:

  • The flat file nature of WebmasterWorld makes it apparently more vulnerable to spiders.  
  • Spider fighting has been taking a considerable and increasing amount of time.  
  • A ton of efforts have been done to stop spiders but cookie-based login still seen as necessary  
  • Major search engines other than Google (Ask Jeeves, MSN and Yahoo) were all banned for more than 60 days before this latest move.

Brett Tabke Interviewed on Bot Banning from Search Engine Roundtable takes the interview approach, where it is much easier to see what Brett's thinking and reacting to than wandering through the forum posts. Beyond the points above, he addresses not wanting to make use of non-standard extensions to robots.txt that Google, MSN and some other search engines have added precisely because they aren't standard.

Overall, I can appreciate much of what Brett's going through, but there still have to be better ways for this to be addressed. His solution is simply not one that the vast majority of sites will want to try, because it will simply wipe out the valuable search traffic they gain.

To be clear, I'm NOT saying that any site should be entirely dependent on search traffic. But neither do you cut yourself off from them, either. It's a matter of balance and moderation. To quote from what I posted in our forum thread on the WebmasterWorld situation:

People would often ask how much of their traffic they get from search engines. There is no right answer, but I'd often said that if you were looking at 60, 70, 80 percent or higher, you might have a search engine dependency problem. You want to have a variety of sources sending you traffic, so no one single thing wipes you out.

But to suggest that a site is so successful that it doesn't need search traffic at all? That's foolishness. I have absolutely no doubt that WMW will survive. It's a healthy community with plenty of alternative traffic. But people seeking answers to things it has answers to give are no longer going to be finding it.

Hmm, we'll maybe those people aren't good members, just generate to noise and so on. Yeah, maybe. But that also assumes that every single quality person must be there already. That's just not so. You always have good new people coming onto the web.

Search engines are a way you build up loyal users. People often discover you for the first time through search, then they keep coming back. It's not a dependency to have a small amount of your traffic bringing in new people this way. But it is, in my view, a marketing screw-up to cut yourself off from that potential audience.

Geez, it's like the basic rule of SEO/SEM. Ensure your site is accessible to search engines. If they can't get in, you stand no chance of getting traffic at all from them. And when people are paying by the click for search traffic, why don't you want that free publicity. Why wouldn't you seek other ways of retaining it but also restricting the bad bandwidth you don't want?

Overall, WMW obviously can and will do what it wants, and perhaps there's some magical master plan that down the line will make us all say "Genius!" Maybe. But this is a very, very bad model for any site to be considering, if they're having the same spidering problems that are the stated reason for why WMW is doing this. It's like saying you're getting too many phone calls to your business, so you're going to pull out the phone entirely!

So what is a site owner to do, if they are suffering from rough spiders? I'll share a bit from our own experience, plus point at what maybe the search engines should be doing.

We've encountered rogue spiders. It was one reason why our own Search Engine Watch Forums were down briefly last month, coincidentally the same time WebmasterWorld and Threadwatch went offline for different reasons. Rogue spiders aren't just something unique to Brett's set-up. They can and do indeed cause problems even for less "flat file" sites and URL structures. In fact, want to have some fun. Check this out. That shows you all the people on our SEW Forums at the moment you click on the link, up to 200 visitors. Scan the IP Address column, and you'll see how Yahoo's Slurp spider is in many, many different threads all at once. That's a burden on our server, though since we're getting well indexed as a result, it's a burden we live with.

Our own solution has been for our developers to throttle or ban spiders at the IP level that seem to be hitting us hard, in particular spiders that aren't identifying themselves as to their purpose. Good spiders often leave behind a URL string in your logs so you know they are from Google, Yahoo or whatever. For example, Yahoo points you here. Google points you here. No good identification? Then we don't worry that banning you is going to harm us seriously in some way.

What about improving the robots.txt system? Unfortunately, that's not a solution for rogue spiders. Brett's right when he points out the real story is moving to required logins. Rogue spiders aren't paying attention to robots.txt. Put in a ban against them, and they'll ignore it. Robots.txt only works with "polite" spiders.

Because robots.txt isn't a solution, it also means that wishing that the major search engines would come together to endorse new improved "standards" for the protocol also isn't a solution. Since rogue spiders are ignoring robots.txt, it doesn't then matter for there to be some type of universal agreement to have a "crawl delay" feature or more wildcard support, for example.

Still, while improving robots.txt isn't a solution to rogue spiders, there are things it could do if improved, and I'm right with Brett in wishing that the major search engines wouldn't unilaterally make their own improvements, as I've written before (and here).

So if we can't depend on robots.txt, what is the solution? If more and more sites face heavy spidering, we'll likely have to see a shift toward feeding content to search engines.

Feeding content isn't a new idea. Yahoo's paid inclusion program is pretty well known as a way for site owners to feed not URLs into the search engine but actual page content. Yahoo also has partnerships with some sites to take in content on a non-paid basis. Google also takes in feeds of content through things like Google Scholar or Google's Froogle shopping feeds program.

To be absolutely clear, these types of program aren't situations where you feed URLs, as with Google Sitemaps or Yahoo's bulk submit. These are programs where you feed actual page content. The spider doesn't come to you and hunt and guess at what you've got. You tell the spider what you've got.

Expanding feed programs to everyone would be a much more efficient way of gathering content, with one exception. You can expect that some sites will abuse feeds to send misleading content. Heck, it's bad enough how ping servers are already abused being wide open this way, as I wrote about on Matt Mullenweg's blog last month, when the future of ping servers was raised:

Whether we have an "independent" ping service almost seems beyond the point when both Dave and Matt are talking about the ping spam problem they have experienced. I'm actually surprised any the open ping servers are surviving. If they are open to anyone to ping, a small number of people will abusively ping for marketing gains

We?ve had 10 years of history knowing this with web search. Web search engines could long ago have had instant add facilities. Indeed, Infoseek and AltaVista even did for a short period of time. They found that without barriers, a small number of people would flood them with garbage. That?s why they don?t take content in rapidly. It?s not that they aren?t smart enough to take pings or let website owners flow content in. Instead, it is that they?ve learned you can?t leave a wide door open like that without being abused.

There?s absolutely no reason for anyone to have assumed that RSS/blog/feed search services were going to be immune to the same problem. If the ping outlook is bleak, it?s not because Verisign or Yahoo has purchased some service. It?s because you simply can?t leave doors open on the web like this for search, not for any search that?s going to attract significant traffic. Blog search is gaining that traffic, and you can expect the spam problem will simply get worse and worse until some barriers are put into place. You also cannot expect that you?ll simply come up with some algorithmic way to stop ping spam. Again, 10 years of web search engines diligently trying to stop spam has simply found it?s a never ending arms race.

I don?t know what the solution is. I suspect that for the major search players, the Googles & Yahoos, they?ll eventually move to a combination of rapid crawling, trusted pings and open pings as a backup. Remember, they get news content very fast. If they have a set of trusted sites, they can spider and hammer those hard. They?ll know to keep checking Boing Boing, Scripting and maybe 1,000 other major blogs that really, really matter ? and that when you check them, you quickly discover other links from blogs you may want to fetch quickly.

So throwing feeds wide open to everyone without vetting isn't the solution. But certainly we're overdue for feeds to be available to more people without requiring payment, through some type of trusted mechanism.

WebmasterWorld is a perfect poster child for this. People want the content there, and the search engines should want the content to be found via their sites as well. Allowing the site to feed its content gets around the barriers erected to stop rogue spiders very nicely.

But WebmasterWorld isn't the only candidate in this class. Many others, including myself, want the ability to feed actual content to the search engines. Let's see them move ahead with a way to make this more a reality, to establish real "trusted feeds" that aren't based on payment or whether your site falls within an area that the business development teams think need more support. Google Base may become Google's means of doing this, but at the moment, that's not feeding into web search.

Want to comment or discuss? Visit our Search Engine Watch Forum thread, WebmasterWorld Off Of Google & Others Due To Banning Spiders.

Posted by Danny Sullivan at 4:16 PM | Permalink

August 25, 2005

Yahoo Bulk Submit Now Live

Gary wrote earlier that part of the new Yahoo Site Explorer (placeholder page for now) service to come is a new bulk submit option for Yahoo. While we still wait for Site Explorer, Barry Schwartz at Search Engine Roundtable reports that the bulk submit part is now live.

It's rudimentary compared to Google Sitemaps, in that you can't prioritize pages, ping Google that you have updates or anything like that. On the other hand, there's a big, big plus in the simplicity. Just make a text file with a list of your URLs, one URL per line. Then submit the location of that file via this page. Have fun!

By the way, Google Sitemaps will also accept a text file in the same manner. So if you've done one for Google, you're set for Yahoo. Doing one for Yahoo? Then you're OK for Google.

FYI, for the "what's old is new" set, this is exactly how Infoseek worked back in 1997, except for the instant inclusion. When you gave Infoseek your list, all the URLs got in. Yahoo's service, like Google's, is merely a way to suggest that pages get crawled and added. There's no guarantee they will.

Posted by Danny Sullivan at 7:43 AM | Permalink

April 18, 2005

Overture Becomes Yahoo Search Marketing & Comparing Listing Products At Yahoo To Google

The rebranding promised in March has happened. Overture has officially become Yahoo Search Marketing, marked by the launch of a new Yahoo Search Marketing site that lists all of Yahoo's search-related listing products.

It's a good change that ought to help new advertisers. Rather than having to explain that they need to buy "Overture" to be on Yahoo, Yahoo can now direct them to a site that retains its branding.

But with rebranding can come confusion, so I thought it would be helpful to look at all the products listed at the new site and also compare them to Google products. In particular, an email I got from a reader prompted the idea:

I am trying to find the "comparable" Yahoo program to Google AdWords. Since their rebranding of Overture last week, I'm still looking unsuccessfully for something like Precision Match, but it looks as if the program has been axed?

We've been using Google AdWords since it launched and are very happy with the format and back office (most of all the results). Is Yahoo offering a similar program? Honestly, I've read about their "Sponsored Search" and it's simply not obvious.

Meanwhile at our Search Engine Watch forums, a thread on the rebranding shows similar confusion:

I thought Overture was being renamed to Yahoo Search Marketing, but this page boasts a range of products, including Shopping, Travel, Directory, PPI & Overture (sponsored search).

The chart below gives you a side-by-side look at all the products listed on the new Yahoo site, along with some other listings areas that I thought made sense to add. If you're a Search Engine Watch member, see this extended post that provides commentary and additional advice and information about each listing area.

Listing Type Yahoo Google Web Search Listings Yahoo Submit Your Site Add Your URL To Google Web Search Paid Inclusion Search Submit Express & Search Submit Pro n/a (but advertisers can get listing support) Search Ads (Paid Placement) Sponsored Search AdWords (search targeted) Contextual Ads Content Match AdWords (content-targeted; AdSense is name for PUBLISHER program) Shopping Listings Product Submit Froogle Feed (free) Travel Listings Travel Submit n/a Directory Listings Directory Submit ODP Submit Local Search Ads Local Sponsored Search AdWords Regional & Local Targeting Local Search Listings Local Enhanced Listings & Local Listings (free) Google Local Business Center News Listings Yahoo News Submissions Google News Source Suggestion

Want to discuss the change from Overture to Yahoo? Visit our forum thread, Yahoo! Search Marketing is Released. Also check out Yahoo To Buy Overture for background on Yahoo buying Overture back in 2003, GoTo Makes Overture To New Name for the last rebranding Overture went through, that of losing it original name of GoTo back in 2001 and GoTo Sells Positions, about GoTo's launch in 1998.

Posted by Danny Sullivan at 9:48 AM | Permalink

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Account Manager
Varick Media Management New York, United States

Reporting and Data Analyst
Varick Media Management New York, United States

Director of Marketing Communications
Avery Dennison Brea, United States

Publisher
Confidential Leading Publisher New York, United States


0