I was having what was proving to be a Pipe dream.
Surely, I thought, the fact that Yahoo Pipes AND a Links Tab within Google Webmaster Tools BOTH appeared out of nowhere during the same magical week in February must be some kind of gift from the SEO gods above. But it turned out to be a sort of cruel joke.
As Rudyard Kipling once wrote, "Oh, East is East, and West is West, and never the twain shall meet."
I've been trying to get these two to meet since that wonderful, glorious week they were both birthed in the same Web 2.0 ward. But ultimately, I had to give up on the dream of that meeting.
And the reason why the two will not work together is very Web 1.0. For all the Ajaxy, mashable goodness of Google, they don't make the links data from Webmaster Tools available as an RSS feed. For it to be fetched as a data source in Yahoo Pipes and then played with properly, it really would need to be done through RSS as the renewable source. And I don't think this was an oversight, but intentional.
First, let me take you through why I think the two would be such a great mash. Take, for example, a client of ours showing 100,000 external links to their domain in the new Links Tab of Webmaster Tools. Now, considering what we had been able to glean from Google about external links previous to the aforementioned magical second week of February, this new information was staggeringly rich information. Staggeringly is not a word to be used lightly.
But as with much cake, every new byte consumed after a certain point of satiety tastes just like all previous. We're not serving anyone's interest in SEO if we allow human limitations to roadblock analysis capacity. Capacity here is a quantitative AND qualitative term, simultaneously. Simultaneously is not a word – well... you get the idea.
Combining Yahoo Pipes with Google's Links Tab data would allow us to leap past this human limitation of data capacity. It would allow us to slice and dice, say, 100,000 inbound links, so that instead of trying to find that hidden duplicate mirror site out there among 98 pages of results, a Pipe operator could do the same in a fraction of a second.
Another Pipe operator placed before this selecting only for URLs new to the dataset on refreshes, then, would assure we were always alerted instantly to any new instances of this problem. Good for us and good for Google. But it requires RSS.
And I really shouldn't have to be arguing right now, in 2007, that computers can better handle massive amounts of data than humans, should I? Yet the most nimble access we get to the links data from Google is the ability to download to Excel.
Should we party like it's 1999?
I can't stress how much I hate Excel, so for the past couple weeks, much of my life has been spent trying to subvert it and work-around/tweak/force an RSS solution onto Google's Links Tab data. I hope I can save you some time if you're considering Yahoo Piping Google's Links, because around the globe in the forums people claim each of the solutions I tried below can create RSS where no RSS has gone before:
Supposedly, you can create a Dapp that will create for you an RSS feed where none exists -- a familiar refrain by the Dappist to any posed RSS (or really any other) question is a profoundly strong and confident Dapp it! Yet, in trying to secure a client's information, we kept getting looped through a JSON password username pairing. We weren't willing to bet the feed, much less the farm, on it.
- Feed 43
Feed 43. Q: Will my feeds be public or private? A: Your feeds will be public by default, but you have an option to password-protect any of them, but https is not supported. Thanks for playing!
Kapow. Sounds cool. Doesn't work.
There are a few other solutions out there like this, but they all seem to unwittingly turn you into a scraper. I don't want to be a scraper. I preach against scrapers almost on a daily basis, so utilizing these solutions would make me a hypocritical scraper.
So, what am I left with but begging Google not to reduce this staggering treasure of data to the dumb ascending and descending sort of Excel? I am reduced through such diminution to quickly scrolling back through Yahoo Site Explorer to see if I missed something there. The best I could find was an API for the results offered from Yahoo. How progressive.
The starting result position to return (1-based). The finishing position (start + results - 1) cannot exceed 1000.
Oh well. At least that will leave us more than enough time:
As chairman and co-founder of SEO firm Intrapromote, John Lustina divides his time between developing the strategic, long-term course of the company, managing key campaigns, and forging and maintaining relationships with partners.
We report the top search marketing news daily at the Search Engine Watch Blog. You'll find more news from around the Web below.
- Fundamental Questions about Local, Screenwerk
- Is It Legal To Include Competitor's Brands in SEO Strategy?, SearchRank
- Researchers Track Down a Plague of Fake Web Pages, New York Times
- Google's Supplemental Index: Questions and Inconsistencies, SEO Speedwagon
- Yahoo Research Looks at Templates and Search Engine Indexing, SEO by the Sea
- Recovering from NoIndex/NoFollow on Google, Web Analytics World
- How Google Blog Search Ranks Results, Google Operating System
- The Secret to Ranking at the Search Engines (that's really no secret at all), SEOmoz
- Breaking Up with Bad Clients: It’s Not You...It’s Me., Stuntdubl
- AdWords Optimization Tips: Part 3 - Account Structure, Inside AdWords
- Excellent Analytics Tip #10: How Thick is Your Head and How Long is Your Tail?, Occam's Razor
- comScore adds sessions to their reporting, Web Analytics Demystified
- Click Fraud Lawsuit Survives Motion to Dismiss--Payday Advance v. FindWhat, Technology and Marketing Law Blog
- Servicing Search: A Better Ad Model?, ClickZ