I guess I get to be the underwhelmed one about Alexa announcing a new Alexa Web Search Platform that's available to anyone willing to pay a fee.
Pay a fee for what? You can create your own search engine by tapping into the 4 billion web pages Alexa has indexed over time. You can search against the entire index or just a selected set, in case you want to make your own vertical search engine.
It's hardly new territory. Back in 2000, we had a number of services offering to let you create your own vertical search engines, as covered more in my The Vortals Are Coming! The Vortals Are Coming! article from that period. These died off because during the dotcom melt down, no one was really willing to spend money to create verticals, especially when the search ad market had yet to mature.
Since then, search ads and verticals are both hot. But spending money to lease search services? That's a remnant from the days before search ads, when search engines wanted to be paid for storage and processor time. Search ads made the leasing services model go away.
The current fight for AOL between MSN and Google underscores this. AOL isn't being asked to stump up money for web search to those companies. It has an audience that MSN and Google want to reach. They stumble over themselves to see who can offer the best deal.
Ah, but what if you're not as big as AOL? I suppose paying Alexa to get off the ground may help some people develop and prove their market, at which point Google and Yahoo -- among others -- will fight to offer these proven verticals similar services for free.
It also has to be said that the Alexa pitch would be a heck of a lot stronger if Alexa itself actually used its own web index. But it doesn't. Want to search the web with Alexa? Alexa depends on Google to give it a reach well beyond the 4 billion pages that Alexa has gathered.
How about more rain on the parade? Well, what could you use instead of Alexa? Let's see:
Just out, allows you to create a vertical search engine by giving it a list of
sites. Under the hood, Rollyo is tapping into Yahoo and refining it.
- Gigablast: Get your own
custom vertical search engine right now, for free, by using
Custom Topic Search. It's been
out for nearly
a year. Want some type of hosted service or something special. If it's not
listed here, I've no
doubt Gigablast will step up to deliver.
- Vortaloptics: This specialty
firm began offering services back in 2003, offering to create vertical search
engines for anyone. I
wrote about them at the time as perhaps signaling a return to easy-to-make
vertical search engines that looked likely before the dotcom downturn. They've
been quiet, so perhaps no one's taking them up on things. But then again,
perhaps the Alexa move might revitalize things.
- Google AdSense For
Search: Want to search the entire web, just as Alexa offers? Out
last year, Google's more than happy to give you access to its entire
database, for free, along with ads ready to go right alongside it. Nope,
vertical search isn't as easy. You could try
Google search, or the Google API
might help. If not, fair to say Alexa's move will spur Google along to
offering more and probably for free, if you want to carry ads.
- Yahoo Search Marketing Partner Solutions: Yahoo doesn't have a self-serve custom web search program similar to Google, but that's only a matter of time. Until then, if you're big enough, they'll do custom solutions. Not big enough? There's the Yahoo Search API you can tap into.
I'm certain I'm missing some players on the list above. Gary will likely know them and will postscript if I'm missing some.
Back to Alexa, John Battelle has a rundown in his Alexa (Make that Amazon) Looks to Change the Game post and is pretty positive, though he notes he's not had a chance to really talk with developers.
Like John, I've not really talked with a ton of developers, and perhaps that might shape my view to be more positive. At the moment, I definitely don't see it as a hugely groundbreaking move that will reshape web search forever, any more than Amazon's A9 Open Search has yet to do. If you want that groundbreaking move, you have to go back to when the Google API was first offered years ago. This is just an extension of that.
To be positive, it's a welcome extension. Certainly more and more people have felt the various APIs are too limited. A paid model definitely helps those trapped between wanting more from a search service but not at the traffic or interest level where the search engine will decide it makes sense to lift limits or partner more closely. Alexa jumping in should help spur the majors along, and that will be welcomed.
For more, John points over to Amazon Revs Its Search Engine at the Wall Street Journal which stresses the Amazon view that this will cater to the vertical search market.
Alexa Search API Released from Google Blogoscoped looks a bit more at the move from a developers angle, finding some positives but not exactly jumping up and down about the move. Announcing the Alexa Web Search Platform Beta from the Alexa blog has more details from Alexa. You'll also find plenty of other commentary via Memeorandum.
Want to discuss, comment, tell me I'm clueless? Visit our forum thread, Alexa Web Search Platform.
Postscript from Gary: In addition to the services that Danny lists like Rollyo and Gigablast, here are a few other services you might want to know about:
+ From the Internet Archive. About a month ago, Brewster and crew introduced a new sevice called Archive-It, that "that allows users to create, manage and search their own web archives through a web interface." It's primarily aimed at institutions and libraries. How's that for ease of use? From the Archive-It FAQ: Subscribers to the service can create distinct web archives, containing only the content they are interested in harvesting, at whatever frequency suite their needs. All collections are text searchable. The annual subscription cost is $10,000 per year and allows and The annual subscription cost is $10,000 per year. This allows an institution to collect, manage and search up to 10 million web documents. The pilot users to this point has been memory institutions, state archives, and libraries. More in the FAQ.
+ Do it Yourself from the Internet Archive: Heritrix This free crawler developed at The Internet Archive is an, open-source, extensible, web-scale, archival-quality web crawler project." Good FAQ that includes a list of some of the organizations using Heretix. It also says that Heritix is available to "crawl/archive a set of websites," in other words a focused set of sites.
+ Nutch A well-known open source web-search software and includes a crawler, a link-graph database, parsers for HTML and other document formats, etc.
One final comment, the Alexa Web Search platform offers an interesting and potentially useful demo. It allows you to search of images taken from digital cameras and exploit the metadata they often provide. Alexa is calling it "Camera Image Search." Why do I find it interesting? Because it offers fielded searching of submitted data. Instead of entering random terms and hoping for the best, this structured searching allows you to access precise results from the outset (of course, assuming the indexing is good). Camera Image Search could use better documentation (example searches would be a start), but I believe the idea is on target. I've called it "structured tagging" in the past and it could make tagging much more powerful as an info retrieval tool, especially if tagging of both text and imagery (much different situations) becomes more popular (big assumption). Of course, we have a few important and MAJOR caveats.
1) In terms of this Camera Image Search demo the structure and data comes directly from the camera, so it's easier to get accurate info but I think it hints at "fielded" interface can do in terms of general tagging.
2) In other situations it's different. Why?
3) Getting people to tag is a challenge in the first place, getting them to do more and add info to a specific field is another. However, tools could be developed to walk people to the process and offer suggestions.
4) Assuming that step 3 takes place, it's another issue to get the typical user to take advantage of the structure and use it to search. That said, a date field on any tagging service that was used correctly would make the search term "2005" or "Chicago" much more useful than just having them as free-text. For example, 2005 might be entered into the date the document was written field, Chicago might be included in where the document was authored or what the document is about field(s). Finally, this might be a help but it does not, in most situations, solve the many problems with authority control, synonyms, etc. While I think that in many cases allowing a text document "speak for itself" and let something like dynamic clustering assist in the organization, adding structure at the outset can make for even more clustering "power." Example? Look at the many ways you can search and cluster the structured data of PubMed using ClusterMed.
Postscript 2 from Gary: In his comments Danny notes how easy (it's true) Gigablast makes it to create a domain specific search. In addition to following the traditional steps, it was recently made even easier by simply adding the urls of the desired sites that you want searched to the Gigablast Advanced Search Page.