IndustryGoogle’s API: For Fun, Not Profit (Yet)

Google's API: For Fun, Not Profit (Yet)

The Google API is a fun way to use Google for things as varied as solving crossword puzzles to creating recipes, but it's not yet ready for prime-time applications.

The Google API is a fun way to use Google for uses as varied as solving crossword puzzles to creating recipes, but it’s not yet ready for prime-time applications.

A special report from the Search Engine Strategies 2003 Conference, August 18-21, San Jose, CA.

The Google API is a way for all sorts of programs to send queries to Google and get back results as data, rather than the web pages we’re familiar with when we do a Google search. Nelson Minar, the Google engineer who designed the API, and Rael Dornfest, co-author of O’Reilly’s “Google Hacks” book, talked a little about what it is, and what people can do with it in a panel called “Up Close with the Google API.”

An API is an “Application Programmer Interface.” Simply put, it’s a way to have one program talk to another program. For example, most systems have ways for applications to tell a browser to open a specific web page. The browser has an API that defines what commands it will accept and in what format.

Most webwide search engines have developed APIs. Using APIs, portals can get search results or ads to display on their local pages. And in fact, Google had an informal XML search results option before it officially releaseded its API: programs could send a browser-like request in the form of a URL, and Google would send back XML results.

But that was not meant for large-scale use. Neither was the hack (workaround) called “screen scraping.” This is when a program sends a URL request, and gets back HTML, and then programatically takes apart the HTML to locate the pieces of information that are included. In Google or any other search engine, this would be the search result item information: title, URL, text snippet, date, size, etc. Search engines hate that: the queries are mechanical rather than human, so there’s never a click on any result and they’re essentially wasting the search engine’s resources.

Minar talked a little about the goals of the Google API project. When Google decided to let people have programmatic access to the search engine in a more controlled way, they did two special things. One was to use hot new technologies called “WDSL” (Web services description language) and “SOAP”, which are designed for application-to-application communication over the Web. The other was to make it very easy for researchers, independent programmers and companies to use this API for creative and interesting projects.

The API is still in beta, non-commercial, free and limited. It provides access to just three of the Google services: search results, cached pages and spelling suggestions. And, to avoid being overwhelmed by results, each developer is limited to 1,000 queries per day. Note that there is not yet API access to the Google Search Appliance (enterprise search engine).

It’s easy to get access to the developer kit, posted online at http://www.google.com/apis. The developer kit includes the WDSL file and a lot of documentation and sample code (mainly Java, .Net, and Perl) but people have written samples for pretty much every known computer language, from C++ to AppleScript. Each developer signs up and gets their own key, which must be sent with every API request.

Using the API is also simple. The program sets up a request by creating an XML message, sends it and gets the reply. In the spelling example, that’s just the phrase to check, and the result is a little package including the spelling suggestion. Similarly, a search request has some parameters (like whether to use safesearch). When you send it, you get a batch of search results, in a tidy format.

For example, when you submit a search API request for Google Hacks author Rael Dornfest, you get some header information, and then a list with elements that look like this:

[
URL = "http://www.oreillynet.com/˜rael/"
Title = "raelity bytes"
Snippet = " ... that's not actually me. "They say Vorilhon, who calls himself the prophet Rael and testified before Congress last year in a futuristic white jumpsuit ..."
Directory Category = {SE="", FVN=""}
Directory Title = ""
Summary = ""
Cached Size = "35k"
Related information present = true
Host Name = ""
”,

The content should look familiar. It consists of the pieces of each Google search result item, in what’s known as a name-value pair format. Unfortunatly, the output is not in XML format, which is a shame. The program that sent that request can now use those tidbits for all sorts of interersting purposes.

Minar also gave some examples of interesting uses of this data, including Mockery Bird, connecting Amazon book reviews and web commentary, people doing data mining on complex topics such as the Web ecosystem, and finding solutions to crossword puzzle clues. Minar encourages anyone doing something cool and different with the API to send email via the API support address.

Author Dornfest talked about the fun he had working on Google Hacks, noting that the sheer size of the Google database is interesting to play with. He showed how using an undocumented date function could take you “back in time” displaying only page information that hasn’t changed in several years. The Weblog Bookwatch by Paul Marsh combs weblogs for mentions of Amazon URLs. AvaQuest’s GoogleMovies gets the overall “temperature” of posts about new movies.

These are all fun and non-profit uses. Google is still working through the process of making the API into a commercial service. Obviously, they’ve already talked to a lot of SEO and SEM companies, but have not come to any public decision on how to handle this issue. Minar encouraged people to contact Google if they are interested in business arrangements, and Dornfest encouraged people to develop additional innovative tools.

Avi Rappoport is an analyst and consultant on web site and enterprise search engines. She’s done enough programming to play with the API in Perl, and thinks it’s nifty.

Overture Search Leader Moves to MSN

Paul Ryan, the former Chief Technology Officer of Overture, has been hired as the General Manager of MSN Monetization.

According to MSN spokesperson Malina Bragg, “Paul joins the management team as we continue to invest and build our search and ad sales platforms. Further, he brings a tremendous amount of search industry experience to MSN, and will help our efforts to move MSN monetization into the future.”

Ryan was a key player at Overture, and his move to MSN is likely a significant loss to Yahoo, Overture’s new owner. We’ll have more coverage as the battle between MSN and Yahoo/Overture ramps up, inveitably, over the course of the coming year.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

What’s Google Really Worth?
Business Week Oct 30 2003 3:13AM GMT
Spammers Steal E-Mail Addresses on Orbitz
SiliconValley.com Oct 30 2003 2:51AM GMT
LookSmart soars as third-quarter results beat views
San Jose Mercury News Oct 30 2003 1:03AM GMT
Google’s Popular Toolbar
New York Times Oct 30 2003 1:03AM GMT
E-Mail Marketing Works — Or Hurts
Internet News Oct 30 2003 0:51AM GMT
LookSmart: Determinedly Second-Tier, Exploring Options
Boston.Internet.com Oct 30 2003 0:40AM GMT
MSN splits in two
CNET Oct 29 2003 9:02PM GMT
Why Do I Need To Hire a SEO (Search Engine Optimization) Company?
Search Engine Guide Oct 29 2003 4:37PM GMT
LookSmart preparing for life after MSN
Netimperative Oct 29 2003 2:40PM GMT
Google eyes book search
CNET Oct 29 2003 1:53PM GMT
Google: Have money, will travel
Yahoo Oct 29 2003 11:16AM GMT
Web Group Backs Microsoft in Patent Suit
New York Times Oct 29 2003 4:47AM GMT
Yahoo acquires BT content sites
CNET Oct 29 2003 1:48AM GMT
powered by Moreover.com

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y