Certainly at least one company got a good chunk of data, as shown by the Orkut Personal Network Geomapper. It lets you look up anyone in the Orkut database at the time the information was mined, then see their connections.
For example, curious about Google cofounder Larry Page's network? Here are his personal connections, mapped pleasantly across the United States.
The service is definitely cool, helping you visualize your network of friends geographically. It's the type of feature you'd expect Google itself to offer. Perhaps those running Orkut will take note and offer something similar internally, using current data.
So far, I haven't found that any personal data is being revealed beyond the names of your connections. However, no doubt that personal data is part of the information that was mined, which is an embarrassment for Google.
Certainly, anyone within Orkut can find some of this data directly. But the amount of data Orkut provides to those outside someone's personal network can be limited. It's possible that no such limits are part of this database.
Bear in mind that the data is apparently fairly old. But Orkut, mined once, leaves fears that it could be mined again or worse.
Source Of Data
Rolan Yang, creator of the Geomapper site, said he gained the information from an acquaintance.
"Roughly two months ago, an associate passed me a file which contained information believed to be [the” Orkut web site. I do not know how he obtained the information, and I am not sure I want to know. My best guess would be that the information was scraped by some sort of search engine spider since it only contained data which is visible from one's public Orkut profile (username, location if given, friends, etc). Since subscribers are continually signing on in real-time, capturing an entire mirror of the website is not likely to be possible," he said, via email.
Yang added that the site wasn't originally intended for use outside a small number of people he knows:
"The Orkut Personal Network GeoMapper was written out of personal scientific curiosity and its use was meant for only a small circle of friends. Unfortunately someone leaked the URL to a 'blog' network after which all hell broke loose!," he wrote.
When this story was first posted, Yang said he'd not been contacted by Google about his site.
"Why have Google's lawyers not contacted me yet? I can't say for sure, but would you think that a lawsuit filed in protest of 'spidering and caching of website information' might just be in conflict of interest with their primary business?," he wrote.
(NOTE: About a week later, Google issued a cease-and-desist letter. See this article for more about that.)
Orkut's terms do have provisions against "using any robot, spider, site search/retrieval application, or other device to retrieve or index any portion or the orkut.com service" and a litany of other things that it might consider unauthorized use. But Yang says he didn't do the actual data acquisition, and if he's not an Orkut member, he wouldn't appear to be bound by its terms.
Geomapper User Reaction
Yang said he's received mostly positive comments since the site went live in the past few weeks, along with a few negative ones are relating to privacy. His response was that these people may not realize they already agreed that such data might be reused at least by Google itself in other forms, as per the Orkut terms of service:
Yang also said that he's using the meta robots tag to prevent search engines from indexing the maps created on his service. As said, this hasn't stopped Google from getting nearly 700 of these pages.
When I looked at what was indexed, such as this example, I do see the tag there. But in Google's cached version, it is not. It may be these were added after the pages were originally crawled. If so, then the pages should get removed at Google and other search engines in the near future. The site itself, of course, will remain online.
NOTE: This story was originally published on May 6, updated with comments from Geomapper on May 7 and with Google's action on May 18.
Optimising Digital Marketing Campaigns with Search, Social and Analytics
At SES London (9-11 Feb) you'll get an overview of the latest tools, tips, and tactics in Paid, Owned, Earned, Integrated Media and Business Intelligence to streamline your marketing campaigns in 2015. Register by 31 October to take advantage of Early Bird Rates.