There's a new paper out saying that business topics have ousted sex topics as top searches. Forget the findings, however. In the wake of the AOL search data uproar, I wanted to know where the 20 to 30 million search sessions studied came from.
The paper, Sexual and pornographic Web searching: Trends analysis, is at First Monday. A press release from Queensland University Of Technology about the paper is here. It notes that one of the paper's author Professor Amanda Spink has used:
20 to 30 million search sessions from popular search engines including Alta Vista, AlltheWeb.com, Ask.com, Excite and Dogpile.
Wow -- major players like AltaVista-owner Yahoo and Ask.com handing over data? And this after Yahoo just said it doesn't release query data to researchers? As it turns out, Yahoo and Ask are in the all clear.
The most recent data comes from Infospace-owned Dogpile, from 2005. Infospace has provided search data for years to Wordtracker, so it's not surprising that it has given it to researchers as well.
The key difference is the research data makes mention of having session information, rather than just query terms. To know a search session, you'll need to be able to know that some particular IP address or cookied person was involved. And if you have that data, then potentially you can identify someone, as the AOL case showed.
Yahoo now owns AltaVista and AllTheWeb, but the data from those services was released in 2002. That's before Yahoo gained ownership of them through the purchase of Overture in 2003.
As for Ask.com, the paper doesn't actually detail any information from that. Excite is listed, and Excite is part of Ask's IAC Search & Media Network. However, the last Excite data in 2001 came from before Ask's involvement with Excite.
Overall, none of the major search engines have handed out data here. Looking forward, the AOL fiasco will make it even less likely anyone's going to provide further information. As I've written, that is a real loss. The types of studies that Spink and her colleagues do are important, looking at how we interact and use important search tools. Figuring out a way to help that research -- yet still protect privacy -- is something I hope can happen.
As for the paper itself, it takes you back in summary fashion through nine studies over the past decade of how popular searches for porn are. Frankly, the topic is pretty boring at this point. The press release notes that:
In their mid-90s heyday, sex-related topics were the most commonly searched category, accounting for 17 per cent of web searches but that figure has now fallen to less than 4 per cent of web inquiries, information scientist Professor Amanda Spink said.
Now fallen? Hey, look at the studies. They fell back in 2002, but we keep getting releases playing up the porn is dead angle. I suppose it's nice to keep checking on this, but perhaps the fact that commerce-related queries are at an all time high (30 percent) is more important? Does it have to be contrasted against the non-changing sex stats?
And is sexual and porn searching really in declined? When I last looked in 2005, the words sex and porn were top 1 and 2 queries on Dogpile. Maybe the overall volume of porn-related queries is dropping, but it still seems to be a popular subject. Heck, here's a Google Trends chart for porn showing a rise since 2004.
Moreover, look at the paper itself. It ranks "sex" as tenth of the most popular terms on Dogpile. That's popular. It's even more popular when you eliminate these "popular" stop words above it: of, the, in, and, for, a, to. Do that, and this is how the top list looks:
Frankly, anyone doubt that sex is still a popular query? The lists might be even more dramatic if they reflected actual queries as entered, rather than individual words. In other words, no one's search for "of" in mass quantities. They are using that word alongside other ones -- and breaking apart the original queries causes skewing.
I've also got some issues about the fact that different search engines are used to compare data over time. For all we know, Excite users were more into porn than those of other search engines. Since Excite's data was used for the first three years, that could cause a skew. Perhaps not, but it's something to note.
Postscript: I asked Amanda Spink if she had any comments to add, and she sent across this:
What we have found in the data is that although sexual terms such as "sex" maybe high frequency terms, overall sexual searching continues to decline as a proportion of Web searches. The language used in sexual searching is realtively constrained and limited in variety, hence the high frequency terms.
We hope that further data can be made available to the academic community to allow us to continue these studies that are of interest to the Web companies, academics and the general public.