There’s a new paper out saying that business topics have ousted sex topics as
top searches. Forget the findings, however. In the wake of the
AOL search data
uproar, I wanted to know where the 20 to 30 million search sessions studied
and pornographic Web searching: Trends analysis, is at First Monday. A press
release from Queensland University Of Technology about the paper is
here. It notes that one of the paper’s author Professor Amanda Spink has
20 to 30 million search sessions from popular search engines including Alta
Vista, AlltheWeb.com, Ask.com, Excite and Dogpile.
Wow — major players like AltaVista-owner Yahoo and Ask.com handing over
data? And this after Yahoo just
said it doesn’t release query data to researchers? As it turns out, Yahoo
and Ask are in the all clear.
The most recent data comes from Infospace-owned Dogpile, from 2005. Infospace
has provided search data for years to
Wordtracker, so it’s not surprising that it has given it to researchers as
The key difference is the research data makes mention of having session
information, rather than just query terms. To know a search session, you’ll need
to be able to know that some particular IP address or cookied person was
involved. And if you have that data, then potentially you can identify someone,
as the AOL case
Yahoo now owns AltaVista and AllTheWeb, but the data from those services was
released in 2002. That’s before Yahoo gained ownership of them through the
purchase of Overture in 2003.
As for Ask.com, the paper doesn’t actually detail any information from that.
Excite is listed, and Excite is part of Ask’s IAC Search & Media Network.
However, the last Excite data in 2001 came from before Ask’s involvement with
Overall, none of the major search engines have handed out data here. Looking
forward, the AOL fiasco will make it even less likely anyone’s going to provide
further information. As
that is a real loss. The types of studies that Spink and her colleagues do are
important, looking at how we interact and use important search tools. Figuring
out a way to help that research — yet still protect privacy — is something I
hope can happen.
As for the paper itself, it takes you back in summary fashion through nine
studies over the past decade of how popular searches for porn are. Frankly, the
topic is pretty boring at this point. The press release notes that:
In their mid-90s heyday, sex-related topics were the most commonly searched
category, accounting for 17 per cent of web searches but that figure has now
fallen to less than 4 per cent of web inquiries, information scientist
Professor Amanda Spink said.
Now fallen? Hey, look at the studies. They fell back in 2002, but we keep
playing up the porn is dead angle. I suppose it’s nice to keep checking on this,
but perhaps the fact that commerce-related queries are at an all time high (30
percent) is more important? Does it have to be contrasted against the
non-changing sex stats?
And is sexual and porn searching really in declined? When I
in 2005, the words sex and porn were top 1 and 2 queries on Dogpile. Maybe the
overall volume of porn-related queries is dropping, but it still seems to be a
popular subject. Heck, here’s a Google Trends chart for
porn showing a rise since
Moreover, look at the paper itself. It ranks "sex" as tenth of the most
popular terms on Dogpile. That’s popular. It’s even more popular when you
eliminate these "popular" stop words above it: of, the, in, and, for, a, to. Do
that, and this is how the top list looks:
Frankly, anyone doubt that sex is still a popular query? The lists might be
even more dramatic if they reflected actual queries as entered, rather than
individual words. In other words, no one’s search for "of" in mass quantities.
They are using that word alongside other ones — and breaking apart the original
queries causes skewing.
I’ve also got some issues about the fact that different search engines are
used to compare data over time. For all we know, Excite users were more into
porn than those of other search engines. Since Excite’s data was used for the
first three years, that could cause a skew. Perhaps not, but it’s something to
Postscript: I asked Amanda Spink if she had any comments to add, and she sent across this:
What we have found in the data is that although sexual terms such as “sex” maybe high frequency terms, overall sexual searching continues to decline as a proportion of Web searches. The language used in sexual searching is realtively constrained and limited in variety, hence the high frequency terms.
We hope that further data can be made available to the academic community to allow us to continue these studies that are of interest to the Web companies, academics and the general public.