Since being rolled out in 2008, Google Flu Trends has often been cited as an exceptional instance of the potential for big data to revolutionize the way we live. However, Science Magazine recently released an article that discusses the shortcomings of Google Flu Trends, causing many to lament the "failings of big data" and criticize Google's data reporting practices.
The Idea Behind Google Flu Trends
Google Flu Trends was an admittedly creative and well-meaning data reporting service that aimed to provide real-time flu statistics to people concerned about contracting the illness.
Google engineers noticed that some search queries spiked during flu season. Specifically, keywords like "cough" and "fever," which are related to flu symptoms, increased noticeably during flu outbreaks. This observation led Google's engineers to believe that they could potentially track these queries and report flu rates faster than traditional CDC methods.
However, it seems that Google Flu Trends has had a nasty habit of overreporting flu rates and that, this too, may be the result of big data.
Google's Algorithm and the Media May Have Influenced Flu Trends Data
According to Science Magazine, Google Flu Trends reported more than double the real flu rate in the U.S. for February 2013. What's more, when compared against the conventionally-reported CDC flu data, Google Flu Trend estimates were high for 100 out of 108 weeks between August 2011 and September 2013. Why?
Google reportedly gets many more searches for flu-related symptoms when media coverage of influenza spikes. This suggests that media coverage of the flu influences search behavior and causes an influx of flu searches, which Google Flu Trends interprets as an increase in flu cases.
Even more interestingly, though, is the possibility that Google's own algorithmic updates have been skewing the flu trend data. As Joseph Stromberg reports for Smithsonian Magazine, Google's 2011 and 2012 algorithmic updates seem to have had the most significant impact on Google Flu Trends' accuracy.
In 2011, Google began recommending suggested searches for users based on their current search queries. Google updated its algorithm again in 2012 to not only suggest searches, but also to try to diagnose health symptom search queries. Thus, if someone were to run a search for the keyword phrase "sore throat" Google would likely suggest "flu" or "cold" searches for that user.
If Google's suggested searches persuaded users to search for flu and cold data, then Google itself was causing an influx of flu searches which its Google Flu Trends service then interpreted as a likely increase in flu cases.
Amending Big Data Reporting in the Future
Despite the hiccup in its Google Flu Trends data, Google appears determined to continue seeking a faster way to report disease trends. Stromberg's Smithsonian article points out that, in an attempt to amend Google Flu Trends errors, Google engineers have found that by combining the real-time qualities of Google Flu Trends with two-week-old flu data from the CDC, they can get a figure that is much more accurate.
As far as the future of big data goes, perhaps what Google Flu Trends lesson tells us is that we need both big data and conventionally-reported data to revolutionize our lives for the better.