Search engines have problems with calendar information. Bottom line: you may end up searching for dates in all the wrong places.
Searchers remark from time to time on the odd results of a date restricted query. For example, several weeks ago a lawyer commented that he had performed "a [Google” search for anything containing [his” name with the restriction 'last year' and got at least one hit that was probably 5+ years old."
The next day, TVC Alert mentioned: "[W”hen restricting a query by date, you are searching the date the engine indexed (or re-indexed) the Web page, rather than the date of the Web page itself." But a closer look at date restricted searching, uncovers quirks that defy this simple explanation.
This article focuses on date searching at Google, but searchers should note that other engines - like AlltheWeb and AltaVista - provide similar advanced features. The issues we raise occur regardless of the engine used.
Google offers three date searching options via its advanced search page. It defines the date qualifier as one that allows you to "restrict your results to the past three, six, or twelve months."
In conventional online research systems, as well as in Google Groups (separate service for posting and querying Usenet messages), date qualifiers retrieve material published or created on, before, or after a certain date, or within a date range. But what is a date on the Web?
HTML provides no code for identifying the creation or publication date of Web pages. While a copyright meta tag exists, few use it. Moreover, search engines generally ignore data contained within this meta tag.
A date in this environment may have other meanings. There are revision dates, expiration dates, as well as crawl and indexing dates.
Which of these, for example, causes Google to retrieve the Web page below when you enter the phrase query, "human cloning," and limit the results to the past three months?
Genetic Encores: The Ethics of Human Cloning... Genetic
Encores: The Ethics of Human Cloning. The successful
cloning of an adult sheep, announced in Scotland this
past February, is one of the most dramatic ...
www.puaf.umd.edu/IPPP/Fall97Report/cloning.htm - 35k
The document carries a copyright date of 1976 - 1999. We conducted the search on 7 May 2002.
Recently, Fagan Finder -- a site devoted to helping searchers locate information - created an interface to Google that permits various types of date searching (e.g., date range, specific date). It utilizes special syntax (daterange:) that requires Julian dates.
Does the use of Julian dates improve the quality of the search results? Try a phrase search for "government documents" limited to 15 January 2002. Enter the query like this:
daterange:2452289-2452289 "government documents"
On 7 May 2002, it retrieved this Web page, which carries a revision date of 26 April 2002.
American Library Association, Government Documents
Round Table ( ... GODORT Logo, . American Library
Association (ALA) Government Documents Round Table
(GODORT) GITCO Government Information Technology
Committee. ... www.library.ucsb.edu/ala/gitco/ - 6k
The query also finds the page below, which was last updated on 11 April 2000!
Government Information... Return to the KDLA Home Page.
For comments or suggestion concerning this page, please
contact Government Documents Librarian. Last Modified:
April 11, 2000. www.kdla.net/links/govinfo.html - 14k
Now conduct this search using the future date range, 1 July - 1 August 2002. The query - "government documents" daterange:2452456-2452487 - yielded 2 results when we performed it on 17 May 2002. This item carries a copyright date of 1998:
From the Jameson Raid to Bloemfontein: Debating the
Origins of ...... With access not only to newly
unsealed government documents in both Cape Town and
London, but also to privately owned Chamberlain papers,
Drus reconfirmed van ...
www.gtexts.com/college/papers/s1.html - 26k
Another future date range search -- law daterange:2452426-2452456 -- yields 13 items. One displays the date August 2001 in the title of the document. But a visit to the page finds a document with the date October 2001.
EBF-Jeremiah's Cry, August 2001... and my family.
Losing my mother has been extremely difficult for me,
my brother and sister-in-law, and my father. Your
kindness, generosity ... www.ebf-
church.org/pages/newsletter/2001/jc10-01.htm - 31k
What causes these false hits? We suspect, in part, that Google counts the indexing (or re-indexing) of a Web page as a date. But Google refreshes a small portion (about 3 million Web pages) of its database every 24 to 48 hours. It refreshes the rest of its database every 4 to 6 weeks. Does each new crawl affect the date of the Web page? If so, is this practice an accurate reflection of the date of the content?
Google spokesperson Nate Tyler contends that Google supports neither the date syntax (daterange:) nor the Fagan Finder interface to Google. Instead, Google "encourage[s” users to conduct date searches from the advanced search page."
While we acknowledge that the date syntax belongs to a new service (Google Web APIs) still in beta mode, we question the availability of a faulty date search option on Google's Advanced Search page. Moreover, we think Google should better define the date qualifier to avoid misleading searchers into believing it means creation or publication date.
Gary Price (Gary Price Library Internet Research Consulting) is the author of the essential weblog for searchers, The Virtual Acquisition Shelf & News Desk. Genie Tyburski is Web Manager of The Virtual Chase, a service of the law firm Ballard Spahr Andrews & Ingersoll. She writes frequently about Internet research issues.
Google Advanced Search
FaganFinder's interface for Google
Google Web APIs
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.
Early Bird Rates have been extended!
June 12-14, 2013: Join industry experts at SES Toronto for a crash course in the latest strategies in Online Marketing and Advertising.
Save $300 when you register by Thursday, May 23.