New Search Patent Applications: June 5, 2006 – Taking Care of Web Decay, Dead Links, and Parked Domains

Yahoo provides an XML based bid management tool, and a way to maintain a persistent link to dynamic information between a browser and specific web pages. Microsoft marries email and search to provide a way store and track queries, and also introduces a method of calculating similarity between pages without the computational overhead of a Latent Semantic Indexing methodology. IBM aims to improve text search by preprocessing and maintaining relationship data between documents, delivers a means of spellchecking URLs, describes a process for personalizing web pages which include personalized search results, and introduces a method to rank pages while accounting for dead links and the decay of content on web pages.

ISEN, L.L.C., has developed an internet search environment number for internet databases to expand search to deep content, rarely indexed by web-based search engines. Zoom Information, Inc., has developed a method of collecting data about people from the web, to present in a structured database to make searching for information about people easier. Bing Swen, from Peking University, has created an improved process for search result clustering. Two Amazon employees have had a patent application published which may correct misspellings in queries in order to return results to searchers.


This first document may be the star of this batch of search filings, and if you decide to try to read through any of these, this may be a good choice. Two of the inventors listed in this patent application are now at Yahoo. Ironically, one of their main examples of the problem of web decay involves the Yahoo Directory.

There are a number of nice ideas here, which touch on why dead links and web decay aren’t good for search engines, how most dead links can be identified regardless of whether a server returns a 404 error message or not, and how some exceptions such as parked domains (including expired domains purchased to benefit a site by using promotional efforts of previous owners) are now being looked at carefully by search engineers.

Methods and apparatus for assessing web page decay
Inventors: Andrei Zary Broder, Ziv Bar-Yossef, Shanmagasundaram Ravikumar, Andrew Tomkins
US Patent Application 20060112089
Published May 25, 2006
Filed: November 22, 2004


Systems and methods are herein disclosed for assessing the staleness of a web page. In particular, in one method of the present invention, the staleness of a web page is assessed by examining internal date references within the web page. In another method of the present invention, the staleness of a web page is assessed by examining the meta-data associated with the web page. In a further method of the present invention, the staleness of a hyperlinked web page is determined by examining the link status of the hyperlinks. If the web page has a relatively large number of dead links, it is assessed as being a stale web page. In a still further method of the present invention, the link status of web pages in the neighborhood of the web page being assessed is likewise examined.

Method and system for improving a text search
Inventors: Michael J. Dockter, Jochen F. Doerre, Ronald W. Lynn, Joseph A. Munoz, Randal J. Richardt, and Roland Seiffert
US Patent 7,054,882
Granted May 30, 2006
Filed: February 12, 2003


A method and system for improving text searching is disclosed. The method and system provides a network of document relationship and utilizes the network of document relationships to identify the region of documents that can be used to satisfy a user’s request. In a preferred embodiment, the text searching method in accordance with the present invention augments a conventional text search by using information on document relationships and metadata. The text searching method and system improves upon conventional text search techniques by incorporating relationship metadata to define regions to search within. In the present invention the definition of a region is not limited to just categories as it includes neighborhoods around individual documents and sets which have been user defined.

Spell checking URLs in a resource
Inventors: Mark Joseph Hamzy
US Patent Application 20060112066
Published May 25, 2006
Filed: November 22, 2004


Methods, systems, and computer program products are provided for spell checking URLs in a resource. Embodiments include identifying within a resource a URL, determining whether the URL is valid, and marking the URL as misspelled if the URL is invalid. In typical embodiments, determining whether the URL is valid is carried out by resolving a domain name contained in the URL. Typical embodiments also include suggesting an alternative spelling for the URL. In some embodiments, suggesting an alternative spelling for the URL is carried out by identifying a keyword in the resource, querying a search engine with the identified keyword, and selecting a URL in dependence upon search results returned by the search engine.

System and method for generating personalized web pages
Inventors: Alexander W. Holt and Michael E. Moran
US Patent Application 20060112079
Published May 25, 2006
Filed: November 23, 2004


A web page personalization system and method. The system comprises: a web application server for serving a web page that includes personalized search results for a user requesting the web page; a content repository for storing content for the web page; a profiling system for dynamically providing profile attributes of the user when the web page is requested; and a search engine for generating the personalized search results using a query that is based a selected set of the provided profile attributes.


A bid management tool based upon XML.

Use of extensible markup language in a system and method for influencing a position on a search result list generated by a computer network search engine (Overture)
Inventors: Stephan Cunningham, Anthony Molinaro, Frank Maritato, Jr., Peng Zhao, and Nick Conrad
US Patent 7,054,857
Granted May 30, 2006
Filed: May 8, 2002


A database search apparatus and method for generating a search result list which responds to Extensible Markup Language (XML) requests from a client to a server of an on-line marketplace. A bid management tool is operable on a client computer to manage search listings and account information of one or more advertisers. The client application communicates with the server via an XML-based application program interface. The bid management tool provides functions for reporting account activity, modifying accounts and manual, timed or event-driven changes to search listings including listings of several advertisers.

The process described in the next patent application provides a way for a user to set up their browser interface so that dynamic information may be added to, and updated on their browser interface from different sites, regardless of whether they remain on that site, or go elsewhere, through the use of Active X controls.

Method of controlling an Internet browser interface and a controllable browser interface
Inventors: Thomas J. Shafron
US Patent Application 20060112102
Published May 25, 2006
Filed: February 2, 2006

See also, patent application number: 20060112341


The present invention is directed to a method of dynamically controlling and displaying an Internet browser interface, and to a dynamically controllable Internet browser interface. In accordance with the present invention, a browser interface may be customized using a controlling software program that may be provided by an Internet content provider, an ISP, or that may reside on an Internet user’s computer. The controlling software program enables the Internet user, the content provider, or the ISP to customize and control the information and/or functionality of a user’s browser and browser interface.


A hybrid use of search and email, enables users to search email, save queries with their email, and have automated queries performed and stored with their email on a periodic basis of their choice.

Storing searches in an e-mail folder
Inventors: Imran I. Qureshi
US Patent Application 20060112081
Published May 25, 2006
Filed: November 23, 2004


A method for saving search query information on a server coupled to the Internet as a search folder. The method may include the steps of: identifying a user communicating with the server; storing the search query associated with the user in a data store on the server responsive to a user instruction to store the search query; and submitting the query to an Internet search engine for execution based on a triggering event. A data structure for storing the search folder is also described.

Method and system for determining similarity of items based on similarity objects and their features
Inventors: Benyu Zhang, Hua-Jun Zeng, Wei-Ying Ma, Zheng Chen, Ning Liu, and Jun Yan
US Patent Application 20060112068
Published May 25, 2006
Filed: November 23, 2004


A method and system for determining similarity between items is provided. To calculate similarity scores for pairs of items, the similarity system initializes a similarity score for each pair of objects and each pair of features. The similarity system then iteratively calculates the similarity scores for each pair of objects based on the similar scores of the pairs of features calculated during a previous iteration and calculates the similarity scores for each pair of features based on the similarity scores of the pairs of objects calculated during a previous iteration. The similarity system implements an algorithm that is based on a recursive definition of the similarities between objects and between features. The similarity system continues the iterations of recalculating the similarity scores until the similarity scores converge on a solution.


Internet search environment number system
Inventors: Matthew S. Theobald and Paul Thompson
US Patent Application 20060116992
Published June 1, 2006
Filed: November 28, 2005


The present invention discloses an Internet search environment number (“ISEN”) system that provides researchers with a tool to locate and search relevant, evaluated online databases. The ISEN system is a portal that comprehensively catalogs the Internet’s databases thereby making information located on the Internet readily accessible from both visible and invisible database resources. The ISEN system facilitates access and adds value by creating more effective and efficient Internet search experiences. The ISEN system takes advantage of a persistent locator for database resources to guarantee users will always be able to locate desired resources no matter where they move or if the content changes.

Zoom Information, Inc.

Method for maintaining people and organization information
Inventors: Jonathan Stern, Jeremy W. Rothman-Shore, Kosmas Karadimitriou, and Michel Decary
US Patent 7,054,886
Granted May 30, 2006
Filed: July 27, 2001


A database is formed and maintained by computer-automated means extracting information from a global computer network. The database contains information about people and organizations. The present invention method provides continual updates to the information stored in the database by the people named in the database and by the automated means. Integrity of the automatically extracted information is maintained. A link from the invention database to a third party data system provides updates in the information in the database to be communicated to the third party data system for updating and maintaining data of the third party data system. The database may serve as an email communication clearinghouse where senders do not need to know the email address of a person named in the database but rather leaves messages through that person’s record in the database. Targeted advertising to a named person is provided during his accessing the database. The targeted advertising is based on information about the named person as stored in the database. The database may be queried by any combination of person name, job title, organization name and field of business of the organization. The invention method provides clipping service which monitors changes in the information stored in the database and notifies interested parties of detected changes.


Bing Swen teaches at Peking University, and has been involved in SIGHAN, a Special Interest Group of the Association for Computational Linguistics, and the last Asian Information Retrieval Symposium.

Method for search result clustering
Inventors: Bing Swen
US Patent Application 20060117002
Published June 1, 2006
Filed: November 1, 2005


Methods and systems are presented to predetermine and record the classes of each indexed document with respect to each of its index keywords, and to provide high quality and relevant classification of the document when it is searched with said keyword. Document classes, recorded in advance, are used as the clustering information of each document in the search results to realize efficient, large-scale and high quality search result clustering. One embodiment provides a method for search result clustering, which includes recording the classes of each indexed document when the document is searched with each of its index keywords. This method further includes grouping the search results according to the classes of each result document with respect to the keyword or keywords contained in the search query. By prerecording the classes of each document with respect to each index keyword, the classes of each document in the search results in response to a search query can be directly determined via the keywords included in the search query. Each result document is put into each of its classes associated with each of the search keywords, and the union of all the classes of the result documents is used to construct the final document clusters for the search results. The clusters are ranked according to the ranks of documents included in each cluster and the weights of the clustered documents in the corresponding cluster. The clustered search results are presented to the user in such a way that clusters with higher ranks, and documents with higher ranks in each cluster are preferentially presented. Each cluster can be displayed and navigated in an independent framed subarea of the output window.

The inventors listed in the following patent application are employees, though the document hasn’t been assigned to Amazon, and examples within the document include searches for books within an online bookstore. When a search query is misspelled, rather than not returning results to a searcher, the process here may try to find a suitable replacement. So, for instance, a search for “The De Vinci Code ” shows results that begin with “The Da Vinci Code.” at

Search query processing to identify related search terms and to correct misspellings of search terms
Inventors: Ruben Ernesto Ortega and Dwayne Edward Bowman
US Patent Application 20060117003
Published June 1, 2006
Filed: January 6, 2006


A search engine process predicts the correct spellings of search terms within multiple-term search queries. In one embodiment, when a user submits a multiple-term search query that includes a non-matching term and at least one matching term, a table is accessed to look up a set of terms that are “related” to the matching term or terms. A spelling comparison function is then used to determine whether any of these related terms is sufficiently similar in spelling to the non-matching term to be deemed a candidate correctly-spelled replacement. A candidate replacement term may automatically be substituted for the non-matching term, or may be suggested to the user as a replacement. The invention also includes a process for identifying terms that are related to each other based on the relatively high frequencies with which they co-occur in search queries of users, database records, and/or specific database fields.

My usual reminder about patents: Some of the processes and technology described in patents are created in house, and some are developed with the assistance of contractors and partners. A percentage are never developed in a tangible manner, but may serve as a way to attempt to exclude others from using the technology, or even to possibly mislead competitors into exploring an area that they might not have an interest in (sometimes skepticism is good.)

There are times when a Google or Yahoo acquires a company to gain access to the intellectual property of that company, or the intellectual prowess and expertise of that company’s employees. And sometimes patents are just purchased.

Want to comment or discuss? Visit our Search Technology & Relevancy area of the Search Engine Watch Forums.

Related reading

Google / YouTube and brand safety: What's next?
lessons learned from launching 100+ campaigns
Amazon Advertising, Prime Pantry
Google Ads 2019: What to look out for