New Search Patent Applications: June 27, 2006 – Searching Amongst Malicious Web Sites

Microsoft’s patent applications from the end of last week include ways for search engines to scan malicious web sites, clustering queries for more relevant searches, and extracting feature and formatting information from pages. IBM introduces a new query dependent page ranking algorithm, and a way to preload the URLs of a site into your history file before you’ve ever visited. Xerox searches for more meaningful snippets, Alcatel takes the PC out of search, and replaces it with TV, and British Telecommunications describes a way to make user profiles more helpful in returning search results.


This patent filing looks at user logs for web queries, and user feedback associated with those queries in an attempt to try to cluster the queries, and serve more relevant results in response to those queries.

Clustering Web Queries
Invented by Ji-Rong Wen, Jian-Yun Nie, Ming-Jing Li, and Hong-Jiang Zhang
Assigned to Microsoft
US Patent Application 20060136455
Published on June 22, 2006
Filed on February 23, 2006


Systems and methods for clustering Web queries are described. In one aspect, one or more of a same document and a plurality of similar documents selected by a user in response to a plurality of queries is identified. Responsive to this identification, a query cluster is generated. The cleric the query cluster indicates that the queries are similar independent of whether individual ones of the queries comprise similar composition with respect to other ones of the queries.

In this next document, Microsoft looks at how data on a web page can be extracted from the page, and parsed into information about the content on the page with its associated formatting, frequency of appearance, associated meta data, titles, and more. Statistics can be used to help understand the relevance of a query to the information extracted from a page.

Ranking search results using feature extraction
Invented by Dmitriy Meyerzon and Hang Li
Assigned to Microsoft
US Patent Application 20060136411
Published on June 22, 2006
Filed on December 21, 2004


Methods and computer-readable media are provided for ranking search results using feature extraction data. Each of the results of a search engine query is parsed to obtain data, such as text, formatting information, metadata, and the like. The text, the formatting information and the metadata are passed through a feature extraction application to extract data that may be used to improve a ranking of the search results based on relevance of the search results to the search engine query. The feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text. The extracted titles, the text, the formatting information and the metadata for any given search results item are processed according to a field weighting application for determining a ranking of the given search results item. Ranked search results items may then be displayed according to ranking.

The following patent application looks at ways of detecting malicious content on pages, during a crawl of the web, and in real time as a query is performed. I was reminded of Scandoo and a recent paper from Ben Edelman and SiteAdvisor, The Safety of Internet Search Engines, when reading it.

System and method for utilizing a search engine to prevent contamination
Invented by Art Shelest and Eytan D. Seidman
Assigned to Microsoft
US Patent Application 20060136374
Published on June 22, 2006
Filed on December 17, 2004


A system and method are incorporated within a search engine for preventing proliferation of malicious searchable content. The system includes a detection mechanism for detecting malicious searchable content within searchable content traversed by a web crawler. The system additionally includes a presentation mechanism for handling the detected malicious searchable content upon determination that the malicious searchable content is included in search results provided by the search engine. The presentation mechanism handles the detected malicious searchable content in order to prevent proliferation of the malicious searchable content to a receiver of the search results.


When you visit a site that you have previously been to before, your browser address bar will often show you pages on that site that you’ve seen, in a dropdown. This can help you return to a page that you may have been trying to return to. It might be helpful if this kind of feature was available on sites that you haven’t visited before. Imagine arriving at a site you haven’t seen previously, and being able to download a list of the URLs on the site.

The method in this filing involves a plugin to help a visitor find URLs of pages by providing an autocomplete, and a dropdown selection of URLs for pages on the site. URLs from the site could be added in the browser to its history files, an auto-complete file, and a site-map file.

Method and system for advanced downloading of URLS for WEB navigation
Invented by Derek Kwan
Assigned to IBM
US Patent Application 20060136453
Published June 22, 2006
Filed on September 8, 2005


A method, computer program product, and system for providing advanced downloading of Uniform Resource Locators (URLs) for a WEB browser running on a computer. The system is capable of providing a WEB browser with Uniform Resource Locators (URLs). The system comprises a client computer and a server. The client computer includes the WEB browser for use by a user and includes a URL component. The server provides WEB data to the client computer. The server includes a URL downloader, which is responsive to the URL component for downloading the URLs to the client computer.

A method for ranking pages in relation to a query; unlike pagerank, this method is query dependent and ranks pages associated with specific queries.

Dynamically ranking nodes and labels in a hyperlinked database
Invented by Krishna Prasad Chitrapura and Srinivas Raaghav Kashyap
Assigned to IBM
US Patent Application 20060136098
Published on June 22, 2006
Filed on December 17, 2004


The World Wide Web (WWW) can be modelled as a labelled directed graph G(V,E,L), in which V is the set of nodes, E is the set of edges, and L is a label function that maps edges to labels. This model, when applied to the WWW, indicates that V is a set of hypertext documents or objects, E is a set of hyperlinks connecting the documents in V, and the edge-label function represents the anchor-text corresponding to the hyperlinks. One can find a probabilistic ranking of the nodes for any given label, a ranking of the labels for any given node, and rankings of labels and pages using flow based models. Further, the flows can be computing using sparse matrix operations.


Snippets shown in search results could be more reflective of the intent of a searcher, and help a searcher locate a document that best matches what they are looking for, instead of just displaying text that contains the keywords searched for. That’s the focus of the next patent filing.

Systems and methods for using and constructing user-interest sensitive indicators of search results
Invented by Daniel G. Bobrow; Ronald M. Kaplan
Assigned to Xerox
US Patent Application 20060136385
Published on June 22, 2006
Filed on December 21, 2004


Techniques are provided to construct and use user-interest sensitive indicators of search results. A set of documents is determined based on one or more search terms. Passages within each selected document are identified based on the search terms. Condensation transformations applied to the passages to preferentially retain elements of the passage based on the search terms and user interest information. The resultant indicator is provides a user-interest sensitive signal of the meaning of the passage.


Alcatel describes how to search for content to display on television through a televison set box, without using a computer.

Method and system enabling Web content searching from a remote set-top control interface or device
Invented by Prasad Golla
Assigned to Alcatel
US Patent Application 20060136383
Published on June 22, 2006
Filed on December 20, 2004


A system for conducting a data search operation for content stored at nodes on a network includes a menu interface for enabling an interaction sequence of content category selection and definition-narrowing of those categories selected, a server application for interpreting the interaction sequence and for formulating a search query based on the interpretation, and a session application for submitting the query to a third party node, and for receiving and filtering results returned, the results forwarded to the menu interface for subsequent display and interaction. The network may combine wireless and land-based telephone, Internet, cable and satellite television.

British Telecommunications

This last patent filing looks towards the use of user profiles to help make more relevant searches.

Searching apparatus and methods
Invented by Gary M. Ducatel and Behnam Azvine
US Patent Application 20060136405
Published on June 22, 2006
Filed on January 23, 2004


An apparatus and method are provided for improving database searching, the method comprising the steps of: receiving a search query comprising one or more search keywords from a user; accessing a user profile means arranged to provide data indicative of relatedness criteria between keywords from a set of documents, and identifying from said user profile means, for the or each search keyword, potentially-related keywords according to predetermined criteria; providing said potentially-related keywords to the user; receiving information from the user confirming that any potentially-related keywords are considered to be related keywords; in the event that any potentially-related keywords are confirmed by the user to be related keywords, incorporating such potentially-related keywords as keywords in an improved search query; and submitting the improved search query to a search engine. Also provided are an apparatus and method for creating and maintaining user profiles for use in the above searching apparatus and method.

My usual reminder about patents: Some of the processes and technology described in patents are created in house, and some are developed with the assistance of contractors and partners. A percentage are never developed in a tangible manner, but may serve as a way to attempt to exclude others from using the technology, or even to possibly mislead competitors into exploring an area that they might not have an interest in (sometimes skepticism is good.)

There are times when a Google or Yahoo acquires a company to gain access to the intellectual property of that company, or the intellectual prowess and expertise of that company’s employees. And sometimes patents are just purchased.

Want to comment or discuss? Visit our Search Technology & Relevancy area of the Search Engine Watch Forums.

Related reading

Simple Share Buttons