Calais offers a free web service called OpenCalais, which uses natural language processing (NLP) to analyze a document and automatically find entities within it. The first phase: This is normally marked by experimentation and discourse as...
Different users having different understanding of the relevance of a document (result) User document behavior (time on page/site, scrolling behavior); For Google the search quality team has to balance performance (and speed of delivery) with the...
Our software has to reliably reprocess billions of web documents every day, ensuring that every document gets crawled, joined with the appropriate datasets and then indexed with the correct features. Large scale batch processing using Dryad with a...
Your landing page optimization should be based on a formal written test plan document that defines the specific elements and values to be tested. Processing any new Web-based forms. Today, we'll look at the final three roles: the programmer, system...
If a financial services corporation values content in .pdf form more than content in a word processingdocument, administrators can use Source Biasing to increase the weight of .pdfs in the search results.
The content may be obtained in response to viewing, editing, printing, emailing or other accessing and/or processing of the document. This next document looks instead at making it easier for advertisers to find people willing to help them with...
Detecting so-called "navigation bars" (or "nav bars") in a (Web) document by determining whether or not nodes of a parse tree of the (Web) document are "anchor-heavy". The method comprises obtaining a plurality of documents, and determining a rank...
A method includes finding content-rich text in a document by identifying areas of narrative in the document. The detector detects linguistic parameters which characterize narrative text in an input document and the content-rich text indicator...
A method of integrating a new document into a corpus of documents is also provided. This next document looks at filtering search results based upon assigned categories for each of the results, comparisons of quality between them, and the use of...
It cites a function based analysis like the one described in this patent, and points to a document that explains some of the concepts - Function-Based Object Model Towards Website Adaptation (pdf). While the FOM attempts to understand a website...
In one aspect, a server analyzes content of a document as a function of multiple block importance criteria. Each of the one or more customized documents is generated in a particular format of multiple formats to enhance user interaction with the...
Introduces sliding scales for searchers, to choose which factors are most important to them in results returned by a search engine, including things such as inbound links to a page, readability, differently prioritized keywords, age of document...
In a preferred embodiment, the text searching method in accordance with the present invention augments a conventional text search by using information on document relationships and metadata. The method and system provides a network of document...
Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled...
The document index is partitioned into multiple indexes, including a primary index and a secondary index. Documents in the secondary index are ranked by document number, and don't include relevance attributes.
A computer-implemented method for processing documents in a document database includes generating an initial ranking of retrieved documents using an information retrieval system and based upon a user search query, and processing vocabulary words...
Title: Method and system for calculating document importance using document classifications Abstract: Evaluating an electronic document in connection with a search. The first confidence level indicates a likelihood that the electronic document is...
Organizing and categorizing hypertext document bookmarks by mutual affinity based on predetermined affinity criteria Personalized indexing and searching for information in a distributed data processing system
Automatically initiating an internet-based search from within a displayed document Assignee: IBM Awarded: August 31, 2004 System and method for rapid completion of data processing tasks distributed on a network Assignee: Overture Services Awarded...
This new approach will be similar to Google's PageRank, but rather than using the link structure of the web to determine the importance of the document, searcher behavior will be paramount. Giles says he envisions Furl ultimately becoming a...