The oft rumored G phone is becoming a reality and given the popularity of the Google brand I wonder if the Apple crew are starting to worry. T-Mobile will soon be offer the Android-driven cell phone, according to reports.
The phone is being manufactured by a small Taiwanese-based company, "that makes its own (relatively obscure) brand of handsets, plus house brands for carriers including Vodafone and Telecom New Zealand," the National Business Review reported.
Given Google's popularity, this foray into the mobile space could have implications for the entire mobile industry. As NBR asks is the company "poised to take advantage of Google's recent foray into buying large amounts of wi-fi spectrum – creating a possible future scenario for mobile VoIP calls that cut traditional cellphone service providers out of the picture altogether."
The one plus is the phone is selling for the same price as the new IPhone - usually Google buys into an industry and gives away the services to dominate. If they start reducing pricing the industry may have cause to start worrying.
The other notable part of the new phone is the Google branding given they are not the carrier or the manufacturer - a first in the space. Guess Google sells!
Guess the new Androids dream of world dominance of the mobile industry.
Posted by Frank Watson at 2:06 AM | Permalink | Comments (0)
Seems our nine-year old is having severe growing pains lately. Come on, everyone reading this considers Google part of their lives, so why not a nine-year old child. And fellow relatives, we are all being impacted by the growth spurts lately.
Here are just a few of the things that have been mentioned in the past week or so.
Seems the bulk upload for Adwords is throwing a 502 error - had not heard of a 502 before. Others are seeing a lot of time outs when searching. I have had a lot of problems staying logged in to GMail especially the GChat part. Google Analytics is having timing problems also. Other countries are still dealing with pervasive spam in the search results. The #6, -50, 950 penalties - to name just a few - seem to be a little too aggressive. Long turnaround time for removing submitted URLs. Being overly generous to new sites.
And don't get me started on the WiFi efforts or Docs etc. They just seem to be pushing so hard at the envelope they have forgotten about the letter inside.
But we have to be patient after all Google is only nine years old. And if we are not able to deal now think of what the teenage years are going to bring!
Posted by Frank Watson at 3:05 PM | Permalink
Last week a group of technology companies got together to announce the Climate Savers Computing Initiative. Led by Intel, other participants included: Dell, Microsoft, HP, IBM, and The World Wildlife Fund.
While only tangentially related to search, I think it's great when companies in our space take a direct hand in environmental issues, such as Yahoo! did when it announced its plans to become carbon neutral.
I spoke this past Thursday with John Weisblatt, who heads up the Power and Cooling Group initiative at Dell. He indicated that there were two major components of the initiative:
Dell is showing great leadership in these initiatives as well. Jon Weisblatt indicated to me that Michael Dell committed the company to becoming the "greenest tech company on the planet".
Posted by at 10:03 AM | Permalink
Okay the Quality Score discussions have been busy lately. While we would all like to have a light turned on inside the 'black box' there has been some insight given recently if you have been watching carefully.
Yesterday, Barry Schwartz posted about Google's announcement of more small changes to the algorithm. He noted that while Google claimed not many advertisers would be impacted by the changes, the members at both DigitalPoint and WebMasterWorld seem to differ on that point.
We all love finding our accounts littered with inactivated terms without ant prior notice.
Peter Hershberg posted an interesting piece of informatioin inside his take on QS. In an email back from Google about the impact of recent QS changes he was told this nugget: keywords are not dynamically inserted into your ad text because their corresponding Quality Scores aren't high enough to qualify for keyword insertion.
Not only does it add information about the QS but answers a question people have been asking about problems with keyword inserts.
Here are a few more articles worth a read on this topic.
Amy Konefal Susan Esparza Geordie Carswell Greg Meyers
Posted by Frank Watson at 12:44 PM | Permalink
Google held an in company conference this week, gathering techies from their various departments. They assembled "engineers from Testing, Development, User Experience, and other groups to submit conference sessions: tool presentations, tutorials, workshops, panels, and experience reports", the Google testing blog reported.
Testapalooza was a big sucess, they reported in the blog.
"The idea for Testapalooza came out of discussions about how to build a vibrant testing community here at Google. Many diverse groups work daily on quality-related activities, but each group uses different tools and has different ideas for testing an application, so it can be difficult to find out what others are doing. So we decided to put on a conference!"
"All Testapalooza sessions were video recorded (many were videoconferenced to other offices). We want to publish as many of these videos as possible, and will review them over the coming weeks to publish sessions which did not contain any confidential information. Watch this space for more information on the videos."
Posted by Frank Watson at 9:11 PM | Permalink
At the annual American Association for the Advancement of Science conference Larry Page commented , that the human brain algorithm is a simple one.
Guess Google is really pushing forward with its studies into Artifical Intelligence - so now we have to wonder if the Google future is going to be an "AI" world or a "Terminator" world.
I think it will be time to run to the mountains if they launch the first robot and it looks like California Governor Arnold.....
Imagine a robot armed just with Google tools.... today it would be more than half way to the Terminator model.... we should be keeping an eye on their future company purchases.
Posted by Frank Watson at 12:22 PM | Permalink
Over at WebmasterWorld Tedster references an interesting short paper about creating your own search engine by Googler Anna Lynn Patterson. This document makes for a good read.
The paper was published in April of 2004 when she was a student at Stanford University. She is also the person whose name appears on the recent Google patent application titled Detecting spam documents in a phrase based information retrieval system.
Basically, she breaks it down into hard drive space, having lots of servers, and CPU power. Anna's document is a good initial primer, but there is another aspect of building a search engine that deserves some emphasis.
The search engine companies have built the largest networks of servers the world has ever known. When I think of Google's core technology assets, I don't think about search engine algorithms, I think about massively deployed server networks operating in close harmony.
Posted by at 10:46 AM | Permalink
As Kevin Newcomb mentioned yesterday, Danny Sullivan had an outstanding write-up yesterday about Google's enhancement of its "Link:" operator which allows researchers to discover many of the links that Google has indexed as pointing to a particular URL: Google Releases New Link Reporting Tools.
Google will allow users of its Webmaster Central tools to see more thorough reports of inbound links as measured to domains and even particular pages. More information is also available at the Webmaster Central blog.
This underscores the importance of working with Google by signing up with Webmaster Central. Not only will it help to get important pages of a Web site indexed, but it will also assist webmasters in conducting important competitor analysis. In the past, many researchers have almost completely ignored the Google "Link:" command, or operator, since it is known that Google does not display all of the links it knows about. Others have continued to use it, thinking that the ones that Google shows "must be more valuable" than others.
This has in fact been an often discussed topic at the Search Engine Watch Forums, where a sticky thread discusses the topic of the difference in inbound link reporting at various engines, and reveals that the current link discovery tool of consensus choice is the one found at Yahoo! Site Explorer. Although it is unlikely that those many converts will now abandon Yahoo! to use only the Google enhanced version, this news has made many webmasters and search engine optimization specialists happy.
(Begin editorial) Our engineers at Avenue A | Razorfish love to use the Webmaster Central tools, but, believe it or not, sometimes have problems with getting clients to approve the use, since it requires verification code to be placed on the Web site. Perhaps if Google would be more open about sharing its information without requiring this code, it would get a better reputation with some marketers that feel that they require too much "inside information." Google has done a great job in helping webmasters with their Web sites, but still needs to improve its relationship and willingness to work with agencies and other SEO companies, in the opinion of some. (/end editorial)
Posted by Chris Boggs at 11:13 AM | Permalink
The US Patent Office has issued Google two more patents.
One patent, for a similarity search engine - to check duplicate content - was first filed in 2001.
The other, Google's Digital Mapping Systems patent, describes 'various methods, systems and apparatus for implementing aspects of a digital mapping system'.
The similarity engine has had variations filed by IBM and Hitachi, according to a report by VNUNET.
Posted by Frank Watson at 12:54 PM | Permalink
This weekend The Register published an article named Google developing eavesdropping software. The article describes how Google uses existing PC microphones fingerprinting technology to show relevant ads that appeal more to you. The article goes on to explain how the sound fingerprinting works; it "breaks sound into a five-second snippets to pick out audio from a TV, reducing the snippet to a digital "fingerprint", which it matches on an internet server." Privacy folks are worried about the repercussions of such software.
Postscript Barry: I should link to Google Paper Explains Listening To Your TV Can Help It Put Ads & Info On Your Computer we covered back in Jun. 9, 2006.
Posted by Barry Schwartz at 10:50 AM | Permalink
A New York Times article has a detailed analysis of Google's infrastructure and discussion with Urs Hölzle, senior vice president for operations at Google. Here are some of the key points I pulled from that article.
+ Google tends builds from ground up versus buying. + Google's computing costs are half those of other large Internet companies and a tenth those of traditional corporate technology users. + Critics call Google's philosophy "unnecessary and inefficient." + "Google is reducing cost while maintaining performance by shifting the burden of reliability from hardware to software individual hardware components can fail, but software automatically shifts the local task and the data to other machines." + Google is among Advanced Micro's five largest clients.
Posted by Barry Schwartz at 9:51 AM | Permalink
Google patents the Google File System, Microsoft claims a Functional Object Model for mobile devices, and Yahoo! (Overture) describes an autonotification process to inform advertisers of when a certain condition has been met concerning one of their ads.
The authors of a paper on the Google File System (pdf) are listed as the inventors of this patent filing. Another similarity between the two documents is that both cite mostly the same reference documents. The patent and paper appear to cover much of the same ground. This looks like the patent for the Google File System.
Leasing scheme for data-modifying operations Invented by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Assigned to Google US Patent 7,065,618 Granted on June 20, 2006 Filed on June 30, 2003
Abstract
A system may facilitate performance of a data-modifying operation in a file network that includes multiple servers that store replicas of data. One of the servers may serve as a primary replica for one of the replicas of data and at least one other one of the servers may serve as at least one secondary replica for the replica of data. The system may send data associated with the data-modifying operation to the primary replica and the at least one secondary replica based on a network topology and independently send a data-modifying control signal that requests execution of the data-modifying operation using the data associated with the data-modifying operation to the primary replica and the at least one secondary replica.Microsoft
When presenting a web page on a mobile device, it's sometimes best not to display the whole page. But trying to decide which parts to show, and which not to display can be difficult. More information is sometimes needed about the web page.
Microsoft has been experimenting with ways to identify what different parts of a web page do based upon the layout and functions of parts of pages, and a paper from Microsoft that has seen some popularity recently on this type of analysis has been one on Block-level Link Analysis (pdf).
It wasn't a surprise to see Wei-Ying Ma's name on this patent application, as one of the authors of that paper, and an earlier paper on VIPS: a Vision-based Page Segmentation Algorithm.
Another Wei-Ying Ma paper on that topic is Efficient Browsing of Web Search Results on Mobile Devices Based on Block Importance Model (pdf). It cites a function based analysis like the one described in this patent, and points to a document that explains some of the concepts - Function-Based Object Model Towards Website Adaptation (pdf). The other inventor listed in this patent, Jin-Lin Chen, is one of the authors of that paper. Taking a look at those papers may make understanding this patent easier.
Segmenting and indexing web pages using function-based object models Invented by Jin-Lin Chen and Wei-Ying Ma Assigned to Microsoft US Patent 7,065,707 Granted on June 20, 2006 Filed on June 24, 2002
Abstract
By understanding a website author's intention through an analysis of the function of a website, website content can be adapted for presentation or rendering in a manner that more closely appreciates and respects the function behind the website. A website's function is analyzed so that its content can be adapted to different client environments. A function-based object model (FOM) identifies objects associated with a website, and analyzes those objects in terms of their functions. Desktop oriented websites are adapted for mobile devices based on the FOM and on a mobile control intermediary language. While the FOM attempts to understand a website author's intention based on functional analysis of web content, the mobile control intermediary language enables the author to create web content that can be presented in various mobile devices by processing the objects, by extracting forms from the objects, and by generating a file in the mobile control intermediary language for each form.Yahoo
This patent describes an autonotification system, enabling automated messages to be sent to an advertiser regarding their paid search listings when certain pre-defined conditions are met. Here are the areas those conditions listed in the patent encompass:
Automatic advertiser notification for a system for providing place and price protection in a search result list generated by a computer network search engine Invented by Narinder Pal Singh, Scott W. Snell, Douglas T. Huffman, Darren J. Davis, Thomas A. Soulanille, and Dominic Dough-Ming Cheung. Assigned to Overture Services, Inc. US Patent 7,065,500 Granted on June 20, 2006 Filed on September 26, 2001
Abstract
A notification method in a computer database system includes receiving a notification instruction from an owner associated with a search listing stored in the computer database system, monitoring conditions specified by the notification instruction for the search listing, and sending a notification to the owner upon detection of a changed condition of the search listing.My usual reminder about patents: Some of the processes and technology described in patents are created in house, and some are developed with the assistance of contractors and partners. A percentage are never developed in a tangible manner, but may serve as a way to attempt to exclude others from using the technology, or even to possibly mislead competitors into exploring an area that they might not have an interest in (sometimes skepticism is good.)
There are times when a Google or Yahoo acquires a company to gain access to the intellectual property of that company, or the intellectual prowess and expertise of that company's employees. And sometimes patents are just purchased.
Want to comment or discuss? Visit our Search Technology & Relevancy area of the Search Engine Watch Forums.
Posted by Bill Slawski at 3:41 AM | Permalink
Four patent applications from Google describe fighting spam in emails, providing product review searches, moving large amounts of data, and autolinking. Yahoo matches, and raises with five patent filings. One on watching deletions to choose better ads, another on serving dynamic information through a additional browser interface, and three more on multimedia and RSS.
Microsoft goes TV 2.0 with an electronic program guide, and describes a way of matching advertising content with certain search queries before those searches are made. IBM comes up with a unique way of presenting the results of a search from more than one search engine, and a way of reducing the amount of irrelevant results in a search by analyzing an initial set of results, identifying an appropriate additional query term from those results, and searching the original results again but with the additional query term included in the search.
Go Daddy describes a way of fighting spam in emails. Xerox employs collaborative filtering from previous users' searches to predict search results. Apostolos Gerasoulis, from Ask.com, with a couple of co-inventors, ranks and displays pages (objects) based upon linkage and textual data, and then defines a way to identifiy and assign topics to them.
Email Spam
Emails with links in them could be considered spam if the links point to pages that are in a conceptual category considered spammy. This patent application really doesn't describe the concept categorization part of the process. That's done in a related patent application mentioned within this document, and the related document lists Georges Harik as one inventor. Dr. Harik's name is on a very large percentage of the patent applications involving Gmail-type processes.
Method and system to detect e-mail spam using concept categorization of linked content Invented by Johnny Chen US Patent Application 20060122957 Published June 8, 2006 Filed December 3, 2004
Abstract
A system and method for detecting undesired electronic messages (e.g., spam) using concept categorization of hyperlinks is disclosed. A server receives an electronic message and retrieves web pages that correspond to hyperlinks in the message. The server performs concept categorization on the retrieved web pages based on semantic relationships in the received information to determine whether the electronic message meets predefined criteria associated with undesired messages.Searching and Aggregating Product Reviews
If Google wanted to get into the product or services review business, the next patent filing describes a blue print for the process that might make an effective and innovative system.
Method and system for finding and aggregating reviews for a product Invented by Jan Matthias Ruhl and Mayur D. Datar US Patent Application 20060129446 Published June 15, 2006 Filed December 14, 2004
Abstract
The embodiments disclosed herein include new, more efficient ways to collect product reviews from the Internet, aggregate reviews for the same product, and provide an aggregated review to end users in a searchable format. One aspect of the invention is a graphical user interface on a computer that includes a plurality of portions of reviews for a product and a search input area for entering search terms to search for reviews of the product that contain the search terms.Scaling and Distributing Data
Arvind Jain is the head of Research and Development in Google's Bangalore office, and has spoken at a number of conferences on infrastructure projects and issues involving such things as Google's crawl and indexing system, distributed file replication system, and compression techniques for large scale storage systems. He's listed as the inventor for this next Google filing.
System and method for scalable data distribution Invented by Arvind Jain US Patent Application 20060126201 Published June 15, 2006 Filed December 10, 2004
Abstract
A system having a resource manager, a plurality of masters, and a plurality of slaves, interconnected by a communications network. To distribute data, a master determined that a destination slave of the plurality slaves requires data. The master then generates a list of slaves from which to transfer the data to the destination slave. The master transmits the list to the resource manager. The resource manager is configured to select a source slave from the list based on available system resources. Once a source is selected by the resource manager, the master receives an instruction from the resource manager to initiate a transfer of the data from the source slave to the destination slave. The master then transmits an instruction to commence the transfer.Autolinking
Google's Autolink raised a lot of eyebrows, and brought some negative reactions. A Search Engine Watch Blog post from Danny Sullivan, Google Toolbar's AutoLink & The Need For Opt-Out defined many of the issues around the toolbar feature. The following patent application explains how such a system might work from the search engine's perspective.
Providing useful information associated with an item in a document Invented by Gueorgui Djabarov US Patent Application 20060129910 Published June 15, 2006 Filed December 14, 2004
Abstract
A method includes recognizing an item within a first document based on a pattern associated with the item but not the exact content of the item. The method further includes identifying a link for the item and providing a second document that includes information associated with the item when the link for the item is selected.Yahoo
Choosing Better Ads through User Behavior
Some queries involve the use of concepts and units, as described in at least five Yahoo patent filings (see previous patent posts in the Yahoo sections from Yahoo Units and Microsoft Redundancy Filters and More Yahoo Concepts and Google Predictive Searches.)
But sometimes a two term query isn't a concept as much as it is a couple of keywords that someone may use to search for something. If that person performs a second search after deleting one of the words, then the record of that deletion and second search might help Yahoo calculate "deletion probability scores" for words being used in these kind of two term queries.
This can be helpful when there isn't a good keyword based advertising match for that query, but there might be a good match individually for each of the terms that make up the query. The "deletion probability scores" can help determine which of the two terms to show keyword-based advertising for in search results.
System and methods for ranking the relative value of terms in a multi-term search query using deletion prediction Invented by Rosemary Jones and Daniel C. Fain US Patent Application 20060129534 Published June 15, 2006 Filed December 14, 2004
Abstract
The likely relevance of each term of a search-engine query of two or more terms is determined by their deletion probability scores. If the deletion probability scores are significantly different, the deletion probability score can be used to return targeted ads related to the more relevant term or terms along with the search results. Deletion probability scores are determined by first gathering historical records of search queries of two or more terms in which a subsequent query was submitted by the same user after one or more of the terms had been deleted. The deletion probability score for a particular term of a search query is calculated as the ratio of the number of times that particular term was itself deleted prior to a subsequent search by the same user divided by the number of times there were subsequent search queries by the same user in which any term or terms including that given term was deleted by the same user prior to the subsequent search. Terms are not limited to individual alphabetic words.Browser Interface Helpers
This next document describes some ways to provide additional dynamic information to someone via a toolbar styled interface, while they are browsing pages on the web.
Method of controlling an Internet browser interface and a controllable browser interface Invented by Thomas J. Shafron Assigned to Yahoo US Patent Application 20060129937 Published June 15, 2006 Filed February 2, 2006
Abstract
The present invention is directed to a method of dynamically controlling and displaying an Internet browser interface, and to a dynamically controllable Internet browser interface. In accordance with the present invention, a browser interface may be customized using a controlling software program that may be provided by an Internet content provider, an ISP, or that may reside on an Internet user's computer. The controlling software program enables the Internet user, the content provider, or the ISP to customize and control the information and/or functionality of a user's browser and browser interface.RSS Enhancements
The following three Yahoo filings all list the same inventors, including John Thrall who is the head of media search engineering, for Yahoo Search. They provide different aspects of using RSS with multimedia files.
Syndicating multiple media objects with RSS Invented by Andrew R. Volk, David D. Hall, and John J. Thrall US Patent Application 20060129917 Published June 15, 2006 Filed December 1, 2005
Abstract
System and method for syndicating more than one media object in an element using Real Simple Syndication (RSS). In one embodiment, multiple media objects with at least one shared characteristic are syndicated under the same element. For example, a single media object can come in multiple formats and/or compression rates.Syndicating multimedia information with RSS Invented by Andrew R. Volk, David D. Hall, John J. Thrall US Patent Application 20060129907 Published June 15, 2006 Filed December 1, 2005
Abstract
System and method for adding descriptive information to a Real Simple Syndication (RSS) document. The descriptive information describes the content of media objects syndicated through the document. The descriptive information can be used to provided additional information to a subscriber, and can be used in searching for syndicated media content.RSS rendering via a media player Invented by Andrew R. Volk, David D. Hall, John J. Thrall US Patent Application 20060129916 Published June 15, 2006 Filed December 1, 2005
Abstract
System and method for syndicating media objects through a link to a media player using Real Simple Syndication (RSS). A content provider may not want to give direct access to a media object to a subscriber. Instead a content provider can give the subscriber a link to a media player that can access the media object.Microsoft
Searching electronic program guide data Invented by Pradhan S. Rao, David Hendler Sloo, Daniel Danker, and George K. Nyako Assigned to Microsoft US Patent Application 20060130098 Published June 15, 2006 Filed December 15, 2004
Abstract
Searching electronic program guide (EPG) data is described. The EPG data may be compartmentalized into channel metadata that describes characteristics of one or more channels and content metadata that describes characteristics of one or more content items. In a implementation, a method includes searching channel metadata and content metadata. A result of the searching is formed for output in conjunction with an electronic program guide (EPG).System and method for indexing and prefiltering Invented by Brian Burdick, Joshua J. Forman, Kevin P. Kornelson, Murali Vajjiravel, and Rajeev Prasad Assigned to Microsoft US Patent Application 20060129555 Published June 15, 2006 Filed December 9, 2004
Abstract
A method and system are provided for selecting advertisements for presentation to a user in response to a user search query. The system may include a keyword server for parsing the user search query and an index server for receiving the parsed search query. The index server may include an index of advertising phrases and pre-filtering components for comparing index entries to the parsed user search query in order to discard non-matching index entries and locate matching entries. The pre-filtering components may include either a phrase length pre-filtering component or a word hash pre-filtering component. The system may additionally include a listing server for sorting through the matching entries located by the index server and further filtering the matching entries for retrieval and presentation to the user.IBM
Ring method, apparatus, and computer program product for managing federated search results in a heterogeneous environment Invented by Wade Shelby Beavers and David Joseph Borrillo Assigned to IBM US Patent Application 20060129530 Published June 15, 2006 Filed December 9, 2004
Abstract
A method, apparatus and computer program product are provided for managing federated search results in a heterogeneous environment. A user enters a search term and the search term is submitted to multiple selected search engines. Search results are gathered from each selected search engine. A search ring is generated including a ring section to represent each of the selected search engines for enabling the user to view search results from one or more of the selected search engines.Method and system for suggesting search engine keywords Invented by Cary Lee Bates Assigned to IBM US Patent Application 20060129531 Published June 15, 2006 Filed December 9, 2004
Abstract
A search engine receives a search query having one or more keywords. The documents in the result set from that search query are analyzed to identify one or more additional keywords that further segment, or separate, the initial result set. These additional keywords are presented to the user who then selects whether to include or exclude documents matching the additional keywords. In this way, the number of documents in the initial result set is reduced in a relatively quick and effortless manner.Go Daddy
Email filtering system and method Invented by Brad Owen and Jason Steiner US Patent Application 20060129644 Published June 15, 2006 Filed December 14, 2004
Abstract
Systems and methods of the present invention allow filtering out spam and phishing email messages based on the links embedded into the email messages. In a preferred embodiment, an Email Filter extracts links from the email message and obtains desirability values for the links. The Email Filter may route the email message based on desirability values. Such routing includes delivering the email message to a Recipient, delivering the message to a Quarantine Mailbox, or deleting the message.Xerox
Personalized web search method Invented by Lisa S. Purvis Assigned to Xerox Corporation US Patent Application 20060129533 Published June 15, 2006 Filed December 15, 2004
Abstract
A method for contextualizing search results is disclosed. The method includes performing a traditional web query that returns a set of result pages, using collaborative filtering techniques to generate a set of predicted pages, comparing the set of predicted pages with the set of result pages, and ranking the set of result pages so that result pages that are also included in the set of predicted pages are ranked higher than those that are not. Methods herein also contemplate using the search history of the user or others to refine the results of searches.Ask.com
Relevancy-based database retrieval and display techniques Invented by Tao Yang, Wei Wang, and Apostolos Gerasoulis US Patent Application 20060129552 Published June 15, 2006 Filed February 2, 2006
Abstract
Techniques to retrieve, rank and display data objects retrieved form a database are described. In particular, methods to assign a global ranking value to a data object based on a combination of that object's link-based (e.g., vector-space cluster analysis) and text-based (e.g., word frequency) ranks are described. Additional techniques to determine a set of concepts, topics or key words associated with each retrieved data objects are described.My usual reminder about patents: Some of the processes and technology described in patents are created in house, and some are developed with the assistance of contractors and partners. A percentage are never developed in a tangible manner, but may serve as a way to attempt to exclude others from using the technology, or even to possibly mislead competitors into exploring an area that they might not have an interest in (sometimes skepticism is good.)
There are times when a Google or Yahoo acquires a company to gain access to the intellectual property of that company, or the intellectual prowess and expertise of that company's employees. And sometimes patents are just purchased.
Want to comment or discuss? Visit our Search Technology & Relevancy area of the Search Engine Watch Forums.
Posted by Bill Slawski at 8:42 PM | Permalink
There are many people discussing a recent patent Google was awarded for picking up on ambient audio from your TV and pairing those sounds to your computer to serve up ads based on what you are watching (or something like that). Google Research Scientists, Michele Covell & Shumeet Baluja, described the technology as;
We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).There are two additional articles that have good coverage of this, that I am aware of. The first is at Small Biz Pipeline and the second is at TechCrunch. I particularly like how TechCrunch pulled out the four main points of the paper, as such;
+ Personalized information layers Here?s what Tom Cruise is wearing in the show you are watching and here's where you can buy the same clothes in your zip code. + Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well. + Real-time popularity ratings Nielsen requires hardware and the results aren't available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too. + TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.Posted by Barry Schwartz at 8:43 AM | Permalink
The Register reports that Google is "choking on web spam" ever since the roll out of The Big Daddy Infrastructure. The article highlights a mention from Google CEO Eric Schmidt from last month talking about Google having a storage "crisis." From that New York Times article:
Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis."This week, problems have gotten worse, webmasters all over the forums are reporting sever issues with pages dropping in and out of the index, pages not being crawled, old cached pages, dead (404) pages being returned by Google and outright irrelevant results. This morning I posted at the Search Engine Roundtable have a nice roundup of forum threads that are discussing the most recent Google issues with indexing pages. We have been tracking Big Daddy issues for too long here, for our last report at SEW see here.
Want to comment or discuss? Visit our thread named BigDaddy, Missing Pages In Google & Is The Big G Out Of Space at our SEW Forums.
Posted by Barry Schwartz at 8:34 AM | Permalink
Back in September, SEW Forums moderator Edel "Orion" Garcia posted a thread about a new search technology under development. It was coincidentally called the "Orion Search Engine" but not connected with our moderator. Instead, it was developed by a university student who now, according to news reports out this weekend, works for Google. Google's also acquired his search technology.
How great this search engine was is impossible to say. The press release that inventor Ori Allon put out last September was full of excitement, but so are plenty of releases trying to attract the attention of investors and the media. The search engine itself was never available for the public to use.
It sounds like Allon mainly developed an algorithm useful in pulling out better summaries of web pages. In other words, if you did a search, you'd be likely to get back extracted sections of pages most relevant to your query. From the release:
The results to the query are displayed immediately in the form of expanded text extracts, giving you the relevant information without having to go the website.
Such extraction could work well with moves by Google to expand direct answers that it offers, something all search engines are doing. Of course, the more Google and other search engines extract heavily from web pages without sending them actual traffic, the more likely they'll come under legal pressures of stepping over the fair use line.
Via Threadwatch, Google buys search algorithm invented by Israeli student from Haaretz has more details on Google getting the rights to the Orion algorithm and confirmation that Allon now works for Google. His university says that Yahoo and Microsoft were also in negotiations for the technology.
Google wins rights to Aussie algorithm from The Age reports that Allon's been with Google for about six weeks. However, Microsoft chairman Bill Gates never commented on the technology, to my knowledge. The Age just seems confused that Allon's press release mentioned public comments by Gates that there's room for improvement generally in search.
Google does deal for Aussie program from the Daily Telegraph pitches that the technology will revolutionize the way we search. Ho hum. Reality check, OK? When Google acquired the three people from Kaltix along with their search technology back in 2003, it hardly created a revolutionary change for us soon after.
By revolutionary, I mean a radical shake-up of how we search or a major leap-frogging past other players. That didn't happen post-Kaltix. We did indeed see better personalized search come from Google, what I find one of its most impressive features. But that's an evolutionary change. It works on top of other things Google has built. It doesn't overturn and throw out the base technology.
So my reality check alarm is mainly for anyone who thinks Google's going to suddenly change because Allon and this extraction algorithm are now at Google. He gives Google another good employee, and the technology will probably give Google another evolutionary change that may improve things over time, rather than instanty.
Want to comment or discuss? Visit our Search Engine Watch Forums thread, The Orion Search Engine.
Posted by Danny Sullivan at 7:56 AM | Permalink
The LA Times reports that Google will be switching to Advanced Micro Devices (AMD) processors from Intel processors. They will be using Opteron processors in all new servers, and Google buys lots of new servers often. Because of this report, shares of Advanced Micro rose $1.40 to $40.07 that day. I guess AMD caught the Google wave.
Posted by Barry Schwartz at 8:39 AM | Permalink
The second issue of Google's Newsletter for Librarians is now available. It features an article by Karen Schneider, the director of the Librarians' Internet Index, the wonderful and important searchable directory of high quality web resources that I've mentioned on the blog and in SearchDay many times.
Schneider focuses on the some of the critical information judgments needed in determining the trustworthiness of a site and the info that it contains. Those of us who attended library school are aware of many of these concepts. I hope Karen's article reaches more than information professionals including students where these ideas should be taught and reinforced from the earliest grades forward.
Next, Matt "Jagger" Cutts is back with a look at how Google determines what sites are "most trusted." His article talks about the 100's of factors (including some traditional info retrieval metrics) that Google looks at in addition to PageRank.
For more of an in-depth discussion of this you might want to pick up a copy of Chris Sherman's (yes SearchDay's Chris Sherman) book, Google Power. You can preview the title via Amazon's Search Inside the Book. I was unable to find it using Google Book Search.
Remembering that Matt's article was written primarily for librarians and other information professionals, he explains that Google, like other engines analyzes the actual content.
He points out that, "this [analysis] goes beyond scanning page-based text, which webmasters can easily manipulate through meta-tags."
While it's true that Google and other engines look to some degree at the meta-description tag, he doesn't mention that although the meta-keyword tag is still used by some, it's value is not as great as it once was. Danny points this fact out in a 2002 article. You'll also meta tags listed in this post from Barry.
Cutts goes on to write: We also look at factors like fonts and the placement of words on a page. And we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted and will be relevant to users.
It would have been useful, particularly to the readers of this article, if Matt would have explained that the factors listed above and many others can also be manipulated or what others have termed "gamed."
As I've pointed out in many presentations to librarian, this is not a good or bad thing but simply the way large general-purpose web enginrs work. For the librarian, a knowledge and understanding of this is important and useful.
After reading both Karen's article and Matt's piece we see somewhat of a disconnect between trustworthiness in terms of inclusion and good placement on a results page versus the trustworthiness concepts that a human might use to judge not only the quality of a web page itself but the data it contains. Yes, I'll readily admit to being a bit prejudice here but I think Karen's article also illustrates the value of just one of the many skills well-trained librarian can offer.
Matt concludes with links to a few more excellent papers.
Btw, many of the same concepts (what Google calls and has patented as PageRank) are in place at just about every other major web engine. In other places, the concept is referred to as link analysis.
As a librarian I would have loved if Matt would have thrown a "shout out" to Dr. Eugene Garfield, the father of citation analysis. It has has been around since the 1950's and librarians have been using it since day one. The relationship between citation analysis (something librarians understand) and link analysis (PageRank) is strong and are even noted in Brin and Page's seminal paper. One of the biggest differences is that web link analysis is much more open than traditional citaton analysis and thereby harder to game (although to some degree) it's also possible.
Yes, the concepts used in citation analysis are really what drive link analysis.
If you want to learn more, this post has tons of links and interviews about citation analysis. It also includes a link to Garfield's paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas."
Finally, although this Scientific American article was written in 1999, I still think it's one of the best, especially for non-geeks, about web link analysis. It was written by members of IBM's Clever team.
Clever was web search engine (never publicly released) by IBM. More about it here. Members of the Clever team read like a "who's who" of web search including Jon Kleinberg, Soumen Chakrabarti, and Prabhakar Raghavan who is now the head of Yahoo Research
As you review the article, take special note of the section where Clever and Google are compared. While Clever never made a public appearance, many of the concepts it offers are what power the Teoma/Ask Jeeves search technology.
Postscript: Yahoo's Prabhakar Raghavan offers archived materials from his Stanford classes on text and information retrieval classes online. Must have content for those interested in the subject.
Posted by Gary Price at 11:58 AM | Permalink
Washington Post writer and the author of The Google Story, David Vise, chatted on WashingtonPost.com today about Google's most recent acquisition of radio ad firm DMarc (good background on how the technolgy works) and many other topics from where to send your ideas for new Google services to wheather or not a Google Calendar on it's way. The transcript of Vise's chat is available here.
Posted by Gary Price at 5:38 PM | Permalink
It's patent application time! Search Engine Roundtable points to a just published patent application (not an awarded patent) from Google (congrats to Matt Cutts who is listed as a co-inventor) that's titled: Information retrieval based on historical data.
From the abstract: A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data.
Barry (aka RustyBrick) also points out that the app includes a brief discussion and definition of "link churn."
Link churn is "computed as a function of an extent to which one or more links provided by the document changes over time."The patent application also notes that Google MAT penalize the web page owner for link churn above a certain threshold. Note the exact wording in claims 60-63.
Of course, this is just patent app that was filed in December 2003 and does not guarantee that Google is using, will use, or has used any of these techniques. Nevertheless, good discussion material.
Posted by Gary Price at 11:41 AM | Permalink
What might this holidays season's most popular item and topic be for grown-ups be? A satellite radio? A new car with GPS? A trip to St. Barts? Nope, it just might be the fear of Google and what to do about it. I guess a company that does "no evil" and fear of that same "no evil" company are not the same thing. (-;
Reuters (via News.com) has a lengthy look at how another group, in this case Madison Avenue advertisers, fear Google in the aritcle: Madison Avenue faces Google fears.
We're reading article like this on a very regular basis these days. Last week, we posted: + Who's Afraid of Google? Everyone from Wired. + News.com's: Google--what you get for $400, a share that offered a chart of who Google competes with in various areas. + About a month ago, we blogged: NYT On Google As Threat To Other Businesses
Today's Reuters article includes the following takeaways:
On Google Analytics: "There is an inherent conflict of interest there," said Brian McAndrews, chief executive of aQuantive, a company that is both a big buyer and reseller of Google advertising but also a rival supplier of ad measurement tools. "Am I going to use Google to measure my search results on Microsoft and Yahoo? Am I going to use Google to measure my advertising results on ESPN?" McAndrews asked rhetorically during the Reuters Media and Advertising Summit on Thursday.
Lauren Rich Fine from Merrill Lynch recently told clients. "However, Google is starting to attract negative publicity (tied to) its foray into other mediums but from a consumer perspective it's still "all good."Btw, Battelle recently had an excellent post (with lots of comments) on what he called Google's "tipping point."
In my most recent round of conference presentations to search consumers, I've started to notice more interest in what other search companies are doing and how to use these tools. That said, Google is still number one.
On Charging Marketing Firms [David] Verklin, [chief executive of Carat America] (owners of IProspect) complains Google has begun charging marketing firms like his own $50,000 a month to use Google's ad buying system. He adds, ""We're going to try and convince (Google) we think that's a bad idea," Verklin said. "I don't want to have to use one tool to manage Google and my own tool to manage Yahoo and Ask Jeeves and everyone else," he said of conflicts between ad systems."
What does Google have to say? "There's this notion that Google has a grand master devious plan" to put ad agencies and publishers out of business, [Marc] Leibowitz [Google's director of strategic partnerships] said. "Nothing could be further from the truth. We see ourselves in a symbiotic relationship with them."
Posted by Gary Price at 5:55 PM | Permalink
Yes, it's time for another Google TV job posting!
This time for engineers. The title of the job is: Software Engineer, Television Technology - Mountain View and Google is looking for, "well-rounded software engineers with a proven track record in creating and deploying robust high-volume interactive TV applications and services."
In September, Danny blogged about a posting discovered by Adam Lasnik to be a product manager for Google TV. A couple of days later, I went looking for the posting and it was gone, never to be seen again.
Now, over to Yahoo. How's this for a job title: + Search Relevance and Monetization Researcher As a Search Relevance and Monetization Researcher, you will help to improve the relevance and revenue of our Web search and sponsored search products.
Posted by Gary Price at 5:25 PM | Permalink
Perhaps Google's most famous new hire, Vice President and Chief Internet Evangelist, Vint Cerf, has sat down with Juan Carlos Perez of the IDG News Service for a brief Q&A interview that's posted here. Here are a few selected passages from the interview.
+ On Google loosing its focus as new services are added. Cerf says: Absolutely not. What's happening here is the aggregation of a remarkable collection of people, all of whom have a very visceral and strong appreciation for what is possible to do with software and information. And they are exploring a variety of ways in which to make these computer-driven tools more useful and also more cross functional. The focus isn't simply on search. The focus is on making information discoverable and useful, so all of these things you see happening at Google are side effects of expanding on the original paradigm, which was making search an effective tool. Now we're looking at how to make other information activities more effective and relevant.
+ On Mashups Cerf tells IDG: I can't tell you how excited I am about it. We know we don't have a corner on creativity. There are creative people all around the world, hundreds of millions of them, and they are going to think of things to do with our basic platform that we didn't think of. So the mashup stuff is a wonderful way of allowing people to find new ways of applying the basic infrastructures we're propagating. This will turn out to be a major source of ideas for applying Google-based technology to a variety of applications.
+ On Competition One way to get ahead is to stay ahead, and Google is working very hard to make sure it explores as many new ideas as it can. You won't find Google resting on any of its laurels and letting the grass grow.
+ On Google Book Search/Google Library On the Google [Book Search controversy], I don't think we explained as carefully as we should have how this was going to work and how we would protect the interest of the publishers. And the publishers have leapt to a conclusion which is not supported by what we're trying to do. Part of my job is to articulate that more carefully and I hope we can overcome the concerns that have been expressed.
Posted by Gary Price at 6:04 PM | Permalink
Google-Mart from PBS's Robert X. Cringely covers Google apparently having developed incredibly compact data centers that can fit into a shipping container, suitable for perhaps dropping at internet nexus points. Purpose? Cringely speculates that high speed data centers plus its own bandwidth means Google will effectively have its own internet, giving it a competitive advantage that others can't meet.
Posted by Danny Sullivan at 9:51 AM | Permalink
Like he does at many of the Search Engine Strategies conferences, RustyBrick (Barry Schwartz) is blogging from the WebmasterWorld PubCon currently underway in Las Vegas. This post: Coffee Talk with Senior Google Engineer: Matt Cutts, offers a great Q&A style review (not an official transcript) of today's hour long session. Kudos to Barry for making it available.
Here are two of the more interesting Q&A's from the audience:
Q: CSS positioning? How does it affect ranking. A: Good question, I don't know. If your doing an include, it probably wont matter either way. In his mind, positioning text at top or bottom, is over rated. But try it.
Q: Google Analytics, can you confirm that Google will be using that data in the search engine? A: He cant confirm, but he can deny it. :) Matt as a Web spam team member, does not have access to this data. He wont even ask for it. If it becomes a concern, he will post it on his blog. People will always be concerned, so don't use it.
Postscript: Aaron Wall was also at the Matt Cutts session and he shares his overview here.
Posted by Gary Price at 4:52 PM | Permalink
News.com has a nice mention of long-time search watcher Stephen Arnold having compiled more than 120 patents he believes belong to Google on a CD. Want to get them in one go? Visit his site, pay your $50, and there you go. Gary, of course, regularly posts here about patents and links to where you can download them for free (use that Legal: Patents link below this post if you are an SEW member for a fast way to see his past posts). But if you want to save yourself some time and love reading patents, this looks like an easy way to go.
Posted by Danny Sullivan at 8:26 AM | Permalink
A news release from Google (they never slow down) announces that the folks in Mountain View are getting together with the two universities in Oregon to support open source technology developement. Google is making a $350,000 contribution, "to a joint open source technology initiative of Oregon State University and Portland State University.
From the announcement: With the grant, the universities will collaborate to encourage open source software and hardware development, develop academic curricula and provide computing infrastructure to open source projects worldwide.
Posted by Gary Price at 2:53 PM | Permalink
Google's top exec team is hitting the road. At Web 2.0, Google's, Omid Kordestani, chatted with JB. Today, Google CEO, Eric Schmidt, spoke at the Association of National Advertisers Annual Conference in Phoenix. His presentation was titled, "Technology Is Making Marketing Accountable." News.com's Elinor Mills was there and reports in: Google ETA? 300 years to index the world's info.
If this 300 year until Google has it "all done" number sounds familiar, it is. Schmidt said the same thing, sort of, back in June when he spoke to Delaney and Barnes at The Wall Street Journal (see Battelle's summary here). News.com even posted on the 300 year comment back then.
I said "sort of" a moment ago because today's comment is slightly different from what Schmidt said June although I think he's talking about the same thing. In June, Schmidt was quoted saying it would take 300 years to "organize" all of the worlds information. Today, he said it will take 300 years to "index" all of the worlds info and make it searchable. In othe words, that's when Google's mission will be reached. The 300 year number was obtained after Googler's did a math exercise.
Well, unless Google is working on some type of human longetivity program (who knows, maybe they are, how about some sort of vitamin called "GoogleNim Chewables" featuring all of your favorite Google personalities), none of us will be around to see Google's mission accomplised. Nevertheless, this 300 year estimate makes for interesting talk and keeps Google's name in the press (what else is new). Maybe Bill Gates will soon say it will take MSNjust 200 years to get it all done. Now, there is a feud for you. (-:
Indexing is Not Always The Same as Organizing In my opinion, there is a big difference between indexing content (putting it into a database) and then organizing/providing access to it and the allowing people to find what they're looking for without much effort. As the database grows larger the organization becomes even more important.
Once the content is identified, licensed (if needed), made digital (if needed), the actually indexing is relatively easy especially when compared to the organizational part. Access and organization are two different things.
Even today, it's possible to be indexed by a large web engine but that doesn't mean your site(s) will be easily found (especially since most people only look at the first few results) and every site can't be on the first page. Of course, being able to judge the quality, currency, accuracy of the info (criticial info skills) are more important than ever before but that's another story for another time.
What About 50 Years? On a related note, in Chris "SearchDay" Sherman's new book Google Power, Sherman tells the story of a conversation he had with Craig Silverstein, Google's CTO, where Silverstein estimated it would take Google 50 years to completely crack the invisible/deep web problem of pulling useful data of out large/specialty databases.
Of course, 50 years is also a long time (I just turned 40) but I'll again say that just having data in a large, often uncontrolled database, doesn't mean people will find it. I think these are some of the reasons that so much money and effort is being spent on developing verticals (aka specialized databases) both from the large search companies as well as many smaller players. Smaller, focused databases, can also help contribute to a perfect search.
More From Mr. Schmidt So what else did E.S. have to say today? Here are a few highlights.
+ When he [Schmidt] arrived at Google four years ago he was skeptical consumer of text ads. "You've got to be kidding! People actually click on this stuff? And they do."+ Technology and the interactivity it enables, such as the ability to measure an Internet ad's success rate by viewing how many people click on it, is shifting power in the advertising industry from executives at corporations to consumers, he said. "The power is moving from us to the end user; it's occurring by the power of the personal computer, by the power of the cell phone," he said. "Thirty years ago we would make the decision (about ads). Now, that person, that individual makes that decision.
+ Schmidt predicted there will always be ads on the Internet but that there may be an "ad-free subset" of the Internet that might offer a different way for people to pay for things, such as using micro-payments.
Note: Does "Google Wallet" and online payment systems sound familiar? Btw, I just blogged this week that the company is recruiting for Inside Sales people for Google Payment Systems.
+ On Google Wi-Fi in San Francisco Schmidt mentioned that the plan arose out of work that several engineers did on a system that would allow companies to make money offering such a service. "It's an interesting experiment," he said. "If it scales and if it is successful, we think it's going to be very good for the world.
+ On U.S. Copyright and Google Library Project A "fair use" provision under the law allows for excerpts of copyrighted material to be used and Google will only display snippets of copyrighted text, he said. "That model seems to be durable," he said. "We're very, very careful if copyright is owned..."
I do my best to explain the basic differences between Google Print for Publishers and Google Library in this post.
Much more in the Elinor Mills article: Google ETA? 300 years to index the world's info.
Postscript: If you're interested in learning about just how much info is out there, make sure to take a look at the How Much Info? 2003 research project from UC Berkeley.
Postscript 2: Credit to Google Blogoscoped's Philipp Lenssen for a comment he sent to me via email. Lenssen said that it's likely that Google will release their full product in 100 years but keep it in beta for 200 years. This comment has had me smiling and laughing all afternoon because PL (as always) is right on the money.
Posted by Gary Price at 7:39 PM | Permalink
Spotted on Searchblog and elsewhere, news that Google demonstrated a tool at Web 2.0 today that uses pattern recognition to determine sex in photos. You're right John, it sure sounds cool. I can't wait to see it (no pun intended) in action. Plenty of other companies and organizations are also doing work in related areas like finding visually similar imagery.
I've posted about just a few of them including Cydral and LTU Technologies in the past. More about LTU here including presentations from the Search Engine Meeting. Finally, Freenet.de from Germany (interface translated here via Yahoo) allows you to limit your search for imagery that must include the face of a human. Another option allows you to limit to terms that actually appear in the image itself. To give it a go, open another tab, and run the your search using the German language interface (and some German words).
Posted by Gary Price at 9:55 PM | Permalink
Google, NASA sign `a very big deal' from the San Jose Mercury News gives the rundown on the aforementioned plans by Google to expand onto NASA's Ames facility in Mountain View. It's not just getting more space. It's also about collaborating with NASA scientists and getting NASA data, as well. So literally, the sky's not the limit, for Google.
In particular, Google will get access to scientists behind to supercomputing technology that NASA has developed, the article says. Google also gets more access to NASA space data and images, additional fodder for Google Maps and Google Earth, no doubt.
"We already have Google Earth....We'd like to have Google Mars and Google Moon," Google's Peter Norvig is quoted as saying.
NASA Takes Google on Journey Into Space is the official press release from Google with more details, and the NASA version is here.
Postscript from Gary: Google already owns many of the domains that they might need for outer space exploration. See this collection of domains that Google registered a few months ago including GoogleMoon, GoogleMars, and GoogleNeptune.Posted by Danny Sullivan at 7:14 AM | Permalink
Lengedary computer programmer, Peter Weinberger, now works at Google as a software engineer. Unix geeks might already know that Weinberger is the the "W" in the AWK programming language named for its developers. In an interview (free access) with Laurianne McLaughlin from IEEE's Security & Privacy magazine, Weinberger talks Google, future search, and privacy.
From the interview: Security and Privacy: What are the biggest technology challenges for Google today?
Weinberger: Scale is the problem. Our business grows rapidly. That means every year, a lot of the technology decisions made a year ago don?t look so good any more. Exponential growth is a very pleasant problem but requires a lot of work.
Posted by Gary Price at 3:39 PM | Permalink
A new compilation with direct links to eight research papers (the actual papers are not new) by Google engineers is now online. All of the papers deal directly with Google technology. A link to the page can be found at the bottom of the Google Labs home page. The original "Papers written by Googlers" compilation remains online and is available here.
Posted by Gary Price at 11:55 AM | Permalink
What's Google CEO Eric Schmidt's advice for getting new ideas flowing at Google? During a speech at the Gartner Symposium/ITxpo in San Francisco he said that letting the engineers "run rampant" is what works at the Googleplex.
The most clever ideas don't come from the leaders, but rather from the leaders listening and encouraging and kind of creating a discussion," he said. "Wander around and try to find the new ideas."He also told the audience that an open-door policy is best in the IT world.
"You want to see every conceivable demo, no matter how wacky it is," he told the audience. "People love that. They get a chance to present to someone important like yourselves. All of a sudden the whole (corporate culture) becomes about leadership and innovation."Finally, Schmidt had a comment about a Google web browser speculation since the company recently hired a couple Mozilla engineers.
Schmidt downplayed this speculation Wednesday, saying that the Mozilla hires played into his company's strategy of supporting both Internet Explorer and Mozilla's Firefox browser. "We have decided to work on a browser-independent strategy," he said. "We don?t want to be specialized on any particular one, so that's why these people are working at Google."More in the article: Google CEO says 'set the engineers free'
Posted by Gary Price at 12:22 PM | Permalink
Still haven't found time to read through that pending Google patent that all the search forums are abuzz with? Never fear, others are breaking it down for you.
Google's Patent: Information Retrieval Based on Historical Data Rand "Randfish" Fishkin at SEOmoz nicely summarizes the five most critical concepts he finds:
On that last part, I'd say actually this patent application seems to cover many things that people have long speculated any search engine might do. And as I wrote before, whether Google is actually doing any of this is uncertain. I have no doubt some ideas expressed are being used. Other ideas probably aren't.
Spotted via SEO Book, Google Patent Analysis from the Wolf-Howl blog is another good summary of key factors coming out of the application.
Over at the Search Science blog, New Google patent proves "sandbox" exists from Xan Porter is actually his take on why the patent doesn't seem to explain the sandbox operation some feel Google has been following.
For further resources, see the rundown on forum threads I covered in my earlier post: New Google Patent May Give Sandbox & Inner Workings Info.
Want to discuss? Join our forum thread, Does New Google Patent Validate Sandbox Theory?.
Posted by Danny Sullivan at 2:31 PM | Permalink
Yesterday, we linked to a News.com article about how Google's data centers operate. Want to learn? Today, Philipp reminds us about an October 2004 presentation at the University of Washington by Google engineer Jeff Dean. We first linked to it in October. Dean's lecture is titled: Google: A Behind-the-Scenes Look. You can view a webcast of the presentation here.
More Interesting "Search" Lectures Available on the WWW: Most are archived webcasts. + Eric Schmidt at Stanford Business, 2004 + a9's Udi Manber speaks at the University of Washington, 2004 + Amit Singhal from Google discusses (slides only) the challenges in running a commercial search engine, 2004 + Eric Schmidt at the Berkeley in Silicon Valley Symposium, 2003 + Google's Urs Hölzle presentation about Google's Linux cluster at the University of Washington, 2002 + Soumen Chakrabarti from the Indian Institute of Technology discusses web resource discovery at the University of Washington, 2000
Posted by Gary Price at 1:15 PM | Permalink
Those of you with an interest about how Google's data centers operate will want to take a look at the News.com article: Google's secret of success? Dealing with failure. It offers highlights from a presentation by Urs Hoelzle, a vice president of engineering and of operation at Google. Hoelzle spoke at the EclipseCon on Wednesday.
For all its built-in redundancy in case of failure, the system doesn't address all problems, Hoelzle revealed. During the presentation, he showed a photo of six fire trucks responding to an emergency at a Google data center in an undisclosed location. He would not reveal any specific details on the mishap except to say that "it wasn't about one machine going down."In a follow-up interview with CNET News.com, Hoelzle said the cost of power is another important factor in Google's data center designs.
In a 2004 presentation at Stanford, Eric Schmidt talked a bit about Google technology including "power" issues.
Posted by Gary Price at 11:47 AM | Permalink
In the past few week's, a couple members of Google's leadership have been sharing their thoughts and views in public forums.
First, Google's VP of Engineering Adam Bosworth, spoke to The Gillmor Gang (you can listen online) about future search engine architecture, personalization, and RSS. Findory's Greg Linden responds to some of Bosworth's comments with his take on the value of personalization.
Second, Google Blogoscoped points us to a transcript of a presentation by Peter Norvig, Google's Director of Search Quality. Norvig discusses semantic web ontologies, automation, and other issues.
Posted by Gary Price at 1:43 PM | Permalink
A new Google Research Centre has just opened in Tokyo according to this Computer Weekly story.
If you're wondering: 1) What type of R&D research the centre will focus on? 2) How many people Google plans to hire in Tokyo?
The answer is the company doesn't know (or isn't saying).
The centre's role in Google's global R&D network is yet to be defined and will be shaped by the people who are hired to work there, said Howard Gobioff, the centre's engineering director and principal engineer...Google does not have a set number of engineers it is looking to hire for the Tokyo centre and this will depend largely on the quality of people who apply, Gobioff said. The company is looking for "really smart people" who have an interest in building things, are flexible and want to take on new problems in new domains, he said.
This is Google's fourth R&D centere. The others are in the US, Switzerland and India.
Posted by Gary Price at 8:22 AM | Permalink
The magic that makes Google tick from ZDNet has a look at technical details behind delivering Google searches. But, I've got a few quibbles:
OK, enough with the quibbles, and which in fairness I could do with Google competitors, as well. See the rest of the article for some technical details on Google data centers, the fact there's not been a complete system failure since February 2000 and more.
Posted by Danny Sullivan at 12:56 PM | Permalink
It's very likely that many of you have read, The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998), by Google co-founders, Sergey Brin and Larry Page. The paper was written while they were students at Stanford.
Here's a short bibliography of other papers that the "Google Boys" wrote while members of the Stanford Database Group.
+ Extracting Patterns and Relations from the World Wide Web (1999) by Sergey Brin
+ Dynamic Data Mining: Exploring Large Rule Spaces by Sampling (1999) by Sergey Brin and Larry Page
+ The PageRank Citation Ranking: Bringing Order to the Web (1999) by Sergey Brin, Rajeev Motwani, Larry Page, and Terry Winograd
+ Efficient Crawling Through URL Ordering (1998) by J. Cho, Hector Garcia-Molina, and Larry Page
+ Copy Detection Mechanisms for Digital Documents (1995) by Sergey Brin and Hector Garcia-Molina
+ Near Neighbor Search in Large Metric Spaces (1995) by Sergey Brin
Posted by Gary Price at 10:51 AM | Permalink | Comments (0)
Greg Linden from Findory alerts us to a presentation by Jeff Dean from Google. It took place earlier this week at the University of Washington in Seattle.
The presentation is titled: Google: A Behind-the-scenes Look and an archived version is now viewable (Real or MS) on the web. Greg's review of the presentation is also online.
Here are a few other presentations that might be of interest.
+ Seminar Presentation: Challenges in Running a Commercial Search Engine (3.5 MB; PDF) >From the IR perspective, interesting! A presentation by Amit Singhal, Senior Research Scientist at Google. It was the keynote address at IBM's Second Search and Collaboration Seminar 2004 in Haifa.
+ View a Presentation by Google CEO Eric Schmidt Eric Schmidt delivered this presentation (it runs about one hour, Windows Media) at a UC Berkeley during the EECS Annual Research Symposium in February.
+ And one from a non-Googler. Udi Manber, top guy at a9, spoke at the University of Washington last November. You can watch an archived version of his lecture here (RealVideo). It's titled, "The World's Information at Everyone's Fingertips."
Posted by Gary Price at 6:18 PM | Permalink | Comments (0)
A presentation (PowerPoint slides) titled, Challenges in Running a Commercial Search Engine (3.5 MB; PDF) might be of interest to some of you.
The slides come from a keynote presentation by Amit Singhal, a Senior Research Scientist at Google.
The presentation was given in Israel on February 16th at IBM's Second Search and Collaboration Seminar 2004.
Posted by Gary Price at 8:51 AM | Permalink | Comments (0)