As Kevin Newcomb mentioned yesterday, Danny Sullivan had an outstanding write-up yesterday about Google's enhancement of its "Link:" operator which allows researchers to discover many of the links that Google has indexed as pointing to a particular URL: Google Releases New Link Reporting Tools.
Google will allow users of its Webmaster Central tools to see more thorough reports of inbound links as measured to domains and even particular pages. More information is also available at the Webmaster Central blog.
This underscores the importance of working with Google by signing up with Webmaster Central. Not only will it help to get important pages of a Web site indexed, but it will also assist webmasters in conducting important competitor analysis. In the past, many researchers have almost completely ignored the Google "Link:" command, or operator, since it is known that Google does not display all of the links it knows about. Others have continued to use it, thinking that the ones that Google shows "must be more valuable" than others.
This has in fact been an often discussed topic at the Search Engine Watch Forums, where a sticky thread discusses the topic of the difference in inbound link reporting at various engines, and reveals that the current link discovery tool of consensus choice is the one found at Yahoo! Site Explorer. Although it is unlikely that those many converts will now abandon Yahoo! to use only the Google enhanced version, this news has made many webmasters and search engine optimization specialists happy.
(Begin editorial) Our engineers at Avenue A | Razorfish love to use the Webmaster Central tools, but, believe it or not, sometimes have problems with getting clients to approve the use, since it requires verification code to be placed on the Web site. Perhaps if Google would be more open about sharing its information without requiring this code, it would get a better reputation with some marketers that feel that they require too much "inside information." Google has done a great job in helping webmasters with their Web sites, but still needs to improve its relationship and willingness to work with agencies and other SEO companies, in the opinion of some. (/end editorial)
Posted by Chris Boggs at 11:13 AM | Permalink
Rand over at SEOMoz has a great write up on Long List of Link Searches where he goes through a sample client and how he would approach the competitive intelligence aspect of the SEO research. This is a must read for any SEO because he goes through "the obvious," some "advanced operators," "alternative search sources," "directory search terms," "blog & forum searches," and "submit type searches." The best part is that this is a practical example that gives you actionable items to run through in your own SEO practices. Gotta love Rand for doing this, most would not.
Posted by Barry Schwartz at 9:03 AM | Permalink
Windows Live Search has added a new linkfromdomain command designed to show you all the sites that a particular domain links to. For example, want to see all the sites we link to off the Search Engine Watch Blog? Do this:
linkfromdomain:blog.searchenginewatch.com
Well, that's supposed to work. Right now, it doesn't. I suspect there's an issue with subdomains. Instead, I have to use the root domain of searchenginewatch.com, like this:
linkfromdomain:searchenginewatch.com
Windows Live Search has other link-related commands, and they explain more about these in the Search Macros: LinkfromDomain post on the Live Search blog today, completely with color-coded chart. I'll run through them as well.
Want to know who links to a particular domain, such as the US White House site? Use the linkdomain command, like this:
As Windows Live explains, you can use the two commands together to see who links to each other. For example, want to see all the links from one site to another site, such as from Search Engine Watch to TechCrunch? Use the linkfromdomain command in combination with the site command, like this:
linkfromdomain:searchenginewatch.com site:techcrunch.com
That brings up 172 links? How about the reverse? Easy:
linkfromdomain:techcrunch.com site:searchenginewatch.com
That brings up 81 links. Unfortunately, I don't see a way to know exactly what specific page at one site links to that at another. Nor does it seem possible to see reciprocal links. In other words, the commands don't allow you to see which exact pages between two sites may link to each other. Hey, a boy can hope, can't he? Note that the page specific link: command might allow this -- I just haven't had a chance yet to find two pages I know link to each other to try it.
Last year, Microsoft rolled out something related that was long on my wish list, the inanchor command. It allows you to find pages that have links to them containing certain words. For example:
Should be all the pages that have the word "miserable" in a link pointing at them. Unfortunately, you can't do multiple words. That means this won't work:
Or at least it shouldn't according to the instructions. Instead, that's supposedly processed like this:
inanchor:miserable failure
Which should mean show all the pages that have links pointing at them containing the word "miserable" which also themselves have the word failure on them. The problem is, several of the pages clearly do NOT say failure on them, so it doesn't seem to be working correctly.
My real wish is to be able to see all the pages using certain words in links to other pages. That would allow us to finally see, for example, all the pages linking to the US White House and saying "miserable failure" in the links, helpful for those trying to understand this and other link bombs.
Link domain sort of allows this:
linkdomain:whitehouse.gov inanchor:miserable
That shows you all the links saying the word "miserable" and linking to the whitehouse.gov domain. Or at least it should show. I doubt it's working correctly, because it reports only 189 links. There are many, many more links like this out there.
Similarly, this doesn't seem to work to give you a page-specific report:
link:whitehouse.gov/president/gwbbio.html inanchor:miserable
It reports the same 189 links, suggesting there's no difference between it and the linkdomain command, which I've already said isn't working.
Microsoft has many more commands you can play with as listed here. There's also a special form you can use to play more with the new linkfromdomain commands here.
Posted by Danny Sullivan at 9:41 AM | Permalink
An interesting article in Pandia about an overlooked area of search - Searching Google for words with accents. The bottom line is that it's difficult and fiddly to do it, and it depends on a number of different factors that the searcher cannot control. Let's take the example (as given) by Pandia for Mexico and México. A search will probably return sites that contain either word, but to force the engine to return hits with the accented version a search for +México will pretty much work (though there may be a few oddities caused by inbound links.
However, results will differ depending on IP address, language of the Google home page being used, and preferred language. Now, this is useful as far as the searcher is concerned, since it should result in rather more accurate results, but it's going to be a concern for search engine marketers, since it makes the idea of being (say) #8 in Google a rather moveable feast, - is it #8 for Google.com users, or for Spanish language users? Still, no-one said that internet searching was supposed to be easy, did they?
Of course, it doesn't help either when we start to look into the results in a little more detail. Searching on Google.co.uk (searching the web, not just the UK) a search for Mexico returns 671,000,000 results, as does mexico (with a lower case). However, searching for México gives me 665,000,000 results, but a search for méxico 6,000,000 less, with 659,000,000. When we get into the insanity of searching for méxico -mexico with a result of 725,000,000 things certainly get a little more confused again. My one crumb of comfort is that searches on Ask.com tend to be rather more stable, but even then, not perfect. All goes to show - don't trust search engine results!
Posted by Phil Bradley at 8:53 AM | Permalink
Matt Cutts from Google has a great follow up on our reports that Google Modifies Navigational Search Results from about two-plus weeks ago. In his post, he explains that when you search on a URL (i.e. www.searchenginewatch.com), Google has stopped showing the information for the URL and now shows a standard search on the words in the URL itself. I learned two things from Matt's post.
(1) Entering in the URL of a site into to a search box is not labeled as a "navigational search" it is labeled as a "URL search." Navigational searches are when you search on a company name, i.e. Search Engine Watch versus a URL search is when you search on a company URL, i.e. searchenginewatch.com.
(2) To a normal user, bringing back search results for a URL search is more useful then bring back the information on that URL, in Matt's opinion. If SEOs and webmasters want to pull that information, we still can still do this by using the info:www.domain.com command. It works like this for this blog, [info:blog.searchenginewatch.com] and it shows you information for this URL.
Posted by Barry Schwartz at 8:51 AM | Permalink
In the past twenty-four hours I have discovered and documented four different bugs or weird occurrences at both Yahoo and Google. I will cover the four bugs, include adult ads displaying in Google and Yahoo's contextual programs, Yahoo's contextual ads not displaying ads at all, Google's site operator not functioning properly and Google's AdWords statistics not showing the right data.
(1) The first I named Adult Ads Displayed Within Google's AdSense Program? but it actually affects Yahoo as well. Basically, some people found Google AdSense ads that displayed adult oriented content, something that should not happen on AdSense. But what was amusing, was that after I posted this article, people noticed that when the Yahoo Publisher Network ads showed up, they were showing adult ads as well.
(2) If you were not able to load the ads in example one, then it may be because the Yahoo Publisher Network Ads Still Have Accessibility Issues, even after I reported it last Friday. Basically, some ISPs are not able to resolve the DNS information that hosts those Yahoo ads. This was first documented on August 31st, then acknowledged by a Yahoo representative on September 7th and is still an issue today.
(3) The next issue is that Google's Site Operator Shows Sites Off Domain. It does, I have seen screen captures myself, showing someone searching using the site: command and Google returning results from sites off of that domain. I have pictures and more details at the Search Engine Roundtable.
(4) The final bug I found today was that AdWords Statistics Mixing the Search & Content Network. So you are an advertiser, you set a campaign to only run on Google's content network, but for some reason, your stats in AdWords shows impressions and clicks for that campaign in the search network. This bug is confirmed by Google but stated as a small tiny problem.
There you go, four bugs documented in the past twenty-four hours.
Posted by Barry Schwartz at 9:00 AM | Permalink
I reported this morning that Google has changed the way they handle navigational like searches. For example, if you do a search on a site's name (i.e. navigational) you now get a different type of result set then you did a week or so ago.
For example, a search on the popular buy.com will now show: Show Google's cache of www.buy.com Find web pages that are similar to www.buy.com Find web pages that link to www.buy.com Find web pages from the site www.buy.com Find web pages that contain the term "www.buy.com"
Instead it will show you results that match the keyword phrase "buy.com." That includes links to possible competitors. I wonder if that will upset geico.com?
In any event, I have compared how Google, Yahoo, MSN and Ask.com handle these types of navigational queries at the Search Engine Roundtable.
Posted by Barry Schwartz at 9:27 AM | Permalink
The Yahoo Search Blog defines which queries will be redirected from Yahoo Search to Yahoo Site Explorer. Remember on July 11th when we reported that Yahoo Tests Redirecting Some Searches To Site Explorer? So which queries exactly do this? Queries in the format of site:ysearchblog.com or link:http://www.ysearchblog.com/archives/000341.html or linkdomain:ysearchblog.com but not ysearchblog.com or ysearchblog or site:ysearchblog.com webmasters (looking for ysearchblog posts mentioning webmasters) or link:http://www.ysearchblog.com/archives/000341.html Danny Sullivan (looking for links to the article mentioning Danny Sullivan) or or linkdomain:ysearchblog.com site:yahoo.com (looking for links to ysearchblog from within yahoo.com). More details at the Yahoo Search Blog.
Posted by Barry Schwartz at 11:43 AM | Permalink
During the Bot Obedience Course session at SES San Jose just a few minutes ago, Rajat at Yahoo announced a new upgrade to the Site Explorer tool they initially launched last year. The additions at Site Explorer include:
- More information about sites you own., including -- Last Crawled Date and Language for your Site URLs -- Subdomains of your site - Feed submissions are much smoother. You can submit RSS, Atom and URL lists, and manage all of them from one place. For authenticated sites, you can also track when they were submitted and processed. - UpdateNotification Web Service to notify us of feed or site updates, part of the suite of Site Explorer APIs you already know and love. Since these return the same data as the tool, we recommend using them for automated applications.The tool really reminds me of Google Sitemaps, now Google Webmaster Central.
I hope to take a deeper look after the SES conference.
Update: More details just posted at the Yahoo Search Blog.
Posted by Barry Schwartz at 5:39 PM | Permalink
Gary Price notes that Ask.com has added support for ISBN number searches. For example, conduct a search on 091096551X, which is the book Gary Price and Chris Sherman wrote, you should notice an image of the book and a link to compare prices. This is a neat feature, but I should note that entering in the same ISBN number at Yahoo Search gives you a Yahoo Shortcut that takes you to comparison shopping in the Yahoo Network. It was a nice feature for Ask.com to add.
Posted by Barry Schwartz at 10:14 AM | Permalink
Yahoo is testing out redirecting some of those who conduct a link command or site command search at search.yahoo.com to the Yahoo Site Explorer tool. I reported this and just now received confirmation from Yahoo that they are testing out this solution with a "percentage of users" conducting these searches. Yahoo has always wanted to move these special searches off the main search.yahoo.com page and onto the Site Explorer front.
On other Yahoo news, Yahoo just announced a weather report stating, "we rolled out an index update last night. As usual, you may see some changes in ranking as well as some shuffling of the pages that are included in the index."
Want to discuss the Site Explorer change in our forums, join the discussion named Yahoo operators re-directing to Site Explorer.
Posted by Barry Schwartz at 2:44 PM | Permalink
Rand has an excellent post on how to get your hands dirty by manually checking your links at the various search engines. He reviews Google's link command and how bad it is. He also reviews MSN's link command and explains how you can add modifiers to the link or linkdomain commands (i.e. exclude site A from the command). Rand then reviews the Yahoo link command, and explains that although Yahoo has Site Explorer, the "most accurate" result set still comes from search.yahoo.com. He recommends you use search.yahoo.com and then append &b=999 to the end of the URL manually. Like MSN, you can add modifiers to the Yahoo link commands. This is a great post for those who want a refresher on the link commands available to you, plus learn a few new tips on them.
Posted by Barry Schwartz at 8:52 AM | Permalink
I've written before about Google giving strange results counts and why maybe it's time for them to go. Yesterday, I came across the oddest ones ever, when doing some typical searches to gauge the size of the index.
Here's an example. Search for xxkjdiuenmnmd8i, which when I just did it came back with no results. Now search for -xxkjdiuenmnmd8i. In theory, that should show the size of the Google index, all the pages it has.
In reality, that type of search hasn't often worked. It was only last September that this type index estimation technique gave any results at all. Even then, I didn't trust that the numbers were accurate. Still, they seemed better than what's coming up now. Look at the screenshot below:
Ten results? Only ten results, for a search technique that last month would have come up with more than 25 billion? Something funky is going on.
Finding it odd, I tried a search for the, often useful as a fast way to get a sense of how big Google might be, at least for the number of English language pages it has. The query came back with 23 billion matches. So how about -the, I tried, just out of curiosity. Ten matches:
Ten? Ten?!!! And more strangeness. A search for -and, -cars, -movies all did the same thing. The results were different in various ways, but the count was always only 10 matches, when it should be much more.
Note that the results all have additional information that make them appear to come out of Google Base. It all suggests that Google has disabled counting for queries involving a single word, but that somehow, Google Base integration is still happening to throw things off. It might be that Google is still doing a call to Google Base, asking for the top 10 results that it has, in order to integrate those results into a regular web search listing. But because it also has disabled display of regular web search results for a single negative word query, it's only Google Base that shows.
Going back to my post from last month, Google, Kill The Web Search Counts!, I explained how Google had stated that the counts reported for a spam site that were removed were much inflated by a counting glitch. I talked with Google about this and some other issues last week just before leaving for my trip to SES Latino in Miami, where I am now.
Some of what I talked about with Google's Matt Cutts and other engineers at Google has already addressed in a recent blog post. The issue of counts came up, and I'll do a longer post on what Google said after I get back from this trip and clear what I can discuss. The short answer is that they are aware of the issues and are looking to correct things. These strange results counts might be part of that.
More later when I'm back from my current trip, or watch Matt's blog, in case he posts before me.
Posted by Danny Sullivan at 8:12 AM | Permalink
Vanessa Fox from Google Engineering posted at the Inside Google Sitemaps blog, that Google found a bug with the site search command. The post explains that some of the reason people are noticing indexing issues at Google, is because of this bug. The two of a "few bugs that affected the site: operator" include using the site command with a trailing slash (i.e. site:www.example.com/) or trying it on a hyphenated domain name (i.e. site:www.example-site.com). Google says they will have it fixed within a few days, but until then, use the syntax site:www.example.com. I have the forum roundup on this bug at the Search Engine Roundtable.
Posted by Barry Schwartz at 9:08 AM | Permalink
TMCnet.com has an interesting article explaining two new projects Microsoft is working on in relation to MSN Search. The first is something named "Wild Thing" that enables short hand searching. It was first designed for mobile Web searching, to allow users to type short hand, but now if you don't know how to spell "Schwarzenegger" you can type "ar* sc*w mo*" into the engine and it will try to figure it out. The second is something named "Nocturnal" and this shares bookmarks and Web browsing activity with your MSN Messenger buddies. Nocturnal can also be used to learn your Web behavior and tailor search results specifically for you.
Posted by Barry Schwartz at 3:16 PM | Permalink
Can't remember special commands that help in doing special searches at Google (and often work elsewhere, as well). The latest edition of Google's newsletter for librarians points to two posters you can print with the commands. Suitable for framing -- well, for tacking to a wall, you might find them handy. Need something more comprehensive? There's also the long-standing Google Cheat Sheet.
Posted by Danny Sullivan at 8:25 AM | Permalink
The MSN Search Blog last night announced that they have released a new search feature named "Search Macros." Search Macros are methods of building an advanced search query and saving that query for later in your search bar. The example the blog uses is searching for recipes. They provide a link to a pre-built macro named livesearch.recipes, which uses the follow advanced query;
(-site:toeatgoodfood.com -linkdomain:googlesyndication.com intitle:recipe prefer:cup prefer:serve prefer:cook prefer:food prefer:menu prefer:cookbook prefer:site:www.epicurious.com prefer:site:www.recipesource.com prefer:site:allrecipes.com prefer:site:www.foodtv.com prefer:site:www.recipesource.com)
This combination of advanced query strings allows you to exclude certain sites, look for certain page characteristics and also give preference to other sites. You can then click install this macro and the Macro Search will be saved to your search bar. You can also create your own Macros, for more information visit the MSN Search blog.
Posted by Barry Schwartz at 8:44 AM | Permalink
Rand Fishkin explores the MSN Search link operator. With this advanced search command, you can find out who links to Matt Cutts, SEO Book, SEO Consultants, but not SEOmoz.org; (linkdomain:mattcutts.com linkdomain:seobook.com linkdomain:seoconsultants.com) (-linkdomain:seomoz.org). This is pretty powerful stuff, check out SEOMoz.org for more information on this.
Posted by Barry Schwartz at 9:47 AM | Permalink
Barry reports that MSN Search allows searchers an option to limit their search to a specific IP address. Example here. Actually, a bit of research points out that this IP search option went live around midyear 2005. Nevertheless, it's good that SER reminds of us of it.
MSN Search isn't the only web engine to offer an IP search option. Gigablast also offers this feature. It's documented here with an example.
Web search historians remember that AllTheWeb used to offer an IP search limit. It disappeared in 2004 when ATW moved to the Yahoo search platform.
Posted by Gary Price at 10:56 AM | Permalink
The new Yahoo Open Shortcuts service lets anyone create their own custom search commands for use on Yahoo. Want to navigate to a particular site quickly or have Yahoo remember a particular search string? The new service lets you do this.
This help page provides full details on how the system works, and this page helps walk you through the creation process. Time Saving Search Shortcuts on the Yahoo Search Blog also has more info.
To add a few examples, let's say you want to reach the Search Engine Watch Blog quickly. Using the shortcuts creation page, you give the shortcut a name that you'll enter (let's say "sewb") and the URL (http://blog.searchenginewatch.com/blog). Once you've set this up, you can then do this into the Yahoo search box:
!sewb
And that is supposed to take you to the Search Engine Watch Blog home page. You can make any number of commands to navigate wherever you like, and you can recall the entire list you've created through this command:
!list
How about saving searches? Sure. Say you always use Yahoo to search against our site to find stories about Yahoo. That would look like this:
site:blog.searchenginewatch.com yahoo
After doing that search, look in the address bar of your browser, and you'll see a URL similar to this:
http://search.yahoo.com/search?p=site%3Ablog.searchenginewatch.com%20yahoo
Now using the creation page, you'd enter a search command/shortcut you want to save the URL to be associated with (let's say sew-yahoo). Then anytime you did this:
!sew-yahoo
that long URL you copy and pasted would automatically be sent to Yahoo, causing it to rerun your search.
I tested the service before it went public and also 15 minutes after that, and I found the shortcuts I'd made and saved weren't working. Even Yahoo's own shortcuts like !my weren't working. If you find the same, keep trying. I suspect they'll begin operating shortly.
Don't want to use the creation page? Power folks can make any shortcut on the fly right within the search box. The magic weapon is the !set command. Use that followed by the name of your shortcut and the URL to save, and you'll make a shortcut on the fly. For example:
!set sewf http://forums.searchenginewatch.com
would instantly create a shortcut for you called "sewf" that takes you to the Search Engine Watch Forums.
Yahoo's created a number of shortcuts that anyone can use to reach or search popular sites, such as Amazon, My Yahoo and Flickr. Unfortunately, the shortcuts you create can't be shared with others.
In contrast, YubNub that I wrote about earlier this year lets you not only create any number of powerful shortcuts, but once created, everyone else can use them.
For more about that an and a similar service called Ambedo, see these past posts:
Browsers Many browsers offer pre-built search shortcut features.
+ For example, some shortcuts are built into Opera. For example, entering a "g" in the address box and then your search terms will run a Google search, entering an "z" plus search terms runs an Amazon.com search, a "z" plus search terms runs an eBay search. Documentation about all of this is available here.
Worth mentioning that states that these options can't be customized BUT this page says they can. It will take a little hacking but it's easy enough for a non-geek to do it. The problems is I'm not sure it works. Details about what I'm talking about here in a knowledge base entry.
+ Other browsers also offer search shortcuts that can be created with little effort. One of many examples is, Netcaptor. This browser calls search shortcuts "Quick Search" and they can be set-up in seconds. The documentation is clear and is found in the Help section. The section on "Aliases" might also be worth taking a look at.
Toolbars Three toolbars allow you to customize and add search capabilities direct from any search box. I've mentioned all of them on the blog or in SearchDay during the past few years.
Btw, these services as well as many others allow advanced users ca "hack the urls" and bring very advanced searches directly to the toolbar. Trust me, it's easy and fun. The possibilities are endless.
Posted by Danny Sullivan at 12:15 AM | Permalink
Promised over a month ago, Yahoo Site Explorer is now reality. Yahoo gives the heads-up to everyone here on its Yahoo Search Blog, and how it will show you all pages within a domain, within a particular directory of a domain, all inbound links to a domain and the ability to bulk submit (which was already live earlier and explained more in our earlier post). You can also access through a new Site Explorer API or export data for further analysis. More details also on the help page.
If you're a Search Engine Watch member, I do a through exploration of Site Explorer in this article in the members area. Check it out! Or hey, help support the site and the blog by becoming an SEW member! Below, a summary of my wish list items and observations from that members' article:
Want to comment or discuss? Visit our Search Engine Watch Forums thread, Yahoo Site Explorer Now Live!
Posted by Danny Sullivan at 8:46 PM | Permalink
Further to my previous post on the Google index update/size increase, there appears to be a new way to count all the pages within Google. Find a term that doesn't exist, then search for minus that term, and you get a full count. Well, sort of.
This was the technique that we used to be able to use at Northern Light, to verify all the pages it had. AllTheWeb used to have a similar method it gave to Greg Notess, as he says here, that he used as part of his long time documentation of search sizes.
I was emailing with Google last week about wanting that type of command to exist at Google. It didn't work last week when I tried, nor had I seen it working before. But if we had it, then anyone could see exactly the total number of pages Google should have in its index.
That's important, because as I've written, the count on Google's home page doesn't change in line with the index growing. In addition, searching for a common word like "the" sometimes doesn't work well because of stopword issues. There are also plenty of non-English language pages that won't contain the word "the."
Today, I noticed the technique suddenly did work! To see it in action, I've provided an example below. This work in the long-term, because once this post gets indexed, the word will suddenly exist in Google's index. But you can easily do it with other words.
A search for djfdkjkfjkdjdfk comes back with "Your search - djfdkjkfjkdjdfk - did not match any documents." OK, then we know there are no documents in the Google index with this term.
Now I do a search for -djfdkjkfjkdjdfk. That means, "Show me all the pages you have that don't have this word on them." Since we know that NO pages have that word, asking for all pages without it should show us everything.
Count? About 9,560,000,000 pages. Count on the Google home page? "Searching 8,168,684,336 web pages." So at least, Google should have about 1.5 billion pages in its index more than it currently claims.
I actually think that's much higher, as I'll explain in a future post. That's why I'm saying this may "sort of" work to show all the pages. Certainly PhilC on our SEW Forums has tried this technique and gotten 11.3 billion results. I can't get the same, but it's just another sign that the counts aren't adding up in the many ways you want to slice them.
By the way, I tried the negative technique at some other places. It won't work for Yahoo and Ask Jeeves. But at MSN, -djfdkjkfjkdjdfk came back with a count of 5,304,186,736, which is right in line with the self-reported figure of 5 billion MSN gave last year.
Of course, even if all the search engines make this technique work, it doesn't necessarily mean we've got apples-to-apples comparisons. What depths are the pages indexed to? How well are duplicates removed? Are these pages actually indexed or just links to pages you know about? Those are just some of the issues.
More important, as I've written before and will come back to again, having higher counts won't mean you're more comprehensive. For more on this, see my post from yesterday, Googlewhacks Show More Signs That Google's Increased Its Index; Time To Drop The Hamburger Count.
Want to comment or discuss? Visit our forum thread, Sept. 2005 Google Index Update & Size Increase Coming?
Postscript: Spotted via Inside Google, Google: Spot the mistake charts how queries on Google are now bringing back more results and estimates the index may not be at 21 billion.
Posted by Danny Sullivan at 9:45 AM | Permalink
The Google Blog reports that three new search shortcuts are now available for users of Google's mobile web search (XHTML) service.
Shortcuts Enter: movies [film title] or movie [location] Get movie showtime info. Enter: weather [city or ZIP Code] Access four-day forecast. Enter: Ticker Symbol Access current stock price (delayed) for NASDAQ, AMEX, and NYSE listed companies. Other shortcuts listed here.
If you don't have an XHTML-capable mobile browser or just want to check out the mobile service, the user interface should is accessible here.
Yahoo Mobile and 4info.net also offer movie showtimes, weather, stock quotes, and other shortcuts.
Posted by Gary Price at 1:45 PM | Permalink
Google Brings Film Showtimes and Reviews Service to the UKGoogle has just introduced their film showtime and review service on the Google UK site. It operates just like the U.S. film info service that went live in February.
Searchers can trigger the Google Film service on Google.co.uk, by entering the terms:
Results pages contain showtimes and a link to find reviews from various sources. Pages are designed to the searcher to quickly identify positive and negative reviews, to search "within" the reviews, and more. Google Films also shows an "average rating" that's based on all of the reviews in their database.
Film showtimes along with basic info (running times, rating, etc.) is also now available via SMS in the UK. Details and directions here.
If you're looking for info about older films, enter, film: [title] or movie:[title] into a web search box to trigger the service.
The only thing that surprised me was after spot checking reviews for several popular films, I wasn't able to find many reviews published in the UK media. I would think that this content would also be relevant to Google UK users.
Others
Posted by Gary Price at 12:43 PM | Permalink
MSN Search Gains New Feed Searching CommandsMSN Searches RSS Deeper from Robin Good looks at how MSN Search now has two new undocumented feed searching commands, including the ability to full-text search within feeds.
The first command is feed: that allows you to tell MSN Search that you want to search for material just within feeds it has indexed, rather than across web pages and other documents. It's similar to existing filetype commands that let you limit searching to things such as Word documents or PDF files.
How about an example? Try feed:"hurricane katrina", and you'll get a list of feeds that have that exact phrase within them. Look at the first page listed, a feed from the National Hurricane Center. In the cached copy, you'll see that phrase appearing in the feed.
Keep in mind that only text in a feed is indexed, and that may be different from the text in a post referenced in a feed. In other words, some blogs and other sources don't send out the full text of their posts in feeds. In other cases, some people will write custom descriptions so that what's in the feed will be different than what's on the page.
Overall, the feed: command isn't that wonderful, though it's still nice to have. Use it if you're trying to narrow down a mention that may have happened in the blogosphere, since many blogs have feeds, so this is a way to drill into that subset of content. But plenty of non-blog pages also have feeds, so it's not perfect. Plus, since you're only looking in feeds rather than the full text of post, you might miss items you are interested in.
The second command is hasfeed: that's supposed to bring back any page that links to a feed that has those words in the feed content, from what I understand. That's not what I found, however. Running hasfeed:google brought back the Google home page first, and that page has no feed on it nor any links to feeds that I can see.
Robin's article has a few examples he was sent by MSN, and it seems like using hasfeed: with site: works better, as a way to see if a particular site offers any feeds. For example, hasfeed: site:searchenginewatch.com does bring back pages within the searchenginewatch.com domain and subdomains that link to feeds -- since we have a link to every feed on every page.
A better combination shown in the examples Robin was sent is something like feed: site:searchenginewatch.com. That brings back only feed content from within the searchenginewatch.com domain and subdomains -- and it did catch all the major ones we have. Our old feed URLs are being used, rather than the new redirected ones. But the redirect change just happened last week, so that's not too surprising.
Posted by Danny Sullivan at 9:59 AM | Permalink
How to write queries from Google software engineer Matt Cutts is a nice look at how Googlers internally represent queries, so that everyone knows what was actually entered in the search box. It's a good approach and one I'll be adopting in general, for the most part.
Being clear about queries is crucial. Sometimes people do a phrase search, which brings back radically different results that a regular search. But the same people may not indicate this has happened, when writing about the search.
Matt says that Googlers surround queries as entered into the search box with the bracket symbols. So if you searched for red balls and wanted to say that, you'd write:
[red balls]
If you did a phrase search, one where quotes surround the phrase, you write:
["red balls"]
How about a more complicated search:
["red balls" -blue site:balls.com]
Personally, I think it also helps to think of the brackets as a visual representation of the search box. Picture them as both sides of the search box, with whatever you searched for being placed within that box. Easy!
As said, I'll be doing this going forward when it makes sense. In general, my preference for showing what I searched for is to turn the search into an exact link. So if I looked for red balls, I hyperlink the query so you can see exactly what I saw.
That's also helpful because the search box model alone doesn't tell you whether you saw a certain number of results, used a particular edition of a search engine and so on. But you can't always hyperlink, so it's nice to have the bracket idea.
Posted by Danny Sullivan at 10:04 AM | Permalink
Word on the Google Blog that they've improved the underlying algorithm used with their wildcard (*) search operator.
This includes allowing softer pattern matching, if necessary, and promoting results in which the blank filler is relatively more frequent in the context of the query.I've been writing about the wildcard matching feature from Google since 2002. Earlier this year, Greg posted about problems with this feature. Hopefully, todays's upgrade will not only fix the problem but make the service even more useful. We're still hoping that Google and other engines bring back a proximity operator that AltaVista once offered. Exalead is the only web engine that I know of that currently provides this feature that if used correctly can help provide more precise results.
Thanks to Garrett and Philipp for the news tip.
Posted by Gary Price at 11:38 AM | Permalink
Like the idea of command line searching, as YubNub that I wrote about earlier offers? Ambedo writes to say it offers the same. The commands are called "tags," which will confuse anyone who has gotten used to tags meaning words assigned to categorize photos, bookmarks, etc. A list of commands is here. For more on YubNub and search commands on major search engines, see C:\> YubNub For "Command Line" Searching & Search Commands For the Majors and MSN Search Gets Neural Net/RankNet Technology & (Potentially) Awesome New Search Commands.
Posted by Danny Sullivan at 9:04 AM | Permalink
Local, Relevance, and Japan! from the MSN Search WebLog talks about MSN Search using a new relevancy ranking system based on "Neural Net" technology, along with new search commands -- such as anchor text searching -- now available.
The cynical part of me is expecting that soon we'll be hearing about "MSN Search With New Neural Net Technology" coming in the marketing. Google never did a "Google With PageRank Technology" push, but it did often offer up PageRank as something that made it special. Meanwhile, ads for Ask Jeeves over here in the UK keep going on about "Ask Jeeves With New Teoma Technology," which makes me laugh, given that Ask has owned Teoma's technology for several years. When did it become new again?
Anyway, the post has a really cool picture illustrating how the "correct" page for a search on pbs evolution videos wasn't ranking well in early May but then through the help of the new technology moved up to the top position by June. Of course, that other neural net technology -- the human brain inside of an editor -- could have made the change in a few minutes.
OK, in fairness, you do want automated systems to learn how to do this stuff better. You can't have human editors constantly meddling in search results. But the occasional intervention would be nice.
Neural Net & RankNet
So what about with Neural Net technology? Kudos to Greg Linden. In his MSN Search and Learning to Rank post, he dug up a paper about the topic from Microsoft Research: Learning to Rank using Gradient Descent (PDF format).
The paper talks about RankNet, a much better sounding name for the technology. I'd love to give you a one sentence summary of what it does, but so far, that escapes me despite reading the paper several times. There's sure to be discussion and analysis, which I'll point to. I'll also be following up with MSN Search directly on this.
The impression I have at the moment is that the system is trained in some way to recognize what is good (trained by algorithms, human choices, I don't know) which in turn uses that data to refine results. It sounds similar to TrustRank, which we've touched on and that I'll be exploring more in the future. But it could also be me misinterpreting the paper.
The Deneuralized Do OK
Yahoo, Ask Jeeves and Google have made no claims to having similar neural net technology, though as Greg noted, a coauthor of the RankNet paper works at Google. How do the others do for that PBS query?
Ask Jeeves, pbs evolution videos, same number one page as MSN Search
Google, pbs evolution videos, same number one page as MSN Search
Yahoo, pbs evolution videos, number two page at MSN Search is number one; number one page at MSN Search is number two. Both are from the same PBS site.
So either as good or practically as good -- and the practically part in Yahoo's case could be argued as good, depending on your particular viewpoint.
New Commands
Aside from new ranking technlogy, the blog post notes new search commands are now being offered by MSN Search. These are:
inanchor: The command I've been hoping for, pleading for, lost to the search world since dropped from AltaVista two or three years ago. This is supposed to let you search through anchor text -- in other words, the text of links. Why would you use it? Want to know all the pages that really are linking to the official George W. Bush biography with the words miserable failure in the links? This type of command should let you find them. However, I can't get it to work! inanchor:miserable failure link:http://www.whitehouse.gov/president/gwbbio.html doesn't work, nor does inanchor:miserable failure or even just inanchor:miserable or inanchor:http://www.whitehouse.gov/president/gwbbio.html miserable failure or various other attempts I've tried. I'm following up. In the meantime, the link: command can still be used to sort of do this, though as I've wrtiten before, it's not perfect: Wishing For Better Anchor Text Searching.
filetype: Lets you find pages of a particular filetype, such as HTML files or Word documents. Though said to be new, I'm fairly certain it's been around for at least a few weeks. It's listed on the advanced search operators page, while the other new commands are not. It wasn't offered when the beta service came out last November.
inurl: - Lets you find pages that have text within a URL, as opposed to the url: command, which lets you find specific URLs listed in the index.
intitle: - Lets you find pages that have text within the title tag of a document. For multiple words, it appears to work if you surround the words with quotes (example) or parentheses (example) but not with the words on their own (example and example). If you are after words appearing in the title in no particular order, it seems to work to use the command in front of each word (example).
linkdomain: - Lets you find all pages that link to anywhere within a particular domain, as opposed to the link: command, which lets you find all pages linking to a particular URL. For example, all links to the US White House (569,343) versus links just to the official George W. Bush biography page (26,414).
contains: - Supposed to let you find pages with links to documents of a particular filetype. For example, the MSN blog says contains:wma should bring up pages that have links to WMA files. But when I did that search, the pages that came up didn't necessarily seem to have such links, such as this example which ranked second.
When Gary gets in, I'm going to ask him to bang away on the new commands as he tested before at the beta launch. And if search commands seem cool (they are), see C:\> YubNub For "Command Line" Searching & Search Commands For the Majors that guides you to commands that the other major search engine offer.
Want to discuss? Join our forum thread, June 2005 MSN Search Update & Neural Net Tech.Posted by Danny Sullivan at 8:27 AM | Permalink
Honestly, I didn't miss leaving behind the DOS command line and getting all GUI. But there are times when I'll still dig out the command prompt window on Windows XP to do something quickly. New service YubNub harkens to give you a command line interface for search. Come up with your own command (it wasn't working when I tried), and then anyone can make use of that on YubNub. For example:
who google.com
That brings back whois data on Google.com. Use who before any domain you want to check within YubNub to get results. Get the Firefox plug-in, and then any commands you find useful can easily be run from within your search bar.
There are a ton of different commands, some of which work, some of which don't. A full list is here; YubNub Golden Eggs is a list of what's considered the best or most interesting. By the way, if a command isn't recognized, you get Google search results back.
When adding commands works again, I'm going to grab the idea from this example and make a command to search all of Search Engine Watch via Google. It's easy to do the same for your own site. Just use:
http://www.google.com/search?q=site:YOURDOMAINHERE+%s
John Battelle has comments on the service here; Threadwatch here; and background from YubNub itself here.
Finally, remember that all search engines have command line-like interfaces of their own. Here are links with more info:
Posted by Danny Sullivan at 8:33 AM | Permalink