Promised over a month ago, Yahoo Site Explorer is now reality. Yahoo gives the heads-up to everyone here on its Yahoo Search Blog, and how it will show you all pages within a domain, within a particular directory of a domain, all inbound links to a domain and the ability to bulk submit (which was already live earlier and explained more in our our earlier post). You can also access through a new Site Explorer API or export data for further analysis. More details also on the help page. Below is a further exploration of Site Explorer from me.
See All Pages From ALL Domains
It's important to understand that while this is "Site Explorer," a web site might have multiple domain names. In addition, a "root" domain search may bring back matches from completely different web sites that you operate. So in some ways, this is "Domain Explorer."
For example, if I enter http://searchenginewatch.com, I get back 158,895 pages. These pages from these different web sites:
- searchenginewatch.com (10,673 pages, from the "main" Search Engine Watch site)
- www.searchenginewatch.com (10,013 pages, still from the "main" Search Engine Watch site, but technically a different site to Yahoo, because it can be reached with the www prefix)
- forums.searchenginewatch.com (124,274 pages from the Search Engine Watch Forums web site)
- blog.searchenginewatch.com (11,418 pages from the Search Engine Watch Blog web site)
See All Pages From ONE Domain
Don't want to match all of your domains and subdomains? Then get specific! A search for http://forums.searchenginewatch.com comes back with 124,274 pages from our forums.searchenginewatch.com site, for example. Just indicate the exact domain you are after.
But how do you indicate an exact domain when it may be the root domain you are interested in? In other words, what if I want to see all the pages for searchenginewatch.com but NOT match all the subdomains like forums.searchenginewatch.com and so on.
Do your search, then use the "Only this domain" link that comes up on the results page. Look how that brings back just 10,673 pages solely from searchenginewatch.com.
Oddly, I found that this option came up even for subdomains that have no further subdomains. For example, blog.searchenginewatch.com has no other domain under it. Nevertheless, I get an option to see matches from "Only this domain," even though blog.searchenginewatch.com really should only be matching that specific domain. But if I use that option, the count drops from 11,418 matches to 11,334 matches. I'm checking with Yahoo about this.
See All Pages From Within A PORTION/DIRECTORY Of One Domain
Want to see all the pages from within a particular section or directory of your web site? Just enter the directory location. For example, all of our SearchDay articles are within the http://searchenginewatch.com/searchday/ area of Search Engine Watch. So if I search for that, I'm shown just the pages within that section of the site.
Sadly, can can't pattern match. For example, on the SEW Blog, all stories from a particular day have the date as part of the file name, in yymmdd format. So if I want to see all stories written in 2005, I know they'll all have 05 as part of their file name. So ideally, I could enter:
And get back a list. Ideally! In reality, that doesn't work. I'm passing it along as a wish list item to Yahoo.
See Links To A Specific Page
Yes, you can see a single page. Why bother? Because then you can see all the links pointing to that page, if you make use of the Inlinks option.
For example, if I enter http://blog.searchenginewatch.com/blog/050118-204728, a page comes back listing that URL. Look below the search box, and you'll see an "Inlinks" link/option that appears. If I click on that, I get a new page showing me all the all the people linking to my page.
Unfortunately, Inlinks will also show any of your OWN links, as well as external links. In other words, here are the first five Inlinks I'm shown for that page:
See the first two in bold? Those are my own links. Most site owners tend not to want to see their own links. Hopefully, we'll see Yahoo add this option such as allowing you to use a minus command with a site name. For instance, I can do this search at MSN Search:
See the part in bold? That tells MSN Search to show me all links to the specific page I've listed but to remove any links from the searchenginewatch.com domain (and subdomains). Only MSN of the four major search engines (Google, Yahoo, MSN & Ask Jeeves) allows you to do this. It's another item I'm checking with Yahoo about on having changed.
See Links To Specific Domain
Want to see all the links pointing at a particular site? Enter the site's domain, then use the Inlinks option, then use the Entire Site option that will show up under the search box when the page reloads.
For example, say I want all the pages linking to the blog.searchenginewatch.com site. I enter that, then I click on the Inlinks option below the search box that brings up a new page of results showing all links to that page (13,398), then I use the Entire Site option below the search box to see all links (198,510) to the blog.searchenginewatch.com domain.
This ought to be much easier, such as through search commands, which I'll touch on below.
See Links To ALL Domains
Earlier, I explained how searching for a domain matches all subdomains. So what happens if I search for all links to a domain that also has subdomains. I tried with http://searchenginewatch.com using the Entire Site option. Unfortunately, I can't tell if the matches are just for searchenginewatch.com or also including subdomains like forums.searchenginewatch.com. I'm checking with Yahoo on this.
Want export your page or link lists? After you do a search, there's a "Export results to: TSV" option over near the top right-hand side of results. But don't get too excited:
- It only exports 50 items. So if you have 4,000 links you want to review, you only get 50 AND
- It only exports the FIRST 50 items, no matter what page you are looking at.
In other words, say I do a search for all pages at http://searchenginewatch.com, like this. I get 158,628 matches. I can only export 50 of those, the first 50.
You can test it yourself. Clicking the next button, I get matches 51-100. Exporting still shows me matches 1 through 50.
URL Hacking For Fast Jumps
FYI, don't want to "Next" your way through the results? Add this to the end of your search:
Replace ### with the results you want to jump to. These have to be in increments of 50 ending in 51. For example, want to see results 251 to 300? An initial search for http://searchenginewatch.com brings back this URL:
Now I append to that &b=251, since that's where I want to jump to:
And so you jump. Want to go to results 501 to 550? Then use &b=501. And technically, the numbers don't have to end in 1. For example, say I want to jump to result 645. I could just enter &b=645, like this:
However, the results will always begin with the hundred-and-first or fifty-first listing. IE:
- Results 1 to 50
- Results 51 to 100
- Results 101 to 150
- Results 151 to 200
And so on. So even though I entered wanting to jump to results 645, the list will begin at 601, and I have to scroll down to find it.
More URL Hacking
Looks like you don't really need the beginning character encoding bit or the ending bit or the, so you could shorten to this if you wanted:
Hmm. So what are the two key parts of this:
Not sure what the last does, but you can drop it and the results are the same, leaving you with:
So what's bwm controlling? Whether you see pages (p) or inlinks (i). Want inlinks? You change the p to an i:
Now it's links showing and by default, they'll be all links to the URL. Click to see Entire Site, and you get this:
See the part I've bolded? The letter s is showing that you see all links to the entire site. Change it to a U,
Now you get all links to the URL.
Give Me Commands!
Hacking is fun but a painful way to get what you want. I want to have commands, especially commands we are used to using on Yahoo. For example, this:
Appears to be working again (it wasn't for several weeks). I'm checking on whether it really is showing links and, if so, if Site Explorer is showing more accurate information. The idea behind Site Explorer was to take these webmaster-oriented lookups off the main site and help stop polluting the query stream there, which I entirely agree with.
I don't care if the main site no longer will work to show link lookups. As long as Site Explorer exists, that's fine. But I want commands. I don't want to go through two pages of results to see all the links pointing to a particular site. I want to be able to do something like:
And see all the pages pointing at pages in the searchenginewatch.com domain specifically without counting any from the domain. Or:
And see all links pointing at any page in all searchenginewatch.com domains, including subdomains, and not counting my own links. Or:
linkalldomain:searchenginewatch.com -site:searchenginewatch.com export:all
And get ALL my links exported for download!
How About The API
I mentioned about there's a new Site Explorer API you can use to tap into this data. I'm not a programmer, so that's especially why I want to have ways to access this type of data fully without having to resort to programming.
However, having said I'm not a programmer, from what I read about the APIs, you can't seem to get more than 1000 results. I want more than that. Heck, I'll pay for that service. Let's have it! Well, I'll at least ask Yahoo if we can have it and report back.
Feeds & Dates
Yahoo doesn't mention that you can do a Yahoo Site Explorer search and get your results back as a feed. If your feedreader supports autodiscovery, it will automatically find the feed URL. Don't know what autodiscovery is or need to find the URL manually? Yahoo Gains RSS Feeds For Web Search & Discovering Auto-Discovery explains what to do. Basically, view source and look for the line that says:
Then look further for the URL that begins:
That's the feed URL. It would be a lot easier if they'd just add a visible feed link, however. Yeah, I'm asking about this.
Don't get too excited about feeds, however. From what I can tell, now matter what search you do, you'll still only get back the first 10 matching PAGES for a URL you enter. Even if you change to see links, you'll still get sent top pages. Newsgator also sees these as modified posts, even though nothing has really changed. That means I get the same stuff every 5 to 15 minutes or so.
I'm changing some things on my end, but even then, there's no ranking by freshness in Site Explorer. And for links, that would be great. Give us an option to see the newest links pointing at our pages, so we can have some of the same sense of how our site is doing that people in the blog world get about their posts.