Behind the Scenes at the Daypop Search Engine, Part Three

This concludes a three part interview with Dan Chan, founder and sole proprietor of Daypop, a specialized search engine focusing on weblog and news content (part one, part two).

Q. Although Daypop crawls thousands of blogs, do you think the current popularity of blogs as a fad? Have you ever studies how often certain types of blogs are updated.

I think blogs are here to stay. It’s a great format and it’s very good at disseminating information. As far as its current rise in popularity, I think blogging will plateau out at some point soon. There are only so many people out there with the personality to 1) write publicly, 2) keep at it. Case in point, I used to blog (outside of the Daypop Weblog) but stopped writing to my blog a while ago.

Q. If Google called you today and wanted you to tell them a few things they could do to improve the engine, what would you tell them?

I’ve noticed that PageRank seems to far outweigh Relevance when it comes to searches that I’m personally interested in. There are plenty of times when I feel like nearness of terms should outweigh the PageRank. These usually occur in the context of the type of searches that I do which are highly targeted technical questions. Many times an “important” page is ranked highly with really no relevance to my search. The ability to specify phrases in quotes is too restrictive when all I really want is nearness of terms.

Overall, I know it’s a balancing act, that black art of relevance weighting, and most probably Google has it tuned to respond to 99% of the searches that are being done out there.

One workaround is to make PageRank’s contribution to the final score inversely proportional to the number of search terms entered by a user. The idea is that a large multi-term search is more likely more highly “targeted” than a simple search for say Britney Spears. At least this would potentially solve my specific problem (while most likely causing other problems).

There’s also the concept of Contextual Weighting of the search results. That relevancy can be improved if a search term can be categorized in some fashion to determine the context of the word. Are you searching for Jaguar, the car, or Jaguar, the feline? Doing this automatically with the limited contextual information of a common two term search might just be enough. Or to go further, Search Personalization techniques could be used.

Of course, there’s also freshness of the index. The ideal search engine would allow near real-time search of the web. This would involve incremental updates to the index on the order of minutes or seconds instead of days (or months as it used to be) and very intelligent crawling schedules in addition to more spiders.

Q. Would you tell us about the Daypop relevancy algorithm? Aside from link analysis what other factors influence the order of results?

I think the most important factor in determining relevancy in Daypop is nearness of search terms — the closer your search terms appear on a page, the more relevant it is to your search.

This is a fundamental difference in the way Daypop operates compared to some other search engines. Many search engines only “know” whether or not a page contains a certain word and perhaps how many times that word appears on that page. Daypop goes a step further and stores off for each page, all instances of all words on that page along with position and contextual information.

The position information is used to determine that “search” and “engine” often appear next to each other. The contextual information is an approximation to how important those word instances are in the context of the page, by measuring font size or emphasis.

Q. Dan, before we conclude could you offer us ten Daypop search tips?

1. Daypop Weblog

Most “undocumented secrets” are documented here.

2. Citations

You can find citations to any page in Daypop’s index by searching for link:page_url Here’s an example for pages linking to the Top 40.

For any search result that is a weblog, there is a Citations link that leads directly to a list of citations for that blog. So looking at the page above, you’ll see a list of blogs that mention the Daypop Top 40. Clicking on any of their Citations links will bring up the blogs that mention them along with Citations links. In this way, you can “hop” from blog to blog checking out the link structure of the blogosphere. Searching for Citations is a good way of determining the popularity of a page.

3. My Blogstats

A relatively new feature that generates a list of similar and related blogs, as well as citing blogs for any blog in the index. Similar, in this case, is defined as similar in word content, while related means one degree of separation. This is a good way to find other blogs that may be of interest to you. You don’t need to plug in your own personal URL. You could check up on blogs that are similar to blogs that you enjoy to try and discover something new.

4. Searching RSS

You can search RSS feeds using Daypop. Not many people know this, but using the pull-down menu you can search News Headline feeds. Also, a little known feature is Daypop’s ability to search Weblog Posts by setting t=p in the URL. This option has yet to be put into the pull-down menu. Here is an example.

5. More Headlines

When searching RSS, there’s a More Headlines link below each result. This link leads to a search for all headlines from that RSS feed. This is great for discovering new, interesting feeds from your searches.

6. Custom News Feeds

Daypop outputs RSS 0.91 on almost everything. Even search results. So you can create your own custom news feeds for your news aggregator.

7. Little Icons

There are little icons next to each search result. A red [N” for a News Article and an orange [W” for Weblogs. A blue [H” for News Headlines, gathered from RSS feeds of online news sites, and a greenish [P” for Weblog Posts, indexed from weblog RSS feeds. These icons make it easy to categorize the results.

8. Advanced Search

You can restrict your search to a specific period of time using Advanced Search. You can also search pages in different languages and from different countries. Not many search engines (none that I know of) have the ability to limit to countries.

9. The Little Blue Box

On every search page, there’s a little blue box that offers search modifiers that are extremely useful. The box allows you to narrow your search to headlines or news or weblogs. You can also sort by relevance or date. By default, Daypop sorts by relevance but sometimes you just want the newest articles. You can also search just the page titles which helps narrow your search to pages that specifically deal with your search terms. And also, you can narrow by language to one language easily with the pull-down menu. These features go a long way towards filtering your search results and helping you find what you’re looking for.

10. Narrow Your Search

You can be specific with Daypop when searching for current events. Since Daypop is a full-text search engine and not just a search that indexes article titles, being specific gets you results.

Dan, thank you.

Gary Price is the publisher of ResourceShelf, a weblog covering the online information industry.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

‘Why We Want to Make the Internet Chinese’
World Press Review Sep 10 2003 12:33PM GMT
comScore Media Metrix Launches Reach/Frequency Analysis System
Media Post Sep 10 2003 7:12AM GMT
China blocks spam servers
ZDNet Sep 10 2003 2:42AM GMT
Yahoo: Would you pay to open up IM?
ZDNet Sep 10 2003 1:21AM GMT
Overture tests local service
CNET Sep 9 2003 10:08PM GMT
Niche players can charge premium to find what Google can’t
IHT Sep 9 2003 10:07PM GMT
Google Tweaks Froogle
Research Buzz Sep 9 2003 4:07PM GMT
Speaking of Google … New Country Site for Google News
Research Buzz Sep 9 2003 4:07PM GMT
A gaggle of specialty search engines
IHT Sep 9 2003 3:44PM GMT
Google the big engine
Guardian Unlimited Sep 9 2003 2:52AM GMT
Marshall Field’s to Showcase Yahoo
Media Post Sep 8 2003 10:49PM GMT
Judge Rules in Favor of Pop-Ups
Wired News Sep 8 2003 10:17PM GMT
PIR: .org Domains Now Modify in Five Minutes
theWHIR Sep 8 2003 5:29PM GMT
powered by

Related reading

Search engine results: The ten year evolution
Five ways PPC customer support can help SMBs
#GoogleDoBetter The latest on internal issues at Google and Alphabet
Google Sandbox Is it still affecting new sites in 2019