How Will Wikia Grow The Index?

Author

Date published January 7, 2008 Categories

Social

When Wikia Search was released last night, Jimmy Wales explained they used a “placeholder index” for the search. While this may be appropriate for the alpha search, I’d like to ask Jimmy exactly how Wikia plans to crawl and index a significant portion of the web.

The Grub distributed crawler, which was acquired from LookSmart, appeared to provide most of the solution. At the O’Reilly Open Source Convention, Wales announced that he would immediately release the crawler to the open source community.

By downloading the client, Grub allows “the site owners the option of crawling their own data, with their own bandwidth. The client…is designed to connect to a central coordinating server, grab a batch of URLs, and then proceed to crawl them.” It claims 20:1 savings in bandwidth for both Wikia and the hosting website.

Since the summer, I’m not sure how much progress Wikia has made here. Within Grub’s site stats, there’s a “Wikia Search” team that crawled around 918k URLs so far. That seems far too low.

Site stats about Grub members tell a more complete story, as the top 100 members crawled 350 million URLs so far. The remaining 293 members aren’t shown, but if we assume 250k on average, then 425 million URLs would have been crawled in total.

There are other planning considerations too, regarding what belongs in the index. Will they be able to include the “right” domains or exclude the “wrong” domains? Will they be able to crawl some domains more or less frequently? Will video, images or other media be included?

We would be interested in knowing the game plan for developing a substantial index over time. It’s not just about numbers, although a billion or two could help with a 2009 launch.

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Twitter Cards: A Quick Start Guide

2014 Social Media Tricks, Tools & Trends

13 Twitter PR Secrets to Report News, Gain Publicity, & Build Relationships

Quora Best Practice Tips for Brands

Pinterest & Newspapers: No Pins, No Wins [Study]

A Cure For the C.O.L.D. (Casualty of Linking Distraction)

Google+ Now Has Custom URLs for Pages, Profiles

Social SEO – Facebook & Twitter Best Practices

Follow us

How Will Wikia Grow The Index?

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Twitter Cards: A Quick Start Guide

2014 Social Media Tricks, Tools & Trends

13 Twitter PR Secrets to Report News, Gain Publicity, & Build Relationships

Quora Best Practice Tips for Brands

Pinterest & Newspapers: No Pins, No Wins [Study]

A Cure For the C.O.L.D. (Casualty of Linking Distraction)

Google+ Now Has Custom URLs for Pages, Profiles

Social SEO – Facebook & Twitter Best Practices

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.