Facebook on Graph Search Posts Index: 700 TB of Data & 100+ Ranking Factors

Author

Miranda Miller

Date published October 25, 2013 Categories

Social

What does it take to power a social search index with over a trillion total posts and hundreds of terabytes of data? Facebook search quality and ranking engineer Ashoat Tevosyan shared a peek under the Graph Search hood in a post on the Facebook Engineering page yesterday.

Tevosyan highlighted the challenges facing Facebook as they gave users the ability to search posts in Graph Search, a feature added last week:

“Facebook’s underlying data schema reflects the needs of a rapidly iterated web service. New features often demand changes to data schemas, and our culture aims to make these changes easy for engineers. However, these variations makes it difficult to sort posts by time, location, and tags as wall posts, photos, and check-ins all store this information differently.”

Facebook sorts and indexes on over 70 different types of data kept in a production SQL database. Their search engine, Unicorn, is an inverted index framework with capabilities including index building and data retrieval; raw data is converted and separated into two parts to work within it. Document data contains the post data Facebook uses to rank results, while the inverted index is more typical of a traditional search index, in that it goes through each post to determine which hypothetical search filters match.

The Graph Search posts index is much larger than any other at Facebook, Tesovyan wrote. They had to move from RAM (which worked well for the smaller indices) to solid-state flash memory to accommodate the more than 700 terabytes of data in the posts index.

For a bit of perspective, consider that Amazon, with its over 59 million active customers, has about 42 terabytes of data to deal with. YouTube, where over 100 million videos are watched daily, holds at least 45 terabytes of video in their database. Google is mum on the size of their database, though we know they answer over a billion queries daily. Each query is stored and over the course of just a year, Google packs away data for over 365 billion queries. Even back in 2008, they were processing 20,000 terabytes of data daily.

The ability to search posts was born out of a company Hackathon, he explained.

“My second day as a Facebook intern coincided with a company-wide Hackathon, and I spent the night aiming to implement a way for my friends and me to find old posts we had written. I quickly discovered that the project was much more challenging than I had first anticipated. However, the engineering culture at Facebook meant that I was supported and encouraged to continue working on it, despite the scope of the project. The majority of the work–infrastructure, ranking, and product–has been accomplished in the past year by a few dozen engineers on the Graph Search team,” Tevosyan wrote.

Tevosyan also shared the fact that Facebook uses over 100 distinct ranking features in their post result scoring, in order to serve up the most relevant Graph Search content to users. Before a query reaches that ranking system, though, it is rewritten, which “involves tacking on optional clauses to search queries that bias the posts we retrieve towards results that we think will be more valuable to the user.”

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Facebook on Graph Search Posts Index: 700 TB of Data & 100+ Ranking Factors

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Quora and Reddit: Powerhouses for SEO and marketing in 2021

Cross-channel marketing: why you shouldn’t put all your eggs in the Google ...

Pinterest SEO guide: Eight tips for search-friendly pins

How C-suite derives business value from social media: Q&A with Hootsuite’s ...

How social media influence 71% consumer buying decisions

Top six reasons you should caption your social media video content

Google’s featured snippets: How to get your YouTube video featured in Googl...

Top 15 Chrome extensions for social media marketers

Follow us

Facebook on Graph Search Posts Index: 700 TB of Data & 100+ Ranking Factors

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Quora and Reddit: Powerhouses for SEO and marketing in 2021

Cross-channel marketing: why you shouldn’t put all your eggs in the Google ...

Pinterest SEO guide: Eight tips for search-friendly pins

How C-suite derives business value from social media: Q&A with Hootsuite’s ...

How social media influence 71% consumer buying decisions

Top six reasons you should caption your social media video content

Google’s featured snippets: How to get your YouTube video featured in Googl...

Top 15 Chrome extensions for social media marketers

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.