An inventory of Facebook's data business

An inventory of Facebook's data business

With Facebook going public, its big data model has attracted more and more attention. The newly built Facebook data center in Prineville, Oregon, is known as the world's most energy-efficient data center. What are the specific features of Facebook's data business? Let's take a brief look at it below.

Data collection

Timeline

Timeline, released in December 2011, mainly adjusts the "Profile". Facebook Profile is equivalent to a person's archive and information, or in layman's terms, a personal homepage. The new personal Profile is more visually impactful than previous versions. Facebook launched a new Timeline interface, which organizes the information posted by individuals on Facebook, such as status, pictures, videos, etc., and displays them in a more structured way, just like an autobiography on Facebook.

Like Button

This feature allows users to mark their favorite pages and include them in Facebook's search results, which is similar to Google's use of links between pages to determine search rankings. Facebook said: "As long as users click the 'Like' button, all websites that support the Open Graph protocol will be displayed in the search engine." Facebook will use the Open Graph protocol to further expand the scope of the search engine's index, thus posing a threat to Google.

Data storage

Memcached

It is a distributed memory cache system that Facebook uses as a cache layer between web servers and MySQL servers (because database access is relatively slow). Over the years, Facebook has made many optimizations to Memcached and its surrounding software, such as optimizations to the network stack. Facebook has tens of TB of data cached on thousands of Memcached servers at all times. It may be the largest Memcached server cluster in the world.

Haystack

Haystack is Facebook's high-performance image storage system, but strictly speaking, it is not limited to storing photos. It has to manage more than 20 billion uploaded photos, and each photo is saved in four different resolutions, so there are more than 80 billion photos. Not only does it have to be able to handle hundreds of millions of photos, but performance is also critical. Facebook processes about 1.2 million photos per second, and that's not including the CDN, which is a staggering number.

Cassandra

Cassandra is a distributed storage system that avoids single points of failure. It is a poster child for the NoSQL movement and has been open sourced. It has even become an Apache project. Facebook uses it in its inbox search, and other sites are using it too.

Data analysis

Hadoop Architecture

Hadoop is the most popular open source tool in distributed/parallel computing today. It is not only a distributed file system for storage, but also can be used to build a large number of cluster computers to achieve distributed storage and archiving of large-scale data sets. Facebook is a loyal user of Hadoop and a contributor to source code. Facebook has also contributed two important Hadoop components, Hive and Thrift, which are currently included in Apache's Hadoop subproject.

Hive

Hive originated from Facebook. It makes it possible to perform SQL queries on Hadoop, so that non-programmers can also use it easily. Hive is a data warehouse tool based on Hadoop. It can map structured data files into a database table and provide complete SQL query functions. It can convert SQL statements into MapReduce tasks.

Zookeeper, Thrift

Hadoop's subprojects also include Zookeeper distributed locks, which provide functions similar to Google Chubby. Thrift is Hadoop's cross-language interface that supports multiple languages, such as PHP and Ruby.

BigPipe

BigPipe is a dynamic web page processing system developed by Facebook. In order to achieve the best performance, Facebook uses it to process each web page in blocks (called "pagelets"). For example, chat windows, news feeds, etc. are transmitted separately in blocks. These pagelets can work in parallel, which not only improves performance, but also does not affect the normal access of users even if part of them fails or interrupts.

<<:  The correct way to "turn on" the "eye protection" desk lamp

>>:  What should diabetic patients pay attention to in their diet to stabilize their sugar control? What should they avoid and what should they do?

Recommend

What causes swelling and pain in the urethra after childbirth?

During the confinement period, women's hormon...

Can pregnant women drink Bingquan tofu pudding?

During pregnancy, there are many foods that pregn...

Introduction to women's menopausal age

The time when women enter menopause is different,...

What to do if you have bleeding during intercourse when your period is coming

The cause of bleeding after sexual intercourse ma...

Threatened abortion secretion picture

For expectant mothers in the early stages of preg...

Why did I get pregnant during the safe period?

Women may get pregnant when having sex during the...

Children are at high risk of bone tumors, so pay attention to these situations!

Author: Niu Xiaohui, Chief Physician of Beijing J...

Leucorrhea test negative

Whether the leucorrhea is normal or not is relate...

What is the black stuff in gynecological inflammation

Gynecological inflammation is a disease that is f...

What is the chance of pregnancy with one deep and one shallow

Many women who are preparing for pregnancy prepar...

What causes liver pain in girls

In my country, the morbidity and mortality rates ...

Is it painful to insert the IUD or to remove it?

Our country implements the basic national policy ...