An inventory of Facebook's data business

An inventory of Facebook's data business

With Facebook going public, its big data model has attracted more and more attention. The newly built Facebook data center in Prineville, Oregon, is known as the world's most energy-efficient data center. What are the specific features of Facebook's data business? Let's take a brief look at it below.

Data collection

Timeline

Timeline, released in December 2011, mainly adjusts the "Profile". Facebook Profile is equivalent to a person's archive and information, or in layman's terms, a personal homepage. The new personal Profile is more visually impactful than previous versions. Facebook launched a new Timeline interface, which organizes the information posted by individuals on Facebook, such as status, pictures, videos, etc., and displays them in a more structured way, just like an autobiography on Facebook.

Like Button

This feature allows users to mark their favorite pages and include them in Facebook's search results, which is similar to Google's use of links between pages to determine search rankings. Facebook said: "As long as users click the 'Like' button, all websites that support the Open Graph protocol will be displayed in the search engine." Facebook will use the Open Graph protocol to further expand the scope of the search engine's index, thus posing a threat to Google.

Data storage

Memcached

It is a distributed memory cache system that Facebook uses as a cache layer between web servers and MySQL servers (because database access is relatively slow). Over the years, Facebook has made many optimizations to Memcached and its surrounding software, such as optimizations to the network stack. Facebook has tens of TB of data cached on thousands of Memcached servers at all times. It may be the largest Memcached server cluster in the world.

Haystack

Haystack is Facebook's high-performance image storage system, but strictly speaking, it is not limited to storing photos. It has to manage more than 20 billion uploaded photos, and each photo is saved in four different resolutions, so there are more than 80 billion photos. Not only does it have to be able to handle hundreds of millions of photos, but performance is also critical. Facebook processes about 1.2 million photos per second, and that's not including the CDN, which is a staggering number.

Cassandra

Cassandra is a distributed storage system that avoids single points of failure. It is a poster child for the NoSQL movement and has been open sourced. It has even become an Apache project. Facebook uses it in its inbox search, and other sites are using it too.

Data analysis

Hadoop Architecture

Hadoop is the most popular open source tool in distributed/parallel computing today. It is not only a distributed file system for storage, but also can be used to build a large number of cluster computers to achieve distributed storage and archiving of large-scale data sets. Facebook is a loyal user of Hadoop and a contributor to source code. Facebook has also contributed two important Hadoop components, Hive and Thrift, which are currently included in Apache's Hadoop subproject.

Hive

Hive originated from Facebook. It makes it possible to perform SQL queries on Hadoop, so that non-programmers can also use it easily. Hive is a data warehouse tool based on Hadoop. It can map structured data files into a database table and provide complete SQL query functions. It can convert SQL statements into MapReduce tasks.

Zookeeper, Thrift

Hadoop's subprojects also include Zookeeper distributed locks, which provide functions similar to Google Chubby. Thrift is Hadoop's cross-language interface that supports multiple languages, such as PHP and Ruby.

BigPipe

BigPipe is a dynamic web page processing system developed by Facebook. In order to achieve the best performance, Facebook uses it to process each web page in blocks (called "pagelets"). For example, chat windows, news feeds, etc. are transmitted separately in blocks. These pagelets can work in parallel, which not only improves performance, but also does not affect the normal access of users even if part of them fails or interrupts.

<<:  The correct way to "turn on" the "eye protection" desk lamp

>>:  What should diabetic patients pay attention to in their diet to stabilize their sugar control? What should they avoid and what should they do?

Recommend

What can you eat to induce menstruation? Try these

If a female friend's menstruation is often de...

Wrinkles at the corners of eyes when closed

I believe that no one wants to see wrinkles on th...

My butt hurts so much during pregnancy that I can't walk

After a woman becomes pregnant, her body becomes ...

How long does the ovarian follicle survive after it is released?

How long can the egg survive after it is released...

Is grade 3 cervical lesion cancer?

Some female friends were diagnosed with cervical ...

How to treat iron deficiency anemia? Five steps

Iron is one of the important trace elements neede...

Can I eat cucumbers during confinement?

Cucumber is a very common food ingredient in our ...

There is a hard lump just below the belly button in women

The female body structure is very unique, especia...

How to regulate women's dampness

Many of our female friends have been prone to dam...

The first time I was afraid of pain, I drank to boost my courage

Most people have a certain fear when having sex f...

Fortune telling based on menstrual holidays

Menstruation is a normal physiological phenomenon...

What to do if you dye your hair during an unexpected pregnancy

Generally, women who are preparing for pregnancy ...

Is it normal to discharge fleshy tissue after abortion?

In the case of an unexpected pregnancy, women wil...