An inventory of Facebook's data business

An inventory of Facebook's data business

With Facebook going public, its big data model has attracted more and more attention. The newly built Facebook data center in Prineville, Oregon, is known as the world's most energy-efficient data center. What are the specific features of Facebook's data business? Let's take a brief look at it below.

Data collection

Timeline

Timeline, released in December 2011, mainly adjusts the "Profile". Facebook Profile is equivalent to a person's archive and information, or in layman's terms, a personal homepage. The new personal Profile is more visually impactful than previous versions. Facebook launched a new Timeline interface, which organizes the information posted by individuals on Facebook, such as status, pictures, videos, etc., and displays them in a more structured way, just like an autobiography on Facebook.

Like Button

This feature allows users to mark their favorite pages and include them in Facebook's search results, which is similar to Google's use of links between pages to determine search rankings. Facebook said: "As long as users click the 'Like' button, all websites that support the Open Graph protocol will be displayed in the search engine." Facebook will use the Open Graph protocol to further expand the scope of the search engine's index, thus posing a threat to Google.

Data storage

Memcached

It is a distributed memory cache system that Facebook uses as a cache layer between web servers and MySQL servers (because database access is relatively slow). Over the years, Facebook has made many optimizations to Memcached and its surrounding software, such as optimizations to the network stack. Facebook has tens of TB of data cached on thousands of Memcached servers at all times. It may be the largest Memcached server cluster in the world.

Haystack

Haystack is Facebook's high-performance image storage system, but strictly speaking, it is not limited to storing photos. It has to manage more than 20 billion uploaded photos, and each photo is saved in four different resolutions, so there are more than 80 billion photos. Not only does it have to be able to handle hundreds of millions of photos, but performance is also critical. Facebook processes about 1.2 million photos per second, and that's not including the CDN, which is a staggering number.

Cassandra

Cassandra is a distributed storage system that avoids single points of failure. It is a poster child for the NoSQL movement and has been open sourced. It has even become an Apache project. Facebook uses it in its inbox search, and other sites are using it too.

Data analysis

Hadoop Architecture

Hadoop is the most popular open source tool in distributed/parallel computing today. It is not only a distributed file system for storage, but also can be used to build a large number of cluster computers to achieve distributed storage and archiving of large-scale data sets. Facebook is a loyal user of Hadoop and a contributor to source code. Facebook has also contributed two important Hadoop components, Hive and Thrift, which are currently included in Apache's Hadoop subproject.

Hive

Hive originated from Facebook. It makes it possible to perform SQL queries on Hadoop, so that non-programmers can also use it easily. Hive is a data warehouse tool based on Hadoop. It can map structured data files into a database table and provide complete SQL query functions. It can convert SQL statements into MapReduce tasks.

Zookeeper, Thrift

Hadoop's subprojects also include Zookeeper distributed locks, which provide functions similar to Google Chubby. Thrift is Hadoop's cross-language interface that supports multiple languages, such as PHP and Ruby.

BigPipe

BigPipe is a dynamic web page processing system developed by Facebook. In order to achieve the best performance, Facebook uses it to process each web page in blocks (called "pagelets"). For example, chat windows, news feeds, etc. are transmitted separately in blocks. These pagelets can work in parallel, which not only improves performance, but also does not affect the normal access of users even if part of them fails or interrupts.

<<:  The correct way to "turn on" the "eye protection" desk lamp

>>:  What should diabetic patients pay attention to in their diet to stabilize their sugar control? What should they avoid and what should they do?

Recommend

Is empathy overrated?

Leviathan Press: Personally, I think that in a br...

Flu occurs every year, why is it so severe this year?

As winter approaches and the temperature drops sh...

Will the second menstruation be delayed?

After a woman's reproductive organs are fully...

How should postpartum hair loss be treated?

Many of our friends often experience symptoms of ...

Treatment of white spots on female vulva

Gynecological diseases are very harmful to women&...

What are the kidney-tonifying foods for women?

I believe everyone has heard of the disease of ki...

How much does it cost for women to have an IUD

After giving birth, you are finally free, but mot...

How to make menstruation come faster

Some girls always have delayed menstruation, and ...

Will uterine contractions be painful after childbirth?

After giving birth, the uterus of a pregnant woma...

Pain below 20 days after delivery

If you still feel pain down there 20 days after d...

What is the cause of small hard bumps on the vulva?

Women's special physiological structure makes...

Is it easy to lose temper during early pregnancy?

Many men said that their wives, who originally ha...

How to treat severe postpartum hair loss

After giving birth, the hair becomes oilier and i...

At which months of pregnancy does the fetus' head point downward?

I believe most people know that when giving birth...