
How big MNC’s stores, manages and manipulates Thousands of Terabytes of data with High Speed and High Efficiency?

If you ever think about how big companies like Facebook, YouTube, Amazon, Instagram, LinkedIn, Google, TikTok etc stores data and how much data they store? which technology they use? So here it is a small blog from me to elaborate this topic.
Nowadays in this pandemic situation, beside Facebook, Instagram, Whatsapp etc there are another platforms those are being used too much now. They are zoom, Tiktok, netflix etc. Only in Google Play platform Zoom is downloaded by 123 million of users. How Zoom, TikTok, Google, Facebook etc big big companies are providing their services all over the world. It is the most typical question to everyone and is a big challenge to each and every companies.
A Day of Data
How much data is generated in a day?
Here are some key daily statistics highlighted:
- 500 Million tweets are sent.
- 294 billion emails are sent.
- 4 Petabytes of data are created on Facebook.
- 4 Terabytes of data are created from each connected car.
- 65 billion messages are sent on Whatsapp.
- 5 billion searches are made.
By 2025, it’s estimated that 465 Exabytes of data will be created each day globally that’s the equivalent of 212,765,957 DVDs per day.
YouTube

Over 2 Billion logged-in users visit YouTube each month and every day people watch over a billion hours of video and generate billions of views.

There are 1.49 billion daily active users. 47% of Facebook users only access the platform through mobile. 83% of parents on Facebook are friends with their children. Facebook adds 500,000 new users every day; 6 new profiles every second.
Netflix

14. 23% of US adults stream Netflix on a daily basis. In comparison, this number was just 6% in 2011. And just to put this percentage in context, this equals over 57 million daily Netflix users in the US alone.
Netflix users spent a combined 140 million hours per day watching content in 2017.

A data center normally holds petabytes to exabytes of data. Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters. (Published in 2017)
TikTok

TikTok has over 500 million active daily users and a total of over 800 million downloads. The average TikTok user enjoys the app 53 minutes per day and 90% of users play with the app every single day.

- 660 million LinkedIn users, spread over 200 countries, November 2019.
- Over 30 million companies have profiles on LinkedIn.
- More than two new members join LinkedIn every second
BigData Concept:

By the word ‘BigData’, its meaning is looking like huge amount of data and actually it is. It is all about ExaByte, Yotta Byte data. And companies need storage to store this amount of data. But they doesn’t make any storage device those capacities are in ExaByte, Yotta Byte etc. Actually they doesn’t make any particular storage device. If they make, obviously it will work but the actual problem of storage will not be solved. If they use any particular storage to store data, this device will take lots of time. Then the question comes up. Why would it be? First of all, we have to understand the actual problem with bigdata. Here it is a simple example given below-

Nowadays everyone has a laptop or a smart Phone. Whatever it is mobile or computer, to run every OS we need minimum three requirements. Those are RAM, CPU, Hard disk. Even though if I transfer some GB of data from one drive to another drive it takes more than one minute. It is all about GigaBytes of data. If it takes more than one minute to transfer gigabytes of data, so big big companies have petabytes of data to transfer per day. Even though they have Yotta Bytes of storage device, to store Petabytes of data it will take lots of time to store. To solve this problem, most of the companies are using Distributed Storage and this is called Master-Slave Topology.
Master-Slave Topology

Assume that the upmost laptop is Master and the below three laptops are slaves. The slave laptops have 100 GB storage respectively and all three laptops are connected with the master laptop in parallel. Actually we are dividing the 300 GB storage in three laptops respectively. So that if master laptop want to store 300 GB of data, the data will be divided into three parts like 100 GB of data will be stored into one laptop and another 100 GB of data will be stored into another laptop and so on, because all three slave laptops are connected with master laptop in parallel. So it will take one third of time of storing 300 GB data in one hard disk. So the velocity of storing and retrieving data has become very less as compared to one master-one slave topology. The storage problem has also been solved. This concept is called Master-Slave Topology and nowadays most of the companies are using this topology to solve BigData problem. In the real world companies are using lakhs of Storage devices to make this topology.
What is Hadoop and why it matters in BigData?
Apache Hadoop is an open-source Big Data framework used for storing and processing Big Data and also for developing data processing applications in a distributed computing environment. Hadoop-based applications run on large datasets that are spread across clusters of commodity computers which are cheap and inexpensive. So, you get the computational power of an extensive cluster network at an economically feasible cost. Hadoop’s distributed file system structure allows for concurrent processing and fault tolerance.
Features of Hadoop
- It is best-suited for Big Data analysis
- It is scalable
- It is fault-tolerant
In conclusion
Hadoop is a technology of the future. Sure, it might not be an integral part of the curriculum, but it is and will be an integral part of the workings of an E-commerce, finance, insurance, IT, healthcare are some of the starting points.
Thank you.