So in my last post, I was talking about Business Intelligence. And one may think that now we have a good handle over my data, I am all set for making wonderful business decisions and everything is good ever after. If the world was so fair!
Hmm, what do you mean? What happened??
Before we could munch the data (or byte the data ), the volume of data increased.
How much? Gigabytes? Terrabytes?
Data is pouring in from every conceivable direction: from operational and transactional systems, from scanning and facilities management systems, from inbound and outbound customer contact points, from mobile media and the Web. The following facts will help to understand what I am talking about:
- Wal-Mart handles more than a million customer transactions each day and imports those into databases estimated to contain more than 2.5 petabytes of data.
- Radio frequency identification (RFID) systems used by retailers and others can generate 100 to 1,000 times the data of conventional bar code systems.
- Facebook handles more than 250 million photo uploads and the interactions of 800 million active users with more than 900 million objects (pages, groups, etc.) – each day.
- More than 5 billion people are calling, texting, tweeting and browsing on
mobile phones worldwide
- The Large Hadron Collider at CERN the European Organization for Nuclear Research can generate 40 terabytes every second during experiments
We have officially entered the Big Data era of computing. And the hopeful vision of big data is that organizations will be able to harvest and harness every byte of relevant data and use it to make the best decisions. Big data technologies should not only support the ability to collect large amounts, but more importantly, the ability to understand and take advantage of its full value.
Defining big data
Let’s define big data now. Big data is broadly defined as the capture, management, and analysis of data that goes beyond typical structured data, which can be queried by relational database management systems — often to unstructured files, digital video, images, sensor data, log files, and really any data not contained in records with distinct searchable fields. In some sense, the unstructured data is the interesting data, but it’s difficult to synthesize into BI or draw conclusions from it unless it can be correlated to structured data.