In the Part 1 of this series on Big Data, I got the following comment by Venkataramana:
‘Big’ data that is being buzzed all over seems to mostly refer to unstructured information like Tweets, Facebook updates, Google searches etc., Do you think even the transactional data is being considered ‘Big’ ? My personal perception is that transactional data being structured in nature can always be subject to better filtering and can always be limited to ‘practical’ sizes in the current computing limits.
I am happy that someone actually is reading the blogs and providing some more churning of thoughts by their insightful comments. In this post I wish to share some more insights on the Volume aspect of Big Data in response to the above comment.
Lets define Big Data First…
A reasonable definition of what people refer to as ‘Big Data’ is information that can’t be processed or analyzed using traditional processes or tools. This is due to three reasons – Volume, Variety (that was mentioned in the comment) and Velocity.
Now talking about Volume…
My perception is that the Volume of even the so called structured data (residing in nice well defined schemas in some database) is also growing off the limit. What makes it off the limit is not the storage capacity of the database rather how the traditional tools are equipped to sift through them to deliver meaningful insights. The following examples will help to understand how the data getting generated can increase exponentially:
Taking smart phone out of the holster generates an event; When commuter’s train door opens, an event is generated; check-in for a plane, swiping badge for work, buy a song on iTunes, change the TV channel, take an electronic toll route – everyone of this generates data. Need more? The St. Anthony Falls Bridge in Minneapolis has more than 200 embedded sensors positioned at strategic points to provide a fully comprehensive monitoring system where all sorts of detailed data is collected and even a shift in temperature and the bridge’s concrete reaction to that change is available for analysis.
As the amount of data available to the enterprise is on the rise, the percent of data it can process, undersand, and analyze is on the decline, thereby creating the blind zone. What’s in the blind zone? You don’t know: It might be something great, or may be nothing at all, but the “don’t know” is the problem (or the opportunity, depending how you look at it).
To make the problem and the opportunity look “Real”, I plan to discuss a business case in my next blog where “Big Data” solution solved some “Volume” issues of a client. Stay tuned!