Just as the sheer volume and variety of data we collect and store has changed, so, too, has the velocity at which it is generated and needs to be handled. A conventional understanding of velocity typically considers how quickly the data is arriving and stored, and its associated rates of retrieval. While managing all of that quickly is good and the volumes of data that we are looking at are a consequence of how quick the data arrives, but there is more to the idea of velocity which we will try to explore in this blog.
More and more of the data being produced today has a very short shelf-life, so organizations must be able to analyze this data in near real time if they hope to find insights in this data. In traditional processing, you can think of running queries against relatively static data: for example, the query “Show me all people living in the New Jersey flood zone” would result in a single result set to be used as a warning list of an incoming weather pattern. But can we execute a process similar to a continuous query that identifies people who are currently “in the New Jersey flood zones,” and hope to get continuously updated results, because location information from GPS data is refreshed in real time?
Dealing effectively with Big Data may require that we perform analytics against the volume and variety of data while it is still in motion, not just after it is at rest. Consider examples from tracking neonatal health to financial markets; in every case, they require handling the volume and variety of data in new ways.
The velocity of large data streams power the ability to parse text, detect sentiment, and identify new patterns. Real-time offers in a world of engagement, require fast matching and immediate feedback loops so promotions align with geo location data, customer purchase history, and current sentiment. Key technologies that address velocity include streaming processing and complex event processing.
Sites worth visiting: