Stream Computing and need for it:
Stream Computing refers to Analysis of data in motion (before it is stored).
Stream computing becomes essential with certain kinds of data when there is no time to store it before acting on it. This data is often (but not always) generated by a sensor or other instrument. Speed matters for use cases such as fraud detection, emergent healthcare, and public safety where insight into data must occur in real time and decisions to act can be life-saving.
Streaming data might arrive linked to a master data identifier (a phone number, for example). In a neonatal ICU, sensor data is clearly matched to a particular infant. Financial transactions are unambiguously matched to a credit card or Social Security number. However, not every piece of data that streams in is valuable, which is why the data should be analyzed by a tool such as IBM InfoSphere Streams before being joined with the master data.
IBM InfoSphere Streams
IBM InfoSphere Streams analyzes these large data volumes with Micro-Latency. Rather than accumulating and storing data first, the software analyzes data as it flows in and identifies conditions that trigger alerts (such as outlier transactions that a bank flags as potentially fraudulent during the credit card authorization process). When this situation occurs, the data is passed out of the stream and matched with the master data for better business outcomes. InfoSphere Streams generates a summary of the insights that are derived from the stream analysis and matches it with trusted information, such as customer records, to augment the master data.
IBM InfoSphere Streams is based on nearly 6 years of effort by IBM Research to extend computing technology to rapidly do advanced analysis of high volumes of data, such as:
- Analysis of video camera data for specific faces in law enforcement applications
- Processing of 5 million stock market messages per second, including trade execution with an average latency of 150 microseconds
- Analysis of test results from chip manufacturing wafer testers to determine in real time if there are failing chips and if there are patterns to the failures
InfoSphere Streams provides a development platform and runtime environment where you can develop applications that ingest, filter, analyze, and correlate potentially massive volumes of continuous data streams that are based on defined, proven, and analytical rules that alert you to take appropriate action, all within an appropriate time frame for your organization.
The data streams that are consumable by InfoSphere Streams can originate from sensors, cameras, news feeds, stock tickers, or various other sources, including traditional databases. The streams of input sources are defined and can be numeric, text, or non-relational types of information, such as video, audio, sonar, or radar inputs. Analytic operators are specified to perform their actions on the streams.