I have a couple blogs explaining ETL. In this blog, I will mention how ETL tool (Like Datastage) and a predictive analytics software (Like SPSS) can integrate to provide a lower latency to the customer. With tools like SPSS, you can predict with confidence what will happen next so that you can make smarter decisions, solve problems and improve outcomes. And Without the ability to delve further into your data, can you truly know what your customers will do next? How their tastes have changed? And when they’re looking to abandon you for your competitor?
A typical pattern for predictive analytics today is to extract data from data warehouses into a separate data mart and then apply the predictive models to obtain valuable insights. The results of the analytics are then fed to decision makers or back into operational systems. A key characteristic of running analytics software in such a manner is that it is a batch operation where the analytic model is built once and it is applied on large amounts of data in batch. The main disadvantage of today’s approach is that data is read and transformed multiple times before it is used by the end application once while it is loading the data into the data warehouse or data mart and later when the data is extracted from the data marts for processing in the analytic models.
A more efficient method of performing this type of end-to-end operation is to integrate the process of running the analytic models during the import (or export) of new data into (from) the warehouse. For this to be possible, there must be a mechanism to start the SPSS model from within the context of an InfoSphere DataStage job. By doing this, the analytical model can be applied on the data that is ingested into the warehouse or mart and the output can be stored directly into the resulting tables. Once the output of the statistical model is available in the data warehouse or data mart, business applications such as reporting tools and marketing campaigns can make use of this data readily without the need for a separate analytic step.