In my last blog, I started with two questions.
1. What are somethings to be considered from the perspective of deployment ETL service on Cloud?
2. Whether few ETL Services are already available for Enterprises or is this just a theory?
My last blog covered the part 1 and in this blog, I want to dwell more on part 2.
Need for ETL Services on Cloud:
With this explosion of data, the opportunities available to enterprises are booming. But as the enterprises are getting flooded with increasingly more data – data that’s unknown and unproven. Hence enterprises are taking on a whole new scope of risks and complexities. So it’s not enough to just capture the data. There is a need to Control it, Clean it and make refined data readily available to the people driving the business. So can we afford to create these refinery service from scratch? Lets see with an analogy of getting clean water for Homebuilders.
Homebuilders don’t build water infrastructure, right? They build the home and the pipes underneath, then join it all together with existing pipework for immediate access to clean water. It’s the same way for app developers. They don’t want to govern, clean and monitor data. They just want to bring clean data directly into their applications. That’s the idea behind continuous data refinement – allowing app developers the tools – and the pipes – to build their house.
IBM DataWorks™ refining your data
IBM DataWorks™ is a data refinery (on Cloud) to speed application development by getting the data you need, when you need it, and then ensuring it is fit for purpose. It exposes a set of APIs that implement a standard REST model. These APIs allow you to interoperate with feature-rich refinery capabilities. The performance and scalability of the IBM DataWorks engine will ensure that your application runs efficiently. IBM DataWorks includes APIs to identify relevant data, transform the data to suit your needs, and load it to a system for use.
In IBM DataWorks, you begin by finding the data that you want to work with from data sources like SQL Database and dashDB™. You use metrics to better understand your data quality and identify areas to improve.
To improve the data quality, you work with a sample of the data and apply shaping actions such as sorting, filtering, and joining. You can apply the actions to the full data set and load the data to destinations such as Cloudant® NoSQL DB.
For more information visit Data Works
Note: IBM Bluemix Is Now IBM Cloud