In my previous blog, I mentioned how Data Virtualization can be achieved through Federation, Consolidation, Replication and Caching. Consolidation is ETL batch process traditionally running at the end of each business day. In my earlier blog, I have spoken in some details about it. In this blog, I wish to spend some more time on ‘Replication’ in general focusing on specifics of InfoSphere CDC towards the end.
When would we require data to be Replicated / have Incremental data delivery?
When business require their data to provide up to the minute or near real-time information, they opt for this method of data delivery. This includes both replication and change data capture. Replication moves data from database to database to provide solutions for (a) continuous business availability, (b) live reporting, and (c) database or platform migrations. When using change data capture, the target is not necessarily a database. In addition to the solutions included in replication, this approach can also feed changes to an ETL process or deliver data changes to a downstream application by using a message queue.
Some examples of how Replication is used include the following:
- Providing feeds of changed data for Data Warehouse or Master Data Management (MDM) projects, enabling users to make operation and tactical business decision making using the latest information.
- Dynamically routing data based on content to message queues to be consumed by one or more applications, ensuring consistent, accurate, and reliable data across the enterprise.
- Populating real-time dashboards for on-demand analytics, continuous business monitoring, and business process management to integrate information between mission-critical applications and web applications, ensuring access to real-time data to customers and employees.
- Consolidating financial data across systems in different regions, departments, and business units.
- Improving the operational performance of systems that are adversely affected by shrinking nightly batch windows or expensive queries and reporting functions.
So what is InfoSphere CDC:
Change data capture uses a developed technology to integrate data in near real time. InfoSphere CDC detects changes by monitoring or scraping database logs. The capture engine (the log scraper) is a lightweight, small footprint, and low-impact process on the source server running where the database changes are detected. After the log scraper finds new changed data on the source, that data is pushed from the source agent to the target apply engine through a standard Internet Protocol network socket. In a typical continuous mirroring scenario, the change data is applied to the target database through standard SQL statements. By having the data only interact with the database logs, additional load is not put on the source database and no changes are required to the source application.