In my last blog, we compared IBM’s Information Server and Informatica’s Power Center based on their scalability. Here is the summary: Big Data and enterprise class data environments need unlimited data scalability to keep pace with data volume growth. Informatica’s PowerCenter is NOT designed to provide unlimited data scalability which may lead to investment in expensive workarounds.
In this blog we will touch upon two other important aspect of ETL tools.
Data Governance and Metadata Management
- IBM provides a data governance solution (Information Governance Catalog) designed for business users.
- Information Governance Catalog has deep support of REST API interface. This makes Information Sever more open and ensures compatibility with other enterprise systems. User can create custom enhancement and loaders as well as can create unique user interfaces for a consistent look and feel.
- There is a superior Event based notification that takes advantage of open source kafka messaging. For example, Import of metadata is an “event” that can be monitored for workflow and approval purposes, or simply for notification.
- There is graphical reporting to illustrate relationships, data design origins, and data flow lineage to help answer “what does this mean” and “where did this data come from?”
- There is an advanced search and navigation or a “shopping” experience for the data.
- Metatadata Asset Manager controls what data goes in the repository. “Import Areas” govern what is being imported into the repository (or not), and who is able to import. These imports are initiated via browser interface. No local Windows installation is required for the metadata administrator.
- Informatica lacks these capabilities and provides a data governance solution designed for technical users. It lacks openness of their platform and you get locked to “Informatica Only” architecture.
- IBM provides an integrated data integration platform with one processing engine, one user design experience for data integration and data quality, and one shared metadata repository. Information Server gives ability to write a datastage job once and run it anywhere (transcational database, hadoop or eventually spark)
- Informatica provides a collection of multiple and incompatible processing engines, user design experiences, and metadata repositories. Informatica Data Quality and Informatica Power Center are two different products that have different user interfaces. In fact, PC needs two interfaces to design jobs an manage workflows. It also uses two engines. This means that Data Quality processes have to be ‘pushed’ or ‘exported’ to PC to run.
In Summary, we can say Information Server is a better solution to go in case we want to create scalable workflows, open-ness in architecture and better productivity design and running the workflows. Information Server supports the power of 1.
- 1 Engine: The same engine runs stand-alone, in a grid, or natively in Hadoop/YARN. Jobs can remain unchanged regardless of deployment model.
- 1 Design Experience: Single design experience for Data Integration and Data Quality that increases productivity and reduces error.
- 1 Repository: A single active metadata repository across the entire portfolio and so design and execution metadata instantly shared among team members.