Within Information Server, data profiling and analysis can be performed by Discovery, Information Analyzer and the Investigate Stage. Many of the customers have asked us to explain the difference between Information Analyzer and Investigate Stage (that comes with InfoSphere Quality Stage). I have blogged about both of them and you can click on the links to get there. Both of them look at the actual data and use the parallel framework that comes with the InfoSphere Information Server (called PXEngine). In this blog, I will share some of my understanding. I am open for any thoughts and comments from the readers.
- Separate component (client and license) from DataStage/QualityStage
- Typically used by business analysts and subject matter experts (SME)
- Provides pre-built analysis functions and robust reporting capability
- Output written to the Information Server common metadata repository
- Summary analysis results can be made available to DataStage/QualityStage
- Parallel stage for building data analysis into DataStage/QualityStage jobs
- Typically used by programmers and application developers
- Requires developing analysis jobs and offers limited reporting capability
- Output written to targets used by the DataStage/QualityStage jobs
- Allows for migrating investigation jobs from earlier QualityStage versions
- It does the ‘free form’ analysis that neither Discovery nor Information Analyzer do.