As promised in the last blog, I am starting a series on InfoSphere Quality Stage. IBM® InfoSphere® QualityStage™ provides a methodology and development environment for cleansing and improving data quality for any domain. But we need to understand the need for data cleansing first.
Any organization’s data warehouse contains valuable information that the organization needs in order to conduct business, whether it is managing customers and products, managing operations, evaluating corporate performance, or providing business intelligence. But to do the business, their should be a confidence on the data. Data is high quality when it is up-to-date, complete, accurate, and easy to use. InfoSphere QualityStage helps you deliver and maintain data quality so that your organization can rely upon its corporate data investment.
To make things easier, I will show some examples to demonstrate some common data problems…
1. Lack of Information Standard: Suppose we merge the data of Car Insurance, Health Insurance, and Life Insurance from systems that have different format and structures. Now we find the following three records. Since the format is different, how do we identify whether they represent the same person.
2. Data Surprises in Individual Fields: The data may be misplaced in the database (Like Phone number column having tax-id). There could be special characters in the data field. Or some additional information that we may want to capture, but is not yet captured (Like the name has Dr. Rajesh Kabra in it). May be the SSN for 40% of records is 999999999.
IBM WebSphere QualityStage helps to identify and resolve all these issues for
any type of data. It can help accomplish the following goals:
– Resolve conflicting and ambiguous meanings for data values
– Identify new or hidden attributes from free-form and loosely controlled source fields
– Standardize data to make it easier to find
– Identify duplication and relationships among such business entities as
customers, prospects, vendors, suppliers, parts, locations, and events
– Create one unique view of the business entity
– Facilitate enrichment of re-engineered data, such as adding information from
vendor sources or applying standard postal certification routines