InfoSphere Quality Stage – I (Need For Data Cleansing)

As promised in the last blog, I am starting a series on InfoSphere Quality Stage. IBM® InfoSphere® QualityStage™ provides a methodology and development environment for cleansing and improving data quality for any domain. But we need to understand the need for data cleansing first.

Any organization’s data warehouse contains valuable information that the organization needs in order to conduct business, whether it is managing customers and products, managing operations, evaluating corporate performance, or providing business intelligence. But to do the business, their should be a confidence on the data. Data is high quality when it is up-to-date, complete, accurate, and easy to use. InfoSphere QualityStage helps you deliver and maintain data quality so that your organization can rely upon its corporate data investment.

To make things easier, I will show some examples to demonstrate some common data problems…

1. Lack of Information Standard: Suppose we merge the data of Car Insurance, Health Insurance, and Life Insurance from systems that have different format and structures. Now we find the following three records. Since the format is different, how do we identify whether they represent the same person.

2. Data Surprises in Individual Fields: The data may be misplaced in the database (Like Phone number column having tax-id). There could be special characters in the data field. Or some additional information that we may want to capture, but is not yet captured (Like the name has Dr. Rajesh Kabra in it). May be the SSN for 40% of records is 999999999.

3. The Redundancy Nightmare: We may have duplicate records with lack of standard. How do we identify they all refer to one company or not.

IBM WebSphere QualityStage helps to identify and resolve all these issues for
any type of data. It can help accomplish the following goals:

– Resolve conflicting and ambiguous meanings for data values
– Identify new or hidden attributes from free-form and loosely controlled source fields
– Standardize data to make it easier to find
– Identify duplication and relationships among such business entities as
customers, prospects, vendors, suppliers, parts, locations, and events
– Create one unique view of the business entity
– Facilitate enrichment of re-engineered data, such as adding information from
vendor sources or applying standard postal certification routines

4 thoughts on “InfoSphere Quality Stage – I (Need For Data Cleansing)

  1. I am surprised you have a blog site. I accidentally stumbled this site to find a person whom i interact much and not knowing he blogs on Qualitystage / IIS.

    will surely keep tab to keep self updated.
    -Sunil

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s