InfoSphere Quality Stage – VII (Matching)

In last two blogs we went through the need and process of Standerdization of input data. Suppose the input data was “Mr. Brian Rumbaugh NY 27/12/1947”, with the help of Standerdize stage we correctly established the following

First Name: Brian
Last Name: Rumbaugh
City: New York
Country: United States of America
Date Of Birth: 27/12/1947
And so on…

But now how confident are that there is just a single record of Mr. Rumbaugh in the source. It is imperative for customer to understand this as their business decisions are based on it. And what will give me that confidence? Is it the Driving License Number in the two records? Or is it the Date of Birth, or the residential address, or the SSN or is it the combination of these? This is accomplished using Matching .

Data matching finds records in a single data source or independent data sources that refer to the same entity (such as a person, organization, location, product, or material) even if there is no predetermined key.

To increase its usability and completeness, data can be consolidated or linked
(matched) along any relationship, such as a common person, business, place,
product, part, or event. You can also use matching to find duplicate entities that are caused by data entry violations or account-oriented business practices.

During the data matching stage, IBM InfoSphere QualityStage takes the
following actions:

  • Identifies duplicate records (such as customers, suppliers, products, or parts) within one or more data sources.
  • Provides householding for individuals (such as a family or group of individuals at a location) and householding for commercial entities (multiple businesses in the same location or different locations).
  • Enables the creation of match groups across data sources that might or might not have a predetermined key

There are two types of match stage:

  1. One source match locates and groups all similar records within a single input data source. This process identifies potential duplicate records, which might then be removed. An example is the need to eliminate duplicates from a consolidation of mailing lists purchased from multiple sources.
  2. Reference Match identifies relationships among records in two data sources. An example of many-to-one matching is matching the ZIP codes in a customer file with the list of valid ZIP codes. More than one record in the customer file can have the same ZIP code in it.

Here is a sample One source Match job…

DeDuplication Match
Individual DeDuplication Match

8 thoughts on “InfoSphere Quality Stage – VII (Matching)

  1. I am not sure if it is the right place to post this question here. I am new to datastage. I needed some information on how to set up password file for database connector, logging jobs to oracle tables (Auditing of jobs). Can you please let me know where I should post these kind of questions?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s