InfoSphere Quality Stage – X (Different Kinds of Matches)

In one of my previous blogs, I mentioned various types of matching available in Quality Stage. This includes Unduplicate Match and Reference Match. There are three types of Unduplicate match and four types Reference Match that a user can select while finding duplicates in the enterprise data. In this blog I want to share about them. It takes a little time and patience to understand the difference between these, and I am trying to make it simple. Do drop a comment in case anyone wants further elucidation.

QualityStage Unduplicate Match
Within each pass for all match types, records that are similar are added to a match set. Select one of the following types according to your preference for the behavior of subsequent passes:

  •     Dependent: The default and most common choice. After the first pass, duplicates are removed from match consideration in subsequent passes.
  •     Independent: Duplicates are included in the subsequent passes.
  •     Transitive: Records that are similar in each subsequent pass are added to the same match set as the previous pass until all similar records are within the same match set.

QualityStage Reference Match

  •     Many-to-one: Any reference source record can match many data source records.
  •     Many-to-one Multiple: Each reference source record having the same weight as the matched pair when it is scored against the data record is flagged as a duplicate record.
  •     Many-to-one Duplicate: Like the Many-to-one Multiple option, except that additional reference source records that match to a level above the duplicate cutoff value are flagged as duplicates.
  •     One-to-one: Matches a record on the data source to only one record on the reference source.

