Determistic Vs Probabilistic Match

As explained in an earlier blog , Data matching finds records in a single data source or independent data sources that refer to the same entity (such as a person, organization, location, product, or material) even if there is no predetermined key. There are two common approaches to decide a match in data while comparing two similar records. They are deterministic match and probabilistic match.

DetermisticVsProbabilistic

Deterministic matching typically searches for a pool of candidate duplicates and then compares values found in specified attributes between all pairs of possible duplicates. It makes allowances for missing data. The results are given a score, and the scores are used to decide if the records should be considered the same or different. There is a gray area where the scores indicate uncertainty, and such duplicates are usually referred to a data steward for investigation and decision.

Probabilistic matching looks at specified attributes and checks the frequency that these attributes occur in the dataset before assigning scores. The scores are influenced by the frequencies of existing values found. A threshold can be assigned to decide whether it is a definite match or a clerical intervention of data steward is required to determine a match.

In Summary
Deterministic decisions tables:

  • Fields are compared
  • Letter grades are assigned
  • Combined letter grades are compared to a vendor-delivered file
  • Result: Match; Fail; Suspect

Probabilistic record linkage:

  • Fields are evaluated for degree of match
  • Weight is assigned and represents the information content by value.
  • Weights are summed to derive a total score.
  • Result: Statistical probability of a match

InfoSphere QualityStage can perform both deterministic matching and probabilistic record linkage, but uses probabilistic record linkage by default. The above example highlights the advantage of probabilistic matching.

One thought on “Determistic Vs Probabilistic Match

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s