As mentioned in my first blog on this series, the final step of the Quality Stage is called “Survivorship”.
The Survive stage consolidates duplicate records, which in turn creates a best-of-breed representation of the matched data. It consolidates duplicate records,creating the best representation of the match data so companies can use it to load a master data record, cross-populate all data sources, or both.
Data Survivorship performs the following activities:
- Replaces existing data with “better” data from other records based on user specified rules
- Supplies missing values in one record with values from other records on the same entity
- Populates missing values in one record with values from corresponding records which have been identified as a group in the matching stage
- Enriches existing data with external data
The Survive stage constructs column values from groups of related or duplicate records and stores the column values in the survived record (the best result) from each group.
By storing survivorship records in the target data store for each match process, a common institutional representation of data can be viewed across the organization. In storing the data in this fashion, each of the lines of business can efficiently and easily access their data or other lines of business data.
Using the Survive stage, IBM QualityStage enables us to create rules at:
- The record level
- The logical domain level (i.e. Name, Address etc.)
- The field level
- Any combination of the above.
The Survive job is the last job in the IBM QualityStage workflow and is usually run after the Unduplicate Match stage job. The output from the Unduplicate Match stage, and in some cases the Reference Match stage, becomes the source data that you use for the Survive stage.
Example of a typical survivorship with input and output is illustrated below: