IBM’s InfoSphere Quality Stage has the capability to standardize the Indian addresses. In this blog I will mention some of it’s highlight, through some examples. It can do the following…
- Standardizes India addresses (urban, rural, military etc) and provide a great degree of consistently in token definitions for producing high standardization outputs
- Has the ability to be invoked real time by other applications in the enterprise so that data standardization can be handled real time
Here is one example of the Input and generated Standerdized output…
So we see that the Standerdize Stage identified the various tokens (individual words) in the input and identified a proper output column for it.
We typically use the IndiaAddressSharedContainer shared container in a job that standardizes Indian address and area data. Given any Indian address, the Shared Container will standerdize the input. The shared container is imported with the Indian address rule sets. Here are some more sample of input addresses Vs standardized addresses
As I understand, cleansing the Indian Address using Quality Stage has been validated by several Indian Customers with good rural presence.
Important Links: Shared container for Indian address rule sets
If you want to know the details of working of India Rule Set, please check out my Developer Works Article.