It has been more than 5 years that I wrote on Information governance. Over the period of last 5 years some areas of Information Governance became more matured and I thought of re-visiting this topic. In a simple analogy, what library do for books, Data governance does for data. It organizes data, makes it simple to access the data, gives means to check for validity/ accuracy of data and makes it understandable to all who need it. If Information Governance in place, organizations can use data for generating insights and also they are equipped for regulatory mandates (like GDPR).
There are six sets of capabilities that make up the Information Management & Governance component:
- Data Lifecycle Management is a discipline that applies not only to analytical data but also to operational, master and reference data within the enterprise. It involves defining and implementing policies on the creation, storage, transmission, usage and eventual disposal of data, in order to ensure that it is handled in such a way as to comply with business requirements and regulatory mandates.
2. MDM: Master and Entity Data acts as the ‘single source of the truth’ for entities – customers, suppliers, employees, contracts etc. Such data is typically stored outside the analytics environment in a Master Data Management (MDM) system, and the analytics environment then accesses the MDM system when performing tasks such as data integration.
3. Reference Data is similar in concept to Master and Entity Data, but pertains to common data elements such as location codes, currency exchange rates etc., which are used by multiple groups or lines of business within the enterprise. Like Master and Entity Data, Reference data is typically leveraged by operational as well as analytical systems. It is therefore typically stored outside the analytics environment and accessed when required for data integration or analysis.
4. Data Catalog is a repository that contains metadata relating to the data stored in the Analytical Data Lake Storage repositories. The catalog maintains the location, meaning and lineage of data elements, the relationships between them and the policies and rules relating to their security and management . The catalog is critical for enabling effective information governance, and to support self-service access to data for exploration and analysis.
5. Data Models provide a consistent representation of data elements and their relationships across the enterprise. An effective Enterprise Data Model facilitates consistent representation of entities and relationships, simplifying management of and access to data.
6. Data Quality Rules describe the quality requirements for each data set within the Analytical Data Lake Storage component, and provides measures of data quality that can be used by potential consumers of data to determine whether a data set is suitable for a particular purpose. For example, data sets obtained from social media sources are often sparse and therefore ‘low quality’ but that does not necessarily disqualify a data set from being used. Provided a user of the data knows about its quality, they can use that knowledge to determine what kinds of algorithms can best be applied to that data.