Information Governance – Revisited

IIGIt has been more than 5 years that I wrote on Information governance. Over the period of last 5 years some areas of Information Governance became more matured and I thought of re-visiting this topic. In a simple analogy, what library do for books, Data governance does for data. It organizes data, makes it simple to access the data, gives means to check for validity/ accuracy of data and makes it understandable to all who need it.  If Information Governance in place, organizations can use data for generating insights and also they are equipped for  regulatory mandates (like GDPR).

There are six sets of capabilities that make up the Information Management & Governance component:

  1. Data Lifecycle Management is a discipline that applies not only to analytical data but also to operational, master and reference data within the enterprise.  It involves defining and implementing policies on the creation, storage, transmission, usage and eventual disposal of data, in order to ensure that it is handled in such a way as to comply with business requirements and regulatory mandates.

2. MDM: Master and Entity Data acts as the ‘single source of the truth’ for entities – customers, suppliers, employees, contracts etc.  Such data is typically stored outside the analytics environment in a Master Data Management (MDM) system, and the analytics environment then accesses the MDM system when performing tasks such as data integration.

3. Reference Data is similar in concept to Master and Entity Data, but pertains to common data elements such as location codes, currency exchange rates etc., which are used by multiple groups or lines of business within the enterprise.  Like Master and Entity Data, Reference data is typically leveraged by operational as well as analytical systems.  It is therefore typically stored outside the analytics environment and accessed when required for data integration or analysis.

4. Data Catalog is a repository that contains metadata relating to the data stored in the Analytical Data Lake Storage repositories.  The catalog maintains the location, meaning and lineage of data elements, the relationships between them and the policies and rules relating to their security and management .  The catalog is critical for enabling effective information governance, and to support self-service access to data for exploration and analysis.

5. Data Models provide a consistent representation of data elements and their relationships across the enterprise.  An effective Enterprise Data Model facilitates consistent representation of entities and relationships, simplifying management of and access to data.

6. Data Quality Rules describe the quality requirements for each data set within the Analytical Data Lake Storage component, and provides measures of data quality that can be used by potential consumers of data to determine whether a data set is suitable for a particular purpose.  For example, data sets obtained from social media sources are often sparse and therefore ‘low quality’ but that does not necessarily disqualify a data set from being used.  Provided a user of the data knows about its quality, they can use that knowledge to determine what kinds of algorithms can best be applied to that data.



InfoSphere DataStage – XVI (Business Glossary)

Let’s now talk about why would an enterprise need a Business Glossary?

I have spoken about it in my previous blog that Business glossary is a repository used to communicate and govern the enterprise’s business terms along with the associated definitions and the relationships between those terms.
In summary:
  • Business Glossary brings understanding, consistency, and trust in information to any application or context.
  • This authoritative source of information promotes better communication among business and technical teams and aligns cross-team efforts.
  • The line of business uses this centralized information source as a gateway to all information assets to support data governance initiatives.
  • It can associate key business concepts to a vast array of heterogeneous source systems, ETL processes, BI reports, data models, and business rules, and more, automatically.

Now to IBM InfoSphere Business Glossary. IBM InfoSphere Business Glossary is an interactive, web-based tool that enables users to create, manage, and share controlled vocabulary and information governance controls in a repository called a business glossary. The vocabulary and governance controls define business semantics and enable business leaders and IT professionals to manage enterprise-wide information according to defined regulatory or operational business requirements. IBM InfoSphere Business Glossary Anywhere, its companion module, augments InfoSphere Business Glossary with more ease-of-use and extensibility features.

Business Glossary, Business Glossary browser, and Business Glossary Anywhere support complex enterprise development environments with a unique set of the following capabilities:

Manage business terms and categories
Business Glossary provides a dedicated, web-based user interface for creating, managing, and sharing a controlled vocabulary, including batch
editing capabilities. Terms represent the major information concepts in your enterprise and categories are used to organize into hierarchies.

Manage stewardship
Stewards are people or organizations with the responsibility for a given information asset. By using Business Glossary, administrators can import
steward profiles from external sources, generate and edit profiles in the web interface, and create relationships of responsibility between stewards and business terms or any of the artifacts that are managed by Information Server.

Customize and extend

The needs around business metadata tend to differ from one enterprise to the next. For this reason, there is no “one-size-fits-all” meta-model. In addition to the ability to customize the entry page to the application, administrators can extend the application with custom attributes on business categories and business terms.

It is not enough to simply document business metadata. This information is active in the enterprise with open access to all members of business and development teams. IBM InfoSphere Business Glossary provides a collaborative environment in which users can evolve this important
information asset as the business changes and adapts to market conditions, shifting customer needs and competitive threats.

Contextual search and visibility business term definitions
Business Glossary Anywhere is an application independent search window that can be called from any application (such as Microsoft Excel, data
modeling tools, reporting applications, and Microsoft Word) that provides instant access to Business Glossary terms, taxonomies, and stewards.

Simply Browse
Business Glossary browser is an intuitive, read-only web-based interface that requires no training to use. Business users can search and explore the common controlled vocabulary and relationships, identify stewards that are responsible for assets and provide direct feedback.

Business Glossary – A use case

As I promised some time back, here is a blog that will explain the business value of Business Glossary through a simple use case.

Business analysts and subject matter experts can use InfoSphere™ Business Glossary to create and manage a controlled vocabulary and classification system. Such a system enables them to build a common language between business and information technology.

InfoSphere Business Glossary provides a web interface where you can manage the important business aspects of your assets from any computer. With the business glossary, you can create categories and terms, define custom attributes and values, search the glossary, and assign a steward to assets.

The IBM® InfoSphere Information Server metadata repository stores metadata about tools, processes, and data sources. Individual instances of metadata are called “information assets”, or just “assets”. Examples of assets are implemented data resources such as tables and columns, ETL jobs, profiling processes, routines, and functions. “External assets” are assets that are not stored in the metadata repository. External assets can include such items as business process models, Web services, or reports that are in an external asset management system.

The business glossary organizes your metadata into categories that contain terms. Terms can relate to the assets that are stored in the metadata repository or to external assets according to the standards and practices of your enterprise. You can also designate specific users or user groups as stewards who are responsible for particular assets.

For example, you need to work with business analysts to provide information about the purchase patterns of customers in Europe. You use the business glossary to look up the category named European Sales that contains a term named Customers. That term is related to assets such as multiple database tables that are associated with the customers of the European sales operation. Then, business analysts and subject matter experts can:

  • browse the European Sales category
  • view its contained terms
  • browse the Customers term to see which database tables or other assets are related
  • see the steward who is responsible for that information