Whats new in IBM InfoSphere Information Server 11.7 – Part 3

In my last blog, we discussed about Information Governance Catalog (IGC). In this blog I wish to touch upon some new features of Information Governance that were introduced along with the new look and feel with  IBM InfoSphere Information Server version 11.7.

Enterprise Search

Social Collaboration
InfoSphere Information Server also brought social collaboration to the domain of Information Governance. When you browse your data, sometimes you would want to know what other experts think about critical assets such as reports, source files, and more.. Now it is possible, as you can rate an asset on a scale of one to five stars, and you can leave a comment with a couple of words. This enables all members of your organization to collaborate and share their expertise right where it’s needed. Also remember that the more popular the asset is, the higher is its position on the search results list.

Searching for assets
With 11.7, Searching for assets has become very easy. You don’t need to know anything about the data in your enterprise to explore it. Let’s assume that you want to find information about bank accounts, simply type ‘bank account’ in the search field in enterprise search, and that’s it. The search engine looks for the information in all asset types. It takes into account factors like text match, related assets, ratings and comments, modification date, quality score, and usage. And if  you already familiar with your organization and looking for something more specific, then you just open the catalog with your data, and select asset types that you want to browse. To narrow down search results, apply advanced filters like creation and modification dates, stewards, labels, or custom attributes.

Unstructured data sources
The data in your enterprise consists of databases, tables, columns, and other sources of structured data. What about email messages, word-processing documents, audio or video files, collaboration software, or instant messages? They are also a very valuable source of information. To support a unified approach to enterprise information management, IBM StoredIQ can now be set up to synchronize data with IBM Information Governance Catalog. So now you can classify such information in IGC too.

Exploring Relationships
Data in large organizations can be very complex, and assets can be related to one another in multiple ways. To understand these complex relations better, explore them in a graphical form by using graph explorer. This view by default displays all relationships of one asset that you select. But this is just the starting point, as you can further expand relationships of this asset’s relationships in the same view. Having all this information in one place in a graphical format makes it a lot easier to dig into the structure of your data. Each relationship has direction and name. You’ll be surprised when you discover how assets are connected!

To have a look at the new Information Governance Catalog, view this video.

 

Advertisements

Whats new in IBM InfoSphere Information Server 11.7 – Part 2

DataStage Flow Designer

As promised in the last blog, here are a few more changes that came with InfoSphere Information Server 11.7. DataStage Flow Designer is the new web based user interface for IBM’s flagship data integration component IBM DataStage. It can be used to create, edit, load, and run DataStage jobs. But unlike the current DataStage Designer, it does not require any installation on a Microsoft Windows client environment and therefore is immediately available and easily accessible once DataStage is being installed. Moreover, you do not need to migrate jobs to a new location in order to use the new web-based IBM DataStage Flow Designer user interface. Any existing DataStage jobs can be rendered in IBM DataStage Flow Designer, avoiding complex, error-prone migrations that could lead to costly outages.

DataStage Flow Designer
DataStage Flow Designer

Here are few of it’s features.

  • Search and Quick Tours: Quickly find any job using the built-in search feature.  For example, you can search for job name, description or timestamp to find what you are looking for very quickly. Also you can familiarize yourself with the product by taking the built in quick tour. You can also watch the “Create your first job” video on the welcome page.
  • Automatic metadata propagation: Making changes to a stage in a DataStage job can be time consuming because you have to go to each subsequent stage and redo the change. DataStage Flow Designer automatically propagates the metadata to subsequent stages in that flow, increasing productivity.
  • Highlighting of all compilation errors: Today, the DataStage thick client identifies compilation errors one at a time. Big jobs with upwards of 30 or 50 stages have a difficult time on compile, because errors are highlighted one stage at a time. DataStage Flow Designer highlights all errors and gives you a way to see the problem with a quick hover over each stage, so you can fix multiple problems at once before re-compiling.

In summary, the new browser-based DataStage® Flow Designer is geared for data engineers, but is versatile and accessible to all business users. This cognitive designer features an intuitive, modern, and security-rich browser-based interface. Users can access the DataStage Flow Designer and quickly address their data transformation or preparation needs, without having to rely on a Windows™ desktop environment. Do watch the following video on IBM DataStage Flow Designer.

To know more, please visit the IBM Knowledge Center.
There is a lot more in IBM InfoSphere Information Server 11.7. So stay tuned.

Whats new in IBM InfoSphere Information Server 11.7 – Part 1

IBM® InfoSphere® Information Server V11.7 was released last week. And in next couple of blogs, I wish to share how 11.7 is a major milestone for Governance functionality. First let’s look at the changes from a very high level before going closer.

At a Glance
IBM® InfoSphere® Information Server V11.7 accelerates the delivery of trusted and meaningful information to your business with enhanced automation and new design and usage experiences:

Enterprise Search
Enterprise Search
  • New Enterprise smart search to discover and view enterprise information
  • Automated data discovery and classification powered by machine learning
  • Policy-and-business-classification-driven data quality evaluation
  • New browser-based cognitive design experience for data engineers
  • New and expanded Hadoop data lake and cloud capabilities and connectivity
  • Single and holistic catalog view of information across the information landscape, enabling users to derive insight through a knowledge graph

Unified Governance

Now let’s get into some details. InfoSphere® Information Server V11.7 introduces the unified governance platform, a fabric that supports Data Governance Objectives throughout analytics lifecycle. Unified Governance focuses on the following themes and capabilities to construct a data foundation for the enterprise.

  • Auto Discovery and Classification: For data in traditional repositories or in the modern Hadoop data lakes the ability to catalog data accurately and quickly with minimal user intervention is a key requirement all modern enterprises have. Auto Discovery provides the user with the ability to point to a data source and ingest metadata from that data source and Auto Classification is an optional feature used after discovery which conducts data profiling and quality analysis.
  • Auto Quality Management : Data Quality is a key component of Data Governance. Automation rules provide a way to associate evaluation of data quality with business classification of data and also provide a way to automate data quality evaluation. It will help lower the cost of quality evaluation significantly.
  • Enterprise Search – An enterprise wants to leverage data. A lot of data does not get used simply because there is no good way to find it. The Knowledge Graph is a self-service user experience which provides information with insight to the business user. This allows a CDO to improve the use of data in business decisions with a high level of confidence that it is governed data. Starting with a simple keyword search, a user can leverage context of the data and use social collaboration to narrow down data to be used for analytics or business decision making.
  • Customizable User Experience : This release introduces the ability for an enterprise to customize their users experience based on roles and allows a user to customize their experience suited to their personal preference.
  • Metadata Integration from StoredIQ into IGC enabling organizations to govern all their information assets (structured and unstructured) in a centralized repository. This is critical to support customers needs for GDPR.

Watch the following 2 minute video on Unified Governance:

This release introduces key technological innovations as well open source technology. Also there has been a tremendous change in the DataStage Designer. I will share that in the upcoming blog. So stay tuned.

Information Governance – Revisited

IIGIt has been more than 5 years that I wrote on Information governance. Over the period of last 5 years some areas of Information Governance became more matured and I thought of re-visiting this topic. In a simple analogy, what library do for books, Data governance does for data. It organizes data, makes it simple to access the data, gives means to check for validity/ accuracy of data and makes it understandable to all who need it.  If Information Governance in place, organizations can use data for generating insights and also they are equipped for  regulatory mandates (like GDPR).

There are six sets of capabilities that make up the Information Management & Governance component:

  1. Data Lifecycle Management is a discipline that applies not only to analytical data but also to operational, master and reference data within the enterprise.  It involves defining and implementing policies on the creation, storage, transmission, usage and eventual disposal of data, in order to ensure that it is handled in such a way as to comply with business requirements and regulatory mandates.

2. MDM: Master and Entity Data acts as the ‘single source of the truth’ for entities – customers, suppliers, employees, contracts etc.  Such data is typically stored outside the analytics environment in a Master Data Management (MDM) system, and the analytics environment then accesses the MDM system when performing tasks such as data integration.

3. Reference Data is similar in concept to Master and Entity Data, but pertains to common data elements such as location codes, currency exchange rates etc., which are used by multiple groups or lines of business within the enterprise.  Like Master and Entity Data, Reference data is typically leveraged by operational as well as analytical systems.  It is therefore typically stored outside the analytics environment and accessed when required for data integration or analysis.

4. Data Catalog is a repository that contains metadata relating to the data stored in the Analytical Data Lake Storage repositories.  The catalog maintains the location, meaning and lineage of data elements, the relationships between them and the policies and rules relating to their security and management .  The catalog is critical for enabling effective information governance, and to support self-service access to data for exploration and analysis.

5. Data Models provide a consistent representation of data elements and their relationships across the enterprise.  An effective Enterprise Data Model facilitates consistent representation of entities and relationships, simplifying management of and access to data.

6. Data Quality Rules describe the quality requirements for each data set within the Analytical Data Lake Storage component, and provides measures of data quality that can be used by potential consumers of data to determine whether a data set is suitable for a particular purpose.  For example, data sets obtained from social media sources are often sparse and therefore ‘low quality’ but that does not necessarily disqualify a data set from being used.  Provided a user of the data knows about its quality, they can use that knowledge to determine what kinds of algorithms can best be applied to that data.

 

Match and Manage your Data on Cloud

We left the last blog with two questions.

A few weeks back I wrote on IBM Bluemix Data Connect. If you missed it, then watch this video on how you can put data to work with IBM Bluemix Data Connect.

Now, Business Analysts will be able to leverage Entity Matching technology using Data Connect. The Match and Manage (BETA) operation on Data Connect identifies possible matches and relationships (in plethora of data sets, including master data and non-master data sets) to create a unified view of your data. It also provides a visualization of the relationships between entities in the unified data set.

For example, you have two sets of data : One containing customer profile information and the other containing a list of prospects. A Business Analyst can now use intuitive UI to do the Match and Manage operation to match these two data sets and provide insights to questions such as:

  •  Are there duplicates in the prospect list?
  • How many of the prospects are already existing customers?
  • Are there non-obvious relationships among prospects and customers that can be explored?
  • Are there other sources of information within that could provide better insights if brought together?

The two data set are matched using Cognitive capabilities which allows the MDM– matching technology to be auto-configured and tuned to intelligently match across different data sets:

dataconnect

Business Analyst can understand the de-duplicated datasets by navigating through a relationship graph of the data to understand how the entities are related across the entire dataset. Now they can discover new non-obvious relationships within the data that were previously undiscoverable. The following generated canvas enables them to interactively explore relationships between entities.

dataconnect1

In the above example it was illustrated as how clients can now easily understand the data they hold within their MDM repositories and how now they can match their MDM data with other data sources not included within the MDM system. This simplifies the Analytical MDM experience where MDM technologies are accessible to everyone without the need to wait for Data Engineers to transform the data into a format that can be matched and rely on MDM Ninja’s to configure matching algorithms.

Summary:

IBM Bluemix Data Connect provides a seamless integrated self-service experience for data preparation. With addition of entity analytics capability, business users are empowered to gain insight from data that wasn’t previously available to them. Now organizations can extract further value from their MDM data by ensuring it is used across the organization to provide accurate analytics. Entity analytics within Data Connect is now available in beta. Go ahead and experience the next evolution of MDM.

IA Thin Client -Your entry point into data lake

In one of my previous blogs, I was mentioning how a data lake is a set of one or more data repositories that have been created to support data discovery, analytics, ad hoc investigations, and reporting. Some Enterprises have invested money and created data lake, but are not sure how to begin utilizing their data. IA Thin Client gives the first grip on the data to the business user or analyst. Extending the capacity of Information Analyzer on Hadoop and giving a user friendly thin client, it helps the Enterprises to get to know their data. Here are few of it’s capabilities
1.    Customers can see the listing of all the data they have in there HDFS file system which they can preview and select a handful of interesting ones.

2.    They can group these interesting ones into some Workspaces say – Customer related, Employee related, Finance related and so on.

3.    IA Thin Client gives them a dashboard where they can see the overall picture of data in a particular Workspaces.Workspace

4. From Workspace you can drill into details of  of one of these interesting structured / semi structured data and run data analysis to find more details about the data. This detailed analysis gives insight about data in easily understandable way – What is the quality of data? What is format of data? Can the data be classified into one of the several known data classifications? User can also see detailed information for each of the columns of the data (format, any data quality problem observed, data type, min-max values, classification, frequent values, sampling of actual values and so on).DatasetDetails

5.    Using the tool  user can make some suggestion to the meta data of the data. For example after looking they feel that some data formats do not look correct, or the minimum value should have been something else, or the data quality problem identified can be ignored etc. Editing these also reflect on the overall data quality score.

6.  Tool allows to add a note to data or link one of the interesting data to the existing data governance catalog.

7.    Tool allows the customer to apply some existing data rule to the data and see how the data performs against it.

8.    Moreover this is done on a simple, intuitive, easy to use thin client so that a non-technical person can easily navigate through the data.

You can watch a 4 minute video to get a first hand experience of the tool.


Or see InfoSphere Information Analyzer thin client presentation that provides a comprehensive overview of the Information Analyzer thin client.