Need for Governance in Self-Service Analytics

Screen Shot 2017-03-31 at 9.55.05 PM
Analytics Offering without Self-Service

Self-Service Analytics is a form of business intelligence in which line-of-business professionals or data scientists are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support. This empowers everyone in the organization to discover new insights and enable them for informed decision-making. Capitalizing on the data lake, or modernized data warehouse, they can do full data set analysis (no more sampling), gain insight from non-relational data, support individuals in their desire for exploratory analysis and discovery with 360o view of all their business. At this stage, the organization can truly be data-savvy and insight-driven leading to better decisions, more effective actions, and improved outcomes. Insight is being used to make risk-aware decisions, or fight fraud and counter threats, optimize operations or most often focused on attract, grow and retain customers.

Any self service analytics, regardless of persona, has to involve data governance. Here are three examples of how any serious analytics work would be impossible without support for a proper data governance practice in the analytics technology:

  1. Fine-grained authorization controls: Most industries feature data sets where data access needs to be controlled so that sensitive data is protected. As data moves from one store to another, gets transformed, and aggregated, the authorization information needs to move with that data. Without the transfer of authorization controls as data changes state or location, self-service analytics would not be permitted under the applicable regulatory policies.
  2. Data lineage information: As data moves between different data storage layers and changes state, it’s important for the lineage of the data to be captured and stored. This helps analysts understand what goes into their analytic results, but it is also often a policy requirement for many regulatory frameworks. An example of where this is important is the right to be forgotten, which is a legislative initiative we are seeing in some Western countries. With this, any trace of information about a citizen would have to be tracked down and deleted from all of an organization’s data stores. Without a comprehensive data lineage framework, adherence to a right to be forgotten policy would be impossible.
  3. Business glossary: A current and complete business glossary acts as a roadmap for analysts to understand the nature of an organization’s data. Specifically, a business glossary maps an organization’s business concepts to the data schemas. One common problem with Hadoop data lakes is a lack of business glossary information as Hadoop has no proper set of metadata and governance tooling.

Summary:
A core design point of any self service analytics offering (like the IBM DataWorks) is that data governance capabilities should be baked in. This enables self-service data analysis where analysts only see data they’re entitled to see, where data movement and transformation is automatically tracked for a complete lineage story, and as users search for data, business glossary information is used.

The Best Data Science Platform

Data science platforms are engines for creating machine-learning solutions. Innovation in this market focuses on Cloud, Apache Spark, automation, collaboration and artificial-intelligence capabilities.When choosing the best one, organizations often trust on The Gartner Magic Quadrants which aims to provide a qualitative analysis into a market and its direction, maturity and participants. Gartner previously called these platforms “advanced analytics platforms”. But since this platform is primarily used by data scientists so from this year the Quadrant has been renamed to Magic Quadrant for Data Science Platforms. 

This Magic Quadrant evaluates vendors of data science platforms. These are products that organizations use to build machine-learning solutions themselves, as opposed to outsourcing their creation or buying ready-made solution. These platforms are used by data scientists for  demand prediction, failure prediction, determination of customers’ propensity to buy or churn, and fraud detection.

The report aims to rank the BI platforms on the ability to execute and the completeness of vision. The Magic Quadrant is divided in 4 parts:

  • Niche Players
  • Challengers
  • Visionaries
  • Leaders

    gartnerdatascienceplatform
    Source: Gartner (Feburary 2017)

Adoption of open-source platforms and Diversity of tools is an important characteristic of this market. IBM’s mission is to make data simple and accessible to the world and commitment to open source and numerous open-source ecosystem providers made it most attractive platform for Data Science.  A data scientist needs the following to be more successful, which is provided by IBM Data Scientist Experience

  • Community: A data scientist needs to be updated with the latest news from the Data Science Community. There are plenty of new Open Source packages, libraries, techniques and tutorials available every day. A good data scientist follows the most important sources and shares their opinion and experiments with the community. IBM brings this into the UI of the DSX.
  • Open Source: Today there are companies that rely on open source for data science. Open source has become so mature that is directly competing with commercial offerings. IBM provide the best of open source within DSX, such as RStudio and Jupyter.
  • IBM Value Add: DSX improve open source by adding some capabilities from IBM. Data Shaping for example takes 80% of the data scientist time. IBM tools with visual GUI to help users better perform this task. You can execute Spark jobs on  managed Spark Service in Bluemix from within the DSX.

Watson Analytics

Need for Watson Analytics
If an organization is good at analyzing data and extracting relevant insights from it then decision makers can make more informed and thus more optimal decisions. But the decision makers are forced to make decisions with incomplete information. The reason?  Decisions makers/ Citizen Analysts, for the most part, tend to be mainly consumers of analytics and they rely on more skilled resources (Like Data Engineer, Data Scientist, Application developer) in the organization to provide the data driven answers to their questions. Moreover the answer to one question is just the start of another. Think of a detective interrogating a suspect. The consumer/builder model is hardly conducive to the iterative nature of data analysis. Therefore, the time it takes for these answers to be delivered to the decision makers is far from optimal – and many questions go unanswered every day.

watsonlogoWatson Analytics
So a logical solution is to provide an easier to use analytics offerings. Watson Analytics provides that value add so that more people will be able to leverage data to drive better decision making using analytics.

When we think of Watson, we think about Cognitive. And when we think about Analytics, we think  about traditional analytics (querying, dashboarding), along with some more advanced analytic capabilities (data mining, and social media analytics). So Watson Analytics is a Cloud based offering which can make analytics a child’s play even for a non-skilled user.

Watson Analytics helps users understand their data in a guided way using a natural language interface to ask a series of business questions. Example, a user can ask “What is the trend of revenue over years?” and get a visualization in response. So, Instead of having to first choose a visualization and working backwards to try answer the business question, Watson Analytics allows you to describe your intent in natural language, and it chooses the best visualization for you. Even better, Watson Analytics gives you some initial set of questions which you can keep refining.

Watson Analytics for Social Media
Watson Analytics can work on Social Media data to take the pulse of an audience by spotting trends and identifying new insights and relationships across multiple social channels allowing greater visibility into a given topic or market. It combines structured and unstructured self-service analysis to enrich your social media analytics experience for exceptionally insightful discoveries. All on the cloud!

Summary of Steps:
Watson Analytics does the following to provide insights hidden in your Big data. Mouse-over the below images to get the details of the steps.

  • Import data from a robust set of data source (on Cloud and on premise) options, with the option to prepare and cleanse via IBM Bluemix Data Connect.
  • Answering What: Identifying issues, early problem detection, finding anomalies or exceptions, challenging conventional wisdom or the status quo.
  • Understanding or explaining outcomes, Why something happened.
  • Dashboarding to share results

Match and Manage your Data on Cloud

We left the last blog with two questions.

A few weeks back I wrote on IBM Bluemix Data Connect. If you missed it, then watch this video on how you can put data to work with IBM Bluemix Data Connect.

Now, Business Analysts will be able to leverage Entity Matching technology using Data Connect. The Match and Manage (BETA) operation on Data Connect identifies possible matches and relationships (in plethora of data sets, including master data and non-master data sets) to create a unified view of your data. It also provides a visualization of the relationships between entities in the unified data set.

For example, you have two sets of data : One containing customer profile information and the other containing a list of prospects. A Business Analyst can now use intuitive UI to do the Match and Manage operation to match these two data sets and provide insights to questions such as:

  •  Are there duplicates in the prospect list?
  • How many of the prospects are already existing customers?
  • Are there non-obvious relationships among prospects and customers that can be explored?
  • Are there other sources of information within that could provide better insights if brought together?

The two data set are matched using Cognitive capabilities which allows the MDM– matching technology to be auto-configured and tuned to intelligently match across different data sets:

dataconnect

Business Analyst can understand the de-duplicated datasets by navigating through a relationship graph of the data to understand how the entities are related across the entire dataset. Now they can discover new non-obvious relationships within the data that were previously undiscoverable. The following generated canvas enables them to interactively explore relationships between entities.

dataconnect1

In the above example it was illustrated as how clients can now easily understand the data they hold within their MDM repositories and how now they can match their MDM data with other data sources not included within the MDM system. This simplifies the Analytical MDM experience where MDM technologies are accessible to everyone without the need to wait for Data Engineers to transform the data into a format that can be matched and rely on MDM Ninja’s to configure matching algorithms.

Summary:

IBM Bluemix Data Connect provides a seamless integrated self-service experience for data preparation. With addition of entity analytics capability, business users are empowered to gain insight from data that wasn’t previously available to them. Now organizations can extract further value from their MDM data by ensuring it is used across the organization to provide accurate analytics. Entity analytics within Data Connect is now available in beta. Go ahead and experience the next evolution of MDM.

24th Year of Patent Leadership

IBM broke the U.S. patent record with 8,088 patents granted to its inventors in 2016, marking the 24th consecutive year of innovation leadership. IBM passed the milestone as the first organization to deliver more than 8,000 U.S. patents in a year. When you do the math, that’s more than 22 patents granted to IBM inventors per day in 2016. IBM’s 2016 patents output covers a diverse range of inventions in artificial intelligence and cognitive computing, cognitive health, cloud, cybersecurity , IoT and other strategic growth areas for the company.

INNOVATION has been a focus at IBM since day one, and it is at the core of IBM’s values. IBM’s patent leadership is key in demonstrating it’s strategic commitment to the fundamental R&D necessary to drive progress in business and society, and an important barometer of innovation. Inventions are a great source of value to IBM, to clients, to business partners and society as a whole.

The Top Ten list of 2016 U.S. patent recipients* includes:

  1. IBM – 8,088
  2. Samsung Electronics – 5,518
  3. Canon – 3,665
  4. Qualcomm – 2,897
  5. Google – 2,835
  6. Intel – 2,784
  7. LG Electronics – 2,428
  8. Microsoft – 2,398
  9. Taiwan Semiconductor Manufacturing Co. – 2,288
  10. Sony – 2,181

*Data provided by IFI CLAIMS Patent Services

In the area of cognitive computing and artificial intelligence, IBM inventors patented more than 1,100 inventions that help machines learn, reason, and efficiently process diverse data types while interacting with people in natural and familiar ways. Here is sample of some of the Patents filed in 2016:

  • Machine learning to secure the best answers: Providing accurate answers to questions that are posed by users. (US Patent #9,384,450)
  • Planning the best route for a traveler’s cognitive state: IBM inventors have developed a method for planning a trip route based on the state of travelers that affects driving risk the most: their state-of-mind. Had a long day or easily overwhelmed? This system will help you navigate a less stressful route home. (US Patent #9,384,661)
  • Using images to better gauge heart health: IBM researchers have developed a method for categorizing human heart disease states by using cardiac images to characterize the shape and motion of the heart.  (US Patent #9,311,703)
  • Using drones to clean microbes in hospitals and agricultural fields: In this patent, surveying, testing and measuring contamination is controlled by a cognitive facility that manages drones. The drones could enter a contaminated area, collect specimens then confirm and map and sterilize contamination.  (US Patent #9,447,448)
  • Measurement and Integrated Reporting of Public Cloud Usage in a Hybrid Cloud Environment:  This innovation enables enterprises to monitor and measure employee and application usage and reduce information technology costs. (US Patent #9,336,061)
  • Pre-emptively detecting and isolating cloud application network intrusions:  When network breaches are detected, networking between applications – or their subcomponents – can be locked down to minimize the impact of an attack. (US Patent #9,361,455)
  • Managing incoming communications to prevent phishing and the spread of malicious content: IBMers invented a system to create levels of permission and trust for inbound communications such as e-mails and text messages. This system determines a level of trustworthiness to assign to an inbound communication, and how much of that communication to forward on to a user. (US Patent #9,460,269)

 

3 Compelling Use cases for Entity Analytics

Entity analytics is used to detect non-obvious relationships, resolve entities, and find threats and vulnerabilities that are hiding in your disparate collections of data. Through the medium of three use cases, let’s try to understand how Entity Analytics can help organizations enhance their customer experience.

entityanalytics1Scenario 1

Entity Analytics can detect non-obvious relationships between entities. It can also analyze new data sources in context leading to new insights and opportunities. In this scenario you have some data in an MDM system and another set of data in a spreadsheet file. Suppose you want to run a marketing campaign to target high-net-worth clients to sell them a premium bank account. The information in the one MDM system in isolation doesn’t give you the needed information. You want to bring these two sources together and determine if you can identify individuals that can be targeted for the new account.

In the MDM system, John Smith lives with Mary Smith. The spreadsheet file shows that John Smyth (spelled differently) is actually a high-net-worth client. Combining this information we can say that John Smith is actually the same person across the data sets. He’s a high-net-worth client, and he has a wife. With this information you want to target Mary Smith with a premium bank account because she lives with a high-net-worth individual. Entity analytics enables you to discover and understand this opportunity.

entityanalytics2

Scenario 2

Entity Analytics can find where threats and vulnerabilities are hiding in big data and respond efficiently. In this scenario for a risk assessor in an insurance firm, severe rainfall is predicted within a geographical area that includes the client’s residential location. When pulling up the client data from MDM and the flood warnings being issued from the environmental agency, we can match across the data sets to identify that a number of properties are at risk. So, the client can then be provided an early warning to help mitigate risk and increase the flood risk value on the client’s property renewal. Also, if you have an elderly customer that is at severe risk; you can take action to notify the emergency service to ensure a proactive resolution to any potential threat.

entityanalytics3

Scenario 3:

Lets see how using Entity Analytics, MoneyGram International Inc.,  a money transfer company gets notified of questionable activities in real time for faster predictive and preventive decision making. This helped them to save $200 million in just two years!

Summary

Entity analytics help organizations by launching more target-oriented campaigns and by reducing the risk of fraud. With the help of entity analytics, organizations can predict and preempt suspicious activity faster and with reduced costs. Entity analytics further help by allowing enterprises to detect the entities that are the same, regardless of whether the entities are hidden or masked. So following questions can be raised:

  • Does this Analytics require an MDM Ninja or can something be set up easily by a Business user?
  • Do we have Entity Analytics available on Cloud for decisions that cannot waaaaiiiiittttt?

Stay tuned for my next blog.

 

The 4 Personas for Data Analytics

Due to new modernization strategies, data analytics is architected from  top down or through the lens of the consumers of the data. In this blog, I will describe the four roles that are integral to the data lifecycle. These are the personas who interact with data while uncovering and deploying insights as they explore this organizational data.

Citizen analysts/knowledge workers

A knowledge worker is primarily a subject-matter expert (SME) in a specific area of business—for example, a business analyst focused on risk or fraud, a marketing analyst aiming to build out new offers or someone who works to drive efficiencies into the supply chain. These users do not know where or how data is stored, or how to build an ETL flow or a machine learning algorithm. They simply want to access information on demand, driving analysis from their base of expertise, and create visualizations. They are the users of offerings like the Watson Analytics.

Data scientists

Data scientists can do a more sophisticated analysis, find a root cause to a problem, and develop a solution based on an insight that he discovers. They can use SPSS, SAS, etc or open source tools with built-in data shaping and point-and-click machine learning to manipulate large amount of data.

Data engineers

They focus enable data integrations, connections (plumbing) and data quality. They do the underlying enablement that a data scientist and citizen analyst depend on. They typically depend on solutions like DataWorks Forge to access multiple data source and to transform them within a fully managed service.

Application developers

Application developers are responsible for making analytics algorithms actionable within a business process, generally supported by a production system. Beginning with the analytics algorithms built by citizen analysts or data scientists, they work with the final data model representation created by data engineers, building an application that ties into the overall business process. They use something like Bluemix development platform and APIs for the individual data and analytics services.

Putting it all together

Image a scenario where a Citizen analyst notices (from a dashboard) that retail sales are down for the quarter. She pulls up Watson Analytics and uses it to discover that the underlying problem is specific to a category of goods and services in store in a specific region. But she needs more help to find the exact cause and a remedy.

She engages her data scientists and engineer. They discuss the need to pull in more data than just the transactional data the business analyst already has access to, specifically weather, social, and IoT data from the stores. The data engineer helps create the necessary access – the data scientists can then form and test various hypothesis using different analytic models.

Once the data scientist determines the root cause, he then shares the model with the developer who can then leverage it to improve the company’s mobile apps and websites to be more responsive in real-time to address the issue. The citizen analyst also shares the insight with the marketing department so they can take corrective action.

screen-shot-2016-12-05-at-1-33-18-pm