Why Blockchain?

There has been a lot of buzz on blockchain taking it to Gartners Hype Cycle for Emerging Technologies, 2016. It has been envisioned that blockchain will do for transactions what the Internet did for information. So in this blog, lets discuss the need for blockchain?

Why Blockchain?

MultipleLedgers
Complex Transactions

If you’ve ever bought a house, you probably had to sign a huge stack of papers from a variety of different stakeholders to make that transaction happen. It is a complex transaction involving banks, attorneys, title companies, insurers, regulators, tax agencies and inspectors. They all maintain separate records, and it’s costly to verify and record each step. That’s why the average closing takes several days. Same holds good if you are registering a vehicle. In these two examples, what you are doing is ‘Establishing ownership of the asset’ and the problem is that there are several ledgers (or databases) where the information resides and all of them have to have the same version of truth. So the problem are many fold:

  • Multiple ledger(s) which are updated to represent business transactions as they occur.
  • This is EXPENSIVE due to duplication of effort and intermediaries adding margin for services.
  • It is clearly INEFFICIENT, as the business conditions – the contract – is duplicated by every network participant and we need to rely on intermediaries through this paper laden process.
  • It is also VULNERABLE because if a central system (e.g. Bank) is compromised due to an incidents this affects the whole business network.  Incidents can include fraud, cyber attack or a simple mistake.

Solution:

What if there existed a common ledger (or a distrubuted database) that everyone had an access to and everyone trust? This is what blockchain does to the business!

Why now?

There are three reasons why blockchain is starting to take a foothold now.
  • Industries are merging and interacting like never before. The growth of ecommerce, online banking, and in-app purchases, and the increasing mobility of people around the world have fueled the growth of transaction volumes. And transaction volumes will explode with the rise of Internet of Things (IoT) — autonomous objects, such as refrigerators that buy groceries when supplies are running low and cars that deliver themselves to your door, stopping for fuel along the way. These partnerships require more trust and transparency to succeed.
  • There is increasing regulation, cybercrime and fraud that is inhibiting business growth. The last 10 years have seen the growth of global, cross-industry regulations, including HIPA, Sarbanes -Oxley Act, anti-money laundering and more. And to keep pace with regulatory changes, companies are rapidly increasing compliance staff and budgets.
  • Advancement in technologies like cloud (offering compute power to track billions of transactions) and cryptography (securing both networks and transactions) are also enablers for blockchain.

In my future blog I will discuss how blockchain makes things better and how it works. So stay tuned.

Advertisements

Data Science Vs BI & Predictive Analytics

Business intelligence (BI) has been evolving for decades as data has become cheaper, easier to access, and easier to share. BI analysts take historical data, perform queries, and summarize findings in static reports that often include charts. The outputs of business intelligence are “known knowns” that are manifested in stand-alone reports examined by a single business analyst or shared among a few managers. For example, who are the probable high-net-worth clients to sell them a premium bank account. There can be some consideration like the average account balance etc.

Predictive analytics has been unfolding on a parallel track to business intelligence. With predictive analytics, numerous tools allow analysts to gain insight into “known unknowns”. These tools track trends and make predictions, but are often limited to specialized programs. In the previous example, the probable high-net-worth client could also be the spouse of an existing high-net-worth client that can be figured out using predictive analytics.

Data Science on the other hand is an interdisciplinary field that combines machine learning, statistics, advanced analysis, high-performance computing and visualizations. It is a new form of art that draws out hidden insights and puts data to work in the cognitive era. The tools of data science originated in the scientific community, where researchers used them to test and verify hypotheses that include “unknown unknowns”. Here are some of the examples:

  • Uncover totally unanticipated relationships and changes in markets or other patterns. For example the price of a house based on nearness to high voltage power lines or based on brick exterior.
  • Handle streams of data—in fact, some embedded intelligent services make decisions and carry out those decisions automatically in microseconds. For example analyzing the users click pattern to dynamically propose a product or promotion to attract the customer.

As discussed, Data Science different from from traditional business intelligence and predictive analytics in the following way.

  • It brings in data that is orders of magnitude larger than what previous generations of data warehouses could store, and it even works on streaming data sources.
  • The analytical tools used in data science are also increasingly powerful, using artificial intelligence techniques to identify hidden patterns in data and pull new insights out of it.
  • The visualization tools used in data science leverage modern web technologies to deliver interactive browser-based applications. Not only are these applications visually stunning, they also provide rich context and relevance to their consumers.

Data science enriches the value of data, going beyond what the data says to what it means for your organization—in other words, it turns raw data into intelligence that empowers everyone in your organization to discover new innovations, increase sales, and become more cost-efficient. Data science is not just about the algorithm, but about deriving value.

 

Disclaimer: The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

 

Need for Governance in Self-Service Analytics

Screen Shot 2017-03-31 at 9.55.05 PM
Analytics Offering without Self-Service

Self-Service Analytics is a form of business intelligence in which line-of-business professionals or data scientists are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support. This empowers everyone in the organization to discover new insights and enable them for informed decision-making. Capitalizing on the data lake, or modernized data warehouse, they can do full data set analysis (no more sampling), gain insight from non-relational data, support individuals in their desire for exploratory analysis and discovery with 360o view of all their business. At this stage, the organization can truly be data-savvy and insight-driven leading to better decisions, more effective actions, and improved outcomes. Insight is being used to make risk-aware decisions, or fight fraud and counter threats, optimize operations or most often focused on attract, grow and retain customers.

Any self service analytics, regardless of persona, has to involve data governance. Here are three examples of how any serious analytics work would be impossible without support for a proper data governance practice in the analytics technology:

  1. Fine-grained authorization controls: Most industries feature data sets where data access needs to be controlled so that sensitive data is protected. As data moves from one store to another, gets transformed, and aggregated, the authorization information needs to move with that data. Without the transfer of authorization controls as data changes state or location, self-service analytics would not be permitted under the applicable regulatory policies.
  2. Data lineage information: As data moves between different data storage layers and changes state, it’s important for the lineage of the data to be captured and stored. This helps analysts understand what goes into their analytic results, but it is also often a policy requirement for many regulatory frameworks. An example of where this is important is the right to be forgotten, which is a legislative initiative we are seeing in some Western countries. With this, any trace of information about a citizen would have to be tracked down and deleted from all of an organization’s data stores. Without a comprehensive data lineage framework, adherence to a right to be forgotten policy would be impossible.
  3. Business glossary: A current and complete business glossary acts as a roadmap for analysts to understand the nature of an organization’s data. Specifically, a business glossary maps an organization’s business concepts to the data schemas. One common problem with Hadoop data lakes is a lack of business glossary information as Hadoop has no proper set of metadata and governance tooling.

Summary:
A core design point of any self service analytics offering (like the IBM DataWorks) is that data governance capabilities should be baked in. This enables self-service data analysis where analysts only see data they’re entitled to see, where data movement and transformation is automatically tracked for a complete lineage story, and as users search for data, business glossary information is used.

Watson Analytics

Need for Watson Analytics
If an organization is good at analyzing data and extracting relevant insights from it then decision makers can make more informed and thus more optimal decisions. But the decision makers are forced to make decisions with incomplete information. The reason?  Decisions makers/ Citizen Analysts, for the most part, tend to be mainly consumers of analytics and they rely on more skilled resources (Like Data Engineer, Data Scientist, Application developer) in the organization to provide the data driven answers to their questions. Moreover the answer to one question is just the start of another. Think of a detective interrogating a suspect. The consumer/builder model is hardly conducive to the iterative nature of data analysis. Therefore, the time it takes for these answers to be delivered to the decision makers is far from optimal – and many questions go unanswered every day.

watsonlogoWatson Analytics
So a logical solution is to provide an easier to use analytics offerings. Watson Analytics provides that value add so that more people will be able to leverage data to drive better decision making using analytics.

When we think of Watson, we think about Cognitive. And when we think about Analytics, we think  about traditional analytics (querying, dashboarding), along with some more advanced analytic capabilities (data mining, and social media analytics). So Watson Analytics is a Cloud based offering which can make analytics a child’s play even for a non-skilled user.

Watson Analytics helps users understand their data in a guided way using a natural language interface to ask a series of business questions. Example, a user can ask “What is the trend of revenue over years?” and get a visualization in response. So, Instead of having to first choose a visualization and working backwards to try answer the business question, Watson Analytics allows you to describe your intent in natural language, and it chooses the best visualization for you. Even better, Watson Analytics gives you some initial set of questions which you can keep refining.

Watson Analytics for Social Media
Watson Analytics can work on Social Media data to take the pulse of an audience by spotting trends and identifying new insights and relationships across multiple social channels allowing greater visibility into a given topic or market. It combines structured and unstructured self-service analysis to enrich your social media analytics experience for exceptionally insightful discoveries. All on the cloud!

Summary of Steps:
Watson Analytics does the following to provide insights hidden in your Big data. Mouse-over the below images to get the details of the steps.

  • Import data from a robust set of data source (on Cloud and on premise) options, with the option to prepare and cleanse via IBM Bluemix Data Connect.
  • Answering What: Identifying issues, early problem detection, finding anomalies or exceptions, challenging conventional wisdom or the status quo.
  • Understanding or explaining outcomes, Why something happened.
  • Dashboarding to share results

Match and Manage your Data on Cloud

We left the last blog with two questions.

A few weeks back I wrote on IBM Bluemix Data Connect. If you missed it, then watch this video on how you can put data to work with IBM Bluemix Data Connect.

Now, Business Analysts will be able to leverage Entity Matching technology using Data Connect. The Match and Manage (BETA) operation on Data Connect identifies possible matches and relationships (in plethora of data sets, including master data and non-master data sets) to create a unified view of your data. It also provides a visualization of the relationships between entities in the unified data set.

For example, you have two sets of data : One containing customer profile information and the other containing a list of prospects. A Business Analyst can now use intuitive UI to do the Match and Manage operation to match these two data sets and provide insights to questions such as:

  •  Are there duplicates in the prospect list?
  • How many of the prospects are already existing customers?
  • Are there non-obvious relationships among prospects and customers that can be explored?
  • Are there other sources of information within that could provide better insights if brought together?

The two data set are matched using Cognitive capabilities which allows the MDM– matching technology to be auto-configured and tuned to intelligently match across different data sets:

dataconnect

Business Analyst can understand the de-duplicated datasets by navigating through a relationship graph of the data to understand how the entities are related across the entire dataset. Now they can discover new non-obvious relationships within the data that were previously undiscoverable. The following generated canvas enables them to interactively explore relationships between entities.

dataconnect1

In the above example it was illustrated as how clients can now easily understand the data they hold within their MDM repositories and how now they can match their MDM data with other data sources not included within the MDM system. This simplifies the Analytical MDM experience where MDM technologies are accessible to everyone without the need to wait for Data Engineers to transform the data into a format that can be matched and rely on MDM Ninja’s to configure matching algorithms.

Summary:

IBM Bluemix Data Connect provides a seamless integrated self-service experience for data preparation. With addition of entity analytics capability, business users are empowered to gain insight from data that wasn’t previously available to them. Now organizations can extract further value from their MDM data by ensuring it is used across the organization to provide accurate analytics. Entity analytics within Data Connect is now available in beta. Go ahead and experience the next evolution of MDM.

The 4 Personas for Data Analytics

Due to new modernization strategies, data analytics is architected from  top down or through the lens of the consumers of the data. In this blog, I will describe the four roles that are integral to the data lifecycle. These are the personas who interact with data while uncovering and deploying insights as they explore this organizational data.

Citizen analysts/knowledge workers

A knowledge worker is primarily a subject-matter expert (SME) in a specific area of business—for example, a business analyst focused on risk or fraud, a marketing analyst aiming to build out new offers or someone who works to drive efficiencies into the supply chain. These users do not know where or how data is stored, or how to build an ETL flow or a machine learning algorithm. They simply want to access information on demand, driving analysis from their base of expertise, and create visualizations. They are the users of offerings like the Watson Analytics.

Data scientists

Data scientists can do a more sophisticated analysis, find a root cause to a problem, and develop a solution based on an insight that he discovers. They can use SPSS, SAS, etc or open source tools with built-in data shaping and point-and-click machine learning to manipulate large amount of data.

Data engineers

They focus enable data integrations, connections (plumbing) and data quality. They do the underlying enablement that a data scientist and citizen analyst depend on. They typically depend on solutions like DataWorks Forge to access multiple data source and to transform them within a fully managed service.

Application developers

Application developers are responsible for making analytics algorithms actionable within a business process, generally supported by a production system. Beginning with the analytics algorithms built by citizen analysts or data scientists, they work with the final data model representation created by data engineers, building an application that ties into the overall business process. They use something like Bluemix development platform and APIs for the individual data and analytics services.

Putting it all together

Image a scenario where a Citizen analyst notices (from a dashboard) that retail sales are down for the quarter. She pulls up Watson Analytics and uses it to discover that the underlying problem is specific to a category of goods and services in store in a specific region. But she needs more help to find the exact cause and a remedy.

She engages her data scientists and engineer. They discuss the need to pull in more data than just the transactional data the business analyst already has access to, specifically weather, social, and IoT data from the stores. The data engineer helps create the necessary access – the data scientists can then form and test various hypothesis using different analytic models.

Once the data scientist determines the root cause, he then shares the model with the developer who can then leverage it to improve the company’s mobile apps and websites to be more responsive in real-time to address the issue. The citizen analyst also shares the insight with the marketing department so they can take corrective action.

screen-shot-2016-12-05-at-1-33-18-pm

DataStage now available on Cloud

For data integration projects, DataStage has been the work horse for many years. It is used by Data Engineers to extract data from many different sources, transform and combine the data, and then populate them for applications and end users. DataStage has many distinct advantages over other popular ETL tools.

ETL on CloudUntil recently, these capabilities were only available with the on-premises offering. Now DataStage is available on the Cloud as a hosted cloud offering. Customers can take advantage of the full capabilities of DataStage and without the burden and time consumption of standing up the infrastructure and installing the software themselves. Customers can quickly deploy a DataStage environment (from ordering to provisioning it on the cloud) and be up and running in a day or less. There is no up-front capital expenditure as customers only pay a monthly subscription based on the capacity they purchase. Licensing is also greatly simplified.

Using DatasStage on Cloud, existing DataStage customers can start new projects quickly. Since it is hosted in the IBM cloud, the machine and operating system are managed by IBM. The customer will not have to spend time to either increase the current environment or create a new one. In other words, Cloud elasticity makes them ready to scale and handle any workload. DataStage ETL job developers can immediately be productive and the data integration activities can span both on-premises and cloud data if necessary, as the DataStage jobs can be exported from the cloud and brought back to an on-premises DataStage environment.

datastage-on-cloud2As an example; A customer has data sources such as Teradata, DB2, etc. in their data center as well as SalesForce, MongoDB and other data residing in the Cloud. They need access to their existing data sources and their cloud data sources for a new customer retention project . This project requires some sophisticated data integration to bring it all together but they don’t have the IT resources or budget to stand up a new data integration environment in their own data center for this project. So, an instance of DataStage on the Cloud can be deployed for their use. The customer can access the DataStage client programs on the Cloud to work with DataStage. The access would be either through the public Internet or a private connection via the SoftLayer VPN. DataStage ETL jobs running in the Cloud can access the customer’s on-premise data sources and targets using secured protocols and encryption methods. In addition, these DataStage jobs can also access cloud data sources like dashDB as well as data sources on other cloud platforms using the appropriate secured protocols.

So with DataStage hosted on the Cloud you can:

  1. Extend your ETL infrastructure: Expand your InfoSphere DataStage environment or begin transitioning into a private or public cloud with flexible deployment options and subscription pricing.
  2. Establish ad hoc environments: Extend your on-premises capacity to quickly create new environments for ad hoc development and testing or for limited duration projects.
  3. Start new projects in the cloud: Move straight to the cloud without establishing an on-premises environment. Realize faster time-to-value, reduce administration burden and use low-risk subscription pricing.

Go here for more information: https://developer.ibm.com/clouddataservices/docs/information-server/