Why Blockchain?

There has been a lot of buzz on blockchain taking it to Gartners Hype Cycle for Emerging Technologies, 2016. It has been envisioned that blockchain will do for transactions what the Internet did for information. So in this blog, lets discuss the need for blockchain?

Why Blockchain?

MultipleLedgers
Complex Transactions

If you’ve ever bought a house, you probably had to sign a huge stack of papers from a variety of different stakeholders to make that transaction happen. It is a complex transaction involving banks, attorneys, title companies, insurers, regulators, tax agencies and inspectors. They all maintain separate records, and it’s costly to verify and record each step. That’s why the average closing takes several days. Same holds good if you are registering a vehicle. In these two examples, what you are doing is ‘Establishing ownership of the asset’ and the problem is that there are several ledgers (or databases) where the information resides and all of them have to have the same version of truth. So the problem are many fold:

  • Multiple ledger(s) which are updated to represent business transactions as they occur.
  • This is EXPENSIVE due to duplication of effort and intermediaries adding margin for services.
  • It is clearly INEFFICIENT, as the business conditions – the contract – is duplicated by every network participant and we need to rely on intermediaries through this paper laden process.
  • It is also VULNERABLE because if a central system (e.g. Bank) is compromised due to an incidents this affects the whole business network.  Incidents can include fraud, cyber attack or a simple mistake.

Solution:

What if there existed a common ledger (or a distrubuted database) that everyone had an access to and everyone trust? This is what blockchain does to the business!

Why now?

There are three reasons why blockchain is starting to take a foothold now.
  • Industries are merging and interacting like never before. The growth of ecommerce, online banking, and in-app purchases, and the increasing mobility of people around the world have fueled the growth of transaction volumes. And transaction volumes will explode with the rise of Internet of Things (IoT) — autonomous objects, such as refrigerators that buy groceries when supplies are running low and cars that deliver themselves to your door, stopping for fuel along the way. These partnerships require more trust and transparency to succeed.
  • There is increasing regulation, cybercrime and fraud that is inhibiting business growth. The last 10 years have seen the growth of global, cross-industry regulations, including HIPA, Sarbanes -Oxley Act, anti-money laundering and more. And to keep pace with regulatory changes, companies are rapidly increasing compliance staff and budgets.
  • Advancement in technologies like cloud (offering compute power to track billions of transactions) and cryptography (securing both networks and transactions) are also enablers for blockchain.

In my future blog I will discuss how blockchain makes things better and how it works. So stay tuned.

Advertisements

Match and Manage your Data on Cloud

We left the last blog with two questions.

A few weeks back I wrote on IBM Bluemix Data Connect. If you missed it, then watch this video on how you can put data to work with IBM Bluemix Data Connect.

Now, Business Analysts will be able to leverage Entity Matching technology using Data Connect. The Match and Manage (BETA) operation on Data Connect identifies possible matches and relationships (in plethora of data sets, including master data and non-master data sets) to create a unified view of your data. It also provides a visualization of the relationships between entities in the unified data set.

For example, you have two sets of data : One containing customer profile information and the other containing a list of prospects. A Business Analyst can now use intuitive UI to do the Match and Manage operation to match these two data sets and provide insights to questions such as:

  •  Are there duplicates in the prospect list?
  • How many of the prospects are already existing customers?
  • Are there non-obvious relationships among prospects and customers that can be explored?
  • Are there other sources of information within that could provide better insights if brought together?

The two data set are matched using Cognitive capabilities which allows the MDM– matching technology to be auto-configured and tuned to intelligently match across different data sets:

dataconnect

Business Analyst can understand the de-duplicated datasets by navigating through a relationship graph of the data to understand how the entities are related across the entire dataset. Now they can discover new non-obvious relationships within the data that were previously undiscoverable. The following generated canvas enables them to interactively explore relationships between entities.

dataconnect1

In the above example it was illustrated as how clients can now easily understand the data they hold within their MDM repositories and how now they can match their MDM data with other data sources not included within the MDM system. This simplifies the Analytical MDM experience where MDM technologies are accessible to everyone without the need to wait for Data Engineers to transform the data into a format that can be matched and rely on MDM Ninja’s to configure matching algorithms.

Summary:

IBM Bluemix Data Connect provides a seamless integrated self-service experience for data preparation. With addition of entity analytics capability, business users are empowered to gain insight from data that wasn’t previously available to them. Now organizations can extract further value from their MDM data by ensuring it is used across the organization to provide accurate analytics. Entity analytics within Data Connect is now available in beta. Go ahead and experience the next evolution of MDM.

3 Compelling Use cases for Entity Analytics

Entity analytics is used to detect non-obvious relationships, resolve entities, and find threats and vulnerabilities that are hiding in your disparate collections of data. Through the medium of three use cases, let’s try to understand how Entity Analytics can help organizations enhance their customer experience.

entityanalytics1Scenario 1

Entity Analytics can detect non-obvious relationships between entities. It can also analyze new data sources in context leading to new insights and opportunities. In this scenario you have some data in an MDM system and another set of data in a spreadsheet file. Suppose you want to run a marketing campaign to target high-net-worth clients to sell them a premium bank account. The information in the one MDM system in isolation doesn’t give you the needed information. You want to bring these two sources together and determine if you can identify individuals that can be targeted for the new account.

In the MDM system, John Smith lives with Mary Smith. The spreadsheet file shows that John Smyth (spelled differently) is actually a high-net-worth client. Combining this information we can say that John Smith is actually the same person across the data sets. He’s a high-net-worth client, and he has a wife. With this information you want to target Mary Smith with a premium bank account because she lives with a high-net-worth individual. Entity analytics enables you to discover and understand this opportunity.

entityanalytics2

Scenario 2

Entity Analytics can find where threats and vulnerabilities are hiding in big data and respond efficiently. In this scenario for a risk assessor in an insurance firm, severe rainfall is predicted within a geographical area that includes the client’s residential location. When pulling up the client data from MDM and the flood warnings being issued from the environmental agency, we can match across the data sets to identify that a number of properties are at risk. So, the client can then be provided an early warning to help mitigate risk and increase the flood risk value on the client’s property renewal. Also, if you have an elderly customer that is at severe risk; you can take action to notify the emergency service to ensure a proactive resolution to any potential threat.

entityanalytics3

Scenario 3:

Lets see how using Entity Analytics, MoneyGram International Inc.,  a money transfer company gets notified of questionable activities in real time for faster predictive and preventive decision making. This helped them to save $200 million in just two years!

Summary

Entity analytics help organizations by launching more target-oriented campaigns and by reducing the risk of fraud. With the help of entity analytics, organizations can predict and preempt suspicious activity faster and with reduced costs. Entity analytics further help by allowing enterprises to detect the entities that are the same, regardless of whether the entities are hidden or masked. So following questions can be raised:

  • Does this Analytics require an MDM Ninja or can something be set up easily by a Business user?
  • Do we have Entity Analytics available on Cloud for decisions that cannot waaaaiiiiittttt?

Stay tuned for my next blog.

 

DataStage now available on Cloud

For data integration projects, DataStage has been the work horse for many years. It is used by Data Engineers to extract data from many different sources, transform and combine the data, and then populate them for applications and end users. DataStage has many distinct advantages over other popular ETL tools.

ETL on CloudUntil recently, these capabilities were only available with the on-premises offering. Now DataStage is available on the Cloud as a hosted cloud offering. Customers can take advantage of the full capabilities of DataStage and without the burden and time consumption of standing up the infrastructure and installing the software themselves. Customers can quickly deploy a DataStage environment (from ordering to provisioning it on the cloud) and be up and running in a day or less. There is no up-front capital expenditure as customers only pay a monthly subscription based on the capacity they purchase. Licensing is also greatly simplified.

Using DatasStage on Cloud, existing DataStage customers can start new projects quickly. Since it is hosted in the IBM cloud, the machine and operating system are managed by IBM. The customer will not have to spend time to either increase the current environment or create a new one. In other words, Cloud elasticity makes them ready to scale and handle any workload. DataStage ETL job developers can immediately be productive and the data integration activities can span both on-premises and cloud data if necessary, as the DataStage jobs can be exported from the cloud and brought back to an on-premises DataStage environment.

datastage-on-cloud2As an example; A customer has data sources such as Teradata, DB2, etc. in their data center as well as SalesForce, MongoDB and other data residing in the Cloud. They need access to their existing data sources and their cloud data sources for a new customer retention project . This project requires some sophisticated data integration to bring it all together but they don’t have the IT resources or budget to stand up a new data integration environment in their own data center for this project. So, an instance of DataStage on the Cloud can be deployed for their use. The customer can access the DataStage client programs on the Cloud to work with DataStage. The access would be either through the public Internet or a private connection via the SoftLayer VPN. DataStage ETL jobs running in the Cloud can access the customer’s on-premise data sources and targets using secured protocols and encryption methods. In addition, these DataStage jobs can also access cloud data sources like dashDB as well as data sources on other cloud platforms using the appropriate secured protocols.

So with DataStage hosted on the Cloud you can:

  1. Extend your ETL infrastructure: Expand your InfoSphere DataStage environment or begin transitioning into a private or public cloud with flexible deployment options and subscription pricing.
  2. Establish ad hoc environments: Extend your on-premises capacity to quickly create new environments for ad hoc development and testing or for limited duration projects.
  3. Start new projects in the cloud: Move straight to the cloud without establishing an on-premises environment. Realize faster time-to-value, reduce administration burden and use low-risk subscription pricing.

Go here for more information: https://developer.ibm.com/clouddataservices/docs/information-server/

IBM Bluemix Data Connect

I have been tracking the development on IBM Bluemix Data Connect quite closely. One of the reason is that I was a key developer in the one of the first few services that it launched almost two years back under the name of DataWorks. Two weeks back I attended a session on Data Connect by the architect and saw a demo. I am impressed at the way it has evolved since then. Therefore I am planning to re-visit DataWorks again, now as IBM Bluemix Data Connect. In this blog I will reconcile the role that IBM Bluemix Data Connect play in the era of cloud computing, big data and the Internet of Things.

Research from Forrester found that 68 percent of simple BI requests take weeks, months or longer for IT to fulfill due to lack of technical resources. So this entails that the enterprises must find ways to transform line of business professionals into skilled data workers, taking some of the burden off of IT. It means business users should be empowered work with data from many sources—both on premises and in the cloud—without requiring the deep technical expertise of a database administrator or data scientist.

This is where cloud services like IBM Bluemix Data Connect comes into picture. It enables both technical and non-technical business users to derive useful insights from data, with point and click access—whether it’s a few Excel sheets stored locally, or a massive database hosted in the cloud.

Data Connect is a fully managed data preparation and movement service that enables users to put data to work through a simple yet powerful cloud-based interface. The design team has taken lot of pain to design the solution in most simplistic way, so that a basic user can quickly get started with it. Data Connect empowers the business analyst to discover, cleanse, standardize, transform and move data in support of application development and analytics use cases.

Through its integration with cloud data services like IBM Watson Analytics, Data Connect is a seamless tool for preparing and moving data from on premises and off premises to an analytics cloud ecosystem where it can be quickly analyzed and visualized. Furthermore, Data Connect is backed by continuous delivery, which adds robust new features and functionality on a regular basis. Its processing engine is built on Apache Spark, the leading open source analytics project, with a large and continuously growing development community. The result is a best-of-breed solution that can keep up with the rapid pace of innovation in big data and cloud computing.

So here are highlights of IBM Bluemix Data Connect:

  • Allow technical and non-technical users to draw value from data quickly and easily.
  • Ensure data quality with simple data preparation and movement services in the cloud.
  • Integrate with leading cloud data services to create a seamless data management platform.
  • Continuous inflow of new and robust features
  • Best-of-breed ETL solution available on Bluemix  – IBMs Next-Generation Cloud App Development Platform

How BlueMix can help in a Natural Disaster

A few minutes back the news headline reads “A powerful earthquake has struck south Asia, with tremors felt in northern Pakistan, India and Afghanistan”. Natural Disasters are becoming a commonplace. Technology can help in predicting about such natural disaster and also can help in relief effort, post disaster. Based on my involvement in Uttarakhanda Disaster relief and Nepal Disaster relief, I want to share how technology can help in post disaster relief.

Why Cloud?

A solution on Cloud is inevitable because of the following reasons:

  • Location – the Cloud datacenters are physically distant from the area of the natural disaster and applications can keep running even when power and telecommunications are disrupted.
  • Autoscaling – applications designed for Cloud can automatically scale up easily to accommodate the sudden spike in the application usage on the event of disaster.
  • Support for distributed team development – you won’t be tied to inaccessible physical build and deployment servers if you hit a bug at exactly the wrong time
  • On demand pricing – Using the infrastructure only when it is required – Reduces cost of solution. No need to keep the infrastructure ready, waiting for the disaster to strike.

 

DisasterWhy Bluemix?

BlueMix offers many out of the box services that can help in this effort and one need not have to create applications afresh for these. A catalog of IBM, third party, and open source services allow the developer to stitch an application together quickly.

  • Lots of available libraries for Node.JS for implementing pop-up sites like Wikis
  • Language translation with Watson can be helpful for displaced persons whose first language is not English
  • Twilio can integrate into SMS messaging and VOIP phone networks

How ETL tool like Information Server can help?

We can use the following functionalities of InfoSphere Information Server in Disaster Management

  • Data Standardization: Lot of Data about the location or disaster victims is passed around. This comes from various sources and can be dirty or unusable. Data Standardization service can do data cleansing to remove noise and make it usable.
  • Data Matching: Victim information needs to be dynamically communicated between disaster relief team and the friends and relatives of the victim. These two different sources need to find each other and exchange information. Probabilistic Matching algorithms become inevitable to bring these two together.

These are some of my thoughts. Please share yours so that others can learn and benefit …

ETL Services now available on Cloud

In my last blog, I started with two questions.

1. What are somethings to be considered from the perspective of deployment ETL service on Cloud?
2. Whether few ETL Services are already available for Enterprises or is this just a theory?

My last blog covered the part 1 and in this blog, I want to dwell more on part 2.

Need for ETL Services on Cloud:
With this explosion of data, the opportunities available to enterprises are booming. But as the enterprises are getting flooded with increasingly more data – data that’s unknown and unproven. Hence enterprises are taking on a whole new scope of risks and complexities. So it’s not enough to just capture the data. There is a need to Control it, Clean it and make refined data readily available to the people driving the business. So can we afford  to create these refinery service from scratch? Lets see with an analogy of getting clean water for Homebuilders.

Homebuilders don’t build water infrastructure, right? They build the home and the pipes underneath, then join it all together with existing pipework for immediate access to clean water. It’s the same way for app developers. They don’t want to govern, clean and monitor data. They just want to bring clean data directly into their applications. That’s the idea behind continuous data refinement – allowing app developers the tools – and the pipes – to build their house.DataWorks

IBM DataWorks™ refining your data
IBM DataWorks™ is a data refinery (on Cloud) to speed application development by getting the data you need, when you need it, and then ensuring it is fit for purpose. It exposes a set of APIs that implement a standard REST model. These APIs allow you to interoperate with feature-rich refinery capabilities. The performance and scalability of the IBM DataWorks engine will ensure that your application runs efficiently. IBM DataWorks includes APIs to identify relevant data, transform the data to suit your needs, and load it to a system for use.

In IBM DataWorks, you begin by finding the data that you want to work with from data sources like SQL Database and dashDB™. You use metrics to better understand your data quality and identify areas to improve.

To improve the data quality, you work with a sample of the data and apply shaping actions such as sorting, filtering, and joining. You can apply the actions to the full data set and load the data to destinations such as Cloudant® NoSQL DB.

For more information visit Data Works