Why Blockchain?

There has been a lot of buzz on blockchain taking it to Gartners Hype Cycle for Emerging Technologies, 2016. It has been envisioned that blockchain will do for transactions what the Internet did for information. So in this blog, lets discuss the need for blockchain?

Why Blockchain?

Complex Transactions

If you’ve ever bought a house, you probably had to sign a huge stack of papers from a variety of different stakeholders to make that transaction happen. It is a complex transaction involving banks, attorneys, title companies, insurers, regulators, tax agencies and inspectors. They all maintain separate records, and it’s costly to verify and record each step. That’s why the average closing takes several days. Same holds good if you are registering a vehicle. In these two examples, what you are doing is ‘Establishing ownership of the asset’ and the problem is that there are several ledgers (or databases) where the information resides and all of them have to have the same version of truth. So the problem are many fold:

  • Multiple ledger(s) which are updated to represent business transactions as they occur.
  • This is EXPENSIVE due to duplication of effort and intermediaries adding margin for services.
  • It is clearly INEFFICIENT, as the business conditions – the contract – is duplicated by every network participant and we need to rely on intermediaries through this paper laden process.
  • It is also VULNERABLE because if a central system (e.g. Bank) is compromised due to an incidents this affects the whole business network.  Incidents can include fraud, cyber attack or a simple mistake.


What if there existed a common ledger (or a distrubuted database) that everyone had an access to and everyone trust? This is what blockchain does to the business!

Why now?

There are three reasons why blockchain is starting to take a foothold now.
  • Industries are merging and interacting like never before. The growth of ecommerce, online banking, and in-app purchases, and the increasing mobility of people around the world have fueled the growth of transaction volumes. And transaction volumes will explode with the rise of Internet of Things (IoT) — autonomous objects, such as refrigerators that buy groceries when supplies are running low and cars that deliver themselves to your door, stopping for fuel along the way. These partnerships require more trust and transparency to succeed.
  • There is increasing regulation, cybercrime and fraud that is inhibiting business growth. The last 10 years have seen the growth of global, cross-industry regulations, including HIPA, Sarbanes -Oxley Act, anti-money laundering and more. And to keep pace with regulatory changes, companies are rapidly increasing compliance staff and budgets.
  • Advancement in technologies like cloud (offering compute power to track billions of transactions) and cryptography (securing both networks and transactions) are also enablers for blockchain.

In my future blog I will discuss how blockchain makes things better and how it works. So stay tuned.


3 Compelling Use cases for Entity Analytics

Entity analytics is used to detect non-obvious relationships, resolve entities, and find threats and vulnerabilities that are hiding in your disparate collections of data. Through the medium of three use cases, let’s try to understand how Entity Analytics can help organizations enhance their customer experience.

entityanalytics1Scenario 1

Entity Analytics can detect non-obvious relationships between entities. It can also analyze new data sources in context leading to new insights and opportunities. In this scenario you have some data in an MDM system and another set of data in a spreadsheet file. Suppose you want to run a marketing campaign to target high-net-worth clients to sell them a premium bank account. The information in the one MDM system in isolation doesn’t give you the needed information. You want to bring these two sources together and determine if you can identify individuals that can be targeted for the new account.

In the MDM system, John Smith lives with Mary Smith. The spreadsheet file shows that John Smyth (spelled differently) is actually a high-net-worth client. Combining this information we can say that John Smith is actually the same person across the data sets. He’s a high-net-worth client, and he has a wife. With this information you want to target Mary Smith with a premium bank account because she lives with a high-net-worth individual. Entity analytics enables you to discover and understand this opportunity.


Scenario 2

Entity Analytics can find where threats and vulnerabilities are hiding in big data and respond efficiently. In this scenario for a risk assessor in an insurance firm, severe rainfall is predicted within a geographical area that includes the client’s residential location. When pulling up the client data from MDM and the flood warnings being issued from the environmental agency, we can match across the data sets to identify that a number of properties are at risk. So, the client can then be provided an early warning to help mitigate risk and increase the flood risk value on the client’s property renewal. Also, if you have an elderly customer that is at severe risk; you can take action to notify the emergency service to ensure a proactive resolution to any potential threat.


Scenario 3:

Lets see how using Entity Analytics, MoneyGram International Inc.,  a money transfer company gets notified of questionable activities in real time for faster predictive and preventive decision making. This helped them to save $200 million in just two years!


Entity analytics help organizations by launching more target-oriented campaigns and by reducing the risk of fraud. With the help of entity analytics, organizations can predict and preempt suspicious activity faster and with reduced costs. Entity analytics further help by allowing enterprises to detect the entities that are the same, regardless of whether the entities are hidden or masked. So following questions can be raised:

  • Does this Analytics require an MDM Ninja or can something be set up easily by a Business user?
  • Do we have Entity Analytics available on Cloud for decisions that cannot waaaaiiiiittttt?

Stay tuned for my next blog.


Need for ETL tool – As explained to Undergraduate Students (Part 1)

UniversityYesterday I went to an University Campus to deliver a talk on ‘Data Cleansing’ . One of the challenges that I faced was to explain these students the need for an ETL tool in a way they can relate to. So I created a hypothetical story so that they can appreciate the need of such a tool and I am narrating it again to get feedback of the readers.

In the world of analytics, data is a resource or asset to make business decisions. For example a CEO of a Bank wants to know what are the good locations for opening a new center? One of the way this decision can be made is to find out the area where there are more number of customers concentrated. Say near Bank Location A there are 25000 customers residing but typically the bank can serve only 10,000 so that makes a case to expand the branch or open another branch.

So the bank hires some college interns to get the count of customers per location of the branch. Note the individual branch does not have this information as this information resides in a central database of the bank. So the interns need to get the list of the customers and divide them into different cities and different location and get the count of the customers in each location. Sounds like a day’s job for someone who knows how to key in proper SQL statements (Select , Group by, Count etc.). Correct? Let’s see…

On Day 1 of the job, they speak to the DB Administrator  to get access to the database where they can give these queries. The Database Administrator has some concerns…

1. Which database they want an access to. The Loan customer’s data reside in Oracle, the debit Card customer’s data resides in the DB2, The Credit Card customer’s data reside in Netezza and so on…
2. Moreover recently the bank had some acquisition and the data from the acquired bank still uses the Informix database.
3. To add to all this, the Loan account, Debit Card account, Credit Card account are considered different accounts. If a single person has all these accounts from the perspective of bank they are three different individuals. Or in technical terms the data contains duplicates. Not to forget that a bank was recently acquired and some of the customers would have account in both of them, so they should also be considered one person.
4. As if it was not enough, the Database Administrator added that these complex queries cannot be run on up and running live databases. Customers are making transactions and bank will suffer huge loss in case of down time. But if required he can give read access to one of the mirror database for queries during night from 11:00pm to 4:00 am.

So now what looked like a day’s job turned out to be something that would take them months to accomplish. And what if more complexities are added in between (like more data sources or the window to move data is not sufficient)? So what is the solution? Stay tuned…

Data Delivery using CDC

In my previous blog, I mentioned how Data Virtualization can be achieved through Federation, Consolidation, Replication and Caching. Consolidation is ETL batch process traditionally running at the end of each business day. In my earlier blog, I have spoken in some details about it. In this blog, I wish to spend some more time on ‘Replication’ in general focusing on specifics of InfoSphere CDC towards the end.

When would we require data to be Replicated / have Incremental data delivery?
When business require their data to provide up to the minute or near real-time information, they opt for this method of data delivery. This includes both replication and change data capture. Replication moves data from database to database to provide solutions for (a) continuous business availability, (b) live reporting, and (c) database or platform migrations. When using change data capture, the target is not necessarily a database. In addition to the solutions included in replication, this approach can also feed changes to an ETL process or deliver data changes to a downstream application by using a message queue.

Some examples of how Replication is used include the following:

  •  Providing feeds of changed data for Data Warehouse or Master Data Management (MDM) projects, enabling users to make operation and tactical business decision making using the latest information.
  •  Dynamically routing data based on content to message queues to be consumed by one or more applications, ensuring consistent, accurate, and reliable data across the enterprise.
  • Populating real-time dashboards for on-demand analytics, continuous business monitoring, and business process management to integrate information between mission-critical applications and web applications, ensuring access to real-time data to customers and employees.
  •  Consolidating financial data across systems in different regions, departments, and business units.
  •  Improving the operational performance of systems that are adversely affected by shrinking nightly batch windows or expensive queries and reporting functions.

So what is InfoSphere CDC:
Change data capture uses a developed technology to integrate data in near real time. InfoSphere CDC detects changes by monitoring or scraping database logs. The capture engine (the log scraper) is a lightweight, small footprint, and low-impact process on the source server running where the database changes are detected. After the log scraper finds new changed data on the source, that data is pushed from the source agent to the target apply engine through a standard Internet Protocol network socket. In a typical continuous mirroring scenario, the change data is applied to the target database through standard SQL statements. By having the data only interact with the database logs, additional load is not put on the source database and no changes are required to the source application.

Incremental data delivery is shown below:
Incremental Data Delivery

InfoSphere Quality Stage – II (Various Parts)

Sometime back I had an introduction on the need of Data Cleansing and the power of InfoSphere Quality Stage. In this series, I plan to take a closer look on Quality Stage’s various parts and capabilities.

IBM QualityStage allows us to use the following processes to analyze the data,to Cleanse/Standardize data, Match Various data and create one unique view of an entity:

  • Investigation — Helps to understand the nature and scope of data anomalies
  • Standardization — Parses individual fields and makes them uniform according to business standards
  • Matching — Identifies duplicate records within and across data sources
  • Survivorship — Helps eliminate duplicate records and create the best-breed record of data

To take a closer look at each of the above steps you can click them to go to one of the detailed blogs explaining them.

What is Business Intelligence?

In my last sets of posts, I was mentioning the journey of data to a Data Warehouse. It started from an ETL job that can has an ability to extract the Data from different types of sources (Z/OS, SAP or Custom) and on the way getting transformed, cleansed to land into a final source as “Trusted Information”(which by definition means Accurate, Complete, Insightful and Real Time). So why was this effort made after all? An answer that I already provided was that for compliance purpose this much effort is needed and more or less sufficient (at least from IT perspective) to ensure that our records are proper. But beyond that, often such data can be further used to give valuable insights. This is where BI (called as Bee Eye) or Business Intelligence comes into picture.

What is BI?
Business intelligence (BI) is defined as the ability for an organization to take all its capabilities and convert them into knowledge, ultimately, getting the right information to the right people, at the right time to make right decisions. These decisions drive organizations. Making a good decision at a critical moment may lead to a more efficient operation, a more profitable enterprise, or perhaps a more satisfied customer. BI tools and processes working on Trusted data provides a safer way to make decision than making a decision by a “gut feeling”.

Where does BI Apply?

  • BI can be used to segment customers and identify the best ones. We can analyze data to understand customer behaviors, predict their wants and needs, and offer fitting products and promotions. Finally, we can identify the customers at greatest risk of attrition, and intervene to try to keep them.
  • The human resources department can learn which individuals in an organization are high performers and then hire, train, and reward other employees to become similar high performers.
  • Inventory managers can segment their inventory items by cost and velocity, build key facilities in the best locations, and ensure that the right products are available in the right quantities.
  • Production can minimize its costs by setting up activity-based costing programs.

With BI can I make any business decisions accurate?
BI just assists in making a proper decision. But in places, intuition may be required. What if we do not have sufficient time to run our tools and get a report before making a decision? What if we have no precedent data to make decision or that history is misleading?

So does BI munch the trusted data and gives you some gyan (sanskrit word for giving insight)? Not really. There are two additional things that it should do. It should measure the results according to predetermined metric and also feed the lessons from one decision into the next.

These are my tit-bits gathered from reading about BI from various sources. I welcome readers to share their understandings or point to some more interesting read in this upcoming area.

Data Governance – I (Basics)

Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. Data governance guarantees that data can be trusted and that people can be made accountable for any adverse event that happens because of poor data quality. So Data Governance is about putting people in charge of fixing and preventing issues with data, so that the enterprise can become more efficient.

Data governance encompasses the people, processes, and information technology required to create a consistent and proper handling of an organization’s data across the business enterprise.

  •  People – Effective enterprise data governance requires executive sponsorship as well as a firm commitment from both business and IT staffs.
  • Policies – A data governance program must create – and enforce – what is considered “acceptable” data through the use of business policies that guide the collection and management of data.
  •  Technology – Beyond data quality and data integration functionality, an effective data governance program uses data synchronization technology, data models, collaboration tools and other components that help create a coherent enterprise view.

The benefits to a holistic approach are obvious; better data drives more effective decisions across every level of the organization. With a more unified view of the enterprise, managers and executives can create strategies that make the company more profitable.