IBM Vs Informatica (2 of 2)

In my last blog, we compared IBM’s Information Server and Informatica’s Power Center based on their scalability. Here is the summary: Big Data and enterprise class data environments need unlimited data scalability to keep pace with data volume growth. Informatica’s PowerCenter is NOT designed to provide unlimited data scalability which may lead to investment in expensive workarounds.

In this blog we will touch upon two other important aspect of ETL tools.

Data Governance and Metadata Management


  • IBM provides a data governance solution (Information Governance Catalog) designed for business users.
    • Information Governance Catalog has deep support of REST API interface. This makes Information Sever more open and ensures compatibility with other enterprise systems. User can create custom enhancement and loaders as well as can create unique user interfaces for a consistent look and feel.
    • There is a superior Event based notification that takes advantage of open source kafka messaging. For example, Import of metadata is an “event” that can be monitored for workflow and approval purposes, or simply for notification.
    • There is graphical reporting to illustrate relationships, data design origins, and data flow lineage to help answer “what does this mean” and “where did this data come from?”
    • There is an advanced search and navigation or a “shopping” experience for the data.
    • Metatadata Asset Manager controls what data goes in the repository. “Import Areas”  govern what is being imported into the repository (or not), and who is able to import. These imports are initiated via browser interface.  No local Windows installation is required for the metadata administrator.
  • Informatica lacks these capabilities and provides a data governance solution designed for technical users. It lacks openness of their platform and you get locked to “Informatica Only” architecture.

Data Quality

  • IBM provides an integrated data integration platform with one processing engine, one user design experience for data integration and data quality, and one shared metadata repository. Information Server gives ability to write a datastage job once and run it anywhere (transcational database, hadoop or eventually spark)
  • Informaticdataquality.pnga provides a  collection of multiple and incompatible processing engines, user design experiences, and metadata repositories. Informatica Data Quality and Informatica Power Center are two different products that have different user interfaces.  In fact, PC needs two interfaces to design jobs an manage workflows. It also uses two engines. This means that Data Quality processes have to be ‘pushed’ or ‘exported’ to PC to run.
In Summary, we can say Information Server is a better solution to go in case we want to create scalable workflows, open-ness in architecture and better productivity design and running the workflows. Information Server supports the power of 1.
  • 1 Engine: The same engine runs stand-alone, in a grid, or natively in Hadoop/YARN. Jobs can remain unchanged regardless of deployment model.
  • 1 Design Experience: Single design experience for Data Integration and Data Quality that increases productivity and reduces error.
  • 1 Repository: A single active metadata repository across the entire portfolio and so design and execution metadata instantly shared among team members.
Disclaimer: The postings on this site are my own and don’t necessarily represent IBM‘s positions, strategies or opinions

A World with Watson

An year back I wrote my first blog about Watson. I have been closely following what’s happening with Watson. Here are some facts on Watson and what user’s of Watson are speaking about it.


Quick Facts About Watson:

  • By the end of this year, Watson will touch one billion people in some way
  • Watson can “see,” able to describe the contents of an image. For example, Watson can identify melanoma from skin lesion images with 95 percent accuracy, according to research with Memorial Sloan Kettering.
  • Watson can “hear,” understanding speech including Japanese, Mandarin, Spanish, Portuguese, among others.
  • Watson can “read” 9 languages.
  • Watson can “feel” impulses from sensors in elevators, buildings, autos and even ball bearings.
  • Watson has been trained on 8 types of cancers, with plans to add 6 more this year.
  • Beyond oncology, Watson is in use by nearly half of the top 25 life sciences companies, major manufacturers for IoT applications, retail and financial services firms, and partners like GM, H&R Block and
  • At IBM, there are more than 1,000 researchers focused solely on artificial intelligence

But perhaps more important than what Watson can do, it is what people, businesses and institutions of all sizes are doing with Watson. See what some IBM Watson users are saying.
What IBM and Watson has been at the leading edge of is providing enterprise grade, commercially ready cognitive services, fully integrated with a top notch cloud and many other services from analytics to support and sales & marketing.”  — André M. König, Co-Founder @ Opentopic Inc. This quote was included in Mr. König’s article “Watson is a Joke?” featured on LinkedIn.

All of us involved in training Watson… are absolutely convinced that this technology will become an indispensable part of a doctor’s armamentarium to care for patients.” — Mark G. Kris, MD, lead physician of the Memorial Sloan-Kettering-IBM Watson collaboration. Dr. Kris’s quote was featured in a June 25, 2017 article in the American Society of Clinical Oncology entitled “How Watson for Oncology is Advancing Personalized Patient Care.”

But, the probably more exciting part about it is in 30 percent of patients Watson found something new. And so that’s 300-plus people where Watson identified a treatment that a well-meaning, hard-working group of physicians hadn’t found.” Dr. Norman “Ned” Sharpless, director of the Lineberger Comprehensive Cancer Center at the University of North Carolina at Chapel Hill and recent presidential appointee as director of the National Cancer Institute.
Dr. Sharpless’ made these comments in a “60 Minutes” segment that aired on October 2016 and again on June 25, 2017. The segment can be viewed here.

30 minutes is down to 8 minutes to screen a patient…That coordinator can now spend that valuable time gained … in educating the patient on why it’s important for her to be in that clinical trial, helping to break down other barriers.”  Dr. Tufia Haddad, MD, Breast Medical Oncologist, Mayo Clinic, made these comments during an AI in Healthcare panel during HIMSS 2017, reported here.

We could have individually looked at the 1,500 proteins and genes but it would have taken us much longer to do so.  IBM Watson for Drug Discovery, with its robust knowledge base, was able to rapidly give us new and novel information we would not otherwise have had.” – Robert Bowser, PhD, director of the Gregory W. Fulton ALS Research Center at Barrow Neurological Institute and one of the nation’s leading ALS researchers. Quote is from a press release announcing the recent Society for Neuroscience study findings.

[With Watson], we’re seeing some really tremendous efficiencies gained in the drilling business – [including] an 80 percent reduction in the geoscience research time we need to actually design our wells. That means geoscience searchers are doing geoscience not looking out for more data.” -Peter Coleman, CEO and Managing Director for Woodside [source:  Investor Briefing, March 7, 2017]

[Watson services] was a wake-up call for us – that cognitive solutions are real and powerful. We felt that IBM had, by far, the largest lead in terms of where cognitive was going and that the Watson team would be in the best position to help our business users.” -Ryan Bartley, Head of Applied Innovation at Staples [source: IBM Watson blog, February 10, 2017]

It’s not man versus machine—they very much work hand and hand. Our analysts continue to play a critical role in evaluating a cyber security incident, while Watson for Cyber Security enforces their decisions and validates what they are sharing with the customer at risk. It enables security analysts to deliver faster and more accurate details on a breach, so we may better protect our customers.” – Ronan Murphy, CEO, Smarttech (source: Press Release, May 11, 2017)

Seven Reasons Why Enterprises Trust IBM Software

Recently IBM announced that it would be backing Spark in it’s effort to embrace and promote Open Source. At this, technology entrepreneur and co-founder of the venture capital firm Andreessen Horowitz, Ben Horowitz said, “It’s like Spark just got blessed by the enterprise rabbi.” So this is the position that IBM commands as it stands as a technology company supporting it’s clients for over a century. In this blog I will share Seven reasons why  major corporations around the world rely heavily on IBM for critical services and solutions.


1. Innovation: Once IBM’s CEO asked one of top Indian Telco customer to describe IBM in one word. They immediately said – Innovation. Innovation is in IBMers DNA. Watson is just an illustration of IBM’s innovative prowess which demolished human competitors in a highly touted series of Jeopardy! games. IBM has been the top position in number of inventions for more than two decades now. You can read some notable inventions hereFrom eWeek: “IBM might be, at heart, an old school, enterprise-focused company, but it also keep coming up with innovative ideas, including artificial intelligence, supercomputing and the role of the mainframe in cloud computing. The company’s Watson invention is one of the most important it’s brought to the public in some time, and its work on capturing and analyzing big data to make it actionable in a corporate environment could have a positive effect on the world for decades to come.”

2. Understands Customers Needs: Management expert (and author of books such as Built to Last) Jim Collins says, “If you consider what IBM’s mission is, it’s not about computers or technology. It’s about allowing its individual employees to create ways for its customers to solve operational problems. Whether that’s a task best done with scales, typewriters or computers doesn’t matter; what matters is that customers’ needs are answered.” IBM understands the business of Enterprises and so is the market Leader the Gartner’s Magic quadrant in almost any technology area.

3. Spread Across Geographies: IBM has it’s offices in over 170 countries making it easy to reach an executive to get a demo or a quick help. In my induction to IBM 13 years back I was told that it is one  the top three most popular brand name around the world!

4. Trust: Which of the company can an Enterprise trust that will last for the next decade? Will it be acquired by another company and with its fate unknown? IBM has managed to have organic growth to survive 10 decades. Nobody will ever complete a leveraged buy out of IBM. When a company is looking for important solutions in key areas such as infrastructure software or security, the vendor’s reputation and trustworthiness are crucial considerations. There is an old saying in the industry: “Nobody ever got fired for buying IBM.”

5. Big Pockets: Why IBM is a Leader in most of Gartner’s magic quadrants? You guessed it. Either it innovates to be there or it acquires the company which is there. Mobile and Cloud solutions market are on rise and IBM is ready with $4 billion investment in these areas. Hardware operations lost half a billion dollars in 2013 due to large shifts in the commodity hardware market. For most companies, that sort of loss would spell the end, but given that IBMs big pocket, the management team is simply transitioning the business through this change cycle.

6. Experience: IBM survived several recessions, technological shifts and intense competition and demonstrated a strength shared by most 100-year-old companies: the ability to learn and change. For example,  many enterprises are now joining the band wagon of big data,  whereas IBM’s InfoSphere Information Server has over a decade of experience in big data movement and data governance. You may watch this video that captures IBM’s 100 years of experience that changed the world.

ibm7. Stack Integration: The one advantage you get with IBM is that IBM does everything – from silicon to solutions (end-to-end). Morningstar analyst Peter Wahlstrom says, “IBM holds a defensible position in enterprise software, services and hardware. While each of these businesses is an industry leader in its own right, the combination of these products and services provides the firm with a unique solution creation perspective and delivery ability that is key to its wide economic moat.“‘

I hope this would have been an interesting read – specially when it comes from an IBM developer who had been developing market leading software since over a decade.

Disclaimer: The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions

What is InfoSphere BigInsights ?

I spent some time reading about IBM InfoSphere BigInsights. In this blog, I wish to share the summary of what I read.

Need for a solution like BigInsights
Imagine if you were able to:

  •  Build sophisticated predictive models from the combination of existing information and big data information flows, providing a level of depth that only analytics applied at a large scale can offer.
  •  Broadly and automatically perform consumer sentiment and brand perception analysis on data gathered from across the Internet, at a scale previously impossible using partially or fully manual methods.
  • Analyze system logs from a variety of disparate systems to lower operational risk.
  •  Leverage existing systems and customer knowledge in new ways that were previously ruled out as infeasible due to cost or scale.

Highlights of InfoSphere BigInsightsInfoSphere-BigInsights

  • BigInsights allows organizations to cost-effectively analyze a wide variety and large volume of data to gain insights that were not previously possible.
  • BigInsights is focused on providing enterprises with the capabilities they need to meet critical business requirements while maintaining compatibility with the Hadoop project.
  • BigInsights includes a variety of IBM technologies that enhance and extend the value of open-source Hadoop software to facilitate faster time-to-value, including application accelerators, analytical facilities, development tools, platform improvements and enterprise software integration.
  • While BigInsights offers a wide range of capabilities that extend beyond the Hadoop functionality, IBM has taken an optin approach: you can use the IBM extensions to Hadoop based on your needs rather than being forced to use the extensions that come with InfoSphere BigInsights.
  • In addition to core capabilities for installation, configuration and management, InfoSphere BigInsights includes advanced analytics and user interfaces for the non-developer business analyst.
  • It is flexible to be used for unstructured or semi-structured information; the solution does not require schema definitions or data preprocessing and allows for structure and associations to be added on the fly across information types.
  • The platform runs on commonly available, low-cost hardware in parallel, supporting linear scalability; as information grows, we simply add more commodity hardware.

InfoSphere BigInsights provides a unique set of capabilities that combine the innovation from the Apache Hadoop ecosystem with robust support for traditional skill sets and already installed tools. The ability to leverage existing skills and tools through open-source capabilities helps drive lower total cost of ownership and faster time-to-value.  Thus InfoSphere BigInsights enables new solutions for problems that were previously too large and complex to solve cost-effectively.

Disclaimer: The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions

Java Integration Stage (1 of 3)

In an ETL tool, we may want to invoke an external Java code for some intermediate processing of data. Information Server has a stage called “Java Integration Stage” that is meant to accomplish this.

Java Integration Stage:
The Java Integration stage provides the functionality to invoke Java code that interfaces with InfoSphere Data Stage and Quality Stage parallel jobs. The customer will be able to use the stages to integrate their Java code into their job design.

Java Integration StageSome Highlights of Java Integration Stage API:

  • Provides functionality to
    • Produce (write) rows that are used within the job
    • Consume (read) rows that are supplied on input links
    • Process rows from an input link and generate rows on the output link
    • Query column and stage metadata
  • Supports Java Beans for simplicity, and to allow for a user’s existing Java code to be invoked from the Java Integration Stage.
  • Supports a ‘column-based’ mode for querying metadata dynamically at runtime, and for dynamic access of column data.
  • Provides a discovery interface that allows a user’s code to learn about the calling environment, and for the framework to learn about the user’s code capabilities.
  • Supports any number of inputs and outputs
  • Supports reject links, and the ability to transfer records from an input to an output
  • Improves design issues with the current Java Pack API (such as being able to get the links’ column metadata in initialize() without having to create an input row).
  • Supports sending end-of-wave markers to output links
  • Supports Runtime Column Propagation (RCP)
  • Supports Automatic Column Transfer

What is BlueMix? (Part 3 of 3)

I hope you would have read the Part1 and Part2 of this series. By the time I am writing this Blog, I have an application up and running on BlueMix. So in this last blog in the series, I wish to share what makes development on BlueMix much easier.

[1] Choosing development tools that suit my needs 
With BlueMix, developers have the freedom to choose the development tools that work best for them.

Command line: The Cloud Foundry (CF) command line provides integration for developers who code without an IDE (integrated development environment). I used this command line to deploy my web app on BlueMix.
Eclipse: The Cloud Foundry integration can be installed from the Eclipse Marketplace. And now we are ready to develop on Eclipse and from there deploy directly on BlueMix. I have not tried it till now.
Web IDE: Developers can work with the Web IDE directly in BlueMix. This allows modification of the application without any development environment installed on the developers’ laptops.

[2] Services marketplace where we can shop for service(s) required by our app or we can put our service(s) for consumption
Pre-built services make application assembly very easy. These services leverage APIs and software development kits (SDKs) that can quickly and easily be incorporated with BlueMix applications.  IBM itself provides several application runtimes and services that we can use to get started with building our app. Example of these are DataCache (which is WebSphere eXtreme Scale) and Elastic MQ (WebSphere MQ). Moreover BlueMix offers an open and flexible ecosystem which allows other companies to provide services that can be integrated into applications. Companies can be both providers and users of services. “User Provided Services” can be added so that organizations can share services within their organization. This promotes more reuse and standardization of services within the company. “Managed Services” can be exposed to others.

[3] Source Control integration makes BlueMix a Great place to start development of new project
BlueMix also comes with integration to several source control management (SCM) systems. These include Git, GitHub and Jazz SCM. These environments can be configured to deliver application changes continuously.

[4] Easy to Manage
 Users can start or stop applications and define how much memory is associated with each application very easily. BlueMix will automatically redeploy workloads to other virtual machines (VMs) if there is an outage. Moreover BlueMix can automatically scale a deployed application up or down based on application usage.

IBM had a good timing to launch it’s PAAS platform. Now Oracle and HP are also following suit as noted in the following articles.

Since a picture speaks a thousand words, here is a view from my Catalog Tab.
BlueMix Catalog

Note: IBM Bluemix Is Now IBM Cloud

IBM – 21 years of patent leadership

IBMers were granted a record of 6,809 U.S. patents in 2013 , the 21st consecutive year IBM has led in U.S. patent issuances – and third year in a row of more than 6,000. This year’s total is more than the combined totals of Amazon, Google, EMC, HP, Intel, Oracle/SUN and Symantec. Many of these patents are in strategic areas-–such as IBM’s Watson, cloud computing, Big Data analytics and the new cognitive computing era.

Top Ten 2013 U.S. Patent Leaders

Further Reading:
IBM sets a new Patenting record in 2012