In my previous blogs I was discussing about Data Lake. Imagine you have pooled the entire data of your enterprise to a Data lake, there will be challenges. All this raw data will be overwhelming and unsafe to use because no-one is sure where data came from, how reliable it is, and how it should be protected. Without proper management and governance, such a data lake can quickly become a data swamp. This data swamp can cause frustration to the business users, application developers, IT and even customers.
- Business users: In a data lake they will find slow, difficult access to data, and lack of relevance for the data they do have. They wait in long queues to have data access provided by IT. Once the data becomes available, there are questions about the quality, timeliness and reliability of the data.
- Application developers: They will end up building new applications that don’t properly integrate and govern the data on which the applications rely. Why? Because no tools or services are available to give them easy, fast access to the right data.
- IT: Frustration will abound in IT, as team will be inundated with demands for rapid access to good data from Business people and Application Developers.
- Security Risks: Having all the data in one place creates a much higher risk of loss or misuse of information if the security of the system is compromised.
- Customer Satisfaction: The customers could be unhappy if they felt the bank was monitoring how they spent their money.
- Keeping data current: Customers expect real-time insights into spending patterns, including the transaction that happened a few seconds before. So data must be continuously consolidated and reconciled for real-time processing in the data lake. At the same time the data lake should be able to support batch processing.
- Data representation: Data for business users must be formatted to support simple visualization tools, labeled with relevant business terminology and in step with the data in the systems or record. At the same time, broad access to all types of raw data is needed by data scientists to develop advanced analytics algorithms. So data lake should support different formats of data for different types of users, based on the same data values.
So there is a need for a facility for transforming raw data into information that is Clean, Timely, Useful and Relevant. Hence an enhanced data lake solution was built with management, affordability, and governance at its core. This solution is known as a data reservoir. Probably in one of the subsequent blogs we will take a dip into data reservoir! Stay tuned.