Data Lake Vs Data Warehouse

DataLakeIn my last blog, I wrote on Data Lake. The first comment on the Blog was to find out the difference between Data Lake and Data Warehouse. So in this blog, I will try to share some of my understanding on their difference:

Schema: In Data Warehouse (DW), schema is defined before data is stored. This is called “Schema on WRITE” or required data is identified and modeled in advance. But in Data Lake the schema is defined after the data is stored. This is called “Schema on READ”. So the data must be captured in code for each program accessing the data.

Cost (Storage and Processing) : Data Lake provides cheaper storage of large volumes of data and has potential to reduce the processing cost by bringing analytics near to data.

Data Access: The data lake gives business users immediate access to all data. They don’t have to wait for the data warehousing (DW) team to model the data or give them access. Rather, they shape the data however they want to meet local requirements. The data lake speeds delivery which is required in a dynamic market economy.

Flexibility: Data Lakes offers unparalleled flexibility since nobody or nothing stands between business users and the data.

Data Quality: The quality of data that exists in a traditional Data Warehouse is cleansed whereas typical data that exist in Data Lake is Raw.

Relevance in Big Data world: Traditional approach of manually curated data warehouses, provides limited window view of data and are designed to answer only specific questions identified at the design time. This may not be adequate for data discovery in today’s big data world. Moreover data lake can contain any type of data – clickstream, machine-generated, social media, and external data, and even audio, video, and text. Traditional data warehouses are limited to structured data. The data lake can hold any type of data. For example, data lakes are an ideal way to manage the millions of patient records for a hospital. These patient records can be physicians’ notes to lab results. With a data lake, the hospital stores all of that disparate data in its original format, calling upon specific types of record when needed, converting the data into uniform structures only when the situation calls for it.

Data Lake does provide some advantages to the Enterprises who require quick access to data. But Data Lakes brings  it’s own sets of challenges. I will explore this in my subsequent blogs.

3 thoughts on “Data Lake Vs Data Warehouse

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s