In my last blog, I took a digression from speaking about InfoSphere Information Server and discussed what is ETL. Since the digression has already taken place, I think it is a good time to discuss some “somewhat similar” concepts called ELT & EII.
ELT (short for extract, load, and transform) is a slightly different approach to data integration than ETL. In the case of ELT, the data is first extracted from the source(s), loaded into the target database, and then transformed and integrated into the desired format. All the heavy data processing takes place inside the target database. The advantage of this approach is that in general, a database system is better suited for handling large workloads where hundreds of millions of records need to be integrated. Database systems are also usually optimized for I/O (throughput), which helps to process data faster, too.
There are pros and cons of what one may want to use (ETL/ELT). In order to benefit from an ELT approach, the ELT tool needs to know how to use the target database platform and the specific SQL dialect being used. This is the reason there aren’t a lot of ELT solutions on the market. Some detailed description on ETL Vs ELT could be found here.
Pushing heavy transformations down to the database creates significant costs and complexity. Simply joining heterogeneous data sources using SQL requires building costly database staging areas. SQL jobs to move data to target has one or more of these challenges:
- Relatively high elapsed processing times
- Very complex scripts, including multiple merge, joins, cursors and unions
- High impact on resource utilization, including CPU, memory and storage
- Unstable or error-prone code
Both ETL and ELT move or copy data physically to another data store, (Say from the OLTP to the data warehouse system). In some cases, however, there is no need to move or copy data. In fact, most users don’t even care whether there is an ETL process and a data warehouse complemented with data marts: They just want access to their data – quickly and easily.
So instead of physically integrating data, it is virtually integrated, making the data accessible in real time when it is needed. This is called enterprise information integration, or EII; other terms such as data federation and data virtualization are used as well and have the same meaning. The main advantage of this approach is, of course, the fact that data is always up-to-date. Another advantage is that there is no extra storage layer and no extra data duplication. Some data warehouse environments copy the same data three or four times: once in a staging area, the data warehouse itself, and finally the data marts. By using virtual data integration techniques, the data is accessible for an end user as if it were a data mart, but in reality the EII tool takes care of all the translations and transformations in the background.
Sounds like a win-win case? But it will be challenging to manage large volumes of cleansed, current data using a virtual approach. More over ETL tools like Information Server can be used for EII. In one of my previous blog on Dynamic Warehousing I gave some description on it.