I apologize if someone landed on this blog looking for some Job openings for ETL. It was not a publicity stunt to give a name like this. I blog because I like to write. And we do not have memories like our older generations, so if we do not write, we tend to forget soon. Hmmm, interesting. According to some modern theory, we should have been getting better than our previous generations, then how come we are having a poorer memory, poorer strength, higher susceptibility to disease when compared to them? Anyhow, lets get back to the DataStage Jobs.
InfoSphere DataStage is the tool that I have been working on since 2006. In this blog I wish to share how we can easily integrate various data sources and extract the data from there and transform it on the way and finally load it in our warehouse.
InfoSphere DataStage provides a designer tool that allows developers to visually create integration jobs. Job is used within InfoSphere DataStage to describe extract, transform and load (ETL) tasks. Jobs are composed from a rich palette of operators called stages. These stages include:
• Source and target access for databases, applications and files
• General processing stages such as filter, sort, join, union, lookup and aggregations
• Built-in and custom transformations
• Copy, move, FTP and other data movement stages
• Real-time, XML, SOA and Message queue processing.
Additionally, InfoSphere DataStage allows pre- and post-conditions to be applied to all these stages. Multiple jobs can be controlled and linked by a sequencer. The sequencer provides the control logic that can be used to process the appropriate data integration jobs. InfoSphere DataStage also supports a rich administration capability for deploying, scheduling and monitoring jobs.
A picture speaks thousand words, here is a sample data stage job on parallel canvas (more about Parallel Canvas later)
This is just a brief. Hopefully someday I will go in detail describing about these individual stages. Now one may ask will I get some good job, if I can write these DataStage jobs? I guess the name suggests it 🙂