There are quiet a good number of strengths of using DataStage. In this blog, I will describe the top three (my personal choice) where I find it to be really cool. One is we need not bother about the underlying structure while designing a job (remember my last post?), another is due to the functions available, much of the transformations can happen without a need of a staging database, and finally the way it scales. Here is some more description of these in little detail.
- One of the great strengths of InfoSphere DataStage is that when designing jobs, very little consideration to the underlying structure of the system is required and does not typically need to change. If the system changes, is upgraded or improved, or if a job is developed on one platform and implemented on another, the job design does not necessarily have to change. InfoSphere DataStage has the capability to learn about the shape and size of the system from the InfoSphere DataStage configuration file. Further, it has the capability to organize the resources needed for a job according to what is defined in the configuration file. When a system changes, the file is changed, not the jobs. A configuration file defines one or more processing nodes with which the job will run. The processing nodes are logical rather than physical. The number of processing nodes does not necessarily correspond to the number of cores in the system.
- Another great strength of InfoSphere DataStage is that it does not rely on the functions and processes of a database to perform transformations: while InfoSphere DataStage can generate complex SQL and leverages databases, InfoSphere DataStage is designed from the ground up as a multipath data integration engine equally at home with files, streams, databases, and internal caching in single-machine, cluster, and grid implementations. As a result, users in many circumstances find they do not also need to invest in staging databases to support InfoSphere DataStage.
- Linear scalability and very high data processing rates were obtained for a typical information integration benchmark using data volumes that scaled to one terabyte. Scalability and performance such as this is increasingly sought by customers investing in a solution that can grow with their data needs and deliver trusted information throughout their organization.