InfoSphere DataStage – XV (Balanced Optimization)

An IBM InfoSphere DataStage job consists of individual stages that are linked together and describe the flow of data from a data source to a data target. Balanced Optimization allows you to maximize job performance and optimize resources usage, which enables you to balance the workload across source and target systems. This allows Information Server to support not only the Extract-Transform-Load paradigm, but alternatives such as Extract-Load-Transform, where transformation tasks are performed on the target system, such as an IBM PureData™ for Analytics data warehousing appliance.

Balanced Optimization helps to improve the performance of your InfoSphere DataStage job designs that use connectors to read or write source data. You design your job and then use Balanced Optimization to redesign the job automatically to your stated preferences.

For example, you can maximize performance by minimizing the amount of input and output (I/O) that are used, and by balancing the processing against source, intermediate, and target environments. You can then examine the new optimized job design and save it as a new job. Your root job design remains unchanged.

You can use the Balanced Optimization features of InfoSphere DataStage to push sets of data integration processing and related data I/O into database management systems (such as an IBM PureData System for Analytics warehousing appliance) or into a Hadoop cluster.

4 thoughts on “InfoSphere DataStage – XV (Balanced Optimization)

  1. Hi Namit,

    Thank for sharing and good explanation.
    Understand that with Balance Optimization is helping to improve performance the DataStage jobs, its will push to source or target.

    For this case, do you have any statistic of study case how many percentages improve performance comparison with Balance Optimization and without balance optimization ? do you mind to share ?

    Let assume that only one source/target to push to IBM PureData System, resources in DataStage all jobs created are used Balance Optimization, in term on I/O and CPU will be reduced (eg 90% to 30%), for this case DataStage server I’m may reduce the hardware spec like CPU from 6 cores to 4 cores and saving some cost.

    Can share your thought or point of view on this ?

    • Hi Jonny, Thank you for your comment.
      It is difficult to comment on the percentage improvement with and without bal-op because it depends on data and the job. In extreme case (say with a very small amount of data) the performance may go down too after using bal-op. But we have seen a case where we got 80% improvement in the execution time for a job after moving parts of the job to source and/or target.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s