DataStage Best Practices – 2

A red button with the words Act Now on it

Some consideration/recommendation when it come to deploying Information Server.
1. For a deployment on Intel based server, it is recommended to have 4GB RAM per Core as minimum and if you can afford do 8GB per Core.

2. Key File System that need good I/O are –

  • Source Files Folder
  • Target Files Folder
  • Scratch Disk (For those cases with 8GB/Core, consider creating RAM to be used for Scratch Disk)
  • Resource Disk

3. GRID related consideration –
Consider Network File System (NFS)  for the program files that is shared by conductor to all compute nodes. NFS or Clustered File System can be used for the Source/Target Files and Resource Disk.

4. In 11.3,  few more repositories were added to store information with the intention to manage the systems better. This implies higher capacity requirements for the repository. This means more licenses if not using DB2. The other way is to do more house keeping or not to use those features.

5. Job Design and Operation Consideration –

  • Try to keep the Project within 500 objects.
  • Always clean the job log in Director (Auto Purge can be set in Administrator Client). There new ways to store the logs in Repository.
  • When possible always complete the process with 1 job since all the processing is done within memory.
  • If there’s requirements to have multiple jobs to complete the process, the intermediate “files” should always be DataSets.
  • For Big DataSet, it is always recommended to use Compress/Expand to process it. In some cases you will see over 50% reduction in processing time. The extra processing time to Compress/Expand the DataSet is compensate by using lesser storage which is typically the slowest components in a Server. The size of DataSet after compress can be as small as 10% of the original size, a difference of 5GB versus 50GB for large file.

6. It is better to use an Enterprise scheduler rather than a Job Sequence when Enterprise Scheduler is available. Leverage on good Enterprise scheduler to handle the following –

  • Dependency Management (Cross Platform and Application)
  • Priority Management
  • Resource Management
  • Concurrency Management
  • DataStage Upgrade – It can be used to handle dependency across multiple version of DataStage during transition
  • Handle Active-Active Engine Requirement.

Here is another site providing valuable hints for DataStage.

One thought on “DataStage Best Practices – 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s