Exposing Existing Datastage Batch Jobs as Services

In one of my previous blogs I mentioned about ETL jobs and in my last couple of blogs I was writing about role of an ETL tool. Now suppose you have created an ETL batch job and you want to enable the user call it dynamically by passing the name of input and output file in real time.

Here is the use case in detail.
1. User has a bunch of input files on server.
2. User deploys the job as a service which takes the input as the name of input file and name of the output file. (Just names as they are at predefined locations)
3. Based on the name of the input file, a particular file is selected, processed and processed output is written to the output file.

So is it possible? I was not sure till today morning and I am assuming some of the readers would also not be. But it is possible.

Deploying the Batch Job
User can deploy a batch job and then starts on demand. Each service request starts one instance of the job that runs to completion. This job typically initiates a batch process from a real-time process that does not need direct feedback on the results. It is tailored for processing bulk data sets and is capable of accepting job parameters as input arguments.

These jobs are called Topology 1 jobs and have the following characteristics:

Start and stop times
The elapsed time for starting and stopping a batch job, also known as latency, is high. This factor contributes to a low throughput rate in communication with the service client.
Job instances
The Information Service Framework (ISF) agent starts job instances on demand to process service requests, up to a maximum that you configure. For load balancing, you can run the jobs on multiple InfoSphere DataStage servers.
Input and output
An information service that is based on a batch job can use job parameters as input arguments. This type of service returns no output. If you design the information service, you can set values for job parameters. If the job ends abnormally, the service client receives an exception.

Like the following job takes bunch of Addresses and Validates/Standardizes it:

CASS

User can parametrize the names of input and output file in the job. It signals the engine that these values will be provided during run time. Then we deploy these jobs using ISD. These parameters appear as the input parameters for the deployed job, as if by magic🙂. If you have selected Rest Bindings, you can invoke the job by adding the parameters to the end of the URL.

For example the following is your Rest URL: https://Server:port/wisd-rest2/USAC/Addr_Validation/newOperation
Then this service can simply be invoked using https://Server:port/wisd-rest2/USAC/Addr_Validation/newOperation1?input=CASSIN&output=CASSOUT

And now you have successfully transformed your job to a Rest Service (or SOAP or EJB). Feel free to comment to get some of the missing details.

2 thoughts on “Exposing Existing Datastage Batch Jobs as Services

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s