InfoSphere DataStage – I

Information available to business leaders continues to explode. Business requires an extensive data integration software platform that helps organizations derive more value from this complex, heterogeneous information spread across their systems. It should include a suite of components that provides end-to-end information integration that offers discovery, understanding, transformation and delivery capabilities.

IBM laid out requirements companies must consider when developing an information integration architecture equipped to grow with their needs. The seven requirements are:
1. A dataflow architecture supporting data pipelining
2. Dynamic data partitioning and in-flight repartitioning of data
3. Adapting to scalable hardware without requiring modifications of the data flow design
4. Support and take advantage of available parallelism of leading databases
5. High performance and scalability for bulk and real-time data processing
6. Extensive tooling to support resource estimation, performance analysis and optimization
7. Extensible framework to incorporate in-house and third-party software

The tool that I work on [InfoSphere DataStage] is designed to support these seven elements and provide the infrastructure customers require to achieve high performance and scalability in their information integration environment.

InfoSphere Information Server integration functions

By using IBM InfoSphere Information Server for Data Integration, you can transform data in any style and deliver it to any system, which ensures faster time to value. Built-in transformation functions and a common metadata framework help you save time and expense. InfoSphere Information Server also provides the following options for delivering data, whether via bulk (extract, transform, load), virtual (federated) or incremental (data replication) delivery:

  • Build once and run without modification anywhere
  • Reduce complexity with one unified mechanism for parallelizing
  • Achieve a clean separation between the development of a job and the expression of parallelization at runtime
  • Eliminate the need for performance tuning every time you change hardware architecture
  • Add hardware with no data scalability upper limit
  • Project blueprinting capabilities to map out your data integration project to improve visibility and reduce risk.
  • Discovery capabilities to understand the relationships within and across data sources before moving forward.
  • Easy-to-use graphical interface to help you transform information from across your enterprise.
  • Integrate data on demand across multiple sources and targets, while satisfying the most complex requirements with the most scalable run time available.
  • Hundreds of built-in transformation functions, and promote collaboration through a common metadata framework. Select from multiple data delivery options whether through bulk data delivery via extract, transform, and load (ETL) or incremental data delivery (Change Data Delivery).
  • Benefit from balanced optimization capabilities and choose the deployment option that works best for you, such as ETL and extract, load, and transform (ELT).
  • Take advantage of connectivity to database management system (DBMS), big data sources, messaging queues, Enterprise Resource Planning (ERP) and other packaged applications, industry formats, and mainframe systems, all with unique native API connectivity and parallelism.

In my upcoming blogs, I will be posting some more information on this tool.

Here are some good site for more information on DataStage:

2 thoughts on “InfoSphere DataStage – I

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s