Data Transformation Manager
In earlier articles I explained PowerCenter 9 Architecture, Integration Service Architecture and Integration Service process. In this article I will focus on Data Transformation Manager (DTM) and its functions.
Just to recap, I explained how when workflow is kicked off Integration Service picks up the request and creates Integration Service process. Job of Integration Service process is to interact with Repository Service to retrieve metadata information about workflow, various sources / targets / lookups etc. and establish connection pools. Once this is done, it starts Data Transformation Manager process and passes session and parameter file information to DTM process.
Read the article Informatica PowerCenter Integration Service Architecture to read functioning of Integration Service process in details.
There are various actions that DTM process performs, e.g., reading data from source, writing to target, processing transformations, etc.
- Integration Service process provides DTM process with session instance information. DTM process then reads metadata about Session and associated Mapping from repository and then validation process is done
- If Session is using a parameter file, Integration Service process passes parameter file information to the DTM process. DTM is capable of finding out Session Level, Mapping Level and Workflow level parameters and variables using this information.
- DTM process also validates all the connection objects associated with Source/Target/Lookups and sees if appropriate execute permissions are available to the user who started the workflow.
- Once Connection Object verification is done DTM executes and pre-session and then post-session commands provided.
- One of the important functions of DTM process is to create multiple threads to perform various transformations specified in the mapping. DTM process creates separate threads for lookup, readers, writers, etc.
- DTM process is also responsible for creating session logs. Informatica has best logging mechanism where all the errors, transformation status and overall session processing status are logged. This makes Informatica powerful in terms of debugging any issue while executing the session.
- DTM process also checks for any post session success or failure emails and processes them accordingly.
Above list contains typical execution of a DTM process. There are more functions like validation of code pages, creation of partition groups or dynamic partitions, etc. which I am not including in this article. I believe you can master the architecture if you understand above listed functions.
Let us now understand DTM processing in details. This section will help you understand how DTM understands mapping structure, various transformations used and how it executes the mapping in order to process data from source to target. This topic is very important to understand as it will help you in designing Informatica Mappings in efficient way. I believe that until you understand how a tool functions, it is difficult to develop a code in that tool.
Target Load Order Group
Before getting into further details of DTM let us understand what Target Load Order Group means. If you are new to Informatica and reading this article with no experience in creating mappings, you might find it difficult to understand. I will try my best to keep the language simple and explain as much as I can.
Whenever a mapping is created in Informatica, it contains one or more target load order groups. Target load order specifies a collection of sources, in between transformations and targets together. Refer to figure provided below, there are two target load orders (depicted as Group A and Group B). Target Load Order Groups are always processed sequentially; which group is processed first is specified in Target Load Plan settings in PowerCenter Designer. In example provided below Group A loads first and then Group B.
There can be one or more pipelines in single target load order group. In our example, you can see Pipeline A and B belonging to same Target Load Order Group A; pipeline C belongs to Target Load Order Group B. Within a single Target Load Order Group source data is pulled concurrently. Below should be order of processing:
- Target Load Order Group A is processed
- Sources 1 and 2 are read concurrently
- Targets 4 and 5 are loaded concurrently
- Target Load Order Group B is processed
- Source 3 is read
- Target 6 is loaded
This concludes the series of 3 articles on Informatica Architecture.
First Article in the series provides an overview of Informatica Architecture.
Second article details functioning of Informatica Integration Service and Integration Service process
Third and final article provides an overview of Data Transformation Manager (DTM).
I hope you enjoyed reading these articles; I will soon come up with advanced Informatica topics.