Joiner Transformation Overview
Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources. The Joiner transformation is an active transformation.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
To join more than two sources in a mapping, join the output from the Joiner transformation with another source pipeline. Add Joiner transformations to the mapping until you have joined all the source pipelines.
To work with the Joiner transformation
Configure the Joiner transformation properties. Properties for the Joiner transformation identify the location of the cache directory, how the Integration Service processes the transformation, and how the Integration Service handles caching.
Configure the join condition. The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row.
Configure the join type. A join is a relational operator that combines data from multiple tables in different databases or flat files into a single result set. You can configure the Joiner transformation to use a Normal, Master Outer, Detail Outer, or Full Outer join type.
Configure the session for sorted or unsorted input. You can improve session performance by configuring the Joiner transformation to use sorted input. To configure a mapping to use sorted data, you establish and maintain a sort order in the mapping so that the Integration Service can use the sorted data when it processes the Joiner transformation.
Configure the transaction scope. When the Integration Service processes a Joiner transformation, it can apply transformation logic to all data in a transaction, all incoming data, or one row of data at a time.
Defining the Join Type
The Joiner transformation is similar to an SQL join except that data can originate from different types of sources. The Joiner transformation supports the following types of joins: A normal or master outer join performs faster than a full outer or detail outer join.
Normal
Master Outer
Detail Outer
Full Outer
Normal Join: With a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the condition. (ie process only the matching rows from Master & Detail)
Master Outer Join: A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
Detail Outer Join: A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
Full Outer Join: A full outer join keeps all rows of data from both the master and detail sources.
Points to Remember
- The naming convention for Joiner transformations is JNR_DesciptionOfTransformation.
- The Designer configures the first set of source fields as Detail fields and second set of source fields as Master fields by default. You can edit this property later.
- The master and detail ports must have matching datatypes. The Joiner transformation only supports equivalent (=) joins.
Performance Tips for Joiner Transformation
- For an unsorted Joiner transformation, designate the source with fewer rows as the master source. During a session, the Joiner transformation compares each row of the master source against the detail source. The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
- For a sorted Joiner transformation, designate the source with fewer duplicate key values as the master source. When the Integration Service processes a sorted Joiner transformation, it caches rows for one hundred keys at a time. If the master source contains many rows with the same key value, the Integration Service must cache more rows, and performance can be slowed.
- Improve session performance by configuring the Joiner transformation to use sorted input. You see the greatest performance improvement when you work with large data sets.