Aggregator Transformation Overview
The Aggregator transformation performs aggregate calculations, such as average, Sum, Max & Min etc. The Aggregator is an active transformation that changes the number of rows in the pipeline. The Integration Service performs the aggregate calculation on groups. The Integration Service then passes the last row received, along with the results of the aggregation.
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions.
Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.
Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )
You can also use non-aggregate functions in the aggregate expression. The following expression returns the highest number of items sold for each item (grouped by item). If no items were sold, the expression returns 0.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
The Aggregator transformation has the following components and options:
Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. The Integration Service stores group values in an index cache and it stores row data in the data cache.
Aggregate expression. Enter an expression in an output port. The expression can include nonaggregate expressions and conditional clauses.
Group by port. Indicate how to create groups. You can configure an input, input/output, output, or variable port for the group. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
Configuring Aggregator Transformation Properties
Cache Directory – Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir. If you enter a new directory, make sure the directory exists and contains enough disk space for the aggregate caches.
If you have enabled incremental aggregation, the Integration Service creates a backup of the files each time you run the session. The cache directory must contain enough disk space for two sets of the files.
Sorted Input – Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.
Aggregator Data Cache Size – Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can use a numeric value for the cache, you can use a cache value from a parameter file or you can configure the Integration Service to set the cache size by using the Auto setting. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Aggregator Index Cache Size – Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can use a numeric value for the cache, you can use a cache value from a parameter file or you can configure the Integration Service to set the cache size by using the Auto setting. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
Transformation Scope – Specifies how the Integration Service applies the transformation logic to incoming data:-
Transaction – Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.-
All Input – Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
Performance Tips for Aggregator Transformations
Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.
We will practice all these Aggregate Expressions in the next sessions.