Aggregator Transformation Overview

Aggregator Transformation Overview 

The Aggregator transformation performs aggregate calculations, such as average, Sum, Max & Min etc. The Aggregator is an active transformation that changes the number of rows in the pipeline. The Integration Service performs the aggregate calculation on groups. The Integration Service then passes the last row received, along with the results of the aggregation.

Aggregate Expressions

The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions.

Aggregate Functions

Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.

SUM
AVG
COUNT
MAX
MIN
PERCENTILE
STDDEV
MEDIAN
VARIANCE
FIRST
LAST

Conditional Clauses

Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.

For example, use the following expression to calculate the total commissions of employees who exceeded their quarterly quota:
SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions

You can also use non-aggregate functions in the aggregate expression. The following expression returns the highest number of items sold for each item (grouped by item). If no items were sold, the expression returns 0.

IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

Aggregator Transformation Overview
A Mapping with an Aggregator Transformation

The Aggregator transformation has the following components and options:

Aggregate cache. The Integration Service stores data in the aggregate cache until it completes aggregate calculations. The Integration Service stores group values in an index cache and it stores row data in the data cache.

Aggregate expression. Enter an expression in an output port. The expression can include nonaggregate expressions and conditional clauses.

Group by port. Indicate how to create groups. You can configure an input, input/output, output, or variable port for the group. When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.

Sorted input. Select this option to improve session performance. To use sorted input, you must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.

Configuring Aggregator Transformation Properties

Cache Directory – Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir. If you enter a new directory, make sure the directory exists and contains enough disk space for the aggregate caches.

If you have enabled incremental aggregation, the Integration Service creates a backup of the files each time you run the session. The cache directory must contain enough disk space for two sets of the files.

Aggregator Transformation Overview
Aggregator Transformation Properties

Sorted Input – Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.

Aggregator Data Cache Size – Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can use a numeric value for the cache, you can use a cache value from a parameter file or you can configure the Integration Service to set the cache size by using the Auto setting. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.

Aggregator Index Cache Size – Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the session on a 64-bit Integration Service. You can use a numeric value for the cache, you can use a cache value from a parameter file or you can configure the Integration Service to set the cache size by using the Auto setting. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.

Transformation Scope – Specifies how the Integration Service applies the transformation logic to incoming data:-

Transaction – Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.-

All Input – Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.

Performance Tips for Aggregator Transformations

Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option with the Sorter transformation to pass sorted data to the Aggregator transformation.

Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.

Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.

We will practice all these Aggregate Expressions in the next sessions.