How to remove duplicate records Using Aggregator

How to remove duplicate records Using Aggregator

There are couple of options available in informatica to remove duplicate records from the source. 

For Relational Tables

1. Source Qualifier > ‘SELECT DISTINCT’ option
2. Source Qualifier > SQL override (Write your own Query)

For Flat files or other sources

  1. Sorter > Aggregator
  2. Sorter > Expression > Filter
  3. Sorter > ‘DISTINCT’ option (ONLY to remove duplicate across all ports)

In this session we will cover How to remove duplicate records using Aggregator.

How to remove duplicate records using Aggregator Transformation

Let’s create a mapping for this, To improve Aggregator performance we are going to use Sorter transformation to sort the input records and pass it to Aggregator. Here our source is a Flat file having Job information’s. The file contains duplicate JOB_ID which i need to remove before loading to target table.

Source Flat File having duplicate JOB_ID

Source – Flat File
Target – Oracle Table
Key port – JOB_ID
Transformations

Sorter   –  To sort the source records (To improve Aggregator Performance) 
Aggregator – To remove the duplicate records by using ‘Group By’ on key port

Complete Mapping with Sorter & Aggregator
Have a close look on the transformations, next will see each transformation properties.
Sorter Transformation – Here we are sorting the record using key JOB_ID
Aggregator Transformation – To remove the duplicate records just Group By the port JOB_ID which will group all unique records together and pass it to target. Enable Sorted Input to improve the performance of Aggregator Transformation.
Workflow Log – 4 duplicate records got removed from the source and everything else got loaded in to target.
Target Table – See the results now, We have only unique JOB_ID in the target table.

In the next post we will see how to remove duplicate records using Expression Transformation.

Leave a comment

Your email address will not be published. Required fields are marked *