How to remove duplicate records using Sorter Transformation
In the last two post we have seen how to remove duplicate records (Only one column) using Aggregator and Expression. We have identified the duplicate JOB_ID and skipped those rows from loading to target.
The same logic CANNOT be done using Sorter Transformation. In sorter we cannot check duplicate against a single column. If you enable ‘DISTINCT’ option it will automatically treat all the columns as the key fields and then check for the duplicates.
The below two records are duplicate in terms of first column (JOB_ID). Sorter transformation couldn’t check for duplicate against only one column, we may need to use either Aggregator or expression as we did in the previous post to remove the duplicates.
SA_REP Sales-Representative 6000 12008
SA_REP Sales-Rep 4000 10000
The below two records are duplicate in terms of all the columns. Sorter transformation CAN be used here to remove these duplicate records by using ‘DISTINCT’ option.
SA_REP Sales-Representative 6000 12008
SA_REP Sales-Representative 6000 12008
Let’s see the mapping for this …
Source – Flat File
Target – Oracle Table
Transformations – Sorter (With DISTINCT)