Sunday, May 27, 2012

Remove Duplicates From Data


It is quite common to have duplicates in the data. From example, an application is submitted or a claim is filed  multiple times. With Oracle function row_number function, we can easily remove the duplicates.

The following SQL first calculates the rank of records by app_date for each app_id, then it only selects those records with ranks equals 1. The only earliest record for each app_id will be selected.

with tbl as
(
select a.*,
row_number() over(partition by app_id order by APP_DATE ) rnk  from CELL_PHONE_SERVICE_APPS a
)
select * from tbl where rnk=1;

If we want to keep the latest record for each app_id, we simply generate rank for each app_id by the descending order of app_date as shown below.


with tbl as
(
select a.*,
row_number() over(partition by app_id order by APP_DATE desc ) rnk  from CELL_PHONE_SERVICE_APPS a
)
select * from tbl where rnk=1;

No comments: