Thursday, July 30, 2015

Remove the Rows with Lowest or Highest Values

In a project, I need to remove rows with lowest or highest values within groups. I used the following table to illustrate the problem. For grp 10, rows with val 1 and 4 need to be removed and for grp 11, rows with val 5.5 and 9.6.

SQL> select * from tbl_x2 order by 1,2;

       GRP        VAL
---------- ----------
        10          1
        10          2
        10          3
        10          4
        11        5.5
        11        7.3
        11        9.6

7 rows selected.
I used the following SQL statement to do it.
delete from tbl_x2 a
where a.rowid in
(
select rid from
(
with tbl as (
select b.rowid rid,
row_number() over(partition by grp order by val) rnk,
row_number() over(partition by grp order by val desc) rnkr
from tbl_x2 b)
select rid from tbl where rnk=1 or rnkr=1
)
);
The following query shows that rows with lowest or highest values within groups are removed.
SQL> select * from tbl_x2 order by 1,2;

       GRP        VAL
---------- ----------
        10          2
        10          3
        11        7.3
The trick is to use row_number() function to generate ranks or reverse ranks based on val for each grp. For example, the following query show all rows in the original table and the ranks (rnk) and reversed ranks (rnkr). The lowest val has a rank of 1 and the highest val has a reversed rank of 1.
select b.*,
row_number() over(partition by grp order by val) rnk,
row_number() over(partition by grp order by val desc) rnkr
from tbl_x3 b

       GRP        VAL        RNK       RNKR
---------- ---------- ---------- ----------
        10          1          1          4
        10          2          2          3
        10          3          3          2
        10          4          4          1
        11        5.5          1          3
        11        7.3          2          2
        11        9.6          3          1