Saturday, November 02, 2013

Find the cutoff value for the top n records

It is a very common task to find the cutoff value to get the top N records . The following are some of the examples:
1. Find the cutoff salary for the top 10 employees in a company.
2. Find a cutoff score for a risk model that generates alerts for the top 100 riskiest transactions.

We use the following table that has 20 records as an example.

SQL> select id, num from TBL_20 order by id;

ID NUM
1 -.650222
2 -1.465297
3 -.689485
4 -1.547403
5 -1.791099
6 -1.270857
7 .988116
8 1.246141
9 .643606
10 -.515888
11 -.713859
12 -.587674
13 -1.634403
14 1.285847
15 -.08049
16 .231295
17 -.66065
18 .422664
19 -.134565
20 -1.773186

20 rows selected.


If we want to find out the cutoff value for the largest 5th column "num", we first use function row_number() to generate rank and then select the num that has a rank of 5.

SQL> with tbl as (select a.*, row_number() over(order by num desc) rnk from tbl_20 a) select num from tbl where rnk=5;

NUM
.422664

To verify that .422664 is indeed the cutoff value for the top 5 records, we run the following query.

SQL> select * from tbl_20 where num>=.422664 order by num desc;

NUM ID
1.285847 14
1.246141 8
.988116 7
.643606 9
.422664 18

It is a good practice to always verify our results using another query. That way, the chance of making mistakes is greatly reduced.

No comments: