In the earlier post
Find the most frequent items using SQL, we use "group by" query to calculate the frequency of items and then functions row_number() or dense_rank() to generate rank based on the frequency to pick the top N items. The difference between row_number() or dense_rank() is that row_number() generate unique rank for each row even if there are ties.
Now, we want to solve a more sophisticate problem. We want to select the top 2 most frequent items in table tbl_items_by_person (shown below) for each person, Emily and John (instead of global top 2 most frequent items).
Table tbl_items_by_person
The first step is to calculate the item frequency "group by" name and item.
SQL> select name, item, count(*) as num from TBL_ITEMS_BY_PERSON group by name, item order by name, count(*) desc;
NAME ITEM NUM
-------------------------------- -------------------------------- ----------
Emily Peach 8
Emily Apple 4
Emily Banana 3
Emily Pear 2
John Orange 5
John Apple 3
John Peach 3
John Banana 2
John Pear 2
The second step is to use row_number() or dense_rank() to generate rank for each name separately. This is done by using the clause over(partition by name order by num desc).
SQL> select name, item, num, row_number() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from TBL_ITEMS_BY_
PERSON group by name, item ) order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
Emily Banana 3 3
Emily Pear 2 4
John Orange 5 1
John Apple 3 2
John Peach 3 3
John Banana 2 4
John Pear 2 5
We can pick the top 2 frequent items for Emily and John using the following query.
SQL> with tbl as (select name, item, num, row_number() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from
TBL_ITEMS_BY_PERSON group by name, item )) select * from tbl where rnk<=2 order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
John Orange 5 1
John Peach 3 2
Or, we can use dense_rank() to generate rank which will assign the same rank for ties.
SQL> select name, item, num, dense_rank() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from TBL_ITEMS_BY_
PERSON group by name, item ) order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
Emily Banana 3 3
Emily Pear 2 4
John Orange 5 1
John Apple 3 2
John Peach 3 2
John Banana 2 3
John Pear 2 3
For John, there are 3 peaches and Apples. Both of them are ranked as 2. Thus they are both selected as shown below.
SQL> with tbl as (select name, item, num, dense_rank() over(partition by name order by num desc) as rnk from (select name, item, count(*) as num from
TBL_ITEMS_BY_PERSON group by name, item )) select * from tbl where rnk<=2 order by name, num desc;
NAME ITEM NUM RNK
-------------------------------- -------------------------------- ---------- ----------
Emily Peach 8 1
Emily Apple 4 2
John Orange 5 1
John Peach 3 2
John Apple 3 2