Wednesday, August 07, 2013

Find the most frequent items using SQL

It is a common task to find the most frequent items. For example, we want the top 2 frequently purchased items from the following table.
SQL> select * from TBL_ITEMS_PURCHASED;
ITEM
Apple
Banana
Banana
Apple
Apple
Pear
Pear
Peach
Peach
Peach
Orange
Orange
Orange
Orange
Orange

We can run a "group by" query to calculate the frequency of items as shown below.
SQL> select item, count(*) as num from tbl_items_purchased group by item order by num;
ITEMNUM
Orange5
Apple3
Peach3
Banana2
Pear2

Now the tricky part is to pick the top 2 most frequent ones. There are two situations.

1. Pick precisely two items and ignore the tie. We can use row_number() function to generate rank based on the frequency and pick the top 2. See the following two queries.
SQL> select a.*, row_number() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a order by rnk;
ITEMNUMRANK
Orange51
Apple32
Peach33
Banana24
Pear25

SQL> with tbl as (select a.*, row_number() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a ) select * from tbl where rnk <=2;
ITEMNUMRANK
Orange51
Apple32

2.Pick items with the top 2 frequencies, regardless how many different items. In this case, we use dense_rank() function to generate rank based on the frequency. See the following two queries.
SQL> select a.*, dense_rank() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a order by rnk;
ITEMNUMRANK
Orange51
Apple32
Peach32
Banana23
Pear23
SQL> with tbl as (select a.*, dense_rank() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a ) select * from tbl where rnk<=2;
ITEMNUMRANK
Orange51
Apple32
Peach32

No comments: