## Wednesday, August 07, 2013

### Find the most frequent items using SQL

It is a common task to find the most frequent items. For example, we want the top 2 frequently purchased items from the following table.
SQL> select * from TBL_ITEMS_PURCHASED;
 ITEM Apple Banana Banana Apple Apple Pear Pear Peach Peach Peach Orange Orange Orange Orange Orange

We can run a "group by" query to calculate the frequency of items as shown below.
SQL> select item, count(*) as num from tbl_items_purchased group by item order by num;
 ITEM NUM Orange 5 Apple 3 Peach 3 Banana 2 Pear 2

Now the tricky part is to pick the top 2 most frequent ones. There are two situations.

1. Pick precisely two items and ignore the tie. We can use row_number() function to generate rank based on the frequency and pick the top 2. See the following two queries.
SQL> select a.*, row_number() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a order by rnk;
 ITEM NUM RANK Orange 5 1 Apple 3 2 Peach 3 3 Banana 2 4 Pear 2 5

SQL> with tbl as (select a.*, row_number() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a ) select * from tbl where rnk <=2;
 ITEM NUM RANK Orange 5 1 Apple 3 2

2.Pick items with the top 2 frequencies, regardless how many different items. In this case, we use dense_rank() function to generate rank based on the frequency. See the following two queries.
SQL> select a.*, dense_rank() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a order by rnk;
 ITEM NUM RANK Orange 5 1 Apple 3 2 Peach 3 2 Banana 2 3 Pear 2 3
SQL> with tbl as (select a.*, dense_rank() over(order by num desc) rnk from (select item, count(*) as num from tbl_items_purchased group by item ) a ) select * from tbl where rnk<=2;
 ITEM NUM RANK Orange 5 1 Apple 3 2 Peach 3 2