Sunday, November 17, 2013

Find Records with Highest/Lowest Values by Category

Problem

We use the emp table as an example. How to we find the records for the highest paid employees by job titles?

SQL> select EMPNO, ENAME, JOB, SAL from emp    order by job, sal desc;

     EMPNO ENAME      JOB              SAL
---------- ---------- --------- ----------
      7902 FORD       ANALYST         3000
      7788 SCOTT      ANALYST         3000
      7934 MILLER     CLERK           1300
      7876 ADAMS      CLERK           1100
      7900 JAMES      CLERK            950
      7369 SMITH      CLERK            800
      7566 JONES      MANAGER         2975
      7698 BLAKE      MANAGER         2850
      7782 CLARK      MANAGER         2450
      7839 KING       PRESIDENT       5000
      7499 ALLEN      SALESMAN        1600
      7844 TURNER     SALESMAN        1500
      7521 WARD       SALESMAN        1250
      7654 MARTIN     SALESMAN        1250

14 rows selected.

Solution

We use Oracle analytic function row_number() which generates unique ranks for records. The rank will be generated independently by job ("partition by job") and will be based on the descending order of salary ("order by sal desc"). So the record with the highest salary in each job will receive a rank of 1 as shown below.

SQL> select EMPNO, ENAME, JOB, SAL, row_number() over(partition by job    order by sal desc) sal_rank from emp order by job, sal desc;

     EMPNO ENAME      JOB              SAL   SAL_RANK
---------- ---------- --------- ---------- ----------
      7902 FORD       ANALYST         3000          1
      7788 SCOTT      ANALYST         3000          2
      7934 MILLER     CLERK           1300          1
      7876 ADAMS      CLERK           1100          2
      7900 JAMES      CLERK            950          3
      7369 SMITH      CLERK            800          4
      7566 JONES      MANAGER         2975          1
      7698 BLAKE      MANAGER         2850          2
      7782 CLARK      MANAGER         2450          3
      7839 KING       PRESIDENT       5000          1
      7499 ALLEN      SALESMAN        1600          1
      7844 TURNER     SALESMAN        1500          2
      7521 WARD       SALESMAN        1250          3
      7654 MARTIN     SALESMAN        1250          4

14 rows selected.
To show only those records having highest salaries by jobs, we add a condition that only records with the rank of salary equal one will be selected.
SQL> select * from (select EMPNO, ENAME, JOB, SAL, row_number() over(      partition by job order by sal   desc) sal_rank from emp order by job, sal desc) where sal_rank=1;

     EMPNO ENAME      JOB              SAL   SAL_RANK
---------- ---------- --------- ---------- ----------
      7788 SCOTT      ANALYST         3000          1
      7934 MILLER     CLERK           1300          1
      7566 JONES      MANAGER         2975          1
      7839 KING       PRESIDENT       5000          1
      7499 ALLEN      SALESMAN        1600          1
The above method using row_number to generates rank will select one record for each category. If more then one records have the highest salaries in their group such as Analyst Scott and Ford, only one of them will be selected. In this case, it is Scott. In the post More on Finding Records with Highest/Lowest Values by Category, I will describe how to select all records with the highest value when there are ties.

No comments: