Monday, November 18, 2013

More on Calculating Z-Score

Problem

In the earlier post Calculate Z-Score using SQL in Oracle, we show that we can normalize a variable using z-score which is defined as (variable value-mean)/Standard Deviation. How do we calculate the Z-scores for employee's salaries for the following table?

SQL>  select EMPNO, ENAME, JOB, SAL from emp order by job, sal desc;

     EMPNO ENAME      JOB              SAL
---------- ---------- --------- ----------
      7902 FORD       ANALYST         3000
      7788 SCOTT      ANALYST         3000
      7934 MILLER     CLERK           1300
      7876 ADAMS      CLERK           1100
      7900 JAMES      CLERK            950
      7369 SMITH      CLERK            800
      7566 JONES      MANAGER         2975
      7698 BLAKE      MANAGER         2850
      7782 CLARK      MANAGER         2450
      7839 KING       PRESIDENT       5000
      7499 ALLEN      SALESMAN        1600
      7844 TURNER     SALESMAN        1500
      7521 WARD       SALESMAN        1250
      7654 MARTIN     SALESMAN        1250

14 rows selected.

Solution

We can use the following query to calculate Z-score.

SQL> with tbl_mean_std as
    (
    select avg(sal) m, stddev(sal) std from EMP
    )
    select EMPNO, ENAME, JOB, SAL, (sal-m)/std as z_score
    from EMP, tbl_mean_std order by job, sal desc;

     EMPNO ENAME      JOB              SAL    Z_SCORE
---------- ---------- --------- ---------- ----------
      7902 FORD       ANALYST         3000 .783748996
      7788 SCOTT      ANALYST         3000 .783748996
      7934 MILLER     CLERK           1300 -.65387922
      7876 ADAMS      CLERK           1100 -.82301195
      7900 JAMES      CLERK            950  -.9498615
      7369 SMITH      CLERK            800  -1.076711
      7566 JONES      MANAGER         2975 .762607405
      7698 BLAKE      MANAGER         2850 .656899448
      7782 CLARK      MANAGER         2450 .318633985
      7839 KING       PRESIDENT       5000 2.47507631
      7499 ALLEN      SALESMAN        1600 -.40018012
      7844 TURNER     SALESMAN        1500 -.48474649
      7521 WARD       SALESMAN        1250  -.6961624
      7654 MARTIN     SALESMAN        1250  -.6961624

14 rows selected.
Not surprisingly, the president's salary has a z-score of 2.47 which is almost two and half standard deviation above the mean. Mean always has a z-score of zero. If we want to save the mean and standard deviation derived from this data set, we may create a permanent table to store them. This table can be used to calculate the z-score for any future data points. The following queries create a permanent table containing mean and standard deviation. It is then used to create a view that calculates the z-score.
SQL> create table tbl_mean_std_perm as select avg(sal) m, stddev(sal) std from EMP;
SQL> create view v_emp_zscore as select EMPNO,  ENAME,  JOB,   SAL,        (sal-m)/std as z_score from EMP, tbl_mean_std_perm;
We can take a look at the actual values of mean and standard deviation.
SQL> select * from tbl_mean_std_perm;

         M        STD
---------- ----------
2073.21429 1182.50322

No comments: