Problem
In the earlier post Calculate Z-Score using SQL in Oracle, we show that we can normalize a variable using z-score which is defined as (variable value-mean)/Standard Deviation. How do we calculate the Z-scores for employee's salaries for the following table?
SQL> select EMPNO, ENAME, JOB, SAL from emp order by job, sal desc;
EMPNO ENAME JOB SAL
---------- ---------- --------- ----------
7902 FORD ANALYST 3000
7788 SCOTT ANALYST 3000
7934 MILLER CLERK 1300
7876 ADAMS CLERK 1100
7900 JAMES CLERK 950
7369 SMITH CLERK 800
7566 JONES MANAGER 2975
7698 BLAKE MANAGER 2850
7782 CLARK MANAGER 2450
7839 KING PRESIDENT 5000
7499 ALLEN SALESMAN 1600
7844 TURNER SALESMAN 1500
7521 WARD SALESMAN 1250
7654 MARTIN SALESMAN 1250
14 rows selected.
Solution
We can use the following query to calculate Z-score.
SQL> with tbl_mean_std as
(
select avg(sal) m, stddev(sal) std from EMP
)
select EMPNO, ENAME, JOB, SAL, (sal-m)/std as z_score
from EMP, tbl_mean_std order by job, sal desc;
EMPNO ENAME JOB SAL Z_SCORE
---------- ---------- --------- ---------- ----------
7902 FORD ANALYST 3000 .783748996
7788 SCOTT ANALYST 3000 .783748996
7934 MILLER CLERK 1300 -.65387922
7876 ADAMS CLERK 1100 -.82301195
7900 JAMES CLERK 950 -.9498615
7369 SMITH CLERK 800 -1.076711
7566 JONES MANAGER 2975 .762607405
7698 BLAKE MANAGER 2850 .656899448
7782 CLARK MANAGER 2450 .318633985
7839 KING PRESIDENT 5000 2.47507631
7499 ALLEN SALESMAN 1600 -.40018012
7844 TURNER SALESMAN 1500 -.48474649
7521 WARD SALESMAN 1250 -.6961624
7654 MARTIN SALESMAN 1250 -.6961624
14 rows selected.
Not surprisingly, the president's salary has a z-score of 2.47 which is almost two and half standard deviation above the mean. Mean always has a z-score of zero. If we want to save the mean and standard deviation derived from this data set, we may create a permanent table to store them. This table can be used to calculate the z-score for any future data points. The following queries create a permanent table containing mean and standard deviation. It is then used to create a view that calculates the z-score.
SQL> create table tbl_mean_std_perm as select avg(sal) m, stddev(sal) std from EMP; SQL> create view v_emp_zscore as select EMPNO, ENAME, JOB, SAL, (sal-m)/std as z_score from EMP, tbl_mean_std_perm;We can take a look at the actual values of mean and standard deviation.
SQL> select * from tbl_mean_std_perm;
M STD
---------- ----------
2073.21429 1182.50322
No comments:
Post a Comment