Sunday, August 26, 2012

Calculate correlation coefficient between two variables using SQL

Oracle function corr()  returns the correlation coefficient between two variables. The following scripts show two ways of calculating correlation coefficient.

1. Calculate correlation coefficient  using corr() function.
select corr(x,y) from TBL_XY;
2. Calculate correlation coefficient directly without using corr() function. The equation is based on  Pearson's coefficient ( The results are the same.
with tbl_mean as(
select avg(x) mean_x, avg(y) mean_y from tbl_xy
tbl_vec_m_corrected as
select x-mean_x mean_x_corrected, y-mean_y mean_y_corrected
from tbl_xy, tbl_mean
select sum(mean_x_corrected*mean_y_corrected)/(
sqrt(sum(mean_x_corrected*mean_x_corrected)) *sqrt(sum(mean_y_corrected*mean_y_corrected)))
 as corr_coef
from tbl_vec_m_corrected ;

No comments: