Sunday, August 26, 2012

Calculate correlation coefficient between two variables using SQL


Oracle function corr()  returns the correlation coefficient between two variables. The following scripts show two ways of calculating correlation coefficient.

1. Calculate correlation coefficient  using corr() function.
select corr(x,y) from TBL_XY;
.214418947
2. Calculate correlation coefficient directly without using corr() function. The equation is based on  Pearson's coefficient (http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient). The results are the same.
with tbl_mean as(
select avg(x) mean_x, avg(y) mean_y from tbl_xy
),
tbl_vec_m_corrected as
(
select x-mean_x mean_x_corrected, y-mean_y mean_y_corrected
from tbl_xy, tbl_mean
)
select sum(mean_x_corrected*mean_y_corrected)/(
sqrt(sum(mean_x_corrected*mean_x_corrected)) *sqrt(sum(mean_y_corrected*mean_y_corrected)))
 as corr_coef
from tbl_vec_m_corrected ;
.214418947

No comments: