I am compiling a list of leading data providers. This is the current list. If your favorite data providers are not included here and you feel they should be, please contact me at firstname.lastname@example.org. Thanks. Dr. Zhou.
Monday, August 27, 2012
Calculate correlation matrix using SQL
Any table can be converted into the a simple table with only three columns: rec_id (record id), var_id (column_id), val (value). For example,
Record_ID Variable_1 Variable_2 Variable3
1 1.2 2.2 3.3
2 2.0 1.5 4.0
The above table can be converted into the following simple table.
Rec_id var_id val
1 1 1.2
1 2 2.2
1 3 3.3
2 1 2.0
2 2 1.5
2 3 4.0
Once the data are stored in the simple, three column table, the top-right correlation matrix can be easily calculated using the following query.
select a.var_id var_id1, b.var_id var_id2, corr(a.val, b.val) correl
from tbl_simple a, tbl_simple b where a.rec_id=b.rec_id
and a.var_id<=b.var_id group by a.var_id, b.var_id ;
This method can handle unlimited number of records and variables. To improve the performance, we should create index on rec_id and var_id. Also see my yesterday's post about "Calculate correlation coefficient between two variables using SQL".