Friday, April 03, 2015

Chi-square Test in Oracle Database

When we want to test the association of two categorical variables, such as sex and certain disease, we may use Chi-square test. Fortunately, with Oracle statistical function stats_crosstab, this can be done using a SQL query. For example, we have a table containing two columns of discrete values, sex and target.

SQL> select sex, target, count(*) num from tbl_chiqs group by sex, target
 order by sex, target;

S     TARGET        NUM
- ---------- ----------
F          0        152
F          1         13
M          0        411
M          1         26
We want to find out if sex and target are associated. Since both are categorical variables, we perform Chi-square test using a SQL query.
select
stats_crosstab(sex, target, 'CHISQ_OBS') chi_squared,
stats_crosstab(sex, target, 'CHISQ_SIG') p_value
from
tbl;

CHI_SQUARED    P_VALUE
----------- ----------
 .735719671 .391035481   
Based on the p value of the Chi-square statistics, we can not reject the NULL hypothesis that sex and target are independent given the significant level of 0.05. As we see, it is very convenient to perform statistical tests such as Chi-square, within a database using SQL query.It is not necessary to move the data to another statistical software to perform those tests.

No comments: