Saturday, July 28, 2012
Perform Principal Component Analysis (PCA) Using R, SAS and SQL
Perform Principal Component Analysis (PCA) in R.
R1. Using princomp(m1): by default covariance matrix is used. data is center shifted but not scaled. princomp(m1)$score can be precisely replicated by:
R2. princomp(d, corr=T): correlation matrix is used. data is center shifted and scaled based on standard deviation. However, standard deviation is based on divisor N not N-1. princomp(m1, cor=TRUE)$score can be precisely replicated by:
((m1-princomp(m1, cor=TRUE)$center)/princomp(m1, cor=TRUE)$scale)%*%princomp(m1, cor=TRUE)$loading
Perform PCA in SAS. By default, correlation matrix is used.
SAS 1. proc princomp data=M1 cov out=m1_pca;
SAS 2. proc princomp data=M1 out=m1_pca_cor;
Scores from R1 match scores from SAS 1.
Scores from R2 roughly match scores from SAS 2. The difference is caused by that in R, standard deviation, used as scaling facor, is based on divisor N not N-1. In SAS, the divisor for standard deviation is N-1.
Calculating PCA in a database using SQL is a very interesting way. We can perform PCA on large data sets.