If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.

An example.

# Generates sample matrix of five discrete clusters that have # very different mean and standard deviation values. z1 <- rnorm(10000, mean=1, sd=1); z2 <- rnorm(10000, mean=3, sd=3); z3 <- rnorm(10000, mean=5, sd=5); z4 <- rnorm(10000, mean=7, sd=7); z5 <- rnorm(10000, mean=9, sd=9); mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T, dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep=""))) # Performs principal component analysis after scaling the data. # It returns a list with class "prcomp" that contains five components: # (1) the standard deviations (sdev) of the principal components, # (2) the matrix of eigenvectors (rotation), # (3) the principal component data (x), # (4) the centering (center) and # (5) scaling (scale) used. pca <- prcomp(mydata, scale=T) # Prints variance summary for all principal components. summary(pca) # Set plotting parameters. x11(height=6, width=12, pointsize=12); par(mfrow=c(1,2)) # Define plotting colors. mycolors <- c("red", "green", "blue", "magenta", "black") # Plots scatter plot for the first two principal components # that are stored in pca$x[,1:2]. plot(pca$x, pch=20, col=mycolors[sort(rep(1:5, 500))]) # Same as above, but prints labels. plot(pca$x, type="n"); text(pca$x, rownames(pca$x), cex=0.8, col=mycolors[sort(rep(1:5, 500))]) # Plots scatter plots for all combinations between the first four principal components. pairs(pca$x[,1:4], pch=20, col=mycolors[sort(rep(1:5, 500))]) # Plots a scatter plot for the first two principal components # plus the corresponding eigen vectors that are stored in pca$rotation. biplot(pca) # Loads library scatterplot3d. library(scatterplot3d) # Same as above, but plots the first three principal components in 3D scatter plot scatterplot3d(pca$x[,1:3], pch=20, color=mycolors[sort(rep(1:5, 500))]) # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # Standard deviation 2.157 0.9953 0.9831 0.9684 0.9601 0.9465 0.9340 0.9288 # Proportion of Variance 0.233 0.0495 0.0483 0.0469 0.0461 0.0448 0.0436 0.0431 # Cumulative Proportion 0.233 0.2822 0.3305 0.3774 0.4235 0.4683 0.5119 0.5550 # PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 # Standard deviation 0.9030 0.8989 0.8930 0.8763 0.8703 0.8656 0.8573 0.8458 # Proportion of Variance 0.0408 0.0404 0.0399 0.0384 0.0379 0.0375 0.0367 0.0358 # Cumulative Proportion 0.5958 0.6362 0.6761 0.7145 0.7523 0.7898 0.8265 0.8623 # PC17 PC18 PC19 PC20 # Standard deviation 0.8415 0.8360 0.8302 0.8110 # Proportion of Variance 0.0354 0.0349 0.0345 0.0329 # Cumulative Proportion 0.8977 0.9326 0.9671 1.0000 # KernSmooth 2.23 loaded # Copyright M. P. Wand 1997-2009

[...] about PCA in R. #rstats http://statsravingmad.wordpress.com/2010/02/23/a-quicky/ [...]

I have a question. For what does ‘pca$x’ stand for? I know that pca the name is for the PCA you did earlier, and that an $ is used for selectnig a specific column, but I can’t deduce where the x comes from.