Clustering and Classification methods for Biologists


MMU logo

Principal Components Analysis

LTSN Bioscience logo

Page Outline

 

Search

[ Yahoo! ] options

Principal Components Analysis

Background

When multivariate data are collected it is common to find some correlated variables. One implication of these correlations is that there will be redundancy in the information provided by the variables. In the extreme case of two perfectly correlated variables (x & y) one is redundant. Knowing the value of x leaves y with no freedom and vice versa. Principal Components Analysis (PCA) exploits the redundancy in multivariate data, enabling us to:

  1. pick out patterns (relationships) in the variables;
  2. reduce the dimensionality of a data set without a significant loss of information.

PCA is one of a family of related ordination or projection techniques that includes Factor Analysis and Principal Co-ordinates Analysis.

Note: it is Principal ( 'first in rank or importance' Concise Oxford Dictionary) not Principle ('a fundamental truth or law as the basis of reasoning or action', Concise Oxford Dictionary).

top


Ten Important Concepts

correlation
variance
covariance
variability
matrix
eigen value
eigen vector
ordination
standardization
linear combination

top


Description and examples

It is important that you work through the following in the specified order. It is also important that you thoroughly understand the content before moving onto the next section.

A. Some background
B. Matrix methods (very brief!)
C. Eigen methods (must understand this)
D. A graphical explanation of PCA
E. Sample analyses
F. Self assessment exercise 1
G. Self assessment exercise 2

top


Summary

PCA and FA are two similar methods, indeed under certain circumstances (no rotation and number of factors = number of variables) they produce identical results (albeit with some rescaling of the eigen vectors).

Both methods are based on an eigen analysis of either a correlation or a covariance matrix. If a correlation matrix is used the variables are standardised

top