Multivariate statistics is a branch of statistics that deals with
the examination of numerous variables simultaneously. This is obviously
an advantage when analysing survey data because there are many variables
collected at a large number of sites. It is also more efficient
and meaningful to treat the data together rather than each one separately.
Also, numerous factors influence the abundance and diversity of
plants and animals, so it is helpful to compare which factors are
the most influential by treating them simultaneously.
The increasing power of computers and user-friendly software has
made this branch of statistics much more accessible to researchers,
as much of this type of analysis is computationally very demanding.
ANOVA and regression, described in univariate
statistics, can be used to perform multivariate analyses where
there are two or more explanatory variables.
|
Please read
Reading 7
Gauch, H.G. (1981). Multivariate Analysis in Community
Ecology: Chapter 1. Cambridge University Press, Cambridge.
|
Classification
Multivariate classification is a basic technique used to define
communities. There are a number of different algorithms used and
these are dependent on the size of the data set and outcomes required.
The basic premise of these algorithms is to measure the 'ecological
distance' between two sample points. This 'ecological distance'
is calculated using similarity or dissimilarity indices.
For large data sets (over 100 sample points) a method known as
Cluster Analysis is suggested. Smaller data sets use Hierarchical
Classification (FUSE: Belbin 1991). Results from this classification
can be presented as a dendrogram, which is a graphical representation
of the similarity between sample points. The statistical program
PATN (Belbin 1991) has been specifically designed to perform these
(and other multivariate) analyses, although some other packages
can also perform classifications.
Ordination
Ordination (or multidimensional scaling) can be used as an exploratory
or interpretative tool in data analysis. Using the basis of 'ecological
distance' between pairs of sample points, Ordination presents this
distribution of sampling points in a multi-dimensional relationship.
Further analysis using correlation between environmental attributes
will reveal which attributes have the most influence on the distribution
of sampling points.
Generalised Linear Models (GLMs)
GLMs can be used in a wide variety of circumstances in the analysis
of survey data. Two most widely used types of models are logistic
and linear regressions. Logistic regressions can use presence/absence
data to calculate the probability of a species and/or community
occurrence in a particular habitat (Crawley 1993, Nichols 1991).
Linear regression utilises continuous data to examine if there is
a linear relationship between response and explanatory variables.
In addition to linear regression, GLMs can test for non-linear relationships
between variables. They can also incorporate categorical explanatory
variables, and can examine response variable data which follow a
non-normal distribution (e.g. Poisson).
back to "types of statistical analysis"
|