Redundancy Analysis (RDA) is a useful statistical technique for investigating the relationship between two or more sets of ordinal variables. It can be used to measure the inter-dependency between the sets of variables, and it is especially useful when there is more than one explanatory variable and one dependent variable. RDA is similar to multiple regression and partial correlation, but it does not assume linear relationships between the variables, and is not limited to just two sets of variables. This guide will provide an overview of the RDA technique and how to conduct it in R.
What is Redundancy Analysis (RDA)?
Redundancy Analysis (RDA) is a form of multivariate analysis combining the techniques of principal components analysis and multiple regression. It involves simultaneously regressing a dependent variable (the “response” variable) against one or more independent variables (the “explanatory variables”), and then comparing the observed relationships to the relationships between the explanatory variables and a set of principal components. The overall goal of the analysis is to determine which of the independent variables are effectively contributing to the prediction of the dependent variable, and to what degree.
Overview of Steps in RDA
The following steps describe the basic procedure for conducting a Redundancy Analysis:
- Collect the data for the independent and dependent variables.
- Perform a principal components analysis (PCA) on the independent variables.
- Regress the dependent variable on the principal components and compute the overall model fit statistics.
- Graphically compare the observed relationships between the independent and dependent variables to those predicted by the PCA.
- Perform a Partial Least Squares (PLS) regression to determine which of the explanatory variables have the strongest correlation with the dependent variable, and to what degree.
Resources
- Tutorial on RDA: A stack exchange question on RDA.
FAQ
What is the purpose of redundancy analysis?
The purpose of redundancy analysis is to analyse the relationship between two or more sets of ordinal variables. It can be used to measure the inter-dependency between the sets of variables, and it is especially useful when there is more than one explanatory variable and one dependent variable.
What is the difference between RDA and PCA?
The main difference between RDA and PCA is that RDA is used to determine the relationship between two or more sets of variables while PCA is a technique for reducing the number of variables by combining correlated variables into a smaller set of independent components.
What is the relationship between RDA and multiple regression?
RDA is similar to multiple regression in that it involves regressing a dependent variable against one or more explanatory variables. However, RDA does not assume linear relationships between the variables, and is not limited to just two sets of variables.
How can I perform redundancy analysis in R?
You can perform redundancy analysis in R using the MASS package. The steps include collecting the data for the independent and dependent variables, performing a principal components analysis on the independent variables, regressing the dependent variables on the principal components and computing the overall model fit statistics, and then graphically comparing the observed relationships between the independent and dependent variables to those predicted by the PCA.
What is Partial Least Squares (PLS) regression?
Partial Least Squares (PLS) regression is a type of regression analysis used to determine which of the explanatory variables have the strongest correlation with the dependent variable, and to what degree. This technique can be used as an additional step in the redundancy analysis process.