Appendix
Appendix A. Statistics in R Cheat Sheet
The following provides guidance on the types of statistical tests to perform depending on the nature of the variables of interest. Several R functions are also provided, some of which are available in packages.
For quantitative variables
- If you want to compare if mean values differ from known values, perform a one-sample t-test.
- e.g.,
t.test(x, mu = 999))
- e.g.,
- If you want to compare if mean values differ from other mean values, perform a two-sample t-test.
- e.g.,
t.test(x, y)
- e.g.,
- If you want to find the difference between paired data that are not independent, perform a paired t-test.
- e.g.,
t.test(x, y, paired = TRUE)
- e.g.,
- If you want to find the sample size necessary given your data, perform a power test.
- e.g.,
power.t.test()
- e.g.,
- If you want to compare if the variances from two populations differ, perform an F-test for variance.
- e.g.,
var.test(x, y)
- e.g.,
- If you want to see how correlated two quantitative variables are, calculate the Pearson correlation coefficient.
- e.g.,
cor(x, y)
orcor.test(x, y)
- e.g.,
- If you want to predict a quantitative variable using information from a different quantitative variable (the independent variable), perform a simple linear regression.
- e.g.,
lm()
- e.g.,
- If you want to predict a quantitative variable using information from multiple quantitative variables (the independent variables), perform a multiple linear regression.
- e.g.,
lm()
- From the leaps package:
regsubsets()
- From the GGally package:
ggpairs()
- From the car package:
vif()
- e.g.,
- If you want to predict a quantitative variable at one or more treatment levels, perform an analysis of variance.
- e.g.,
lm()
;pairwise.t.test()
- From the agricolae package:
lsd.test()
- e.g.,
- If you want to predict a quantitative variable at one or more treatment levels and a quantitative covariate, perform an analysis of covariance.
- e.g.,
lm()
;pairwise.t.test()
- From the agricolae package:
lsd.test()
- e.g.,
- If you want to predict a quantitative variable using information from a different quantitative variable (the independent variable) with fixed and random effect, perform linear mixed models regression.
- From the lme4 package:
lmer()
- From the lme4 package:
For proportions
- If you want to compare if its mean proportion differs from a known proportion, perform a one-sample test for proportion.
- e.g.,
binom.test()
orprop.test()
- e.g.,
- If you want to compare its mean proportion to another mean proportion, perform a two-sample test for proportion.
- e.g.,
prop.test()
- e.g.,
For categorical variables
- If you want to test if there is a relationship across categories, or see if the categories are independent, perform a chi-square test.
- e.g.,
chisq.test()
- e.g.,
For binary variables
- If you want to Predict a binary variable (e.g., yes/no) using information from one or more quantitative/categorical variables (the independent variables), perform a logistic regression.
- e.g.,
glm(family = “binomial”)
- e.g.,
For multinomial variables
- If you want to predict an unordered multinomial variable (e.g., three or more responses) using information from one or more quantitative/categorical variables (the independent variables), perform multinomial logistic regression.
- From the nnet package:
multinom()
- From the nnet package:
For ordinal variables
- If you want to predict an ordered multinomial variable (e.g., three or more responses) using information from one or more quantitative/categorical variables (the independent variables), perform an ordinal regression.
- From the MASS package:
polr()
- From the MASS package:
For integers
- If you have non-negative integers (e.g., 0, 1, 2, 3, …) and you want to predict an integer using one or more quantitative/categorical variables (the independent variables), perform count regression (e.g., Poisson, negative binomial).
- e.g.,
glm(family = “Poisson”)
orglm.nb()
- e.g.,
- If you have non-negative integers with many zeros, and you want to predict an integer using one or more quantitative/categorical variables (the independent variables), perform zero-inflated count regression (e.g., zero-inflated Poisson or zero-inflated negative binomial).
- e.g.,
zeroinfl()
from the pscl package
- e.g.,