Appendix

Appendix A. Statistics in R Cheat Sheet

The following provides guidance on the types of statistical tests to perform depending on the nature of the variables of interest. Several R functions are also provided, some of which are available in packages.

For quantitative variables

  • If you want to compare if mean values differ from known values, perform a one-sample t-test.
    • e.g., t.test(x, mu = 999))
  • If you want to compare if mean values differ from other mean values, perform a two-sample t-test.
    • e.g., t.test(x, y)
  • If you want to find the difference between paired data that are not independent, perform a paired t-test.
    • e.g., t.test(x, y, paired = TRUE)
  • If you want to find the sample size necessary given your data, perform a power test.
    • e.g., power.t.test()
  • If you want to compare if the variances from two populations differ, perform an F-test for variance.
    • e.g., var.test(x, y)
  • If you want to see how correlated two quantitative variables are, calculate the Pearson correlation coefficient.
    • e.g., cor(x, y) or cor.test(x, y)
  • If you want to predict a quantitative variable using information from a different quantitative variable (the independent variable), perform a simple linear regression.
    • e.g., lm()
  • If you want to predict a quantitative variable using information from multiple quantitative variables (the independent variables), perform a multiple linear regression.
    • e.g.,lm()
    • From the leaps package: regsubsets()
    • From the GGally package: ggpairs()
    • From the car package: vif()
  • If you want to predict a quantitative variable at one or more treatment levels, perform an analysis of variance.
    • e.g., lm(); pairwise.t.test()
    • From the agricolae package: lsd.test()
  • If you want to predict a quantitative variable at one or more treatment levels and a quantitative covariate, perform an analysis of covariance.
    • e.g., lm(); pairwise.t.test()
    • From the agricolae package: lsd.test()
  • If you want to predict a quantitative variable using information from a different quantitative variable (the independent variable) with fixed and random effect, perform linear mixed models regression.
    • From the lme4 package: lmer()

For proportions

  • If you want to compare if its mean proportion differs from a known proportion, perform a one-sample test for proportion.
    • e.g., binom.test()or prop.test()
  • If you want to compare its mean proportion to another mean proportion, perform a two-sample test for proportion.
    • e.g., prop.test()

For categorical variables

  • If you want to test if there is a relationship across categories, or see if the categories are independent, perform a chi-square test.
    • e.g., chisq.test()

For binary variables

  • If you want to Predict a binary variable (e.g., yes/no) using information from one or more quantitative/categorical variables (the independent variables), perform a logistic regression.
    • e.g., glm(family = “binomial”)

For multinomial variables

  • If you want to predict an unordered multinomial variable (e.g., three or more responses) using information from one or more quantitative/categorical variables (the independent variables), perform multinomial logistic regression.
    • From the nnet package: multinom()

For ordinal variables

  • If you want to predict an ordered multinomial variable (e.g., three or more responses) using information from one or more quantitative/categorical variables (the independent variables), perform an ordinal regression.
    • From the MASS package: polr()

For integers

  • If you have non-negative integers (e.g., 0, 1, 2, 3, …) and you want to predict an integer using one or more quantitative/categorical variables (the independent variables), perform count regression (e.g., Poisson, negative binomial).
    • e.g., glm(family = “Poisson”) or glm.nb()
  • If you have non-negative integers with many zeros, and you want to predict an integer using one or more quantitative/categorical variables (the independent variables), perform zero-inflated count regression (e.g., zero-inflated Poisson or zero-inflated negative binomial).
    • e.g., zeroinfl() from the pscl package