Compute Generalized Estimating Equations (GEEs) on longitudinal microbiome data

This function takes an animalcules-formatted MultiAssayExperiment and runs an independent GEE model for each taxon. The model predicts taxon log CPM abundance as a product of fixed-effects covariates conditional on a grouping ID variable, usually the unit on which repeated measurements were taken. This modeling approach works best with small datasets that multiple samples across many (>40) clusters/units. Note, the "broom", "ggeffects", "broom.mixed", "geepack", "emmeans" packages are required to use this function; all can be installed via CRAN.

Usage

run_gee_model(
  dat,
  taxon_level = "genus",
  unit_var,
  fixed_cov,
  corstr = "ar1",
  p_adj_method = "fdr",
  plot_out = FALSE,
  plotsave_loc = ".",
  plot_terms = NULL,
  ...
)

Arguments

dat: A MultiAssayExperiment object specially formatted as an animalcules output.
taxon_level: Character string, default is "genus".
unit_var: Character string giving the variable containing the identifiers for the unit on which multiple measurements were conducted, e.g. subjects. Default is NULL; must be supplied if paired = FALSE.
fixed_cov: A character vector naming covariates to be tested.
corstr: A character string specifying the correlation structure. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"'.
p_adj_method: A character string specifying the correction method. Can be abbreviated. See details. Default is "fdr".
plot_out: Logical indicating whether plots should be output alongside the model results. Default is FALSE.
plotsave_loc: A character string giving the folder path to save plot outputs. This defaults to the current working directory.
plot_terms: Character vector. Which terms should be examined in the plot output? Can overlap with the fixed_cov inputs.
...: Further arguments passed to ggsave for plot creation.

Value

A data.frame of modeling results.

Details

P-values are adjusted for the model coefficients within each taxon. The following methods are permitted: c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none")

Examples

in_dat <- system.file("extdata/MAE_small.RDS", package = "LegATo") |>
              readRDS()
out <- run_gee_model(in_dat, taxon_level = "genus", unit_var = "Subject",
                     fixed_cov = c("HairLength", "Age", "Group", "Sex"),
                     corstr = "ar1")
head(out)
#>   Coefficient Coefficient Estimate Standard Error  Statistic Unadj p-value
#> 1 (Intercept)            4704.2733     9242.28654  0.2590754   0.610756059
#> 2 (Intercept)           27036.2185    13974.99000  3.7427333   0.053037617
#> 3 (Intercept)            1139.1468      393.29821  8.3891008   0.003774776
#> 4 (Intercept)             271.1042       84.54737 10.2818739   0.001343434
#> 5 (Intercept)             210.2923       72.45671  8.4234430   0.003704137
#> 6 (Intercept)             893.8138      386.92571  5.3362838   0.020885952
#>   Lower 95% CI Upper 95% CI              Taxon Adj p-value
#> 1 -13410.27548   22818.8220      Acinetobacter  0.63983968
#> 2   -354.25853   54426.6956    Corynebacterium  0.09723563
#> 3    368.29643    1909.9971     Staphylococcus  0.02768169
#> 4    105.39440     436.8140 Noviherbaspirillum  0.02768169
#> 5     68.27979     352.3049            Kocuria  0.02768169
#> 6    135.45333    1652.1742        Pseudomonas  0.04594909