Compute Generalized Estimating Equations (GEEs) on longitudinal microbiome data
Source:R/run_gee_model.R
run_gee_model.Rd
This function takes an animalcules-formatted MultiAssayExperiment
and
runs an independent GEE model for each taxon. The model predicts taxon log
CPM abundance as a product of fixed-effects covariates conditional on a
grouping ID variable, usually the unit on which repeated measurements were
taken. This modeling approach works best with small datasets that multiple
samples across many (>40) clusters/units. Note, the "broom", "ggeffects",
"broom.mixed", "geepack", "emmeans" packages are required to use this
function; all can be installed via CRAN.
Usage
run_gee_model(
dat,
taxon_level = "genus",
unit_var,
fixed_cov,
corstr = "ar1",
p_adj_method = "fdr",
plot_out = FALSE,
plotsave_loc = ".",
plot_terms = NULL,
...
)
Arguments
- dat
A MultiAssayExperiment object specially formatted as an animalcules output.
- taxon_level
Character string, default is
"genus"
.- unit_var
Character string giving the variable containing the identifiers for the unit on which multiple measurements were conducted, e.g. subjects. Default is
NULL
; must be supplied ifpaired = FALSE
.- fixed_cov
A character vector naming covariates to be tested.
- corstr
A character string specifying the correlation structure. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"'.
- p_adj_method
A character string specifying the correction method. Can be abbreviated. See details. Default is
"fdr"
.- plot_out
Logical indicating whether plots should be output alongside the model results. Default is
FALSE
.- plotsave_loc
A character string giving the folder path to save plot outputs. This defaults to the current working directory.
- plot_terms
Character vector. Which terms should be examined in the plot output? Can overlap with the
fixed_cov
inputs.- ...
Further arguments passed to
ggsave
for plot creation.
Details
P-values are adjusted for the model coefficients within each taxon. The
following methods are permitted: c("holm", "hochberg", "hommel",
"bonferroni", "BH", "BY", "fdr", "none")
Examples
in_dat <- system.file("extdata/MAE_small.RDS", package = "LegATo") |>
readRDS()
out <- run_gee_model(in_dat, taxon_level = "genus", unit_var = "Subject",
fixed_cov = c("HairLength", "Age", "Group", "Sex"),
corstr = "ar1")
head(out)
#> Coefficient Coefficient Estimate Standard Error Statistic Unadj p-value
#> 1 (Intercept) 4704.2733 9242.28654 0.2590754 0.610756059
#> 2 (Intercept) 27036.2185 13974.99000 3.7427333 0.053037617
#> 3 (Intercept) 1139.1468 393.29821 8.3891008 0.003774776
#> 4 (Intercept) 271.1042 84.54737 10.2818739 0.001343434
#> 5 (Intercept) 210.2923 72.45671 8.4234430 0.003704137
#> 6 (Intercept) 893.8138 386.92571 5.3362838 0.020885952
#> Lower 95% CI Upper 95% CI Taxon Adj p-value
#> 1 -13410.27548 22818.8220 Acinetobacter 0.63983968
#> 2 -354.25853 54426.6956 Corynebacterium 0.09723563
#> 3 368.29643 1909.9971 Staphylococcus 0.02768169
#> 4 105.39440 436.8140 Noviherbaspirillum 0.02768169
#> 5 68.27979 352.3049 Kocuria 0.02768169
#> 6 135.45333 1652.1742 Pseudomonas 0.04594909