Conduct a multivariate Hotelling's T-squared test — test_hotelling

This function takes an animalcules-formatted MultiAssayExperiment object and runs a multivariate Hotelling's T-squared test. The test expects a comparison of two distinct groups, and compares the abundances of the top microbes at a given taxon level between the groups. This function allows both paired and unpaired tests. Both test the null hypothesis that the population mean vectors are equal, with the alternative being that they are unequal.

Usage

test_hotelling_t2(
  dat,
  test_index = NULL,
  taxon_level = "genus",
  num_taxa,
  grouping_var,
  paired = FALSE,
  pairing_var = NULL,
  unit_var = NULL,
  save_table_loc = "."
)

Arguments

dat: A MultiAssayExperiment object specially formatted as an animalcules output.
test_index: Any argument used for subsetting the input dat, can be a character, logical, integer, list or List vector. Default is NULL.
taxon_level: Character string, default is "genus".
num_taxa: The number of most abundant taxa to test. If unpaired, this should be no larger than the total number of subjects in both groups - 2, or (n1 + n2 -2). If paired, this should be no larger than the total number of pairs - 1, or n - 1. Required.
grouping_var: Character string, the name of a DICHOTOMOUS grouping variable in the metadata of dat.
paired: Logical indicating whether a paired test should be conducted. Default is FALSE for an unpaired test.
pairing_var: Character string giving the variable containing pairing information. The variable should be in integer form. Must be supplied if paired = TRUE, otherwise the default is NULL.
unit_var: Character string giving the variable containing the identifiers for the unit on which multiple measurements were conducted, e.g. subjects. Default is NULL; must be supplied if paired = FALSE.
save_table_loc: A character string giving the folder path to save t.test results. Note that these are only conducted if the Hotelling's T-test value is <0.05. Defaults to the current working directory.

Value

A list of the elements "df1", "df2", "crit_F", "F_stat" and "pvalue" giving the results of the test.

Details

The Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing to test the means of different populations.

Note that any entries or pairs with missing values are excluded.

Referenced articles in the implementation of tests:

https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.14

https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.15

https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.4

https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.9

Examples

dat <- system.file("extdata", "MAE.RDS", package = "LegATo") |>
readRDS()
dat_0.05 <- filter_MAE(dat, 0.001, 10, "species")
#> The overall range of relative abundance counts between samples is (13218, 3016276) 
#> Number of OTUs that exhibit a relative abundance >0.001% in at least 10% of the total samples: 93/1690
out1 <- test_hotelling_t2(dat = dat_0.05,
                  test_index = which(dat_0.05$MothChild == "Infant" &
                                       dat_0.05$timepoint == 0),
                  taxon_level = "genus",
                  # Total number of pairs - 1
                  num_taxa = 9,
                  paired = TRUE,
                  grouping_var = "HIVStatus",
                  pairing_var = "pairing")
out1                  
#> $df1
#> [1] 9
#> 
#> $df2
#> [1] 1
#> 
#> $crit_F
#> [1] 240.5433
#> 
#> $F_stat
#> [1] 0.3869031
#> 
#> $pvalue
#> [1] 0.8576324
#> 

out <- test_hotelling_t2(dat = dat_0.05,
                  test_index = which(dat_0.05$MothChild == "Mother" &
                                       dat_0.05$timepoint == 6),
                  taxon_level = "genus",
                  # Max is Total number of subjects - 2
                  # Here we use a much smaller number
                  num_taxa = 6,
                  grouping_var = "HIVStatus",
                  unit_var = "Subject",
                  paired = FALSE)
out                  
#> $df1
#> [1] 6
#> 
#> $df2
#> [1] 11
#> 
#> $crit_F
#> [1] 3.094613
#> 
#> $F_stat
#> [1] 1.591778
#> 
#> $pvalue
#> [1] 0.2384123
#>