Filter a MultiAssayExperiment object to keep a top percentage of taxa

This function takes an animalcules-formatted MultiAssayExperiment (MAE) object and identifies all taxa at the OTU level of choice that exhibit a relative abundance greater than or equal to a relative abundance percent threshold, relabu_threshold, in at least occur_pct_cutoff% of the total samples. After filtration, taxa across the specified OTU level and all downstream levels are then consolidated into the category "Other".

Usage

filter_MAE(
  dat,
  relabu_threshold = 3,
  occur_pct_cutoff = 5,
  taxon_level = "genus"
)

Arguments

dat: A MultiAssayExperiment object specially formatted as an animalcules output.
relabu_threshold: A double(percentage) between 0 and 100, representing the relative abundance criterion that all OTUs should meet to be retained. The smaller the threshold, the fewer the OTUs will be retained. Default is 3%.
occur_pct_cutoff: A double (percentage) between 0 and 100 representing the percent cutoff for how many OTUs must meet the relabu_threshold across the samples to be retained. It is wise to keep the number of samples in mind when setting this parameter. Default is 5%.
taxon_level: Character string indicating the level of taxonomy to aggregate the counts data. Must be the name of a column in MultiAssayExperiment::rowData(dat).

Value

An animalcules-formatted MultiAssayExperiment object with major OTUs retained.

Examples

  in_dat <- system.file("extdata/MAE_small.RDS", package = "LegATo") |>
    readRDS()
  filter_MAE(in_dat, relabu_threshold = 3, occur_pct_cutoff = 5,
             taxon_level = "genus")
#> The overall range of relative abundance counts between samples is (590, 238823) 
#> Number of OTUs that exhibit a relative abundance >3% in at least 5% of the total samples: 54/100
#> A MultiAssayExperiment object of 1 listed
#>  experiment with a user-defined name and respective class.
#>  Containing an ExperimentList class object of length 1:
#>  [1] MicrobeGenetics: SummarizedExperiment with 54 rows and 50 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save data to flat files