A function for reference batch correction and imputation. — ref_combat

A function used to perform reference batch correction and imputation in the testing data for gene signatures that require retraining of the model. We used the k-nearest neighbors to impute the expression values for missing gene(s). The imputation operation is achieved using impute.knn. Since the computational time for the imputation step can be excessive for large number of missing genes. We made some constrains to prevent the overflow of imputation operation. The evaluation will not run if more than geneMax*100\ of the genes are not found for the corresponding gene signature in the input study. By default geneMax = 0.8, so the evaluation will not run if more than 80\ of the genes are missing when matching the input study to the reference data.

Usage

ref_combat_impute(
  theObject_train,
  useAssay,
  gene_set,
  input,
  SigName,
  adj,
  geneMax = 0.8
)

Arguments

theObject_train: A SummarizedExperiment object that has been pre-stored in the data file: OriginalTrainingData.
useAssay: A character string or an integer specifying the assay in the input. Used for the test SummarizedExperiment object. Default is 1, indicating the first assay in the test SummarizedExperiment object.
gene_set: A character vector that includes gene symbols for selected gene signature.
input: A SummarizedExperiment object with gene symbols as the assay row names.
SigName: Optional. A character string that indicates the name for gene_set. SigName is used to provide information when gene signatures were missing in the test data.
adj: A small real number used in combat to solve for genes with 0 counts in rare cases. Not required for most of cases.
geneMax: A real number between 0 and 1. This is used to detect the maximum percent missing genes allowed in the evaluated signatures. See impute.knn for details. The default value is 0.8.

Value

Gene set subset