A function for reference batch correction and imputation.
Source:R/OriginalModel.R
ref_combat_impute.Rd
A function used to perform reference batch correction and imputation in the
testing data for gene signatures that require retraining of the model.
We used the k-nearest neighbors to impute the expression values for missing gene(s).
The imputation operation is achieved using impute.knn
.
Since the computational time for the imputation step can be excessive for large
number of missing genes. We made some constrains to prevent the overflow of imputation
operation. The evaluation will not run if more than geneMax
*100\
of the genes are not found for the corresponding gene signature in the input study.
By default geneMax
= 0.8, so the evaluation will not run if more than 80\
of the genes are missing when matching the input study to the reference data.
Arguments
- theObject_train
A SummarizedExperiment object that has been pre-stored in the data file: OriginalTrainingData.
- useAssay
A character string or an integer specifying the assay in the
input
. Used for the test SummarizedExperiment object. Default is 1, indicating the first assay in the test SummarizedExperiment object.- gene_set
A character vector that includes gene symbols for selected gene signature.
- input
A SummarizedExperiment object with gene symbols as the assay row names.
- SigName
Optional. A character string that indicates the name for
gene_set
.SigName
is used to provide information when gene signatures were missing in the test data.- adj
A small real number used in combat to solve for genes with 0 counts in rare cases. Not required for most of cases.
- geneMax
A real number between 0 and 1. This is used to detect the maximum percent missing genes allowed in the evaluated signatures. See
impute.knn
for details. The default value is 0.8.