Evaluating Signatures Using Original Models
Xutao Wang
Department of Biostatistics, Boston University, Boston, MAxutaow@bu.edu Source:
vignettes/rmd/OriginalModelTutorial.Rmd
OriginalModelTutorial.Rmd
Introduction
Some of the gene signatures included in the TBSignatureProfiler were originally trained using a machine learning or statistical model. In order to provide an element of completeness to our package, we have included these models for users to run and compare to the methods that serve as the main mechanism of scoring gene signatures in the TBSP.
This vignette provides some examples to allow users to evaluate
certain signatures’ performance using these original models. Currently,
the package has incorporated the original methods for the gene
signatures listed in the code chunk below. The specific genes within
each biomarker can be found by calling that gene within the
TBsignatures
data object.
library(TBSignatureProfiler)
signatureOriginalModel <- c("Anderson_42", "Anderson_OD_51", "Kaforou_27",
"Kaforou_OD_44", "Kaforou_OD_53", "Sweeney_OD_3",
"Maertzdorf_4", "Maertzdorf_15", "LauxdaCosta_OD_3",
"Verhagen_10", "Jacobsen_3", "Sambarey_HIV_10",
"Leong_24", "Berry_OD_86", "Berry_393",
"Bloom_OD_144", "Suliman_RISK_4", "Zak_RISK_16",
"Leong_RISK_29", "Zhao_NANO_6")
Evaluation
In this tutorial, we will work with HIV and Tuberculosis (TB) gene
expression data in a SummarizedExperiment
format. First, we
evaluate the performance of all available TB gene signatures whose
original models have been included in the package by setting
geneSignaturesName = ""
.
# HIV/TB gene expression data, included in the package
hivtb_data <- TB_hiv
out <- evaluateOriginalModel(input = hivtb_data, geneSignaturesName = "",
useAssay = "counts")
out$Zak_RISK_16_OriginalModel
Users can also evaluate selected gene signatures based on their preference.
outSub <- evaluateOriginalModel(input = hivtb_data,
geneSignaturesName = c("Anderson_42", "Sweeney_OD_3",
"Verhagen_10", "Zak_RISK_16"),
useAssay = "counts")
# The predicted score from each signature can be viewed by calling:
colData(outSub)[, paste0(c("Anderson_42", "Sweeney_OD_3", "Verhagen_10", "Zak_RISK_16"), "_OriginalModel")]
The returned object is also of the SummarizedExperiment
.
The scores will be returned as a part of the colData
with
column names formatted as “Name_Of_Signature_OriginalModel”. The
structure of the returned object is the same as the one given by
runTBsigProfiler
. At this point, users may now follow the
guidance to using the package given in the main package
vignette for downstream analysis.