Evaluating Signatures Using Original Models

Introduction

Some of the gene signatures included in the TBSignatureProfiler were originally trained using a machine learning or statistical model. In order to provide an element of completeness to our package, we have included these models for users to run and compare to the methods that serve as the main mechanism of scoring gene signatures in the TBSP.

This vignette provides some examples to allow users to evaluate certain signatures’ performance using these original models. Currently, the package has incorporated the original methods for the gene signatures listed in the code chunk below. The specific genes within each biomarker can be found by calling that gene within the TBsignatures data object.

library(TBSignatureProfiler)
signatureOriginalModel <- c("Anderson_42", "Anderson_OD_51", "Kaforou_27",
                            "Kaforou_OD_44", "Kaforou_OD_53", "Sweeney_OD_3",
                            "Maertzdorf_4", "Maertzdorf_15", "LauxdaCosta_OD_3",
                            "Verhagen_10", "Jacobsen_3", "Sambarey_HIV_10",
                            "Leong_24", "Berry_OD_86", "Berry_393",
                            "Bloom_OD_144", "Suliman_RISK_4", "Zak_RISK_16",
                            "Leong_RISK_29", "Zhao_NANO_6")

Evaluation

In this tutorial, we will work with HIV and Tuberculosis (TB) gene expression data in a SummarizedExperiment format. First, we evaluate the performance of all available TB gene signatures whose original models have been included in the package by setting geneSignaturesName = "".

# HIV/TB gene expression data, included in the package
hivtb_data <- TB_hiv
out <- evaluateOriginalModel(input = hivtb_data, geneSignaturesName = "",
                             useAssay = "counts")
out$Zak_RISK_16_OriginalModel

Users can also evaluate selected gene signatures based on their preference.

outSub <- evaluateOriginalModel(input = hivtb_data,
                                geneSignaturesName = c("Anderson_42", "Sweeney_OD_3",
                                                       "Verhagen_10", "Zak_RISK_16"),
                                useAssay = "counts")
# The predicted score from each signature can be viewed by calling: 
colData(outSub)[, paste0(c("Anderson_42", "Sweeney_OD_3", "Verhagen_10", "Zak_RISK_16"), "_OriginalModel")]

The returned object is also of the SummarizedExperiment. The scores will be returned as a part of the colData with column names formatted as “Name_Of_Signature_OriginalModel”. The structure of the returned object is the same as the one given by runTBsigProfiler. At this point, users may now follow the guidance to using the package given in the main package vignette for downstream analysis.

Xutao Wang

Introduction

Evaluation