Run TB gene signature profiling. — runTBsigProfiler • TBSignatureProfiler

Using some subset of the signatures listed in TBsignatures and specified scoring algorithms, this function runs gene signature profiling on an input gene expression dataset. It allows for scores to be computed for these signatures which can be compared using various visualization tools also provided in the TBSignatureProfiler package.

Usage

runTBsigProfiler(
  input,
  useAssay = NULL,
  signatures = NULL,
  algorithm = c("GSVA", "ssGSEA", "ASSIGN", "PLAGE", "Zscore", "singscore"),
  combineSigAndAlgorithm = FALSE,
  assignDir = NULL,
  outputFormat = NULL,
  parallel.sz = 0,
  ASSIGNiter = 1e+05,
  ASSIGNburnin = 50000,
  ssgsea_norm = TRUE,
  update_genes = TRUE
)

Source

Profiling for the Z-Score, PLAGE, GSVA, ssGSEA algorithms are all conducted with the Bioconductor GSVA package. Profiling for the singscore algorithm is conducted with the Bioconductor singscore package.

Arguments

input: an input data object of the class SummarizedExperiment, data.frame, or matrix containing gene expression data. Required.
useAssay: a character string specifying the assay to use for signature profiling when input is a SummarizedExperiment. Required only for input data of the class SummarizedExperiment. If null, the assay used will be "counts". The default is NULL.
signatures: a list of signatures to run with their associated genes. This list should be in the same format as TBsignatures, included in the TBSignatureProfiler package. If signatures = NULL, the default set of signatures TBsignatures list is used. For details, run ?TBsignatures. If <2 genes in a signature are present in the sample, that signature will not be evaluated and will not be present in the resulting SE object. The default is NULL.
algorithm: a vector of algorithms to run, or character string if only one is desired. The default is c("GSVA", "ssGSEA", "ASSIGN", "PLAGE", "Zscore", "singscore"). NOTE: ASSIGN takes a long time to run and is not recommended for efficient use.
combineSigAndAlgorithm: logical, if TRUE, output row names will be of the form _. It must be set to TRUE if the ouputFormat will be a SummarizedExperiment and length(algorithm) > 1. It will always be FALSE if only one algorithm is selected. If FALSE, there will be a column named algorithm' that lists which algorithm is used, and a column named 'pathway' that lists the signature profiled. If NULL, and one algorithm was used, the algorithm will not be listed. The default is FALSE.
assignDir: a character string naming a directory to save intermediate ASSIGN results if algorithm specifies "ASSIGN". The default is NULL, in which case intermediate results will not be saved.
outputFormat: a character string specifying the output data format. Possible values are "SummarizedExperiment", "matrix", or "data.frame". The default is to return the same type as the input object.
parallel.sz: an integer identifying the number of processors to use when running the calculations in parallel for the GSVA and ssGSEA algorithms. If parallel.sz = 0, all cores are used. The default is 0.
ASSIGNiter: an integer indicating the number of iterations to use in the MCMC for the ASSIGN algorithm. The default is 100,000.
ASSIGNburnin: an integer indicating the number of burn-in iterations to use in the MCMC for the ASSIGN algorithm. These iterations are discarded when computing the posterior means of the model parameters. The default is 50,000.
ssgsea_norm: logical, passed to GSVA::gsva(). When parameter algorithm = "ssgsea",the profiler runs the SSGSEA method from Barbie et al. (2009) normalizing the scores by the absolute difference between the minimum and the maximum, as described in their paper. When ssgsea.norm = FALSE, this last normalization step is skipped. The default is TRUE.
update_genes: logical, denotes whether gene names from signatures and the rownames of input should be checked for accuracy using HGNChelper::checkGeneSymbols(). The mapping assumes genes are from humans and will keep unmapped genes as the original input gene name. Default is TRUE.

Value

A SummarizedExperiment object, data.frame, or matrix of signature profiling results. The returned object will be of the format specified in outputFormat. If input is a SummarizedExperiment and outputFormat = "SummarizedExperiment", then the output will retain any input information stored in the input colData. In general, if outputFormat = "SummarizedExperiment" then columns in the colData will include the scores for each desired signature with samples on the rows. If input is a data.frame or matrix, then the returned object will have signatures on the rows and samples on the columns.

References

Barbie, D.A., Tamayo, P., Boehm, J.S., Kim, S.Y., Moody, S.E., Dunn, I.F., Schinzel, A.C., Sandy, P., Meylan, E., Scholl, C., et al. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108-112. doi: 10.1038/nature08460.

Foroutan, M. et al. (2018). Single sample scoring of molecular phenotypes. BMC Bioinformatics, 19. doi: 10.1186/s12859-018-2435-4.

Lee, E. et al. (2008). Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217. doi: 10.1371/journal.pcbi.1000217

Shen, Y. et al. (2015). ASSIGN: context-specific genomic profiling of multiple heterogeneous biological pathways. Bioinformatics, 31, 1745-1753. doi: 10.1093/bioinformatics/btv031.

Subramanian, A. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102, 15545-15550. doi: 10.1073/pnas.0506580102.

Tomfohr, J. et al. (2005). Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225. doi: 10.1186/1471-2105-6-225

Examples

## Using a data.frame input/output
 # Create some toy data to test Zak_RISK_16 signature, using 5 samples with low
 # expression & five samples with high expression of the signatures genes.
df_testdata <- as.data.frame(rbind(matrix(c(rnorm(80), rnorm(80) + 5), 16, 10,
                             dimnames = list(TBsignatures$Zak_RISK_16,
                             paste0("sample", seq_len(10)))),
                      matrix(rnorm(1000), 100, 10,
                             dimnames = list(paste0("gene", seq_len(100)),
                             paste0("sample", seq_len(10))))))
res <- runTBsigProfiler(input = df_testdata,
                        signatures = TBsignatures["Zak_RISK_16"],
                        algorithm = c("GSVA", "ssGSEA"),
                        combineSigAndAlgorithm = FALSE,
                        parallel.sz = 1)
#> Parameter update_genes is TRUE. Gene names will be updated.
#> Running GSVA
#> Running ssGSEA
subset(res, res$pathway == "Zak_RISK_16")
#>       pathway algorithm            sample1            sample2
#> 1 Zak_RISK_16      GSVA -0.534666666666667 -0.600632318501171
#> 2 Zak_RISK_16    ssGSEA  0.192383825351229  0.069521160142643
#>              sample3            sample4            sample5           sample6
#> 1 -0.736618705035971 -0.631807228915663 -0.527538940809969 0.496305732484077
#> 2 -0.166684761621847 0.0714328935283431  0.114638189730784 0.833315238378153
#>             sample7           sample8           sample9          sample10
#> 1 0.681304347826087 0.499178082191781 0.586710239651416 0.540224719101124
#> 2 0.833315238378153 0.832308838099057 0.833315238378153 0.833315238378153

## Using a SummarizedExperiment input/output
 # The TB_indian SummarizedExperiment data is included in the package.
GSVA_res <- runTBsigProfiler(input = TB_indian,
                             useAssay = "logcounts",
                             signatures = TBsignatures["Zak_RISK_16"],
                             algorithm = c("GSVA"),
                             combineSigAndAlgorithm = FALSE,
                             parallel.sz = 1)
#> Parameter update_genes is TRUE. Gene names will be updated.
#> Running GSVA
#> ! 1 genes with constant values throughout the samples
#> ! Genes with constant values are discarded
GSVA_res$Zak_RISK_16
#>  [1]  0.6255317  0.6709665  0.4502341  0.7240338  0.5423896  0.5076658
#>  [7]  0.6268138 -0.3785315  0.5837650  0.3315927  0.5688302  0.5667150
#> [13]  0.6943893 -0.6998614  0.3244909  0.6782273 -0.6715210 -0.6758541
#> [19] -0.5738776 -0.7008174 -0.5409373 -0.5289045 -0.5812383 -0.6514118
#> [25] -0.6726977 -0.2904185 -0.5783098 -0.6327992 -0.6770838 -0.4487168
#> [31] -0.6444862 -0.1045107  0.6871138  0.1003890 -0.4337305 -0.5566492
#> [37]  0.3517130 -0.3596061  0.6574704 -0.3224089  0.2424746  0.2679882
#> [43]  0.1508951  0.3123478