Skip to contents

Introduction

This tutorial serves as a step-by-step walkthrough to add signatures by performing the necessary data object updates, updating the publication information in the signature list documentation, and then pulling in the updated package into the official version in the wejlab repository.

There are basically four steps to adding a TB gene signature to the TBSignatureProfiler:

  1. Collecting information from the publication source of the signature

  2. Updating the data objects with the new signature

  3. Adding the signature to the appropriate documentation

  4. Submitting a pull request to the BUMC Division of Computational Biomedicine’s (wejlab) GitHub repository where the package is located.

To illustrate the process of adding signatures, we will use a simple 3-transcript signature published in 2015 by Laux da Costa et al as an example.

Adding multiple signatures

If you need to add more than one signature, note that you will need to repeat the steps of this vignette for each signature. Be sure to use devtools::load_all() before updating the data objects for each subsequent signature.

Setup

Please be sure to download the latest version of the TBSignatureProfiler from GitHub onto your local machine. Navigate to that folder and set it as your working directory. We will use devtools to load the package.

library(knitr)

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

devtools::load_all()
##  Loading TBSignatureProfiler

Gathering Information

The first step in this process is gathering the necessary information about the signature(s) that you would like to add. For each signature, you will need to access the signature’s associated publication, and make a note (a table will be helpful) of these crucial pieces of information:

What to collect

The gene transcripts composing the signature

Currently, all signatures in the package are stored using gene symbols (not Entrez or Ensembl IDs, although these may be implemented in the future). The gene symbols composing a signature are usually either listed in the body of the article, printed in a figure, or available as a table in the supplementary materials. Sometimes the symbols aren’t readily available - in which case, you can try to contact the author to obtain them.

The name of the first author listed

Take note of the last name of the first author listed - this will be used in naming the signature according to package naming standards.

Disease context

We need to know what the signature is intended to discriminate against. The signature type indicates whether the signature was developed to distinguish TB from LTBI (“Disease”), TB from some combination of other diseases and possibly LTBI (“Disease/Other Diseases”), TB from Human Immunodeficiency Virus (“Disease/HIV”), TB from pneumonia (“Disease/Pneumonia”), or identify risk of progression to TB (“risk”), risk of TB treatment failure (“failure”), or classify treatment responses (i.e., failures from cures, “response”). Whatever the designation, please record it as listed in parentheses above (i.e., “risk”, “Disease/Other Diseases”). Note that these designations are case sensitive when we store them in the signature annotation object in the package. More details on this will be explained later in the tutorial.

Tissue type

The tissue type variable denotes whether the signature was developed using samples of either whole blood/paxgene or peripheral blood mononuclear cells (PBMCs). Due to the manipulation of cells inherently required to obtain PBMCs, many scientists prefer to use only whole blood samples for analysis. Also, note that “peripheral blood” without the peripheral blood mononuclear cell designation usually refers to whole blood, so be sure to record it correctly. If you are unsure what the tissue type is, get a second opinion to ensure correctness. Record this variable as “whole blood”, “PBMC”, or “mixed”.

Reference information

Please copy down the citation and the DOI number of the article, to be used in the documentation. Provide the reference in AMA format, if possible.

Assigned publication signature name

Sometimes, but not typically, authors name their signatures in originating publication (or that of a peer). If this is the case, take note of this common name. We will use this name in the package alongside a name that we develop according to our own nomenclature system for convenience. Examples of alternative names include RISK6, PREDICT29, and ACS_COR.

If any details cannot be clearly determined

At this point, you should have recorded all of the necessary information. If you are missing any of the pieces of information listed above, then you will not be able to add the signature at this time. If you have any concerns about recording the correct information, you can reach out to the package maintainer, Aubrey Odom, at .

Information table for Laux da Costa signature

As mentioned before, a table will be useful for keeping track, especially if you plan to use this vignette to update the profiler with multiple signatures. Below is a table of information collected for the Laux da Costa signature.

Item Needed Signature Information
Gene Transcripts GBP5, CD64, GZMA
Author Name Laux da Costa
Disease Context Disease/Other Diseases
Tissue Type whole blood
Alternate Name N/A
Reference Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425.
DOI Number 10.1016/j.tube.2015.04.008

Updating the data objects

To streamline the process of adding signatures, we have introduced the addTBsignature() function to the package. To use the function, users must access the function by calling the unexported function, TBSignatureProfiler:::addTBsignature(). This function will take as input the various pieces of information listed previously, as well as a few other parameters. It will easily update the package’s TBsignatures and TBcommon signature lists, as well as the sigAnnotData and commonAnnotData annotation data tables. After doing so, we will just need to update the documentation for the signature objects and pull in the updates to the repository, and the addition of the signature will be complete. But first, we need to ensure that the data objects can be updated correctly.

From the code below, you should be able to pretty clearly see what needs to be inputted based on the table created above, but if this is unclear, please run ?TBSignatureProfiler:::addTBsignature for more details on the parameters.

The one parameter that we will point out, however, is very important - savobjs. This takes a logical and denotes essentially whether this is a ‘test run’ or the real thing (i.e., saving all the objects with the new signature). Here, we will first run the function with saveobjs = FALSE. Doing so will not save or overwrite any new object files to the data directory - it will only produce error messages and other progress messages, and (if views = TRUE) invoke a data viewer with View() on each of the objects so we can see what the final objects will look like. Note that the function will never return anything in its output to be saved to the working environment, regardless of what saveobjs is set to, and this is why we produce the data views if nothing is actually being saved to the RDS files in the data directory.

Test run

We will run the function using the information gathered from the Laux da Costa publication (note that spaces should be omitted from the author name):

# Append a 2 since the signature is already in the package
# This is merely for testing
addTBsignature(sigsymbols = c("GBP5", "CD64", "GZMA"),
             authname = "LauxdaCosta2", 
             sigtype = "Disease/Other Diseases",
             tissuetype = "whole blood",
             signame_common = NULL,
             saveobjs = FALSE,
             views = FALSE)
## The assigned signature name is LauxdaCosta2_OD_3
## No alternative signature name was provided
## TBsignatures object updated
## TBcommon object updated
## sigAnnotData object updated
## common_sigAnnotData updated

Since this is a vignette, we set views = FALSE, but it will generally be a good thing to keep it set to TRUE to ensure that all goes as planned. This will allow you to peruse the tables/lists and make sure that the signature name looks right, the signatures are all ordered according to alphabetization of package assigned names (TBsignatures ordering is carried over to alternative name objects), and that correct information was entered. Note that the sigAnnotData uses some abbreviations and won’t contain exactly the same information that you put in.

Overwriting the RDS files

Assuming there are no glaring errors and everything looks correct, we can run the function again to save and overwrite the currently existing data objects by setting saveobjs = TRUE. Here, we will refrain from running the code as this is merely a vignette, but the same messages will be output, along with a message confirming that the objects were written to the data folder.

addTBsignature(sigsymbols = c("GBP5", "CD64", "GZMA"),
             authname = "LauxdaCosta2",
             signame_common = NULL,
             sigtype = "Disease/Other Diseases",
             tissuetype = "whole blood",
             saveobjs = TRUE,
             views = FALSE)

With that, the data objects themselves should be successfully updated. You can take a look at them by running devtools::load_all() to update your local repository and then loading the data objects as data("TBsignatures) or data(common_sigAnnotObject).

In the case that you wrote to the RDS files by accident and need to fix them, redownload the appropriate files from the GitHub repository and try again. Be sure to rerun the devtools::load_all() command.

Documentation

The documentation is the only place where the source of the signature will be mentioned in the package; therefore, it is crucial that it be updated accordingly every time a signature is added to the package. Upon adding a signature, the only documentation that needs to be added is that of the list objects, TBsignatures and TBcommon. The table looks a bit messy when viewed raw, but the finished product looks much nicer (you can pull it up with ?TBsignatures or ?Tbcommon).

There is a specific format that needs to be followed for adding an entry, but you can easily create the table entry by using the function provided below. The function only requires that you supply the name of the signature given by addTBsignature(), the reference (minus the DOI, that is a separate parameter), and the DOI number (not the link!).

An example is illustrated below using the LauxdaCosta_OD_3 signature. The file containing documentation should be located in the file R/data.R inside the package file structure.

mkTBreference <- function(TBsigname, ref, DOInum) {
  cat("\\item{\\strong{", TBsigname, "}}{: ", ref,
      " \\href{http://dx.doi.org/", DOInum, "}{", DOInum, "}}", sep = "")
}

mkTBreference(TBsigname = "LauxdaCosta2_OD_3", # The name output in a message from addTBsignature()
              ref = "Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425.",
              DOInum = "10.1016/j.tube.2015.04.008")
## \item{\strong{LauxdaCosta2_OD_3}}{: Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425. \href{http://dx.doi.org/10.1016/j.tube.2015.04.008}{10.1016/j.tube.2015.04.008}}

In the case that an alternative publication signature name exists, mkTBreference will need to be rerun with the alternative name as the TBsigname argument, for use with the TBcommon documentation table. Otherwise, the package assigned signature name can be used for both TBcommon and TBsignatures entries.

Once you have you have the requisite entries, open up data/data.R. First, find the table for TBsignatures, and insert the entry (with the #’ at the beginning like all other entries) at the appropriate position. All signatures should have the same position in the table as they do when you look at ordering of the names of the TBsignatures object (names(TBsignatures)). Once this is complete, do the same to the TBcommon table, which should be a little further down on the page.

Please make sure to compile the documentation by running devtools::document() upon completion of your documentation edits.

Pull in your changes to the repository

The very last step of this process will be to submit a pull request to the GitHub repository. If you have never submitted a pull request to another repo before, instructions can be found here. When you are submitting your request, please note which signature(s) you added by name and what publications they can be found in, so that we can check that the update was completed successfully. We will try to approve pull requests as soon as possible. After the request is approved, your signature will be part of the package. Thank you for your contribution!

Session Information

## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: AlmaLinux 8.9 (Midnight Oncilla)
## 
## Matrix products: default
## BLAS/LAPACK: FlexiBLAS NETLIB;  LAPACK version 3.11.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] TBSignatureProfiler_1.17.1 testthat_3.2.1.1          
## [3] knitr_1.46                 BiocStyle_2.32.0          
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          rstudioapi_0.16.0          
##   [3] jsonlite_1.8.8              shape_1.4.6.1              
##   [5] magrittr_2.0.3              magick_2.8.3               
##   [7] rmarkdown_2.26              GlobalOptions_0.1.2        
##   [9] fs_1.6.4                    zlibbioc_1.50.0            
##  [11] ragg_1.3.0                  vctrs_0.6.5                
##  [13] memoise_2.0.1               htmltools_0.5.8.1          
##  [15] S4Arrays_1.4.0              usethis_2.2.3              
##  [17] Rhdf5lib_1.26.0             rhdf5_2.48.0               
##  [19] SparseArray_1.4.0           sass_0.4.9                 
##  [21] bslib_0.7.0                 htmlwidgets_1.6.4          
##  [23] desc_1.4.3                  plyr_1.8.9                 
##  [25] cachem_1.0.8                mime_0.12                  
##  [27] lifecycle_1.0.4             iterators_1.0.14           
##  [29] pkgconfig_2.0.3             rsvd_1.0.5                 
##  [31] Matrix_1.7-0                R6_2.5.1                   
##  [33] fastmap_1.1.1               GenomeInfoDbData_1.2.12    
##  [35] MatrixGenerics_1.16.0       shiny_1.8.1.1              
##  [37] clue_0.3-65                 digest_0.6.35              
##  [39] colorspace_2.1-0            singscore_1.24.0           
##  [41] AnnotationDbi_1.66.0        S4Vectors_0.42.0           
##  [43] DESeq2_1.44.0               rprojroot_2.0.4            
##  [45] irlba_2.3.5.1               pkgload_1.3.4              
##  [47] textshaping_0.3.7           GenomicRanges_1.56.0       
##  [49] RSQLite_2.3.6               beachmat_2.20.0            
##  [51] fansi_1.0.6                 gdata_3.0.0                
##  [53] httr_1.4.7                  abind_1.4-5                
##  [55] compiler_4.4.0              remotes_2.5.0              
##  [57] bit64_4.0.5                 withr_3.0.0                
##  [59] doParallel_1.0.17           ROCit_2.1.1                
##  [61] BiocParallel_1.38.0         DBI_1.2.2                  
##  [63] pkgbuild_1.4.4              HDF5Array_1.32.0           
##  [65] DelayedArray_0.30.0         sessioninfo_1.2.2          
##  [67] rjson_0.2.21                ASSIGN_1.40.0              
##  [69] gtools_3.9.5                tools_4.4.0                
##  [71] httpuv_1.6.15               glue_1.7.0                 
##  [73] rhdf5filters_1.16.0         promises_1.3.0             
##  [75] grid_4.4.0                  reshape2_1.4.4             
##  [77] cluster_2.1.6               generics_0.1.3             
##  [79] gtable_0.3.5                tidyr_1.3.1                
##  [81] BiocSingular_1.20.0         ScaledMatrix_1.12.0        
##  [83] utf8_1.2.4                  XVector_0.44.0             
##  [85] BiocGenerics_0.50.0         foreach_1.5.2              
##  [87] pillar_1.9.0                stringr_1.5.1              
##  [89] GSVA_1.52.0                 limma_3.60.0               
##  [91] later_1.3.2                 circlize_0.4.16            
##  [93] dplyr_1.1.4                 lattice_0.22-6             
##  [95] bit_4.0.5                   annotate_1.82.0            
##  [97] tidyselect_1.2.1            SingleCellExperiment_1.26.0
##  [99] ComplexHeatmap_2.20.0       locfit_1.5-9.9             
## [101] Biostrings_2.72.0           miniUI_0.1.1.1             
## [103] bookdown_0.39               IRanges_2.38.0             
## [105] edgeR_4.2.0                 SummarizedExperiment_1.34.0
## [107] stats4_4.4.0                xfun_0.43                  
## [109] Biobase_2.64.0              statmod_1.5.0              
## [111] brio_1.1.5                  devtools_2.4.5             
## [113] matrixStats_1.3.0           DT_0.33                    
## [115] stringi_1.8.3               UCSC.utils_1.0.0           
## [117] yaml_2.3.8                  evaluate_0.23              
## [119] codetools_0.2-20            tibble_3.2.1               
## [121] graph_1.82.0                BiocManager_1.30.22        
## [123] cli_3.6.2                   xtable_1.8-4               
## [125] systemfonts_1.0.6           munsell_0.5.1              
## [127] jquerylib_0.1.4             Rcpp_1.0.12                
## [129] GenomeInfoDb_1.40.0         png_0.1-8                  
## [131] XML_3.99-0.16.1             parallel_4.4.0             
## [133] ellipsis_0.3.2              blob_1.2.4                 
## [135] pkgdown_2.0.9               ggplot2_3.5.1              
## [137] profvis_0.3.8               urlchecker_1.0.1           
## [139] sparseMatrixStats_1.16.0    SpatialExperiment_1.14.0   
## [141] GSEABase_1.66.0             scales_1.3.0               
## [143] purrr_1.0.2                 crayon_1.5.2               
## [145] GetoptLong_1.0.5            rlang_1.1.3                
## [147] KEGGREST_1.44.0