Submitting Signatures to the TBSP Package
Aubrey Odom
Program in Bioinformatics, Boston University, Boston, MAaodom@bu.edu Source:
vignettes/rmd/SignatureAddition.Rmd
SignatureAddition.Rmd
Introduction
This tutorial serves as a step-by-step walkthrough to add signatures by performing the necessary data object updates, updating the publication information in the signature list documentation, and then pulling in the updated package into the official version in the wejlab repository.
There are basically four steps to adding a TB gene signature to the TBSignatureProfiler:
Collecting information from the publication source of the signature
Updating the data objects with the new signature
Adding the signature to the appropriate documentation
Submitting a pull request to the BUMC Division of Computational Biomedicine’s (wejlab) GitHub repository where the package is located.
To illustrate the process of adding signatures, we will use a simple 3-transcript signature published in 2015 by Laux da Costa et al as an example.
Adding multiple signatures
If you need to add more than one signature, note that you will need
to repeat the steps of this vignette for each signature. Be sure to use
devtools::load_all()
before updating the data objects for
each subsequent signature.
Setup
Please be sure to download the latest version of the TBSignatureProfiler from GitHub onto your local machine. Navigate to that folder and set it as your working directory. We will use devtools to load the package.
library(knitr)
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::load_all()
## ℹ Loading TBSignatureProfiler
Gathering Information
The first step in this process is gathering the necessary information about the signature(s) that you would like to add. For each signature, you will need to access the signature’s associated publication, and make a note (a table will be helpful) of these crucial pieces of information:
What to collect
The gene transcripts composing the signature
Currently, all signatures in the package are stored using gene symbols (not Entrez or Ensembl IDs, although these may be implemented in the future). The gene symbols composing a signature are usually either listed in the body of the article, printed in a figure, or available as a table in the supplementary materials. Sometimes the symbols aren’t readily available - in which case, you can try to contact the author to obtain them.
The name of the first author listed
Take note of the last name of the first author listed - this will be used in naming the signature according to package naming standards.
Disease context
We need to know what the signature is intended to discriminate against. The signature type indicates whether the signature was developed to distinguish TB from LTBI (“Disease”), TB from some combination of other diseases and possibly LTBI (“Disease/Other Diseases”), TB from Human Immunodeficiency Virus (“Disease/HIV”), TB from pneumonia (“Disease/Pneumonia”), or identify risk of progression to TB (“risk”), risk of TB treatment failure (“failure”), or classify treatment responses (i.e., failures from cures, “response”). Whatever the designation, please record it as listed in parentheses above (i.e., “risk”, “Disease/Other Diseases”). Note that these designations are case sensitive when we store them in the signature annotation object in the package. More details on this will be explained later in the tutorial.
Tissue type
The tissue type variable denotes whether the signature was developed using samples of either whole blood/paxgene or peripheral blood mononuclear cells (PBMCs). Due to the manipulation of cells inherently required to obtain PBMCs, many scientists prefer to use only whole blood samples for analysis. Also, note that “peripheral blood” without the peripheral blood mononuclear cell designation usually refers to whole blood, so be sure to record it correctly. If you are unsure what the tissue type is, get a second opinion to ensure correctness. Record this variable as “whole blood”, “PBMC”, or “mixed”.
Reference information
Please copy down the citation and the DOI number of the article, to be used in the documentation. Provide the reference in AMA format, if possible.
Assigned publication signature name
Sometimes, but not typically, authors name their signatures in originating publication (or that of a peer). If this is the case, take note of this common name. We will use this name in the package alongside a name that we develop according to our own nomenclature system for convenience. Examples of alternative names include RISK6, PREDICT29, and ACS_COR.
If any details cannot be clearly determined
At this point, you should have recorded all of the necessary information. If you are missing any of the pieces of information listed above, then you will not be able to add the signature at this time. If you have any concerns about recording the correct information, you can reach out to the package maintainer, Aubrey Odom, at aodom@bu.edu.
Information table for Laux da Costa signature
As mentioned before, a table will be useful for keeping track, especially if you plan to use this vignette to update the profiler with multiple signatures. Below is a table of information collected for the Laux da Costa signature.
Item Needed | Signature Information |
---|---|
Gene Transcripts | GBP5, CD64, GZMA |
Author Name | Laux da Costa |
Disease Context | Disease/Other Diseases |
Tissue Type | whole blood |
Alternate Name | N/A |
Reference | Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425. |
DOI Number | 10.1016/j.tube.2015.04.008 |
Updating the data objects
To streamline the process of adding signatures, we have introduced
the addTBsignature() function to the package. To use the function, users
must access the function by calling the unexported function,
TBSignatureProfiler:::addTBsignature()
. This function will
take as input the various pieces of information listed previously, as
well as a few other parameters. It will easily update the package’s
TBsignatures
and TBcommon
signature lists, as
well as the sigAnnotData
and commonAnnotData
annotation data tables. After doing so, we will just need to update the
documentation for the signature objects and pull in the updates to the
repository, and the addition of the signature will be complete. But
first, we need to ensure that the data objects can be updated
correctly.
From the code below, you should be able to pretty clearly see what
needs to be inputted based on the table created above, but if this is
unclear, please run ?TBSignatureProfiler:::addTBsignature
for more details on the parameters.
The one parameter that we will point out, however, is very important
- savobjs
. This takes a logical and denotes essentially
whether this is a ‘test run’ or the real thing (i.e., saving all the
objects with the new signature). Here, we will first run the function
with saveobjs = FALSE
. Doing so will not save or overwrite
any new object files to the data directory - it will only produce error
messages and other progress messages, and (if views = TRUE
)
invoke a data viewer with View()
on each of the objects so
we can see what the final objects will look like. Note that the function
will never return anything in its output to be saved to the working
environment, regardless of what saveobjs
is set to, and
this is why we produce the data views if nothing is actually being saved
to the RDS files in the data directory.
Test run
We will run the function using the information gathered from the Laux da Costa publication (note that spaces should be omitted from the author name):
# Append a 2 since the signature is already in the package
# This is merely for testing
addTBsignature(sigsymbols = c("GBP5", "CD64", "GZMA"),
authname = "LauxdaCosta2",
sigtype = "Disease/Other Diseases",
tissuetype = "whole blood",
signame_common = NULL,
saveobjs = FALSE,
views = FALSE)
## The assigned signature name is LauxdaCosta2_OD_3
## No alternative signature name was provided
## TBsignatures object updated
## TBcommon object updated
## sigAnnotData object updated
## common_sigAnnotData updated
Since this is a vignette, we set views = FALSE
, but it
will generally be a good thing to keep it set to TRUE
to
ensure that all goes as planned. This will allow you to peruse the
tables/lists and make sure that the signature name looks right, the
signatures are all ordered according to alphabetization of package
assigned names (TBsignatures
ordering is carried over to
alternative name objects), and that correct information was entered.
Note that the sigAnnotData uses some abbreviations and won’t contain
exactly the same information that you put in.
Overwriting the RDS files
Assuming there are no glaring errors and everything looks correct, we
can run the function again to save and overwrite the currently existing
data objects by setting saveobjs = TRUE
. Here, we will
refrain from running the code as this is merely a vignette, but the same
messages will be output, along with a message confirming that the
objects were written to the data folder.
addTBsignature(sigsymbols = c("GBP5", "CD64", "GZMA"),
authname = "LauxdaCosta2",
signame_common = NULL,
sigtype = "Disease/Other Diseases",
tissuetype = "whole blood",
saveobjs = TRUE,
views = FALSE)
With that, the data objects themselves should be successfully
updated. You can take a look at them by running
devtools::load_all()
to update your local repository and
then loading the data objects as data("TBsignatures)
or
data(common_sigAnnotObject)
.
In the case that you wrote to the RDS files by accident and need to
fix them, redownload the appropriate files from the GitHub repository
and try again. Be sure to rerun the devtools::load_all()
command.
Documentation
The documentation is the only place where the source of the signature
will be mentioned in the package; therefore, it is crucial that it be
updated accordingly every time a signature is added to the package. Upon
adding a signature, the only documentation that needs to be added is
that of the list objects, TBsignatures
and
TBcommon
. The table looks a bit messy when viewed raw, but
the finished product looks much nicer (you can pull it up with
?TBsignatures
or ?Tbcommon
).
There is a specific format that needs to be followed for adding an
entry, but you can easily create the table entry by using the function
provided below. The function only requires that you supply the name of
the signature given by addTBsignature()
, the reference
(minus the DOI, that is a separate parameter), and the DOI number
(not the link!).
An example is illustrated below using the LauxdaCosta_OD_3 signature. The file containing documentation should be located in the file R/data.R inside the package file structure.
mkTBreference <- function(TBsigname, ref, DOInum) {
cat("\\item{\\strong{", TBsigname, "}}{: ", ref,
" \\href{http://dx.doi.org/", DOInum, "}{", DOInum, "}}", sep = "")
}
mkTBreference(TBsigname = "LauxdaCosta2_OD_3", # The name output in a message from addTBsignature()
ref = "Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425.",
DOInum = "10.1016/j.tube.2015.04.008")
## \item{\strong{LauxdaCosta2_OD_3}}{: Laux da Costa L, Delcroix M, Dalla Costa ER, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis (Edinb). 2015;95(4):421-425. \href{http://dx.doi.org/10.1016/j.tube.2015.04.008}{10.1016/j.tube.2015.04.008}}
In the case that an alternative publication signature name exists,
mkTBreference
will need to be rerun with the alternative
name as the TBsigname
argument, for use with the
TBcommon
documentation table. Otherwise, the package
assigned signature name can be used for both TBcommon
and
TBsignatures
entries.
Once you have you have the requisite entries, open up data/data.R.
First, find the table for TBsignatures
, and insert the
entry (with the #’ at the beginning like all other entries) at the
appropriate position. All signatures should have the same position in
the table as they do when you look at ordering of the names of the
TBsignatures object (names(TBsignatures)
). Once this is
complete, do the same to the TBcommon
table, which should
be a little further down on the page.
Please make sure to compile the documentation by running
devtools::document()
upon completion of your documentation
edits.
Pull in your changes to the repository
The very last step of this process will be to submit a pull request to the GitHub repository. If you have never submitted a pull request to another repo before, instructions can be found here. When you are submitting your request, please note which signature(s) you added by name and what publications they can be found in, so that we can check that the update was completed successfully. We will try to approve pull requests as soon as possible. After the request is approved, your signature will be part of the package. Thank you for your contribution!
Session Information
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: AlmaLinux 8.9 (Midnight Oncilla)
##
## Matrix products: default
## BLAS/LAPACK: FlexiBLAS NETLIB; LAPACK version 3.11.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] TBSignatureProfiler_1.17.1 testthat_3.2.1.1
## [3] knitr_1.46 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.16.0
## [3] jsonlite_1.8.8 shape_1.4.6.1
## [5] magrittr_2.0.3 magick_2.8.3
## [7] rmarkdown_2.26 GlobalOptions_0.1.2
## [9] fs_1.6.4 zlibbioc_1.50.0
## [11] ragg_1.3.0 vctrs_0.6.5
## [13] memoise_2.0.1 htmltools_0.5.8.1
## [15] S4Arrays_1.4.0 usethis_2.2.3
## [17] Rhdf5lib_1.26.0 rhdf5_2.48.0
## [19] SparseArray_1.4.0 sass_0.4.9
## [21] bslib_0.7.0 htmlwidgets_1.6.4
## [23] desc_1.4.3 plyr_1.8.9
## [25] cachem_1.0.8 mime_0.12
## [27] lifecycle_1.0.4 iterators_1.0.14
## [29] pkgconfig_2.0.3 rsvd_1.0.5
## [31] Matrix_1.7-0 R6_2.5.1
## [33] fastmap_1.1.1 GenomeInfoDbData_1.2.12
## [35] MatrixGenerics_1.16.0 shiny_1.8.1.1
## [37] clue_0.3-65 digest_0.6.35
## [39] colorspace_2.1-0 singscore_1.24.0
## [41] AnnotationDbi_1.66.0 S4Vectors_0.42.0
## [43] DESeq2_1.44.0 rprojroot_2.0.4
## [45] irlba_2.3.5.1 pkgload_1.3.4
## [47] textshaping_0.3.7 GenomicRanges_1.56.0
## [49] RSQLite_2.3.6 beachmat_2.20.0
## [51] fansi_1.0.6 gdata_3.0.0
## [53] httr_1.4.7 abind_1.4-5
## [55] compiler_4.4.0 remotes_2.5.0
## [57] bit64_4.0.5 withr_3.0.0
## [59] doParallel_1.0.17 ROCit_2.1.1
## [61] BiocParallel_1.38.0 DBI_1.2.2
## [63] pkgbuild_1.4.4 HDF5Array_1.32.0
## [65] DelayedArray_0.30.0 sessioninfo_1.2.2
## [67] rjson_0.2.21 ASSIGN_1.40.0
## [69] gtools_3.9.5 tools_4.4.0
## [71] httpuv_1.6.15 glue_1.7.0
## [73] rhdf5filters_1.16.0 promises_1.3.0
## [75] grid_4.4.0 reshape2_1.4.4
## [77] cluster_2.1.6 generics_0.1.3
## [79] gtable_0.3.5 tidyr_1.3.1
## [81] BiocSingular_1.20.0 ScaledMatrix_1.12.0
## [83] utf8_1.2.4 XVector_0.44.0
## [85] BiocGenerics_0.50.0 foreach_1.5.2
## [87] pillar_1.9.0 stringr_1.5.1
## [89] GSVA_1.52.0 limma_3.60.0
## [91] later_1.3.2 circlize_0.4.16
## [93] dplyr_1.1.4 lattice_0.22-6
## [95] bit_4.0.5 annotate_1.82.0
## [97] tidyselect_1.2.1 SingleCellExperiment_1.26.0
## [99] ComplexHeatmap_2.20.0 locfit_1.5-9.9
## [101] Biostrings_2.72.0 miniUI_0.1.1.1
## [103] bookdown_0.39 IRanges_2.38.0
## [105] edgeR_4.2.0 SummarizedExperiment_1.34.0
## [107] stats4_4.4.0 xfun_0.43
## [109] Biobase_2.64.0 statmod_1.5.0
## [111] brio_1.1.5 devtools_2.4.5
## [113] matrixStats_1.3.0 DT_0.33
## [115] stringi_1.8.3 UCSC.utils_1.0.0
## [117] yaml_2.3.8 evaluate_0.23
## [119] codetools_0.2-20 tibble_3.2.1
## [121] graph_1.82.0 BiocManager_1.30.22
## [123] cli_3.6.2 xtable_1.8-4
## [125] systemfonts_1.0.6 munsell_0.5.1
## [127] jquerylib_0.1.4 Rcpp_1.0.12
## [129] GenomeInfoDb_1.40.0 png_0.1-8
## [131] XML_3.99-0.16.1 parallel_4.4.0
## [133] ellipsis_0.3.2 blob_1.2.4
## [135] pkgdown_2.0.9 ggplot2_3.5.1
## [137] profvis_0.3.8 urlchecker_1.0.1
## [139] sparseMatrixStats_1.16.0 SpatialExperiment_1.14.0
## [141] GSEABase_1.66.0 scales_1.3.0
## [143] purrr_1.0.2 crayon_1.5.2
## [145] GetoptLong_1.0.5 rlang_1.1.3
## [147] KEGGREST_1.44.0