Welcome to the Johnson Lab Software Page
Check out the Johnson Lab Homepage: https://www.wejlab.org
GitHub Links
Our lab’s software projects can be found at the following GitHub repositories:
Some of our most popular projects:
Combining Multiple Batches of -omic Data(ComBat, Combat-Seq, BatchQC): Our tools for batch correction are highly cited, widely-used, and commonly downloaded (107,477 package downloads in 2020; top 3% of all packages in R/Bioconductor).
- ComBat is a software for reducing batch effects when combining genomic data from different labs, experiments, or hybridization batches, or technology platforms. It utilizes an empirical Bayesian linear modeling approach to robustly account for technical variability across multiple high-thoughput studies. ComBat is available within the sva package. Bioconductor sva package / Publication / GitHub sva
- ComBat-Seq uses a Negative Binomial error model to extend ComBat to sequencing-based experiments and is available in the sva package. Bioconductor sva package / Publication / GitHub sva
- BatchQC is a user-interface for interactive evaluation of batch effects in -omic data. Bioconductor / Publication / GitHub
- These tools are highly cited, widely-used, and commonly downloaded (107,477 package downloads in 2020; top 3% of all packages in R/Bioconductor).
Metagenomic profiling and multi-sample analysis (PathoScope, MetaScope, animalcules, LegATo):
- PathoScope 2.0 is a complete bioinformatics framework for the metagenomic analysis of data from clinical or environmental sequencing samples. PathoScope includes modules for reference genome library extraction and indexing, read quality control and alignment, strain identification, and annotation of results. PathoScope 2.0 Wiki / Publication / GitHub
- animalcules is an R package for processing multi-sample metagenomic data and is specially designed for integration with PathoScope and MetaScope outputs with flexibility for other 16S and meta-omics pipeline outputs. The package provides an easy-to-use interactive microbiome analysis framework as a standalone software package or an interactive R Shiny application. animalcules docs / Bioconductor / Publication / GitHub
- MetaScope is an R-based 16S, metagenomic, and metatranscriptomic profiling package that can accurately identify the composition of microbes at a strain-level resolution within a sample. MetaScope can be considered as a highly updated and expanded R translation of PathoScope 2.0. MetaScope docs / Bioconductor / GitHub
- LegATo is an R package suite of open-source software tools for longitudinal microbiome analysis. It integrates visualization, modeling and testing procedures extendable to several different study forms with optimal ease-of-use for researchers. LegATo docs / GitHub
Signature scoring, curation, and validation in TB research (TBSignatureProfiler, curatedTBData, ASSIGN):
- The TBSignatureProfiler R package enables analysis of RNA-seq data using 70+ included gene signatures for tuberculosis disease presence, risk, progression, treatment failure, and other states. In-package signature profiling is available using common gene set enrichment tools that include GSVA, singscore, and ssGSEA. TBSignatureProfiler docs / Bioconductor / Publication / GitHub
- The curatedTBData R package is an effort to compile and harmonize data from more than 49 datasets with more than 4,000 samples. The curatedTBData can be combined with the TBSignatureProfiler to generate or validate new signatures, evaluate existing signatures on subsets or to provide data for other projects. Bioconductor / Publication / Github
- ASSIGN utilizes Bayesian factor regression model to identify genomic biomarkers for applications in pathway profiling, drug responsiveness, environmental exposure, and infectious disease diagnosis. Bioconductor / Publication / GitHub
Tools and Workflows for Single Cell RNA-seq Analysis. (singleCellTK):
- The singleCellTK is an NCI-funded project to construct a comprehensive and interactive R-software framework for complete data processing and analysis of single cell RNA-sequencing data from heterogeneous tumor samples. We have developed the singleCellTK with an R/Shiny user interface that enables interactive analysis and visualization in the data.
Single and Multi-Channel Array Normalization and barcoding (SCAN-UPC, MAT, MA2C):
- SCAN is a microarray normalization method that removes background noise using only data from within each array individually, therefore facilitating applications in precision medicine.
- UPC utilizes a similar modeling approach to produce barcodes that estimate gene activity in data from microarray and RNA-sequencing platforms.
- MAT is designed for the analysis of data from Affymetrix tiling microarrays. The MA2C software is a similar approach but designed to analyze data from two-color tiling arrays.
Genomic Next-generation Universal Mapper (GNUMAP):
- GUNUMAP is a software suite for aligning next sequencing data from DNA-seq, BS-seq, and RNA-seq (including small RNAs, RNA editing) experiments. It uses a highly accurate probabilistic alignment approach that incorporates base uncertainty into the alignment algorithm.