This function allows the user to check a subset of identified reads against NCBI BLAST and the nucleotide database to confirm or contradict results provided by MetaScope. It aligns the top `metascope_id()` results to NCBI BLAST database. It REQUIRES that command-line BLAST and a separate nucleotide database have already been installed on the host machine. It returns a csv file updated with BLAST result metrics.
Usage
metascope_blast(
metascope_id_path,
bam_file_path = list.files(tmp_dir, ".updated.bam$", full.names = TRUE)[1],
tmp_dir,
out_dir,
sample_name,
fasta_dir = NULL,
num_results = 10,
num_reads = 100,
hit_list = 10,
num_threads = 1,
db_path,
quiet = FALSE,
NCBI_key = NULL,
db = NULL,
accessions_path = NULL
)
Arguments
- metascope_id_path
Character string; path to a csv file output by `metascope_id()`.
- bam_file_path
Character string; full path to bam file for the same sample processed by `metascope_id`. Note that the `filter_bam` function must have output a bam file, and not a .csv.gz file. See `?filter_bam_bowtie` for more details. Defaults to
list.files(file_temp, ".updated.bam$")[1]
.- tmp_dir
Character string, a temporary directory in which to host files.
- out_dir
Character string, path to output directory.
- sample_name
Character string, sample name for output files.
- fasta_dir
Directory where fasta files for blast will be stored.
- num_results
Integer, the maximum number of taxa from the metascope_id output to check reads. Takes the top n taxa, i.e. those with largest abundance. Default is 10.
- num_reads
Integer, the maximum number of reads to blast per microbe. If the true number of reads assigned to a given taxon is smaller, then the smaller number will be chosen. Default is 100. Too many reads will involve more processing time.
- hit_list
Integer, number of blast hit results to keep. Default is 10.
- num_threads
Integer, number of threads if running in parallel (recommended). Default is 1.
- db_path
Character string. The database file to be searched (including basename, but without file extension). For example, if the nt database is in the nt folder, this would be /filepath/nt/nt assuming that the database files have the nt basename. Check this path if you get an error message stating "No alias or index file found".
- quiet
Logical, whether to print out more informative messages. Default is FALSE.
- NCBI_key
(character) NCBI Entrez API key. optional. See taxize::use_entrez(). Due to the high number of requests made to NCBI, this function will be less prone to errors if you obtain an NCBI key.
- db
Currently accepts one of
c("ncbi", "silva", "other")
Default is"ncbi"
, appropriate for samples aligned against indices compiled from NCBI whole genome databases. Alternatively, usage of an alternate database (like Greengenes2) should be specified with"other"
.- accessions_path
Directory where accession files for blast are stored.
Details
This function assumes that you used the NCBI nucleotide database to process samples, with a download date of 2021 or later. This is necessary for compatibility with the bam file headers.
This is highly computationally intensive and should be ran with multiple cores, submitted as a multi-threaded computing job if possible.
Note, if best_hit_strain is FALSE, then no strain was observed more often among the BLAST results.
Examples
if (FALSE) {
### Create temporary directory
file_temp <- tempfile()
dir.create(file_temp)
bamPath <- system.file("extdata", "bowtie_target.bam",
package = "MetaScope")
file.copy(bamPath, file_temp)
metascope_id(file.path(file_temp, "bowtie_target.bam"), aligner = "bowtie2",
input_type = "bam", out_dir = file_temp, num_species_plot = 0,
update_bam = TRUE)
## Run metascope blast
### Get export name and metascope id results
out_base <- bamPath %>% base::basename() %>% strsplit(split = "\\.") %>%
magrittr::extract2(1) %>% magrittr::extract(1)
metascope_id_path <- file.path(file_temp, paste0(out_base, ".metascope_id.csv"))
# NOTE: change db_path to the location where your BLAST database is stored!
db <- "/restricted/projectnb/pathoscope/data/blastdb/nt/nt"
Sys.setenv(ENTREZ_KEY = "<your id here>")
metascope_blast(metascope_id_path,
bam_file_path = file.path(file_temp, "bowtie_target.bam"),
#bam_file_path = file.path(file_temp, "bowtie_target.updated.bam")
tmp_dir = file_temp,
out_dir = file_temp,
sample_name = out_base,
db_path = db_path,
num_results = 10,
num_reads = 5,
hit_list = 10,
num_threads = 3,
db = "ncbi",
quiet = FALSE,
NCBI_key = NULL,
fasta_dir = NULL ,
accessions_path = NULL)
## Remove temporary directory
unlink(file_temp, recursive = TRUE)
}