Skip to contents

After a sample is aligned to a target library with align_target_bowtie(), we may use filter_host_bowtie() to remove unwelcome host contamination using filter reference libraries. This function takes as input the name of the .bam file produced via align_target_bowtie(), and produces a sorted .bam or .csv.gz file with any reads that match the filter libraries removed. This resulting .bam file may be used downstream for further analysis. This function uses Rbowtie2 For the Rsubread equivalent of this function, see filter_host.

Usage

filter_host_bowtie(
  reads_bam,
  lib_dir,
  libs,
  make_bam = FALSE,
  output = paste(tools::file_path_sans_ext(reads_bam), "filtered", sep = "."),
  bowtie2_options = NULL,
  YS = 1e+05,
  threads = 1,
  overwrite = FALSE,
  quiet = TRUE
)

Arguments

reads_bam

The name of a merged, sorted .bam file that has previously been aligned to a reference library. Likely, the output from running an instance of align_target_bowtie().

lib_dir

Path to the directory that contains the filter Bowtie2 index files.

libs

The basename of the filter libraries (without .bt2 or .bt2l extension).

make_bam

Logical, whether to also output a bam file with host reads filtered out. A .csv.gz file will be created instead if FALSE. Creating a bam file is costly on resources over creating a compressed csv file with only relevant information, so default is FALSE.

output

The desired name of the output .bam or .csv.gz file. Extension is automatically defined by whether make_bam = TRUE. Default is the basename of unfiltered_bam + .filtered + extension.

bowtie2_options

Optional: Additional parameters that can be passed to the filter_host_bowtie() function. To see all the available parameters use Rbowtie2::bowtie2_usage(). See Details for default parameters. NOTE: Users should pass all their parameters as one string and if optional parameters are given then the user is responsible for entering all the parameters to be used by Bowtie2. The only parameters that should NOT be specified here is the threads.

YS

yieldSize, an integer. The number of alignments to be read in from the bam file at once for chunked functions. Default is 100000.

threads

The amount of threads available for the function. Default is 1 thread.

overwrite

Whether existing files should be overwritten. Default is FALSE.

quiet

Turns off most messages. Default is TRUE.

Value

The name of a filtered, sorted .bam file written to the user's current working directory. Or, if make_bam = FALSE, a .csv.gz file containing a data frame of only requisite information to run

metascope_id().

Details

A compressed .csv can be created to produce a smaller output file that is created more efficiently and is still compatible with metascope_id().

The default parameters are the same that PathoScope 2.0 uses. "--very-sensitive-local -k 100 --score-min L,20,1.0"

Examples

#### Filter reads from bam file that align to any of the filter libraries

## Assuming a bam file has already been created with align_target_bowtie()
# Create temporary filter library
filter_ref_temp <- tempfile()
dir.create(filter_ref_temp)

## Download reference genome
MetaScope::download_refseq("Orthoebolavirus zairense",
                           reference = FALSE,
                           representative = FALSE,
                           compress = TRUE,
                           out_dir = filter_ref_temp,
                           caching = TRUE)
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> [1] "/scratch/5213039.1.cbm.q/RtmppsQl1S/file3a8250397b307/Orthoebolavirus_zairense.fasta.gz"

## Create temp directory to store the indices
index_temp <- tempfile()
dir.create(index_temp)

## Create filter index
MetaScope::mk_bowtie_index(
  ref_dir = filter_ref_temp,
  lib_dir = index_temp,
  lib_name = "filter",
  overwrite = TRUE
)
#> arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only
#> Index building complete
#> [1] "/scratch/5213039.1.cbm.q/RtmppsQl1S/file3a8250396f9aa6"

## Create temporary folder to hold final output file
output_temp <- tempfile()
dir.create(output_temp)

## Get path to example bam
bamPath <- system.file("extdata", "bowtie_target.bam",
                       package = "MetaScope")
target_copied <- file.path(output_temp, "bowtie_target.bam")
file.copy(bamPath, target_copied)
#> [1] TRUE

## Align and filter reads
filter_out <-
  filter_host_bowtie(
    reads_bam = target_copied,
    lib_dir = index_temp,
    libs = "filter",
    threads = 1
  )
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> [1] "Samtools found on system. Using samtools to create bam file"
#> arguments 'show.output.on.console', 'minimized' and 'invisible' are for Windows only
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

## Remove temporary directories
unlink(filter_ref_temp, recursive = TRUE)
unlink(index_temp, recursive = TRUE)
unlink(output_temp, recursive = TRUE)