Skip to contents

This is the main MetaScope target library mapping function, using Rsubread and multiple libraries. Aligns to each library separately, filters unmapped reads from each file, and then merges and sorts the .bam files from each library into one output file. If desired, output can be passed to `filter_host()` to remove reads that also map to filter library genomes.

Usage

align_target(
  read1,
  read2 = NULL,
  lib_dir = NULL,
  libs,
  threads = 1,
  align_file = tools::file_path_sans_ext(read1),
  subread_options = align_details,
  quiet = TRUE
)

Arguments

read1

Path to the .fastq file to align.

read2

Optional: Location of the mate pair .fastq file to align.

lib_dir

Path to the index files for all libraries.

libs

A vector of character strings giving the basenames of the Subread index files for alignment. If ALL indices to be used are located in the current working directory, set lib_dir = NULL. Default is lib_dir = NULL.

threads

The number of threads that can be utilized by the function. Default is 1 thread.

align_file

The basename of the output alignment file (without trailing .bam extension).

subread_options

A named list specifying alignment parameters for the Rsubread::align() function, which is called inside align_target(). Elements should include type, nthreads, maxMismatches, nsubreads, phredOffset, unique, and nBestLocations. Descriptions of these parameters are available under ?Rsubread::align. Defaults to the global align_details object.

quiet

Turns off most messages. Default is TRUE.

Value

This function writes a merged and sorted .bam file after aligning to all reference libraries given, along with a summary report file, to the user's working directory. The function also outputs the new .bam filename.

Examples

#### Align example reads to an example reference library using Rsubread
# \donttest{
## Create temporary directory
target_ref_temp <- tempfile()
dir.create(target_ref_temp)

## Download genome
tax <- "Ovine atadenovirus D"
all_ref <- MetaScope::download_refseq(tax,
                                      reference = FALSE,
                                      representative = FALSE,
                                      compress = TRUE,
                                      out_dir = target_ref_temp,
                                      caching = TRUE)
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/

## Create subread index
ind_out <- mk_subread_index(all_ref)
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.14.2
#> 
#> //================================= setting ==================================\\
#> ||                                                                            ||
#> ||                Index name : Ovine_atadenovirus_D                           ||
#> ||               Index space : base space                                     ||
#> ||               Index split : no-split                                       ||
#> ||          Repeat threshold : 100 repeats                                    ||
#> ||              Gapped index : no                                             ||
#> ||                                                                            ||
#> ||       Free / total memory : 239.7GB / 251.3GB                              ||
#> ||                                                                            ||
#> ||               Input files : 1 file in total                                ||
#> ||                             o Ovine_atadenovirus_D.fasta.gz                ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Check the integrity of provided reference sequences ...                    ||
#> || No format issues were found                                                ||
#> || Scan uninformative subreads in reference sequences ...                     ||
#> || Estimate the index size...                                                 ||
#> ||    8%,   0 mins elapsed, rate=36.2k bps/s                                  ||
#> ||   16%,   0 mins elapsed, rate=72.0k bps/s                                  ||
#> ||   24%,   0 mins elapsed, rate=107.5k bps/s                                 ||
#> ||   33%,   0 mins elapsed, rate=142.5k bps/s                                 ||
#> ||   41%,   0 mins elapsed, rate=177.2k bps/s                                 ||
#> ||   49%,   0 mins elapsed, rate=211.6k bps/s                                 ||
#> ||   58%,   0 mins elapsed, rate=245.6k bps/s                                 ||
#> ||   66%,   0 mins elapsed, rate=279.4k bps/s                                 ||
#> ||   74%,   0 mins elapsed, rate=312.7k bps/s                                 ||
#> ||   83%,   0 mins elapsed, rate=345.7k bps/s                                 ||
#> || 3.0 GB of memory is needed for index building.                             ||
#> || Build the index...                                                         ||
#> ||    8%,   0 mins elapsed, rate=2.5k bps/s                                   ||
#> ||   16%,   0 mins elapsed, rate=5.1k bps/s                                   ||
#> ||   24%,   0 mins elapsed, rate=7.6k bps/s                                   ||
#> ||   33%,   0 mins elapsed, rate=10.1k bps/s                                  ||
#> ||   41%,   0 mins elapsed, rate=12.6k bps/s                                  ||
#> ||   49%,   0 mins elapsed, rate=15.2k bps/s                                  ||
#> ||   58%,   0 mins elapsed, rate=17.7k bps/s                                  ||
#> ||   66%,   0 mins elapsed, rate=20.2k bps/s                                  ||
#> ||   74%,   0 mins elapsed, rate=22.7k bps/s                                  ||
#> ||   83%,   0 mins elapsed, rate=25.2k bps/s                                  ||
#> || Save current index block...                                                ||
#> ||  [ 0.0% finished ]                                                         ||
#> ||  [ 10.0% finished ]                                                        ||
#> ||  [ 20.0% finished ]                                                        ||
#> ||  [ 30.0% finished ]                                                        ||
#> ||  [ 40.0% finished ]                                                        ||
#> ||  [ 50.0% finished ]                                                        ||
#> ||  [ 60.0% finished ]                                                        ||
#> ||  [ 70.0% finished ]                                                        ||
#> ||  [ 80.0% finished ]                                                        ||
#> ||  [ 90.0% finished ]                                                        ||
#> ||  [ 100.0% finished ]                                                       ||
#> ||                                                                            ||
#> ||                      Total running time: 0.1 minutes.                      ||
#> ||Index /scratch/5213039.1.cbm.q/RtmppsQl1S/file3a82502befd492/Ovine_ata ... ||
#> ||                                                                            ||
#> \\============================================================================//
#> 

## Get path to example reads
readPath <- system.file("extdata", "reads.fastq",
                        package = "MetaScope")
## Copy the example reads to the temp directory
refPath <- file.path(target_ref_temp, "reads.fastq")
file.copy(from = readPath, to = refPath)
#> [1] TRUE

## Modify alignment parameters object
data("align_details")
align_details[["type"]] <- "rna"
align_details[["phredOffset"]] <- 50
# Just to get it to align - toy example!
align_details[["maxMismatches"]] <- 100

## Run alignment
target_map <- align_target(refPath,
                           libs = stringr::str_replace_all(tax, " ", "_"),
                           lib_dir = target_ref_temp,
                           subread_options = align_details)
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.14.2
#> 
#> //================================= setting ==================================\\
#> ||                                                                            ||
#> || Function      : Read alignment (RNA-Seq)                                   ||
#> || Input file    : reads.fastq                                                ||
#> || Output file   : reads.Ovine_atadenovirus_D.bam (BAM)                       ||
#> || Index name    : Ovine_atadenovirus_D                                       ||
#> ||                                                                            ||
#> ||                    ------------------------------------                    ||
#> ||                                                                            ||
#> ||                               Threads : 1                                  ||
#> ||                          Phred offset : 64                                 ||
#> ||                             Min votes : 3 / 10                             ||
#> ||                        Max mismatches : 100                                ||
#> ||                      Max indel length : 5                                  ||
#> ||            Report multi-mapping reads : yes                                ||
#> || Max alignments per multi-mapping read : 16                                 ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //=============== Running (01-Mar-2024 22:24:07, pid=3834448) ================\\
#> ||                                                                            ||
#> || Check the input reads.                                                     ||
#> || The input file contains base space reads.                                  ||
#> || Initialise the memory objects.                                             ||
#> || Estimate the mean read length.                                             ||
#> || The range of Phred scores observed in the data is [-14,-14]                ||
#> || Create the output BAM file.                                                ||
#> || Check the index.                                                           ||
#> || Init the voting space.                                                     ||
#> || Global environment is initialised.                                         ||
#> || Load the 1-th index block...                                               ||
#> || The index block has been loaded.                                           ||
#> || Start read mapping in chunk.                                               ||
#> ||                                                                            ||
#> ||                           Completed successfully.                          ||
#> ||                                                                            ||
#> \\====================================    ====================================//
#> 
#> //================================   Summary =================================\\
#> ||                                                                            ||
#> ||                 Total reads : 1,500                                        ||
#> ||                      Mapped : 0 (0.0%)                                     ||
#> ||             Uniquely mapped : 0                                            ||
#> ||               Multi-mapping : 0                                            ||
#> ||                                                                            ||
#> ||                    Unmapped : 1,500                                        ||
#> ||                                                                            ||
#> ||                      Indels : 0                                            ||
#> ||                                                                            ||
#> ||                Running time : 1.0 minutes                                  ||
#> ||                                                                            ||
#> \\============================================================================//
#> 

## Remove temporary folder
unlink(target_ref_temp, recursive = TRUE)
# }