Function for demultiplexing sequencing reads arranged in a common format provided by sequencers (such as Illumina) generally for 16S data. This function takes a matrix of sample names/barcodes, a .fastq file of barcodes by sequence header, and a .fastq file of reads corresponding to the barcodes. Based on the barcodes given, the function extracts all reads for the indexed barcode and writes all the reads from that barcode to separate .fastq files.
Usage
meta_demultiplex(
barcodeFile,
indexFile,
readFile,
rcBarcodes = TRUE,
location = NULL,
threads = 1,
hammingDist = 0,
quiet = TRUE
)
Arguments
- barcodeFile
Path to a file containing a .tsv matrix with a header row, and then sample names (column 1) and barcodes (column 2).
- indexFile
Path to a .fastq file that contains the barcodes for each read. The headers should be the same (and in the same order) as
readFile
, and the sequence in theindexFile
should be the corresponding barcode for each read. Quality scores are not considered.- readFile
Path to the sequencing read .fastq file that corresponds to the
indexFile
.- rcBarcodes
Should the barcode indexes in the barcodeFile be reverse complemented to match the sequences in the
indexFile
? Defaults toTRUE
.- location
A directory location to store the demultiplexed read files. Defaults to generate a new temporary directory.
- threads
The number of threads to use for parallelization (BiocParallel). This function will parallelize over the barcodes and extract reads for each barcode separately and write them to separate demultiplexed files.
- hammingDist
Uses a Hamming Distance or number of base differences to allow for inexact matches for the barcodes/indexes. Defaults to
0
. Warning: if the Hamming Distance is>=1
and this leads to inexact index matches to more than one barcode, that read will be written to more than one demultiplexed read files.- quiet
Turns off most messages. Default is
TRUE
.
Value
Returns multiple .fastq files that contain all reads whose index matches the barcodes given. These files will be written to the location directory, and will be named based on the given sampleNames and barcodes, e.g. './demultiplex_fastq/SampleName1_GGAATTATCGGT.fastq.gz'
Examples
## Get barcode, index, and read data locations
barcodePath <- system.file("extdata", "barcodes.txt", package = "MetaScope")
indexPath <- system.file("extdata", "virus_example_index.fastq",
package = "MetaScope")
readPath <- system.file("extdata", "virus_example.fastq",
package = "MetaScope")
## Demultiplex
demult <- meta_demultiplex(barcodePath, indexPath, readPath, rcBarcodes = FALSE,
hammingDist = 2)
#> Warning: metadata columns on input DNAStringSet object were dropped
demult
#> SampleName Barcode NumberOfReads
#> 1 CDV TCCACGT 25
#> 2 LaCrosse ACAGGCT 25
#> 3 RSV ATCGTGC 25
#> 4 EboV ACTACAG 25
#> 5 Measles AAGTCGC 25
#> 6 VSV TCTCAGG 25