Count the number of base lengths in a CIGAR string for a given operation
Source:R/metascope_id.R
count_matches.Rd
The 'CIGAR' (Compact Idiosyncratic Gapped Alignment Report) string is how the
SAM/BAM format represents spliced alignments. This function will accept a
CIGAR string for a single read and a single character indicating the
operation to be parsed in the string. An operation is a type of column that
appears in the alignment, e.g. a match or gap. The integer following the
operator specifies a number of consecutive operations. The
count_matches()
function will identify all occurrences of the operator
in the string input, add them, and return an integer number representing the
total number of operations for the read that was summarized by the input
CIGAR string.
Arguments
- x
Character. A CIGAR string for a read to be parsed. Examples of possible operators include "M", "D", "I", "S", "H", "=", "P", and "X".
- char
A single letter representing the operation to total for the given string.
Value
an integer number representing the total number of alignment operations for the read that was summarized by the input CIGAR string.
Details
This function is best used on a vector of CIGAR strings using an apply function (see examples).
Examples
# A single cigar string: 3M + 3M + 5M
cigar1 <- "3M1I3M1D5M"
count_matches(cigar1, char = "M")
#> [1] 11
# Parse with operator "P": 2P
cigar2 <- "4M1I2P9M"
count_matches(cigar2, char = "P")
#> [1] 2
# Apply to multiple strings: 1I + 1I + 5I
cigar3 <- c("3M1I3M1D5M", "4M1I1P9M", "76M13M5I")
vapply(cigar3, count_matches, char = "I",
FUN.VALUE = numeric(1))
#> 3M1I3M1D5M 4M1I1P9M 76M13M5I
#> 1 1 5