Skip to contents

This function is a wrapper for the Rsubread::buildindex function. It will generate one or more Subread indexes from a .fasta file. If the library is too large (default >4GB) it will automatically be split into multiple indexes, with _1, _2, etc at the end of the ref_lib basename.

Usage

mk_subread_index(ref_lib, split = 4, mem = 8000, quiet = TRUE)

Arguments

ref_lib

The name/location of the reference library file, in (uncompressed) .fasta format.

split

The maximum allowed size of the genome file (in GB). If the ref_lib file is larger than this, the function will split the library into multiple parts.

mem

The maximum amount of memory (in MB) that can be used by the index generation process (used by the Rsubread::buildindex function).

quiet

Turns off most messages. Default is TRUE.

Value

Creates one or more Subread indexes for the supplied reference .fasta file. If multiple indexes are created, the libraries will be named the

ref_lib basename + "_1", "_2", etc. The function returns the names of the folders holding these files.

Examples

#### Create a subread index from the example reference library

## Create a temporary directory to store the reference library
ref_temp <- tempfile()
dir.create(ref_temp)

## Download reference genome
out_fasta <- download_refseq('Orthoebolavirus zairense', reference = FALSE,
                             representative = FALSE, out_dir = ref_temp,
                             compress = TRUE, patho_out = FALSE,
                             caching = TRUE)
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/

## Make subread index of reference library
mk_subread_index(out_fasta)
#> 
#>         ==========     _____ _    _ ____  _____  ______          _____  
#>         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
#>           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
#>             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
#>               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
#>         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
#>        Rsubread 2.18.0
#> 
#> //================================= setting ==================================\\
#> ||                                                                            ||
#> ||                Index name : Orthoebolavirus_zairense                       ||
#> ||               Index space : base space                                     ||
#> ||               Index split : no-split                                       ||
#> ||          Repeat threshold : 100 repeats                                    ||
#> ||              Gapped index : no                                             ||
#> ||                                                                            ||
#> ||       Free / total memory : 197.8GB / 251.3GB                              ||
#> ||                                                                            ||
#> ||               Input files : 1 file in total                                ||
#> ||                             o Orthoebolavirus_zairense.fasta.gz            ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> //================================= Running ==================================\\
#> ||                                                                            ||
#> || Check the integrity of provided reference sequences ...                    ||
#> || No format issues were found                                                ||
#> || Scan uninformative subreads in reference sequences ...                     ||
#> || Estimate the index size...                                                 ||
#> ||    8%,   0 mins elapsed, rate=22.9k bps/s                                  ||
#> ||   16%,   0 mins elapsed, rate=45.7k bps/s                                  ||
#> ||   24%,   0 mins elapsed, rate=68.2k bps/s                                  ||
#> ||   33%,   0 mins elapsed, rate=90.7k bps/s                                  ||
#> ||   41%,   0 mins elapsed, rate=113.0k bps/s                                 ||
#> ||   49%,   0 mins elapsed, rate=135.0k bps/s                                 ||
#> ||   58%,   0 mins elapsed, rate=157.0k bps/s                                 ||
#> ||   66%,   0 mins elapsed, rate=178.7k bps/s                                 ||
#> ||   74%,   0 mins elapsed, rate=200.4k bps/s                                 ||
#> || 3.0 GB of memory is needed for index building.                             ||
#> || Build the index...                                                         ||
#> ||    8%,   0 mins elapsed, rate=1.8k bps/s                                   ||
#> ||   16%,   0 mins elapsed, rate=3.5k bps/s                                   ||
#> ||   24%,   0 mins elapsed, rate=5.3k bps/s                                   ||
#> ||   33%,   0 mins elapsed, rate=7.0k bps/s                                   ||
#> ||   41%,   0 mins elapsed, rate=8.8k bps/s                                   ||
#> ||   49%,   0 mins elapsed, rate=10.5k bps/s                                  ||
#> ||   58%,   0 mins elapsed, rate=12.3k bps/s                                  ||
#> ||   66%,   0 mins elapsed, rate=14.0k bps/s                                  ||
#> ||   74%,   0 mins elapsed, rate=15.8k bps/s                                  ||
#> || Save current index block...                                                ||
#> ||  [ 0.0% finished ]                                                         ||
#> ||  [ 10.0% finished ]                                                        ||
#> ||  [ 20.0% finished ]                                                        ||
#> ||  [ 30.0% finished ]                                                        ||
#> ||  [ 40.0% finished ]                                                        ||
#> ||  [ 50.0% finished ]                                                        ||
#> ||  [ 60.0% finished ]                                                        ||
#> ||  [ 70.0% finished ]                                                        ||
#> ||  [ 80.0% finished ]                                                        ||
#> ||  [ 90.0% finished ]                                                        ||
#> ||  [ 100.0% finished ]                                                       ||
#> ||                                                                            ||
#> ||                      Total running time: 0.1 minutes.                      ||
#> ||Index /scratch/261666.1.ood/Rtmp4IgmVN/file3f4b0e4df919da/Orthoebolavi ... ||
#> ||                                                                            ||
#> \\============================================================================//
#> 
#> [1] "/scratch/261666.1.ood/Rtmp4IgmVN/file3f4b0e4df919da/Orthoebolavirus_zairense_1"
unlink(ref_temp)