This function will automatically download RefSeq genome libraries in a fasta
format from the specified taxon. The function will first download the
summary report at:
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/**kingdom**/assembly_summary.txt
,
and then use this file to download the genome(s) and combine them in a
single compressed or uncompressed .fasta file.
Usage
download_refseq(
taxon,
reference = TRUE,
representative = FALSE,
compress = TRUE,
patho_out = FALSE,
out_dir = NULL,
caching = FALSE,
quiet = TRUE
)
Arguments
- taxon
Name of single taxon to download. The taxon name should be a recognized NCBI scientific or common name, with no grammatical or capitalization inconsistencies. All available taxonomies are visible by accessing the
MetaScope:::taxonomy_table
object included in the package.- reference
Download only RefSeq reference genomes? Defaults to
TRUE
. Automatically set toTRUE
ifrepresentative = TRUE
.- representative
Download RefSeq representative and reference genomes? Defaults to
FALSE
. IfTRUE
, reference is automatically set atTRUE
.- compress
Compress the output .fasta file? Defaults to
TRUE
.- patho_out
Create duplicate outpute files compatible with PathoScope? Defaults to
FALSE
.- out_dir
Character string giving the name of the directory to which libraries should be output. Defaults to creation of a new temporary directory.
- caching
Whether to use BiocFileCache when downloading genomes. Default is
FALSE
.- quiet
Turns off most messages. Default is
TRUE
.
Value
Returns a .fasta or .fasta.gz file of the desired RefSeq genomes.
This file is named after the kingdom selected and saved to the current
directory (e.g. 'bacteria.fasta.gz'). This function also has the option
to return a .fasta file formatted for PathoScope as well
(e.g. bacteria.pathoscope.fasta.gz') if path_out = TRUE
.
Details
When selecting the taxon
to be downloaded, if you receive an error
saying Your input is not a valid taxon
, please take a look at the
taxonomy_table
object, which can be accessed with the command
MetaScope:::taxonomy_table)
. Only taxa with exact spelling as they appear
at any level of the table will be acknowledged.
Examples
#### Download RefSeq genomes
## Download all RefSeq reference Bovismacovirus genus genomes
download_refseq('Bovismacovirus', reference = FALSE, representative = FALSE,
out_dir = NULL, compress = TRUE, patho_out = FALSE,
caching = TRUE)
#> No ENTREZ API key provided
#> Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> No ENTREZ API key provided
#> Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> [1] "/scratch/290807.1.ood/RtmpsBpExV/file3b921a3fae8bb8/Bovismacovirus.fasta.gz"