Assign species using BLAST — blast_assign

This is to be used alongside a hierarchial classifier such as IDTAXA or RDP to assign additional species level matches. This is designed to be a more flexible version of dada2's assignSpecies function

blast_assign_species(
  query,
  db,
  type = "blastn",
  identity = 97,
  coverage = 95,
  evalue = 1e+06,
  max_target_seqs = 5,
  max_hsp = 5,
  ranks = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
  delim = ";",
  args = NULL,
  quiet = FALSE,
  remove_db_gaps = TRUE
)

Arguments

query: (Required) Query sequence. Accepts a DNABin object, DNAStringSet object, Character string, or filepath.
db: (Required) Reference sequences to conduct search against. Accepts a DNABin object, DNAStringSet object, Character string, or filepath. If DNAbin, DNAStringSet or character string is provided, a temporary fasta file is used to construct BLAST database
type: (Required) type of search to conduct, default 'blastn'
identity: (Required) Minimum percent identity cutoff. Note that this is calculated using all alignments for each query-subject match.
coverage: (Required) Minimum percent query coverage cutoff. Note that this is calculated using all alignments for each query-subject match.
evalue: (Required) Minimum expect value (E) for saving hits
max_target_seqs: (Required) Number of aligned sequences to keep. Even if you are only looking for 1 top hit keep this higher for calculations to perform properly.
max_hsp: (Required) Maximum number of HSPs (alignments) to keep for any single query-subject pair.
ranks: (Required) The taxonomic ranks contained in the fasta headers
delim: (Required) The delimiter between taxonomic ranks in fasta headers
args: (Optional) Extra arguments passed to BLAST
quiet: (Optional) Whether progress should be printed to console, default is FALSE
remove_db_gaps: Whether gaps should be removed from the fasta file used for the database. Note that makeblastdb can fail if there are too many gaps in the sequence.