The maxaccepts and maxrejects options
The termination options -maxaccepts and -maxrejects are supported by most search and clustering commands. These options cause the search for a given query sequence to stop if a given number of accepts (target sequences that meet the
accept criteria
) or rejects (target sequences that were processed but failed to meet those criteria) have occurred. Early search termination can give dramatic improvements in speed, often with minimal or no cost in sensitivity. See
USEARCH algorithm
for discussion of why "U-sorting" with termination is an effective speed optimization.
Other termination options
-termid terminate search when a target identity drops below the given value, specified as a fractional identity in range 0.0 to 1.0.
-termidd terminate when the difference (maxid - minid) exceeds the given value, when maxid (minid) is the maximum (minimum) identity found so far.
Comprehensive search
Roughly speaking, a search of the complete database is specified by disabling the maxaccepts and maxrejects termination options. This is done by setting -maxaccepts 0 -maxrejects 0. This is the default for the
ublast command
, but not for clustering and search based on the USEARCH algorithm. See table below for default values for each command. However, this is not strictly true: with commands based on the USEARCH and UBLAST algorithms, a database sequence will not be aligned if it has no words (or seeds) in common with the query sequence. For a truly comprehensive search, use
search_global
or
search_local
.
Discussion
Termination conditions are combined with OR, so the first one to be satisfied causes the search to stop. (Unlike accept criteria, which are combined with AND).
By default, termination options are enabled only for clustering and search commands based on the USEARCH algorithm. This is because USEARCH tests database sequences (targets) in order of decreasing number of words in common between the query and target sequence. This order correlates well with sequence similarity, so the best hit(s) are likely to be found quickly.
With ublast , search_local and search_global , targets are compared to the query in an order that does not correlate with sequence similarity or E-value. With these commands, the first accepted hit is not expected to be close to the best possible hit. However, termination options can still be useful; see weak hits for discussion and examples.
If maxaccepts is set to a value > 1, then more than one hit may be reported per query. In this case, it is usually recommended to increase maxrejects also, because it will often be necessary to search further into the list of candidate target sequences to find more than one hit.
The maxaccepts and maxrejects options can be used to tune speed against sensitivity. Smaller values of both parameters tend to improve speed by reducing the number of alignments that must be computed per query. For example, with cluster_fast , the default value of maxrejects is reduced from 32 to 8 in order to achieve higher speed. Increasing either value tends to result in slower execution because more alignments must be computed. Increasing maxrejects tends to improve sensitivity by reducing the number of false negatives, i.e. target sequences that would be accepted but are not tested because they are too far down the list in word-count order.
With translated searches , termination conditions apply to each ORF separately. This is because the nucleotide query sequence might span more than one gene.
Command | Uses USEARCH ? |
maxaccepts default
(0 = disabled) |
maxrejects default
(0 = disabled) |
usearch_global | Yes | 1 | 32 |
usearch_local | Yes | 1 | 32 |
cluister_smallmem | Yes | 1 | 32 |
cluster_fast | Yes | 1 | 8 |
ublast | No | 0 | 0 |
search_local | No | 0 | 0 |
search_global | No | 0 | 0 |
otutab , closed_ref | Yes | 4 (v10), 8 (v11) | 64 (v10), 256 (v11) |