See also
Defining and interpreting OTUs
UPARSE OTUs with radius different from 3% (i.e., different from 97% identity)
In previous versions of USEARCH the
cluster_otus command
command had an otu_radius_pct option for specifying a radius different from the default of 3%. However, please note that it is
not recommended
to use non-default values.
The main reason is that chimera detection degrades. Each input sequence is run through
UPARSE-REF
using the current set of OTUs as a reference database. If the optimal model is chimeric, the sequence is discarded. If an OTU radius > 3% is used, then chimera detection becomes more difficult because more true biological sequences will also be discarded when they don't create new OTUs. The set of OTU sequences becomes sparser, and the correct parents of a chimera will more often be missing from the OTU database. Chimeras can still be detected when there are OTUs which are sufficiently close to their parents, but the false negative rate will tend to increase.
Chimera detection also gets more difficult when the OTU radius is <3%. This is because you get many more false positives due to "fake models" where a correct biological sequence can be exactly reconstructed from segments of two other valid sequences. This surprising result is explained in detail in the
UCHIME2 paper
.
Recommended: make OTUs with 100% clustering identity
My current recommendation is to use the
UNOISE
error-correction (denoising) algorithm to reconstruct the set of correct biological sequences in the reads. These sequences are valid OTUs which I call "ZOTUs" (zero-radius OTUs). This is better than traditional 97% clustering because it has better phenotype resolution as it allows you to distinguish species and strains which would be lumped together at 97%. See
unoise2 command
for details.
Recommended procedure for OTUs with clustering identity <100%
To make OTUs at identities different from 97%, the best method is to use
UNOISE
followed by
UCLUST
, e.g. the
unoise3
command followed by
cluster_smallmem
. For example, to make OTUs at 100%, 99%, 97%, 95% and 90% identity:
usearch -unoise3 uniques.fa -zotus otus100.fa
for id in 99 97 95 90
do
usearch -cluster_smallmem otus100.fa -id 0.$id -centroids otus$id.fa
done