UCHIME2 is an algorithm for detecting chimeric sequences . It is an update of the UCHIME algorithm with some new features. It is implemented in the uchime2_ref command. See UCHIME2 paper for details.
The uchmie2_denovo command is obsolete. It is replaced by the uchime3_denovo command which implements the chimera filtering step in unoise3 . This the same algorithm described in the UCHIME2 paper, except that parameters have been adjusted to reduce the number of false positives.
I do not recommend using uchime2_ref in a 16S or ITS analysis pipeline
. The problem is that you will get high rates of both FPs and FNs unless you have denoised sequences, in which case the chimera removal is a special case which is built into the denoising code
(see
UCHIME2 paper
for details).
It is better to use
unoise3
or
cluster_otus
for chimera filtering. I believe that unoise is a better approach because it has better resolution: it reconstructs the biological sequences in the reads without 97% clustering. This enables you to resolve species and strains which are >97% similar to each other.
I recommend you use the
largest available reference database
for uchime2_ref, e.g. SILVA for 16S or UNITE for ITS. My previous advice to use a small, high-quality database was misguided (
wrong!
)-- you need a large database to get decent sensitivity, a small database like "gold" will probably be missing many parents in practice.