USEARCH v12

Closed-reference OTU assignment algorithm

See also
closed_ref command
Problems with closed- and open-reference OTU assignment

Closed-reference OTU assignment ( Rideout et al. 2014) assigns query sequences to OTUs by searching a pre-defined database of full-length sequences which have been clustered at 97% identity. In QIIME, it is implemented by the pick_closed_reference_otus.py script using a default database which was obtained by clustering Greengenes. In QIIME v1.9, the database search is performed by uclust, an old software package that was the predecessor of usearch.

In usearch, a similar algorithm is implemented in the closed_ref command . The USEARCH algorithm is used to search the database. Different parameters are used compared to the usearch_global command to improve sensitivity and report cases where two or more database sequences are tied for the highest identity. Ties are broken systematically by reporting the first of hit in database file order.


Reference (please cite)
R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs, PeerJ 5:e3889
• QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs

• Closed-reference OTU assignment splits strains and species even when no sequence errors

• Closed-reference fails to assign different hyper-variable regions to the same OTU

• Closed-reference discards many well-known species that are present in Greengenes