USEARCH v12

Taxonomy confidence measures

See also
SINTAX algorithm
Naive Bayesian Classifier algorithm
sintax command
nbc_tax command

The sintax and nbc_tax commands generates taxonomy predictions with confidence estimates specified by bootstrapping.

The definition and interpretation of a taxonomy prediction confidence estimate is not as simple as it might appear. Ideally, the error rate of predictions with confidence 0.9 should be approximately 10%, but in practice the error rate depends on the query dataset and on unknown characteristics of the reference dataset. This is shown by the simple example of a single query sequence -- the average accuracy will be 1.0 or 0.0, regardless of the confidence reported by the classifier.


References (please cite)
R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm

• Fast and simple method, accuracy comparable to RDP Classifier


R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates

• Genus accuracy of best methods is 50% on V4 sequences

• Recent algorithms do not improve on RDP Classifier or SINTAX


R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong

• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies