USEARCH v12

Publications

R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases , PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong

• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies


R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences , PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates

• Genus accuracy of best methods is 50% on V4 sequences

• Recent algorithms do not improve on RDP Classifier or SINTAX


R.C. Edgar and H. Flyvbjerg (2018), Octave plots for visualizing diversity of microbial OTUs , https://doi.org/10.1101/389833
• Octave plots visualize alpha diversity as a histogram

• Plots show shape and completeness of distribution


R.C. Edgar (2018), UNCROSS2: identification of cross-talk in 16S rRNA OTU tables , https://doi.org/10.1101/400762
• Cross-talk rate is approx. 1% in many Illumina datasets

• Cross-talk can cause false positive core microbiome

• UNCROSS2 algorithm for filtering cross-talk


R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs , PeerJ 5:e3889
• QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs

• Closed-reference OTU assignment splits strains and species even when no sequence errors

• Closed-reference fails to assign different hyper-variable regions to the same OTU

• Closed-reference discards many well-known species that are present in Greengenes


R.C. Edgar (2017), SEARCH_16S: A new algorithm for identifying 16S ribosomal RNA genes in contigs and chromosomes , https://doi.org/10.1101/124131

R.C. Edgar (2017), SINAPS: Prediction of microbial traits from marker gene sequences , https://doi.org/10.1101/124156

R.C. Edgar (2017), "UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success" , https://doi.org/10.1101/124149
• Read abundance has very low correlation with species abundance

• Bias caused by gene copy count variation and primer mismatches

• Gene copy count and primer mismatches cannot be accurately predicted

• Impossible to correct abundance bias


R.C. Edgar (2017), Updating the 97% identity threshold for 16S ribosomal RNA OTUs , Bioinformatics 34(14) 2371-2375
• Standard 97% OTU identity threshold is too low

• Optimal OTU threshold is 99% for full-length 16S, 100% for V4


R.C. Edgar (2016), UNCROSS: Filtering of high-frequency cross-talk in 16S amplicon reads , https://doi.org/10.1101/088666
• Cross-talk is common, many are reads assigned to wrong sample

• UNCROSS algorithm for filtering cross-talk


R.C. Edgar (2016), UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing , https://doi.org/10.1101/081257
• UNOISE2 algorithm, improved denoiser

• Reduces false-positive chimeras compared to UNOISE and DADA2


R.C. Edgar (2016), UCHIME2: improved chimera prediction for amplicon sequencing , https://doi.org/10.1101/074252
• UCHIME2 algorithm, improved chimera detection

• "Fake" chimeras are common, valid biological sequences matching two-parent model

• Perfect chimera filtering impossible even with complete and correct reference

• Realistic chimera benchmark


R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences , https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm

• Fast and simple method, accuracy comparable to RDP Classifier


R.C. Edgar and H. Flyvbjerg (2015), "Error filtering, pair assembly and error correction for next-generation sequencing reads" , Bioinformatics 31(21) 3476-3482
• Quality filtering by expected errors

• Bayesian paired read assembler

• Most paired read assemblers calculate incorrect Q scores

• UNOISE algorithm, first denoiser for Illumina reads


R.C. Edgar et al. (2014), UCHIME improves sensitivity and speed of chimera detection , Bioinformatics 27(16) 2194-2200
• Shows UCHIME faster and more accurate than ChimeraSlayer

• This paper report misleading benchmark tests, see critique in UCHIME2 paper


R.C. Edgar (2013), UPARSE: highly accurate OTU sequences from microbial amplicon reads , "Nat. Meth. 10, 996-998"
• Describes UPARSE algorithm for 97% OTU clustering

• Stringent error filtering and discarding singletons necessary

• Highly accurate OTUs from paired OTUs without full overlap


R.C. Edgar (2010), Search and clustering orders of magnitude faster than BLAST , Bioinformatics 26(19) 2460-2461
• USEARCH algorithm

• Default citation for USEARCH software