Publications
R.C. Edgar
(2018),
Taxonomy annotation and guide tree errors in 16S rRNA databases
,
PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong
• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies
R.C. Edgar
(2018),
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences
,
PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates
• Genus accuracy of best methods is 50% on V4 sequences
• Recent algorithms do not improve on RDP Classifier or SINTAX
R.C. Edgar and H. Flyvbjerg
(2018),
Octave plots for visualizing diversity of microbial OTUs
,
https://doi.org/10.1101/389833
• Octave plots visualize alpha diversity as a histogram
• Plots show shape and completeness of distribution
R.C. Edgar
(2018),
UNCROSS2: identification of cross-talk in 16S rRNA OTU tables
,
https://doi.org/10.1101/400762
• Cross-talk rate is approx. 1% in many Illumina datasets
• Cross-talk can cause false positive core microbiome
• UNCROSS2 algorithm for filtering cross-talk
R.C. Edgar
(2017),
Accuracy of microbial community diversity estimated by closed- and open-reference OTUs
,
PeerJ 5:e3889
• QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs
• Closed-reference OTU assignment splits strains and species even when no sequence errors
• Closed-reference fails to assign different hyper-variable regions to the same OTU
• Closed-reference discards many well-known species that are present in Greengenes
R.C. Edgar
(2017),
SEARCH_16S: A new algorithm for identifying 16S ribosomal RNA genes in contigs and chromosomes
,
https://doi.org/10.1101/124131
R.C. Edgar
(2017),
SINAPS: Prediction of microbial traits from marker gene sequences
,
https://doi.org/10.1101/124156
R.C. Edgar
(2017),
"UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success"
,
https://doi.org/10.1101/124149
• Read abundance has very low correlation with species abundance
• Bias caused by gene copy count variation and primer mismatches
• Gene copy count and primer mismatches cannot be accurately predicted
• Impossible to correct abundance bias
R.C. Edgar
(2017),
Updating the 97% identity threshold for 16S ribosomal RNA OTUs
,
Bioinformatics 34(14) 2371-2375
• Standard 97% OTU identity threshold is too low
• Optimal OTU threshold is 99% for full-length 16S, 100% for V4
R.C. Edgar
(2016),
UNCROSS: Filtering of high-frequency cross-talk in 16S amplicon reads
,
https://doi.org/10.1101/088666
• Cross-talk is common, many are reads assigned to wrong sample
• UNCROSS algorithm for filtering cross-talk
R.C. Edgar
(2016),
UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing
,
https://doi.org/10.1101/081257
• UNOISE2 algorithm, improved denoiser
• Reduces false-positive chimeras compared to UNOISE and DADA2
R.C. Edgar
(2016),
UCHIME2: improved chimera prediction for amplicon sequencing
,
https://doi.org/10.1101/074252
• UCHIME2 algorithm, improved chimera detection
• "Fake" chimeras are common, valid biological sequences matching two-parent model
• Perfect chimera filtering impossible even with complete and correct reference
• Realistic chimera benchmark
R.C. Edgar
(2016),
SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences
,
https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm
• Fast and simple method, accuracy comparable to RDP Classifier
R.C. Edgar and H. Flyvbjerg
(2015),
"Error filtering, pair assembly and error correction for next-generation sequencing reads"
,
Bioinformatics 31(21) 3476-3482
• Quality filtering by expected errors
• Bayesian paired read assembler
• Most paired read assemblers calculate incorrect Q scores
• UNOISE algorithm, first denoiser for Illumina reads
R.C. Edgar
et al.
(2014),
UCHIME improves sensitivity and speed of chimera detection
,
Bioinformatics 27(16) 2194-2200
• Shows UCHIME faster and more accurate than ChimeraSlayer
• This paper report misleading benchmark tests, see critique in UCHIME2 paper
R.C. Edgar
(2013),
UPARSE: highly accurate OTU sequences from microbial amplicon reads
,
"Nat. Meth. 10, 996-998"
• Describes UPARSE algorithm for 97% OTU clustering
• Stringent error filtering and discarding singletons necessary
• Highly accurate OTUs from paired OTUs without full overlap
R.C. Edgar
(2010),
Search and clustering orders of magnitude faster than BLAST
,
Bioinformatics 26(19) 2460-2461
• USEARCH algorithm
• Default citation for USEARCH software