USEARCH v12

SINTAX downloads

See also
Microbial taxonomy
Which taxonomy database should I use?

FASTA files reformatted with SINTAX-compatible taxonomy annotations
.

16S
rdp_16s_v16.fa.gz RDP training set v16 (13k seqs.).
RDP license terms .
rdp_16s_v16_sp.fa.gz RDP training set with species names  ( not recommended ) ( can species be predicted ?).
gg_16s_13.5.fa.gz Greengenes v13.5 (1.2M seqs.).
Greengenes license terms . ( not recommended )
silva_16s_v123.fa.gz SILVA v123 (1.6M seqs.).
SILVA license terms . ( not recommended )
ltp_16s_v123.fa.gz SILVA v123 LTP named isolate subset (12k seqs.) .
SILVA license terms

ITS
UNITE (current "utax" version at unite.ut.ee) (53k sequences in v7.1).
UNITE license terms .
rdp_its_v2.fa.gz RDP Warcup training set v2 (18k sequences).
RDP license terms .

18S
silva_18s_v123.fa.gz SILVA v123 eukaryotic 18S subset (140k seqs.) . SILVA license terms

References (please cite)
R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm

• Fast and simple method, accuracy comparable to RDP Classifier


R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates

• Genus accuracy of best methods is 50% on V4 sequences

• Recent algorithms do not improve on RDP Classifier or SINTAX


R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong

• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies