See also
OTU / denoising analysis pipeline
UNOISE paper
Should I use UPARSE or UNOISE?
Uses the UNOISE algorithm to perform denoising (error-correction) of amplicon reads.
Errors are corrected as follows:
- Reads with sequencing error are identified and corrected.
- Chimeras are removed.
Input is a set of quality-filtered unique read sequences with
size=nnn; abundance annotations
. See
OTU / denoising pipeline
for details of how reads should be pre-processed and how other types of errors and artifacts can be removed.
The input file must be sorted by decreasing abundance, i.e. by decreasing value of the size=nnn annotation. The can be done using the sortbysize command .
The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads .
Predicted correct biological sequences are written to the -zotus file in FASTA format. Labels are formatted as Zotu nnn where nnn is 1, 2, 3...
Predicted correct amplicon sequences are written to the -ampout fle in FASTA format. These include chimeras, so this output file is not generally needed in a production pipeline. Labels are formatted as Amp nnn ;uniq= Uniqlabel ;uniqsize= u ;size= s ; where nnn is 1, 2, 3..., Uniqulabel is the label in the input file, truncated at the first semi-colon, u is the size= annotation from the input file and s is the total size of reads derived from this amplicon.
An OTU table can be generated using the otutab command . See OTU / denoising pipeline .
The -minsize option specifies the minimum abundance (size= annotation). Default is 8. Input sequences with lower abundances are discarded. Most of the low-abundance sequences are usually noisy and are be mapped to a ZOTU by the otutab command . For higher sensivity, reducing minsize to 4 is reasonable, especially if samples are denoised indivudually rather pooling all samples together , as I would usually recommend. With smaller minsize, there tends to be more errors in the predicted low-abundance biological sequences.
The -tabbedout option specifies a tabbed text filename which reports the processing done for each sequence, e.g. if it is classified as noisy or chimeric.
The -unoise_alpha option specifies the alpha parameter (see UNOISE2 paper for definition). Default is 2.0.
Example
usearch -unoise3 uniques.fa -zotus zotus.fa -tabbedout unoise3.txt