Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed.
An identity threshold must be specified using the - id option .
Sequences are processed in the order specified by the - sort option , which may be other (the default), length or size. See UCLUST sort order for discussion. If -sort length is specified, then sequences are processed in order of decreasing length. This is most appropriate when fragments are present together with full-length sequences. If -sort size is specified, then sequences are processed in order of decreasing size annotation . This can be useful for clustering of amplicon reads such as 16S or ITS tags, though cluster_otus is usually recommended for this task. If -sort other is used (the default), then the input sequences are processed in the order they appear in the input file.
Reverse-complemented matching for nucleotide sequences can be specified by using -strand both.
Size annotations may be generated and/or propagated by using the - sizein and/or - sizeout options.
Output files
Standard output files
are supported. Cluster centroids (representative sequences) are written to a FASTA file specified by the -
centroids
option. Consensus sequences are written to a FASTA file specified by -
consout
and multiple alignments are written to filenames derived from the -
msaout option
. Note that using -consout and -msaout may add significantly to the compute time and memory required for clustering. You can specify a directory to contain one FASTA file per cluster using the -
clusters
option.
Supported options
Accept options
Termination options
Indexing options
Masking options
Multithreading
Alignment parameters
Alignment heuristics
Example
usearch -cluster_fast query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc