USEARCH v12

cluster_fast command


See also
cluster_fast
cluster_smallmem
cluster_otus

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed by multi-threading. It can be much faster than cluster_fast, depending on the input data.

An identity threshold must be specified using the - id option .

Sequences are processed in the order they appear in the input file.

Reverse-complemented matching for nucleotide sequences using -strand both is not supported.

Size annotations may be generated and/or propagated by using the - sizein and/or - sizeout options.

Output files
Standard output files are supported. Cluster centroids (representative sequences) are written to a FASTA file specified by the - centroids option. Consensus sequences are written to a FASTA file specified by - consout and multiple alignments are written to filenames derived from the - msaout option . Note that using -consout and -msaout may add significantly to the compute time and memory required for clustering. You can specify a directory to contain one FASTA file per cluster using the - clusters option.

Supported options
Accept options
Termination options
Indexing options
Masking options
Multithreading
Alignment parameters
Alignment heuristics


Example

usearch -cluster_fast query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc