USEARCH v12

cluster_smallmem command

See also
cluster_fast
cluster_otus
cluster_agg
cluster_aggd

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to minimize memory use.

It's is the user's responsibility to sort the input sequences in an appropriate order before running cluster_smallmem; see UCLUST sort order for discussion. By default, input sequences are expected to be sorted by decreasing length. If some other sort order is used, the - sortedby option should be specified. Valid values are length (default), size and other. If -sortedby other is specified, then USEARCH does not assume or check for any particular order. See also sortbysize and sortbylength .

An identity threshold must be specified using the - id option .

Multithreading is not supported as this would require significant memory overhead.

By default, nucleotide matching is done on the forward strand only. For matching on both strands, use - strand both.

See also
Standard output file options
Accept options
Indexing options
Termination options
Masking options
Alignment parameters
Alignment heuristics

Cluster sizes
Memory requirements

Example

usearch -cluster_smallmem query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc