USEARCH v12

Alignment parameters

See also
Alignment heuristics

These parameters determine the score of an alignment. They include substitution scores and gap penalties. These are distinct from heuristic parameters , which control fast but approximate methods for finding the alignment with the highest score. Ideally, changing heuristic parameters would not change the reported alignment (because the best alignment would always be found). By contrast, changing alignment scoring parameters will tend to change the alignment, e.g. increasing gap penalties will reduce the number of gaps. All scoring parameters are floating-point values and may be specified as integers or real numbers.

If local alignment parameters are changed, then the Karlin-Altschul K and Lambda parameters must also be changed in order to get correct E-values .

Option L ocal/ G lobal
P rotein/ N ucleotide
Default Description
-lopen L PN 10.0 Local gap open
-lext L PN 1.0 Local gap extend
-match LG N +1.0 Match score
-mismatch LG N -2.0 Mismatch score
- matrix filename LG PN BLOSUM62 (aa)
+1/-2 (nt)
Substitution matrix in NCBI BLAST format. See BLOSUM62 for an example.

Gap penalties for global alignments
With global alignments, gap penalties are specified using the -gapopen and -gapext options. Up to 12 separate penalties can be specified: all combinations of query / target, left / interior / terminal, and open / extend can be assigned different penalties.

Image

Default penalties are shown in the following table.

Penalty Default
Interior gap open 10.0 nucleotides, 17.0 proteins
End gap open 1.0
Interior gap extend 1.0
End gap extend 0.5

The nucleotide defaults would be set using these options:

-gapopen 10.0I/1.0E -gapext 1.0I/0.5E

A numerical value for a penalty is optionally followed by one or more letters that specify particular types of gap. Here, "10.0I" means "Interior gap=10.0", and "1.0E" means "End gap=1.0". If no letters are given after the numerical value, then the penalty applies to all gaps. More than one letter can be specified, so for example "0.5IE" means "Interior and End gap=0.5", which is the same as all gaps. Following are valid letters: I=Interior, E=End, L=Left, R=Right, Q=Query and T=Target. If more than one numerical value is specified, then they must be separated by a slash character '/'. White space is not allowed. If a star (*) is used as the numerical value, then the gap is forbidden. Using * in an open penalty means that the gap will never be allowed, using * in an extension penalty means that gaps longer than one will be forbidden. So, for example, *LQ in -gapopen means "left end-gaps in the query are not allowed". A sign (plus or minus) is not allowed in the numerical value, which can be integer or floating-point (in which case a period '.' must be used for the decimal point). The -gapopen and -gapext options are interpreted first by setting the defaults, then by scanning the string left-to-right. Later values override previous values.