Setting alignment parameters

Multiple sequence alignments are performed in two stages: an initial pairwise alignment, and a final, multiple sequence alignment.  The Alignment editor allows you to set parameters that control each stage of the alignment is performed.

To set DNA sequence alignment parameters:

  • Click on the Align sequences menu.
  • Select the CLUSTAL W 1.83 menu item.
  • Select DNA alignment parameters or Protein alignment parameters.
  • Set the desired options and press OK.  The options you selected will be applied the next time you perform an alignment.

Note:
Default values for all parameters can be set by pressing Restore default values.

Parameter notes:
1. Quick pairwise alignments parameters (DNA and Protein):

Alignment parameters

  • The Quick pairwise alignments tab includes parameters that control the speed and sensitivity of initial alignments when a Quick pairwise alignment is performed.
  • The Gap penalty is a penalty assigned for each gap opened.
  • The K-tuple size is the length of matching fragment that is used.  Increase for speed; decrease for sensitivity.
  • Top diagonals specifies the number of K-tuple matches on each diagonal.  Decrease for speed; increase for sensitivity.
  • Window size specifies the number of diagonals around each of the "best" diagonals that will be used.  Decrease for speed; increase for sensitivity.

2. Full pairwise alignments parameters (DNA and Protein):

Alignment parameters

  • Full pairwise alignments are more accurate than Quick pairwise alignments, but can be very slow, especially if the number of sequences is greater than 20, or the length of the sequences is greater than 1000.
  • Gap opening is a penalty applied for opening a gap in the alignment.
  • Gap extension is a penalty applied for extending a gap by one residue.
  • The DNA weight matrix is a table of scores assigned to alignment matches and mismatches.  You can use the built in IUB or CLUSTAL W matrices, or you may enter a path to your own weight matrix file.

3. Multiple Alignments (DNA and Protein):

Alignment parameters

  • The third tab controls the final multiple alignment, and applies to both Quick and Full pairwise alignments.
  • Gap opening is the penalty for opening a new gap in the alignment.  Increasing the gap opening penalty will make gaps less frequent.
  • Gap extension is a penalty for each gap inserted.  Increasing the gap extension penalty will make gaps shorter.
  • Delay divergent sequences delays the alignment of the most distantly related sequences until after the most closely related sequences have been aligned.
  • Transition weight gives transitions a weight between 0 and 1; a weight of zero means that the transitions are scored as mismatches, while a weight of 1 gives the transitions the match score.  For distantly related sequences, the weight should be near to zero; for closely related sequences it can be useful to assign a higher score.

4. Protein gap parameters (Proteins only).

Alignment parameters

  • Residue specific penalties: Amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence.
  • End gap separation: Treats end gaps just like internal gaps for the purposes of avoiding gaps that are too close (set by Gap separation distance below).  If you turn this off, end gaps will be ignored for this purpose.  This is useful when you wish to align fragments where the end gaps are not biologically meaningful.
  • Hydrophilic penalties: Used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common.  The residues that are "considered" to be hydrophilic are entered in the Hydrophilic residues box.
  • Hydrophilic residues: Single letter codes for amino acids to be considered hydrophilic.  Used to determine to which residues to apply Hydrophilic penalties.
  • Gap separation distance: Tries to decrease the chances of gaps being too close to each other.  Gaps that are less than this distance apart are penalized more than other gaps.  This does not prevent closely spaced gaps; it makes them less frequent, promoting a block-like appearance of the alignment.