Multiple sequence alignment methods

Richard Durbin; Sean R. Eddy; Anders Krogh; Graeme Mitchison

doi:10.1017/CBO9780511790492.007

6 - Multiple sequence alignment methods

Published online by Cambridge University Press: 05 September 2012

Anders Krogh and

Richard Durbin: Affiliation:
Sanger Centre, Cambridge
Sean R. Eddy: Affiliation:
Washington University, Missouri
Anders Krogh: Affiliation:
Technical University of Denmark, Lyngby

Book contents

Get access

Summary

In Chapter 5, we assumed that a reasonable multiple sequence alignment was already known and provided the starting point for constructing a profile HMM. We now look at what a ‘reasonable’ multiple alignment is, and at ways to construct one automatically from unaligned sequences.

Multiple alignments must usually be inferred from primary sequences alone. Biologists produce high quality multiple sequence alignments by hand using expert knowledge of protein sequence evolution. This knowledge comes from experience. Important factors include: specific sorts of columns in alignments, such as highly conserved residues or buried hydrophobic residues; the influence of secondary and tertiary structure, such as the alternation of hydrophobic and hydrophilic columns in exposed beta sheet; and expected patterns of insertions and deletions, that tend to alternate with blocks of conserved sequence. Furthermore, the phylogenetic relationships between sequences dictate constraints on the changes that occur in columns and in the patterns of gaps. RNA alignments involve similar knowledge but additionally they are often strongly constrained by a secondary structure model that in many cases has also been inferred from primary sequence data (Chapter 10).

Manual multiple alignment is tedious. Automatic multiple sequence alignment methods are a topic of extensive research in computational biology. In general, an automatic method must have a way to assign a score so that better multiple alignments get better scores. We should carefully distinguish the problem of scoring a multiple alignment from the problem of searching over possible multiple alignments to find the best one.

Type: Chapter
Information: Biological Sequence Analysis
Probabilistic Models of Proteins and Nucleic Acids
, pp. 135 - 160

DOI: https://doi.org/10.1017/CBO9780511790492.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

6 - Multiple sequence alignment methods

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive