Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- Bibliography
- Author index
- Subject index
6 - Multiple sequence alignment methods
Published online by Cambridge University Press: 05 September 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 Pairwise alignment
- 3 Markov chains and hidden Markov models
- 4 Pairwise alignment using HMMs
- 5 Profile HMMs for sequence families
- 6 Multiple sequence alignment methods
- 7 Building phylogenetic trees
- 8 Probabilistic approaches to phylogeny
- 9 Transformational grammars
- 10 RNA structure analysis
- 11 Background on probability
- Bibliography
- Author index
- Subject index
Summary
In Chapter 5, we assumed that a reasonable multiple sequence alignment was already known and provided the starting point for constructing a profile HMM. We now look at what a ‘reasonable’ multiple alignment is, and at ways to construct one automatically from unaligned sequences.
Multiple alignments must usually be inferred from primary sequences alone. Biologists produce high quality multiple sequence alignments by hand using expert knowledge of protein sequence evolution. This knowledge comes from experience. Important factors include: specific sorts of columns in alignments, such as highly conserved residues or buried hydrophobic residues; the influence of secondary and tertiary structure, such as the alternation of hydrophobic and hydrophilic columns in exposed beta sheet; and expected patterns of insertions and deletions, that tend to alternate with blocks of conserved sequence. Furthermore, the phylogenetic relationships between sequences dictate constraints on the changes that occur in columns and in the patterns of gaps. RNA alignments involve similar knowledge but additionally they are often strongly constrained by a secondary structure model that in many cases has also been inferred from primary sequence data (Chapter 10).
Manual multiple alignment is tedious. Automatic multiple sequence alignment methods are a topic of extensive research in computational biology. In general, an automatic method must have a way to assign a score so that better multiple alignments get better scores. We should carefully distinguish the problem of scoring a multiple alignment from the problem of searching over possible multiple alignments to find the best one.
- Type
- Chapter
- Information
- Biological Sequence AnalysisProbabilistic Models of Proteins and Nucleic Acids, pp. 135 - 160Publisher: Cambridge University PressPrint publication year: 1998
- 1
- Cited by