Search results for Computational Biology and Bioinformatics

Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 143-144
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

14 - Haplotype analysis
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 333-343
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

With diploid organisms, one is interested not only in discovering variants but also in discovering to which of the two haplotypes each variant belongs. One would thus like to identify the variants that are co-located on the same haplotype, a process called haplotype phasing. Assume we have managed to do haplotype phasing for several individuals. It is then of interest to do haplotype matching, that is, to locate long contiguous blocks shared by multiple individuals. The chapter covers algorithms and complexity analysis of these key haplotype analysis tasks. A close connection between classical indexes and a tailored data structure called the positional BWT index is established.

7 - Hidden Markov models
from Part II - Fundamentals of Biological Sequence Analysis
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 129-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Analysing the content of a biological sequence can often be modeled as a segmentation problem. For example, one may wish to segment a genome in coding and non-coding regions, where only the former are translated to proteins. Statistical features of what genes usually look like can be used to derive an optimization framework. This process can be formalized through hidden Markov models, and the underlying segmentation problem can be solved using dynamic programming. This chapter introduces the key methods related to such optimization.

9 - Burrows–Wheeler indexes
from Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 174-216
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Classical index structures like suffix trees are powerful, but they occupy much more space than the data they are built on. Many space-efficient alternatives exist that occupy space close to the input data. This chapter covers such data structures based on the Burrows–Wheeler transform (BWT). A special emphasis is given to the bidirectional BWT index, which can be used for solving basic genome analysis tasks by simulating suffix tree exploration without any sacrifice in run time. Space-efficient representations of de Bruijn graphs are also covered.

4 - Graphs
from Part I - Preliminaries
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 42-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Graphs are a fundamental model for representing various relations among data. The aim of this chapter is to present some basic problems and techniques relating to graphs, mainly for finding particular paths in directed and undirected graphs. In the following chapters, we deal with various problems in biological sequence analysis that can be reduced to one of these basic ones.

16 - Transcriptomics
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 367-393
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter we assume that we have a collection of reads from all the different (copies of the) transcripts of a gene. We start by showing how to extend read alignment techniques to short RNA reads, and later we show how to exploit the output of genome analysis techniques to obtain an aligner for long reads of RNA transcripts. Our final goal is to assemble the reads into the different RNA transcripts and to estimate the expression level of each transcript. The main difficulty of this problem, which we call multi-assembly, arises from the fact that the transcripts share identical substrings. We illustrate different scenarios, and corresponding multi-assembly formulations, which we then solve using network flow techniques.

Frontmatter
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Part II - Fundamentals of Biological Sequence Analysis
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 81-82
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Classical indexes
from Part III - Genome-Scale Index Structures
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 145-173
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A full-text index for a string T is a data structure that is built once and that is kept in memory for answering an arbitrarily large number of queries on the position and frequency of substrings of T. Such queries can be used for speeding-up dynamic programming algorithms tailored for mapping reads to a reference genome – a fundamental task in the analysis of high-throughput sequencing data. This chapter covers the classical full-text indexes and the like, including k-mer indexes, suffix arrays, and suffix trees. Linear-time algorithms for suffix sorting and for basic genome analysis tasks, such as finding maximal exact matches, are also presented.

1 - Molecular biology and high-throughput sequencing
from Part I - Preliminaries
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 3-9
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter gives a minimalistic, combinatorial introduction to molecular biology, omitting the description of most biochemical processes and focusing on inputs and outputs, abstracted as mathematical objects.

Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 331-332
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

10 - Alignment-based genome analysis
from Part IV - Genome-Scale Algorithms
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 219-239
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter connects the alignment techniques and space-efficient data structures covered in earlier chapters. It shows how to use BWT indexes for alignining sequencing reads to a reference genome. This powerful read mapping procedure enables variant calling and genotyping of new individuals from a species whose reference genome has already been assembled.

6 - Alignments
from Part II - Fundamentals of Biological Sequence Analysis
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 83-128
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

An alignment of two sequences aims to highlight how much in common the two sequences have. In computational biology, an alignment is a prediction of the evolutionary steps between the two sequences. Different costs for such steps can be assigned, and then one can seek for an optimal alignment. This chapter gives a comprehensive introduction to the dynamic programming algorithms developed for various alignment formulations.

11 - Alignment-free genome analysis and comparison
from Part IV - Genome-Scale Algorithms
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 240-283
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter shows how to perform analysis and comparison of genomes without assuming a reference genome to be available. The bidirectional BWT index turns out to be essential here, and the chapter covers a comprehensive set of techniques to manipulate this data structure. The algorithms covered include computing maximal exact/unique matches, substring kernels, matching statistics, and Jaccard similarity.

15 - Pangenomics
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 344-366
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Several large-scale studies aim to build comprehensive catalogs of all the variants in a population, for example all the frequent variants in a species or all the variants in a group of individuals with a specific trait or disease. Such catalogs are the substrate for subsequent genome-wide association studies that aim to correlate variants to traits, and ultimately to personalized treatments. Such catalogs can also be leveraged for making basic analysis tasks, such as read alignment, using not just one reference genome but a pangenome data structure representing all genomes in the catalogue. The chapter gives an overview of different pangenome data structures and their applications. Selected data structures are covered in more depth, including the r-index.

Part IV - Genome-Scale Algorithms
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 217-218
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

17 - Metagenomics
from Part V - Applications
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 394-413
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Assume that a drop of seawater contains cells from many distinct species. Sequencing such a mixed sample and figuring out the relative abundancy of every species is a key problem in metagenomics. This chapter explores techniques for metagenomics analysis in different settings, for example with and without assuming that reference sequences are available. To solve these problems, we use techniques including tailored k-mer-based analyses, bidirectional BWT indexing, and network flows.

3 - Data structures
from Part I - Preliminaries
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 21-41
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents the minimal setup of data structures required to follow the rest of the book in a self-contained manner. Balanced binary trees are enhanced to solve dynamic range minimum queries. Bitvector rank and select data structures and their extensions to larger alphabets with wavelet tree are covered. Then a special structure for solving static range minimum queries is derived. The chapter ends with a concise description of hashing primitives, such as perfect hashing, Bloom filters, minimizers, and the Rabin–Karp rolling hash.

Index
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 439-444
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

13 - Fragment assembly
from Part IV - Genome-Scale Algorithms
Veli Mäkinen, University of Helsinki, Djamal Belazzougui, Centre de Recherche sur l’Information Scientifique et Technique (CERIST), Algiers, Fabio Cunial, Broad Institute, Massachusetts, Alexandru I. Tomescu, University of Helsinki
Book:

Genome-Scale Algorithm Design

Published online:

28 September 2023

Print publication:

12 October 2023, pp 308-330
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Throughout the book we mostly assume the genome sequence under study to be known. In this chapter we look at strategies for how to assemble fragments of DNA into longer contiguous blocks, and eventually into chromosomes. This chapter is partitioned into sections roughly following the workflow of a de novo assembly project, namely, error correction, contig assembly, scaffolding, and gap filling. Algorithms working with de Bruijn graphs and overlap graphs are studied.

Computational Biology and Bioinformatics

Refine search

Refine search

Actions for selected content:

1070 results in Computational Biology and Bioinformatics

Part III - Genome-Scale Index Structures

14 - Haplotype analysis

Summary

7 - Hidden Markov models

Summary

9 - Burrows–Wheeler indexes

Summary

4 - Graphs

Summary

16 - Transcriptomics

Summary

Frontmatter

Part II - Fundamentals of Biological Sequence Analysis

8 - Classical indexes

Summary

1 - Molecular biology and high-throughput sequencing

Summary

Part V - Applications

10 - Alignment-based genome analysis

Summary

6 - Alignments

Summary

11 - Alignment-free genome analysis and comparison

Summary

15 - Pangenomics

Summary

Part IV - Genome-Scale Algorithms

17 - Metagenomics

Summary

3 - Data structures

Summary

Index

13 - Fragment assembly

Summary

Computational Biology and Bioinformatics

Refine search

Refine search

Actions for selected content:

Save Search

1070 results in Computational Biology and Bioinformatics

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary