Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-12-01T01:57:25.121Z Has data issue: false hasContentIssue false

Statistical Bioinformatics: For Biomedical and Life Science Researchers. Ed. J. K. Lee. Wiley-Blackwell. 2010. 370 pages. ISBN 9780471692720. Price £66.95 (paperback).

Published online by Cambridge University Press:  31 March 2011

ERIC STONE
Affiliation:
Bioinformatics Research CenterNorth Carolina State UniversityNorth Carolina, USA
Rights & Permissions [Opens in a new window]

Abstract

Type
Book Review
Copyright
Copyright © Cambridge University Press 2011

Statistical Bioinformatics, edited by Dr Jae Lee, is envisioned as a textbook for a one- or two-semester course. The book aims to provide a comprehensive introduction to statistical concepts and techniques relevant to the bioinformatics training of life scientists. The textbook is organized into four three-chapter blocks that cover statistical foundations (Chapters 2–4), high-dimensional analysis (Chapters 5–7), advanced topics (Chapters 8–10) and multigene systems analysis (Chapters 11–13). As befitting this breadth of topics, chapters were contributed by a diverse set of authors with expertise distributed through the field of bioinformatics.

Dedicated to statistical foundations, the first section of the book is arguably its most valuable. Chapters 2–4 touch on a wide variety of topics including probability concepts, hypothesis testing and an introduction to statistics and their sampling distributions. Also included are issues of special relevance to bioinformatics such as quality control, data normalization and the testing of multiple hypotheses. The nature of Statistical Bioinformatics is such that none of these topics are treated in depth, but the coverage is appropriate, the content is valuable and students will benefit greatly from the aggregation of this foundational material into one text. The development of statistical inference is worth special mention: rather than relying on traditional concepts such as parametric estimation and confidence intervals, the focus is on sampling and resampling. This seems very much like the right choice even as it leaves a small gap between the foundations in Chapters 2–4 and the advanced topics in Chapters 8–10.

Chapters 5–13 build upon the statistical foundation established in the early part of the book. The second section of the book, spanning Chapters 5–7, focuses on the analysis of high-dimensional data. Chapters 5 and 6 are heavy on machine-learning techniques and cover unsupervised and supervised learning, respectively. After a useful introduction to similarity and dissimilarity metrics, Chapter 5 explores the problem of clustering in great detail. The discussion is logical and nicely complemented by figures that illustrate various clustering methods and their distinctions. Chapter 6 is more challenging than Chapter 5, perhaps because supervised learning is less intuitive than clustering, and perhaps because more background knowledge is required. The material is important but students may find it difficult in the absence of supplemental figures and examples. By contrast, the visualization of high-dimensional data as discussed in Chapter 7 is a more accessible topic.

Chapters 8–10 of Statistical Bioinformatics are reserved for advanced analysis topics that do not depend on the material in Chapters 5–7. These topics include statistical modelling (Chapter 8), experimental design (Chapter 9) and statistical resampling methods (Chapter 10). The statistical modelling chapter picks up where Chapter 2 left off and contains a concise survey of models and methods. The following chapter focuses on high-throughput biological experiments and discusses principles of design including replication and statistical power. While the emphasis of Chapter 9 is on microarray experiments, the principles discussed are quite general. Chapter 10 discusses resampling techniques for prediction, classification and model selection, again in the context of large biological data sets. Although Chapters 8–10 are meant to be independent of Chapters 5–7, there is synergy between Chapters 10 and 6, and while reading one, students should be cognizant of the other.

Chapters 11 and 12 of Statistical Bioinformatics address topics of relevance to systems biology. Chapter 11 is technically advanced and covers methods for statistical network analysis, whereas Chapter 12 presents a thorough discussion of trends and challenges in genome-wide association analysis (GWAS). Both topics are highly relevant, and the GWAS chapter is a must-read for students today. The final chapter of Statistical Bioinformatics highlights the book's greatest strength: the consistent inclusion of statistical programming examples. The statistical programming language R is used throughout the text, and Chapter 13 discusses at length the use of R and its Bioconductor package for bioinformatics applications.

Statistical Bioinformatics is noteworthy for its breadth of coverage and use of descriptive examples. It will find use both as a graduate textbook and as a guide to self-study. Statistics is a large field, and the foundational chapters will help guide students to what they need to know. The remaining chapters introduce topics of relevance to bioinformatics students in such a way that more specialized knowledge can easily be sought.