We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we introduce the notion of graph algorithms, which are basically algorithms working on graphs. There are many such algorithms, aimed at solving a wide range of problems. We will focus on one such problem – the shortest paths problem. This problem has several algorithms, under different constraints, that solve it. We will present the well-known breadth first search (BFS), algorithm that solves a simple version of that problem. This algorithm will be explained in detail and implemented in Python. We will conclude the chapter by mentioning additional common problems in graph theory.
Much of the initial work in bioinformatics was centered around biological sequences of medium lengths, such as genes and protein sequences. Typical problems were finding common genomic motifs, e.g., conserved sequence motifs in the promoters’ region of a gene family, or looking for sequence similarity among proteins of different species. These tasks, and many others, have significant biological consequences. Yet the key to their solution lies in the so-called area of stringology, which is the computer science jargon for the study of string algorithms and properties.
Computers allow us to store an enormous amount of data. Data come in several types, such as numbers, sound, and images. But how are these types of data represented in the computer’s memory?
Modern biology has been undergoing a dramatic revolution in recent decades. Enormous amounts of data are produced at an unprecedented rate. These data come in various forms, such as gene and protein expression level data, DNA, RNA, and protein sequencing, high quality biological and medical images. One consequence of this “data explosion” is that computational methods are increasingly being used in life science research. Computational methods in this context are not the mere use of tools, but the integration of computational and algorithmic thinking to lab-experiment design; to data generation, integration, and analyses; and to modeling and simulation. It is becoming widely recognized that such thinking skills should be incorporated into the standard training of life scientists in the 21st century.
In this chapter, we will define Turing’s Halting Problem. We will then describe Turing’s proof that the Halting Problem cannot be solved by any computer, ever. This was the first problem proven to be, fundamentally, beyond the capabilities of any computer. This result subsequently served as a springboard to prove that a whole, infinite class of other problems are unsolvable as well.
This chapter begins with a definition of what a graph is. While the definition is rather abstract, it is general enough to make graph theory relevant and applicable to a wide range of systems, in a variety of contexts. We will also see how to represent graphs in Python (or any other programming language). Such a representation allows a computerized inspection of many graph properties, as well as implementing various algorithms on graphs. By the end of this chapter, you will be familiar with many basic notions in this field. The last two sections of this chapter introduce the notions of clusters and clustering, and of hierarchical clustering in the context of phylogenetic trees. We note that the treatment of these two topics here hardly touches their surface – both are deep subjects in their own right, covered by vast and diverse literature.
In the previous chapter, we studied well-defined computational problems that cannot be solved by any computer program. Consider now a setting where we are given a well-defined computational problem that is solvable by some computer program. Yet any such program takes a very long time to complete. Long enough that by the time the execution terminates, the solution is no longer relevant.
In this chapter, you will encounter one of the most fundamental notions in CS – complexity of computations. We take a rather non-formal approach in presenting this topic, with an emphasis on intuition. Complexity of algorithms has two flavors – time complexity, which is related to running time, and memory, or space complexity, which reflects the memory allocation requirements of an algorithm. In this chapter, and in most other chapters as well, we will focus primarily on time complexity (except for some cases, in which memory poses a real limitation). You are not expected to become an expert in analyzing the complexity of algorithms. However, you should be able, by the end of this chapter, to understand how running time depends on an algorithm’s input size and to estimate empirically an algorithm’s actual running time. Later in the book, in chapter 11 (Mission Infeasible), we will get acquainted with the notion of complexity from another perspective – computational problems that can be solved in principle, but for which no efficient solutions are known.
Many biological systems can be explored by computer simulation: an imitation of the behavior of the system over time, done by running a computer program. Computer simulations are also termed in-silico experiments, paraphrasing the terms in-vitro and in-vivo. Computer simulation in biology can be used to replace or complement some tedious and costly lab experiments. It enables conducting numerous “experiments” under various conditions, at a scale that is infeasible experimentally.
The term image processing refers to the manipulation or analysis of digital images. While manipulation alters the image in some way to make it more meaningful or informative, analysis merely extracts information from the image without changing its pixels. This chapter describes some very basic notions in image processing, namely segmentation, morphological operators such as erosion and dilation and noise reduction, and labeling. Image processing is really a whole field of expertise, involving rather sophisticated mathematical methods. This chapter aims to familiarize the reader with the field, setting the ground for further exploration of this fascinating topic.
Images are used extensively nowadays. With the digital revolution, we easily generate, send, and observe images, all at a very low cost. Images are also used extensively in biological research and in the medical clinic. Biological images are studied mainly in basic science research, mostly at the cellular and molecular level. Medical images are used for clinical purposes, and focus on the tissue and organ level. Both kinds of images are often complicated and heterogeneous, and analyzing them requires sophisticated computational techniques. The goal of these techniques is to extract meaningful knowledge about the image content. For example, the automated identification of objects in the image (such as cells, intracellular components, or a cancer tumor), tracking cells, organelles, or cancer cells in consecutive video frames, or phenotype identification by object properties (size, light intensity, shape, etc.). Images also require large volumes of storage, raising the need for efficient compression algorithms, in order to store them.
In this part of the book, we will introduce the basics of the programming language Python (chapter 1), and learn how to use it efficiently, taking into consideration a program’s running time and memory allocation requirements (chapter 2). Later on, Python programs will accompany every topic presented throughout this book.
In many contexts, we are required to handle a large collection of objects in a way that supports inserting a new object, finding if an object is present, and possibly deleting an object. These operations typically appear in a series of arbitrary length. We want all these operations to be done as efficiently as possible. Consider a search engine (and its underlying infrastructure) like Google or Bing. One makes a query (e.g., “who is the King of Asteria”) and gets a response in about 945,000 results (0.44 seconds). One of the basic techniques behind such efficient implementations of search is called hashing.
In this chapter, we study another common string-related problem – pattern matching. Suppose we want to find a given sequence motif, or pattern, in a genome or protein, where the pattern is not unique. In other words, the pattern has more than a single possible matching sequence. To that end, we will introduce the fundamental notion of regular expressions, and their use to solve this problem. In addition, we will discuss the closely related notion of finite state machines (FSM), another basic concept in computer science.
Computational thinking is increasingly gaining importance in modern biology, due to the unprecedented scale at which data is nowadays produced. Bridging the cultural gap between the biological and computational sciences, this book serves as an accessible introduction to computational concepts for students in the life sciences. It focuses on teaching algorithmic and logical thinking, rather than just the use of existing bioinformatics tools or programming. Topics are presented from a biological point of view, to demonstrate how computational approaches can be used to solve problems in biology such as biological image processing, regulatory networks, and sequence analysis. The book contains a range of pedagogical features to aid understanding, including real-world examples, in-text exercises, end-of-chapter problems, colour-coded Python code, and 'code explained' boxes. User-friendly throughout, Computational Thinking for Life Scientists promotes the thinking skills and self-efficacy required for any modern biologist to adopt computational approaches in their research with confidence.