Recall from Chapter 1 that, through transcription and alternative splicing, each gene produces different RNA transcripts. Depending on various factors, such as the tissue the cell is in, owing to disease, or in response to some stimuli, the RNA transcripts of a gene and the number of copies produced (their expression level) can be different.
In this chapter we assume that we have a collection of reads from all the different (copies of the) transcripts of a gene. We also assume that these reads have been aligned to the reference genome, using for example techniques from Section 10.6; in addition, Section 15.4 shows how to exploit the output of genome analysis techniques from Chapter 11 to obtain an aligner for long reads of RNA transcripts. Our final goal is to assemble the reads into the different RNA transcripts, and to estimate the expression level of each transcript. The main difficulty of this problem, which we call multiassembly, arises from the fact that the transcripts share identical substrings.
We illustrate different scenarios, and corresponding multi-assembly formulations, stated and solved for each individual gene. In Section 15.1 we illustrate the simplest one, in which the gene's transcripts are known in advance, and we need only find their expression levels from the read alignments. In Section 15.2 we illustrate the problem of assembling the RNA reads into the different unknown transcripts, without estimating their levels of expression. In Section 15.3 we present a problem formulation for simultaneous assembly and expression estimation.
As just mentioned, in this chapter we assume that we have a reference genome, and thus that we are in a so-called genome-guided setting. De novo multi-assembly is in general a rather hard task. Thus, we prefer here to stick to genome-guided multiassembly, which admits clean problem formulations and already illustrates the main algorithmic concepts. Nevertheless, in Insight 15.1 we briefly discuss how the leastsquares method from Section 15.1 could be applied in a de novo setting.