Repeat DNA sequences are inherently unstable and prone to undergo expansions, deletions, and chromosome rearrangements
Trinucleotide and higher order nucleotide repeat genomic domains are characterised by relative instability compared to more random sequence domains. This instability frequently is associated with uncontrolled DNA expansion or contraction events, as well as with chromosome rearrangements. These genomic events often correlate with debilitating phenotypic disease states (Ashley and Warren, Reference Ashley and Warren1995; Orr and Zoghbi, Reference Orr and Zoghbi2007). Mirkin and Khristich published a very interesting and insightful review of possible mechanisms and consequences of repeat DNA instability, entitled ‘On the wrong track: molecular mechanisms of repeat mediated genome instability’ (Khristich and Mirkin, Reference Khristich and Mirkin2020). In their scholarly review, the authors critically discuss various models and conceptual frameworks that have been proposed to explain the propensity for repeats to expand or contract.
Interruptions in pre-expanded repeat domains protect repeat DNAs against expansion while being lost in expanded repeat domains
It has been a relatively short time since DNA repeat expansion was proposed as a genotypic cause of phenotypic diseases induced by a threshold level of triplet repeat DNA expansion (Sutherland and Richards, Reference Sutherland and Richards1995). Since then, some progress has been made towards understanding the causal relationship(s) between repeat domain instability and DNA expansion. Nonetheless, crucial features of the mechanisms of how repeat DNA instability leads to DNA expansion still remain unknown. In a subchapter within their review subtitled, ‘Role of repeat interruptions in repeat stability’, Khristich and Mirkin underscore the inability of current models to account for the reduced levels of repeat expansion caused by one or more ‘interruptions’ within the repeat sequence domain. Employing the example of CGG repeats containing AGG interruptions, the authors also note the puzzling apparent loss of the pre-existing AGG interruption in those repeat DNA domains that have become expanded; fascinating yet perplexing observations for which meaningful explanations currently are lacking.
CAA interruptions near the 3′ end of CAG repeats within the Huntingtin gene delay the onset of disease in pre-mutation length Huntington’s patients
The deficiency in defining a mechanistic explanation for the origins of such interruption-induced delay in repeat expansion represents a barrier to the rational design of target-based therapeutic interventions. The need for such a mechanistic understanding is reinforced by a recent review of deep sequencing data from pre-mutation length CAG repeats in Huntington’s disease patients (Wright et al., Reference Wright, Black, Collins, Gall-Duncan, Caron, Pearson and Hayden2020). In this review, Pearson and co-workers note that one or more CAA interruptions of the CAG repeats near the 3′ end of the repeat domain dramatically delay the age of onset of the Huntington’s disease phenotype relative to carriers of the same length composed of uninterrupted CAG repeats. Since the CAA triplet codes for the same amino acid, glutamine, as the CAG triplet it replaces, these authors concluded that the delayed onset of Huntington’s disease must be due to changes in the properties of the repeat sequence itself and not due to the polyglutamine tract in the mutated Huntingtins protein.
Rationalising interruption-induced alterations in repeat DNA properties
Here we propose plausible explanations/mechanisms that can rationalise the impact of repeat interruptions on repeat expansion events, as well as on the puzzling apparent loss of the interrupting repeats after expansion. Our mechanistic proposals, as elaborated on below, are based on insights we have garnered from our published studies on strategically designed, CAG repeat-containing oligonucleotide systems (Völker et al., Reference Völker, Klump and Breslauer2007, Reference Völker, Plum, Gindikin, Klump and Breslauer2014). Such constructs create a dynamic energy landscape shaped by nearly isoenergetic, interchanging, positional isomers that we have dubbed as ‘rollamers’ (Völker et al., Reference Völker, Klump and Breslauer2008, Reference Völker, Gindikin, Klump, Plum and Breslauer2012, Reference Völker, Plum, Gindikin and Breslauer2019; Li et al., Reference Li, Völker, Breslauer and Wilson2014; Völker and Breslauer, Reference Völker and Breslauer2022) (Fig. 1).
Disruption of DNA secondary structure as a potential cause for the repeat interruption-induced increase in repeat stability. For quite some time it has been proposed that aberrant secondary structure formation, stabilised by repeat specific intrastrand base pairing interactions, might serve as critical intermediates in the processes that induce repeat instability leading to expansion (Gacy et al., Reference Gacy, Goellner, Juranic, Macura and McMurray1995; Mitas, Reference Mitas1997; Pearson and Sinden, Reference Pearson and Sinden1998; McMurray, Reference McMurray1999; Sinden et al., Reference Sinden, Potaman, Oussatcheva, Pearson, Lyubchenko and Shlyakhtenko2002; Lenzmeier and Freudenreich, Reference Lenzmeier and Freudenreich2003). This recognition also suggests a potential basis for the empirically observed reduction in repeat domain instability in the presence of one or more ‘wrong’ triplets. In this regard, it has been suggested that destabilisation of the repeat secondary structure by the ‘wrong’ triplet results in an unstable secondary structure, one that would not provide the same challenges to the replication and repair machinery as would be created by a ‘correct’ repeat secondary structure; thereby preventing the expansion process from occurring. However, as pointed out by Khristich and Mirkin in their comprehensive review (Khristich and Mirkin, Reference Khristich and Mirkin2020), it is hard to conceive how a single base change, or even a small number of repeat interruptions within very large repeat domains could manifest such an effect, and there are scant data in support of this hypothesis.
Delaney and coworkers have perhaps presented the most relevant dataset in partial support of this perspective by showing that AGG interruptions within a (CGG)n repeat oligonucleotide alter its secondary structure (as defined by susceptibility to structure-specific chemical probes), and the thermodynamic stability of the freely folding ensemble of structures adopted by (CGG)n repeat oligonucleotides (Jarem et al., Reference Jarem, Huckaby and Delaney2010). In this regard it should be noted that oligonucleotides composed only of repeat sequences do not fold into one unique native structure, but rather present an ensemble of interrelated folding forms, a feature that makes unambiguous interpretation of the data difficult. Given the unknown ensemble distributions in their samples, other interpretations of Delaney’s results are possible. Of interest here is the interruption-induced changes in the ensemble, a feature that also plays a role, albeit in a different context, in the proposed mechanism put forward below.
Abasic sites (and mismatches) can be accommodated within repeat DNA secondary structures without loss of secondary structure stability
In our published studies, we used abasic site lesions, inserted site specifically in place of guanine in select CAG repeats that are trapped within a repeat bulge loop conformation by conventional Watson and Crick base-paired domains upstream and downstream of the CAG domain (Völker et al., Reference Völker, Plum, Klump and Breslauer2009). This arrangement significantly restricts the confounding repeat ensemble distribution (Völker et al., Reference Völker, Klump and Breslauer2008), thereby simplifying the analysis. Our measurements revealed the impact of the abasic site lesion within the repeat bulge loop domain to be essentially energetically neutral. This is an unanticipated outcome since we previously have shown the abasic lesion to be one of, if not the most destabilising lesion in duplex DNA; an expectation consistent with abasic sites, in a formal sense, involving the removal of the guanine base, producing a pairing and stacking ‘hole’ in the DNA helix (Vesnaver et al., Reference Vesnaver, Chang, Eisenberg, Grollman and Breslauer1989; Gelfand et al., Reference Gelfand, Plum, Grollman, Johnson and Breslauer1996, Reference Gelfand, Plum, Grollman, Johnson and Breslauer1998; Minetti et al., Reference Minetti, Sun, Jacobs, Kang, Remeta and Breslauer2018). However, contrary to this impact on duplex DNA, we have demonstrated that the repeat self-structure is able to adjust in a compensatory manner that entirely masks the energetic cost of the loss of one putative base pair and the 5′ and 3′ associated base stacking interactions (Völker et al., Reference Völker, Plum, Klump and Breslauer2009). By contrast, introducing an ‘inappropriate’ base (i.e., replacing a C or G with an A to form a AGG or CAA triplet) within the repeat loop is likely to be less destabilising than an abasic site, given the plethora of non-standard, non-Watson and Crick base pairing interactions that have been shown to be only marginally less stable than the canonical Watson and Crick base pair (Nelson et al., Reference Nelson, Martin and Tinoco1981; Aboul-ela et al., Reference Aboul-ela, Koh, Tinoco and Martin1985; Wu et al., Reference Wu, McDowell and Turner1995; SantaLucia, Reference SantaLucia1998; SantaLucia and Hicks, Reference SantaLucia and Hicks2004). Focusing on the sequence examples mentioned earlier, replacement of a CGG repeat by AGG, or CAG by CAA would result in CˑA or AˑC base pairs, if sequence alignment of the kind frequently proposed for repeat slip-outs is maintained. Stable DNA structures composed of AˑC base pairs have been reported (Hunter et al., Reference Hunter, Brown, Anand and Kennard1986; Boulard et al., Reference Boulard, Cognet, Gabarro-Arpa, Le Bret, Sowers and Fazakerley1992; Allawi and SantaLucia, Reference Allawi and SantaLucia1998). Based on this analysis, it is unlikely that a single or even a few interruptions of the repeat sequence would suffice to disrupt repeat DNA self-structure to such an extent that uncontrolled expansion is delayed or absent.
Consequently, based on the body of data available, it is reasonable to conclude that one or a few interruptions to the repeat sequences are insufficient to fatally disrupt repeat secondary structure, and thereby prevent expansion. As a result, one needs to look elsewhere for possible explanations.
Repeat DNA secondary structures are dynamic ensembles, with the ensemble distributions dictated by differential energy levels
We propose that the answer to this conundrum lies in the dynamic nature of repeat DNA slip outs within larger repeat domains. As we have shown, short repeat slip outs within larger repeat domains result in dynamic repeat bulge loops, which we dubbed ‘rollamers.’ Rollamers are dynamic, interconverting, positional isomers that can form in multiple energetically equivalent positions within the larger repeat sequence (Völker et al., Reference Völker, Gindikin, Klump, Plum and Breslauer2012). Such an arrangement of loop distributions is favoured by a Boltzmann entropy gain. Relatively facile interconversions between different repeat loop positions over the entire repeat sequence domain results, at equilibrium, in an ensemble distribution rather than a single repeat loop isomer, with each possible loop position being approximately equally populated. Indeed, we have postulated that the fleeting nature of such bulge loops due to relatively facile loop migration may be one of the contributing factors that cause the DNA replication and repair machineries to erroneously expand (or contract) repeats when they encounter rollameric substrates. In fact, consistent with this expectation, we have shown that loop migration can cause abasic site lesions to escape processing by the critical repair enzyme APE1 (Völker and Breslauer, Reference Völker and Breslauer2022). The presence of an abasic site lesion in place of one of the guanines in a CAG repeat, however, alters the ensemble distributions of the rollamers, since in this construct different loop positions are no longer energetically equivalent (Völker et al., Reference Völker, Plum, Gindikin and Breslauer2019). Under such a circumstance, the system adjusts by altering the relative populations of loop isomer states to minimise the energetic penalty caused by the lesion.
How repeat domain interruptions alter the repeat DNA secondary structure ensemble distributions
Following similar reasoning, we postulate that the presence of an interruption of the repeat sequence also will cause a change in repeat bulge loop rollamer distribution within such interrupted repeat sequences. As schematically shown in Fig. 2 and elaborated on in the figure legend, the altered triplet can either form conventional base pairs within upstream or downstream duplex domains (i.e., shown by Isomers I and II in Fig. 2), or result in the altered triplet partitioning into the repeat bulge loop domain while simultaneously causing formation of a mismatch in the duplex domain (Isomers III, IV and V in Fig. 2). Isomer III represents a special case due to partitioning of the mismatch at the 5′ loop junction, most likely resulting in an enlarged loop domain. In the former case, the energetic impact of the repeat interruption to the repeat bulge loop is indistinguishable from equivalent-sized repeat bulge loop rollamers in uninterrupted repeats. In the latter case, the energetic impact of the interruption is defined by the impact of the base mismatch and potential contributions from the loop modification. As a consequence, the different loop isomers are no longer energetically equivalent, and significant changes in the populations of different loop isomer positions can be expected. For example, the presence of an abasic site within a repeat almost completely inhibits population of those loop isomers where the abasic site partitions into the duplex domain (Völker et al., Reference Völker, Plum, Gindikin and Breslauer2019). A similar, but perhaps less profound, effect would be expected for the energetically less costly mismatches.
Based on our abasic site data, as discussed above (Völker et al., Reference Völker, Plum, Gindikin and Breslauer2019), we posit that loop isomer positions resulting in a mismatch and altered repeat loop sequence are unlikely to be populated to any significant extent due to the energetic damage such modifications cause. In other words, the interruption of the repeat sequence acts like a (possibly leaky) barrier to rollameric loop distribution. By thermodynamically discouraging population of some potential rollamer positions, the interruption of the repeat sequence causes it to behave as if equivalent to a shorter length repeat domain, particularly as far as the propensity to expand is concerned, which is exactly what has been observed empirically.
The mechanism we propose by which repeat interruptions increase repeat DNA stability represents an example of the importance of the thermodynamic impact of the final state in addition to the commonly considered initial state when one tries to assess biological outcomes.
Low probability, repeat bulge loop isomer states present the possibility for mismatch repair processes to result in the apparent loss of the repeat interruption
Finally, some comments regarding the apparent loss of interrupting triplets such as AGG in an expanded domain alluded to by Khristich and Mirkin in their review (and references therein) (Khristich and Mirkin, Reference Khristich and Mirkin2020). Two possible explanations exist that can be tested by inspection of the expanded sequence. If expansion happens only as a consequence of the restricted/preferred sequence space available to the ‘correct’ repeat bulge loop rollamer, then the expanded domain would only reflect the ‘correct’ repeats, while the altered triplet should still be present near the 3′ or 5′ end of the repeat domain. Alternatively, the thermodynamic argument for reduced repeat expansion propensity, due to single base/triplet interruptions of the repeat sequences, does not exclude the possibility that the repeat bulge loops adopt loop positions where their positional partition results in the upstream duplex domain containing a mismatch, primarily corresponding to isomers IV and V in Fig. 2. Although thermodynamic considerations make population of such loop states low probability events, they would be highly consequential by resulting in repair of the interrupting repeat, hence our use of the term ‘possibly leaky’ barrier. This reasoning is essentially a classic thermodynamic/dynamic argument; namely, that high energy states that are sparsely populated, but do exist, can lead to additional biologically consequential processing pathways if the lifetime of the sparsely populated state is sufficiently long so as to be recognised and processed by the relevant repair enzyme. Such successful repair would remove the interrupting repeat, an outcome consistent with observation.
Given that mismatch repair has been implicated in facilitating repeat expansion events (McMurray, Reference McMurray2008; Iyer et al., Reference Iyer, Pluciennik, Napierala and Wells2015; Schmidt and Pearson, Reference Schmidt and Pearson2016; Iyer and Pluciennik, Reference Iyer and Pluciennik2021), this cascade of events may also trigger expansion of the now interruption-free repeat domain, also consistent with the observed outcomes from the data. The empirical observations of delayed repeat expansions in interrupted repeat sequences are a strong indicator that rollamer isomers for which loop partition result in the formation of mismatches in the repeat duplex domain are low probability events, as otherwise mismatch repair process could be expected to enhance rates of repeat expansion events.
Going Forward: Potential Experimental Assessments of the proposed mechanisms for interruption-induced increase in repeat DNA stability
In closing, we wish to point out that the proposed mechanism, elaborated here, for increased stability of repeat DNA sequences with one or more interruptions, provides testable predictions that allow one to confirm or refute the proposed hypothesis. Notably, determining the thermodynamic consequences of a single interruption of the repeat sequence in various loop positions within a static repeat bulge loop will assess if the consequence of a single base disruption on repeat self-structure stability is indeed negligible, as we suggest above. Furthermore, monitoring loop distribution of a dynamic rollamer system similar to that outlined in Fig. 2, should allow one to determine if the presence of the repeat disruption indeed influences rollamer loop distribution, as hypothesised here; thereby reducing the effective repeat length. Single-molecule studies may prove useful in this regard (Hu et al., Reference Hu, Morten and Magennis2021; Bianco et al., Reference Bianco, Hu, Henrich and Magennis2022). Finally, it may be possible to test the extent to which the mismatch repair system is able to repair the repeat disruption by using a site specifically located mismatch at the repeat /nonrepeat DNA sequence junction in a rollameric system.
Concluding remarks
To summarise, we have presented a novel energetic-based explanation for the puzzling observations remarked upon by Khristish and Mirkin in their review (Khristich and Mirkin, Reference Khristich and Mirkin2020) that interruptions in repeat sequences lead to increased stability and delayed expansion of these domains. Our proposed mechanism also can explain the equally puzzling observed loss of the interrupting repeat when expansion eventually does happen. Specifically, we describe for the first time how interruption of a repeat domain restricts the ensemble space available to dynamic, slip out, repeat bulge loops by introducing energetic barriers to loop migration. We present the novel proposal that these barriers arise because some possible loop isomers result in energetically costly mismatches in the duplex portion of the repeat domain. We further propose for the first time that the reduced ensemble space is the causative feature for the observed delay in repeat DNA expansion. We further propose for the first time that the observed loss of the interrupting repeat in some expanded DNAs may be due to transient occupation of loop isomer positions that result in a mismatch in the duplex stem due to leakiness in the energy barrier. We propose the novel hypothesis that if the lifetime of such a low probability event allows for recognition by the mismatch repair system, then ‘repair’ of the repeat interruption can occur; thereby rationalising the absence of the interruption in the final expanded DNA ‘product.’
We are pleased that Khristch and Mirkin produced such a comprehensive and provocative critical review. In the best of circumstances, such quality reviews serve as intellectual launching pads that motivate the scientific community to focus on explanations for counterintuitive observations. Their presentation of some puzzling biomedical correlative outcomes stimulated us to refocus on our biophysical studies of such systems. Our resulting proposed mechanistic pathways provide novel insights into a biomedically important set of coupled genotypic phenomena that map the linkage between DNA origami thermodynamics and phenotypic disease states.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/qrd.2023.6.
Acknowledgements
The authors would like to thank Drs Craig A. Gelfand and G. Eric Plum for helpful discussions.
Author contribution
J.V. and K.J.B. contributed equally to this work.
Competing interest
The authors declare none.
Financial disclosure
Supported by grants from the NIH GM23509, GM34469, and CA47995 (all to K.J.B.).
Comments
Comments to Author: This is an interesting manuscript on an important subject, and after minor revision is certainly appropriate for publication in QRB Discovery. I would point out, however, that I found the ms as written to be very hard to read, even though I am somewhat familiar with the field of triplet expansion mechanisms, and figuring out what specifically the authors were proposing requireda careful pre-reading on my part of the excellent Khristich & Mirkin review (which is heavily cited within the present ms) before I could get any clear understanding of the specific new ideas the authors are putting forward, and how they go beyond those described in the K&M review. This may be alright, since QRB Discovery is intended as a place to publish new and developing mechanistic ideas on significant biophysical problems, and thus reviewing this complex field in detail is clearly beyond the scope of a typical Discovery paper. On the other hand, the authors could make it easier for the general reader by some reorganization of their presentation and perhaps the inclusion of a more accessible molecular mechanisms ‘cartoon’ to introduce their “rollamer” schematic (Scheme 1). Some specific suggestions follow.
1. The Abstract and the two paragraphs that follow introduce the ideas to be considered in general terms, and the section headings are helpful, but then the next paragraph on Huntington Disease seems to be dropped into the ms without any clear rationalization for what points it is supposed to make and without background. I would suggest that this paragraph be moved further back in the ms after some of the ideas it presents have been discussed in more general terms, or perhaps placed near the end to show how the authors' ideas can be applied to specific triplet expansion disease problems.
2. The injection of Scheme 1 into the ms is similarly abrupt, and is probably incomprehensible to the general reader without some preliminary introduction, possibly by drawing some sort of more familiar stick-figure cartoons used to describe triplet-expansion ideas comparable to those used by M&K in their review, to make it clearer how the new mechanistic ideas developed by the authors using their ‘rollamer’ concept fit into prior ideas of triplet expansion mechanisms.
3. It would similarly be helpful if the authors could summarize more specifically in their “Concluding Remarks” what general new concepts or ideas they have introduced into the triplet expansion field and how these ideas go beyond what is presented in the M&K review, and what particular ‘open questions’ defined by M&K their approaches help to solve. Perhaps the Huntington Disease section might fit in here to better illustrate the possibilities for using specific rollamer models approaches to provide ideas for disease therapies.
After the authors have considered the above issues and revised the ms along lines related to the suggestions made above I think this work should be quite suitable for publication in QRB Discovery, and will comprise a significant contribution.