1. Introduction
Miniature inverted-repeat transposable elements (MITEs) are non-autonomous short repeats that mobilize within the host genome even without the potential to encode the protein (i.e. the transposase) responsible for their mobilization. The MITEs are, in general, derived from ancient, related autonomous elements, and their origin can occur through internal deletions in autonomous elements, where the only remaining are the terminal inverted repeats (TIRs) and, sometimes, portions between the TIRs and the coding region of the transposase. This origin supports the proposal of their mobilization in trans by a transposase encoded by a full-length element (Feschotte & Pritham, Reference Feschotte and Pritham2007). The autonomous transposons use the cellular machinery of the host cells for the protein synthesis necessary for their mobilization, whereas the MITEs use the machinery encoded by transposons for mobilization. In the 1980s, Orgel & Crick (Reference Orgel and Crick1980) referred to ‘selfish DNA’ as the ‘ultimate parasites’ due to the relationship of parasitism between autonomous elements and the machinery of the cell. Recently, González & Petrov (Reference González and Petrov2009) enlarged this idea to include the MITEs because of their dependency on autonomous elements for mobilization.
In general terms, MITE-like elements have been widely described in plants (Moreno-Vazquez et al., Reference Moreno-Vazquez, Ning and Meyers2005; Lin et al., Reference Lin, Long, Shan, Zhang, Shen and Liu2006; Guermonprez et al., Reference Guermonprez, Loot and Casacuberta2008) and specifically in grapevine (Benjak et al., Reference Benjak, Boue, Forneck and Casacuberta2009), maize (Bureau & Wessler, Reference Bureau and Wessler1992; Zerjal et al., Reference Zerjal, Joets, Alix, Grandbastien and Tenaillon2009), cereal grasses (Bureau & Wessler, Reference Bureau and Wessler1994), Arabidopsis (Feschotte & Mouches, Reference Feschotte and Mouches2000), rice (Feschotte et al., Reference Feschotte, Swamy and Wessler2003; Jiang et al., Reference Jiang, Bao, Zhang, Hirochika, Eddy, McCouch and Wessler2003; Nakazaki et al., Reference Nakazaki, Okumoto, Horibata, Yamahira, Teraishi, Nishida, Inoue and Tanisaka2003; Shan et al., Reference Shan, Liu, Dong, Wang, Chen, Lin, Long, Han, Dong and Liu2005), Medicago (Grzebelus et al., Reference Grzebelus, Lasota, Gambin, Kucherov and Gambin2007, Reference Grzebelus, Gladysz, Macko-Podgorni, Gambin, Golis, Rakoczy and Gambin2009), apple (Han & Korban, Reference Han and Korban2007), beet (Menzel et al., Reference Menzel, Dechyeva, Keller, Lange, Himmelbauer and Schmidt2006), barley (Lyons et al., Reference Lyons, Cardle, Rostoks, Waugh and Flavell2008; Petersen & Seberg, Reference Petersen and Seberg2009), grasses (Park et al., Reference Park, Jeong, Song and Kim2003), pearl millet (Remigereau et al., Reference Remigereau, Robin, Siljak-Yakovlev, Sarr, Robert and Langin2006) and pome fruit trees (Wakasa et al., Reference Wakasa, Ishikawa, Niizeki, Harada, Jin, Senda and Akada2003). Descriptions in other organisms, such as bacteria (Chen et al., Reference Chen, Zhou, Li and Xu2008), cyanobacteria (Zhou et al., Reference Zhou, Tran and Xu2008), fungi (Xu et al., Reference Xu, Wang, Zhang, Tang, Pan and Zhou2010), silkworms (Han et al., Reference Han, Shen, Gao, Chen, Xiang and Zhang2010), fish (de Boer et al., Reference de Boer, Yazawa, Davidson and Koop2007) and amphibians (Hikosaka et al., Reference Hikosaka, Nishimura, Hikosaka-Katayama and Kawahara2011) are also found in the literature, but few occurrences have been reported in the Drosophila genus (Tudor et al., Reference Tudor, Lobocka, Goodell, Pettitt and O'Hare1992; Miller et al., Reference Miller, Nagel, Bachmann and Bachmann2000; Ortiz et al., Reference Ortiz, Lorenzatto, Correa and Loreto2010). Although numerous MITEs have been identified, the association with autonomous elements is often absent. Here, we describe an MITE-like element found in the genome of Drosophila sechellia that is associated with the Bari transposon described in Drosophila melanogaster. The high similarity found with Bari_DM in both TIRs and internal regions suggests a close relationship with autonomous elements.
2. Materials and methods
Searches for Bari_DM elements in the genomes of species of Drosophila (unpublished data) resulted in the identification of an ~90 bp sequence, with TIRs and no coding sequence, in the D. sechellia genome. After this observation, the sequence of the TIRs of the Bari_DM element of D. melanogaster (X67681) was used to search the genome of D. sechellia (release 1.3, June 2009) (Drosophila 12 Genomes Consortium, 2007) using the BLASTn software (Altschul et al., Reference Altschul, Gish, Miller, Myers and Lipman1990). Analyses aimed at identifying the target site duplications (TSDs) and estimations of the gene density in the adjacent regions of the MITEs were also performed extracting the 10 kb 5′ and 3′ flanking regions of each insertion. The ability to form secondary structure was analysed using Mfold (Zuker, Reference Zuker2003) (available at http://mfold.rna.albany.edu/).
To confirm that these MITEs were not a sequencing artefact, their occurrence was searched in a D. sechellia strain maintained in our laboratory. Genomic DNA was extracted from 50 individuals according to a previously described protocol (Jowett, Reference Jowett and Roberts1986). The amplification, cloning and sequencing were performed using specific primers based on the consensus sequence of the MITE identified in the D. sechellia genome (Forward, 5′-MYRGTCATGGTCAAAATTATTTTCACAA-3′ and Reverse, 5′-ACAGAGGTGGTCAAAAGTATTTTCACWW-3′). PCR amplification was performed using 0·3125 unit of Taq polymerase (Invitrogen), 200 ng genomic DNA, 1 mm of MgCl2, 1×buffer, 0·08 mm of dNTPs and 0·4 mm of primers for a final volume of 25 μl. The PCR conditions were as follows: initial denaturation (94°C, 120 s), followed by 30 cycles of denaturation (94°C, 15 s), annealing (59°C, 10 s) and extension (72°C, 20 s). The PCR products were purified (DNA GFX DNA & Gel Band, GE) and cloned (TOPO TA Cloning kit, Invitrogen) according to the specifications of the manufacturers. Eight clones were selected for extraction of the plasmid by a phenol/chloroform protocol and sequenced using universal primers, M13F and M13R, resulting in four sequences with good quality.
The evolutionary relationships between the sequences were reconstructed using the software Network with the Median Joining algorithm (Bandelt et al., Reference Bandelt, Forster and Rohl1999) and the default parameters, using the nucleotide sequences extracted from the D. sechellia genome. The age of these insertions was estimated using the following molecular clock equation (r=k/2T), where r is the neutral synonymous substitution rate of the Drosophila genus (r=0·011/site/Myr) (Tamura et al., Reference Tamura, Subramanian and Kumar2004) and k is the divergence rate (Kimura 2-parameter distance) (Kimura, Reference Kimura1980). The consensus sequence was reconstructed using the software, DAMBE (Xia & Xie, Reference Xia and Xie2001), and the distances were calculated using MEGA version 5 (Tamura et al., Reference Tamura, Peterson, Peterson, Stecher, Nei and Kumar2011).
3. Results and discussion
In general, MITEs are smaller than 600 bp in length, have conserved TIRs, a target site preference, no coding potential and are AT-rich (Feschotte et al., Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002). We found 49 MITE-like sequences in the sequenced genome of D. sechellia (see Supplementary Table S1 available at http://journals.cambridge.org/GRH) that presented lengths between 65 and 89 bp, TIRs of 28 bp and AT contents of approximately 66%. Approximately 63% of these sequences are flanked by AT dinucleotides, which are typical TSDs of the MITE family Stowaway (Feschotte et al., Reference Feschotte, Zhang, Wessler, Craig, Craigie, Gellert and Lambowitz2002). Both consensus sequences showed potential to form secondary structure (see Supplementary Figure S1 available at http://journals.cambridge.org/GRH), ability present in MITEs. Additionally, as other MITEs (Zerjal et al., Reference Zerjal, Joets, Alix, Grandbastien and Tenaillon2009; Han et al., Reference Han, Shen, Gao, Chen, Xiang and Zhang2010), these sequences are preferentially associated with gene regions (62% of the insertions were localized within genes or harboured genes in their 10 kb flanking regions).
The MITE-like sequences described here (Fig. 1 and Supplementary Table S1) show a high similarity with the Bari_DM transposon described in D. melanogaster, but they are significantly smaller (65–89 bp) than this autonomous element (1728 bp). Two types of sequences were found, with their TIRs 100 and 89% similar to the Bari_DM, and both shared three internal regions of 100% identity to Bari_DM and between them. Thus, we concluded that the sequences described in the D. sechellia genome are derivatives of the Bari element, hereafter termed msechBari elements.
These two types of msechBari, which essentially differ by three nucleotides in their TIRs, were grouped into two well-defined clusters in a network tree; thus, they can be considered to be two MITE subfamilies (Fig. 2). The network suggests the existence of a master sequence that would have given rise to the two groups of sequences. Evolution under the master gene model is characterized, in graphic reconstructions of evolutionary relationships, by a star topology, where the central sequence gives rise to the derived sequences (Cordaux et al., Reference Cordaux, Hedges and Batzer2004). The length of the branches is related to the elapsed time since the origin of each sequence: short branches suggest a recent origin, and long branches indicate an old origin.
The two subfamilies derived from the two master sequences, msechBari1 and msechBari2, have short evolutionary distances within the group, 0·00341±0·00036 and 0·00279±0·0004, respectively; however, when a comparison was made between the subfamilies, the distance was larger, 0·05020±0·00029. The short distances between the sequences within a subfamily, the short branches and the absence of reticulation in the network suggest a recent burst of transposition of these elements in the genome of the strain that was sequenced. Accordingly, the groupings of sequences in the network, represented by large circles, indicate that the sequences are identical; therefore, these sequences are very recent and have not had sufficient time to diverge. Similar events have been reported for other transposable elements (Yang et al., Reference Yang, Hung, You and Yang2006; de Boer et al., Reference de Boer, Yazawa, Davidson and Koop2007; Marzo et al., Reference Marzo, Puig and Ruiz2008; Konovalov et al., Reference Konovalov, Goncharov, Goryunova, Shaturova, Proshlyakova and Kudryavtsev2010; Lerat, Reference Lerat2010) and MITE-like sequences in different organisms (Jiang et al., Reference Jiang, Bao, Zhang, Hirochika, Eddy, McCouch and Wessler2003; Chen et al., Reference Chen, Zhou, Li and Xu2008; Zhou et al., Reference Zhou, Tran and Xu2008; Han et al., Reference Han, Shen, Gao, Chen, Xiang and Zhang2010; Hikosaka et al., Reference Hikosaka, Nishimura, Hikosaka-Katayama and Kawahara2011). This recent origin is also supported by the average time of origin of the insertions of each subfamily, 155 000 years (msechBari1) and 127 000 years (msechBari2). We confirmed the presence of msechBari in a laboratory strain (Fig. 1). The sequences found were similar to those in the D. sechellia sequenced genome. They had the internal region conserved, but the 2 bp of the 5′ TIRs were variable (see Supplementary Figure S2 available at http://journals.cambridge.org/GRH). This variation, if real, could indicate inactivity of these MITES. However, as we obtained only four sequences, it is possible that the two pairs of variable bases are sequencing artefacts.
For the mobilization of a transposon, such as Bari, the transposase proteins recognize and bind to specific sites in the TIRs to promote transposition. For some MITEs found in plants, the mobilization of transposons that do not have coding capacity has been suggested to occur via transposases in trans from elements that are distantly related. For example, up to approximately 20 000 insertions of rice MITE-like elements of the Stowaway family, which exhibit TIRs similar to other mariner-like elements, have been reported. However, these elements are not homologous to any other autonomous elements that have been described in rice; thus, it has been proposed that these elements be mobilized by a transposase encoded by other distantly related autonomous elements (Feschotte, Reference Feschotte2008). Therefore, the recent transposition of the msechBari could have resulted from the presence of a transposase from an active Bari_DM transposon in D. sechellia or from other Bari-like elements in the D. sechellia genome that can recognize the TIRs. Only two full-length Bari copies, with both intact TIRs, were found in the D. sechellia genome, but both have many stop codons in their transposase coding sequences (see Supplementary Figure S3 available at http://journals.cambridge.org/GRH), indicating that the Bari element in D. sechellia is inactive. However, both copies exhibit a low diversity, when compared with the consensus sequence of Bari_DM, suggesting that this inactivity is recent. Therefore, this autonomous element is potentially responsible for the msechBari mobilization in the recent past; however, the mobilization by another distantly related autonomous element cannot be disregarded.
Funding for this project was provided by the Brazilian agencies, FAPESP – Fundação de Amparo à Pesquisa do Estado de São Paulo (Grant 2010/10731-4 to C. M. A. C. and fellowship 2008/07629-3 to E. S. D.), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) (Grant 304880/2009-4 to C. M. A. C.) and FUNDUNESP (Fundação para o Desenvolvimento da UNESP) (Grant 670/10). We thank Jean David, PhD for providing the strain used in this study.
4. Supplementary material
The online data are available at http://journals.cambridge.org/GRH