Introduction
Couroupita guianensis Aubl is a large deciduous tropical tree that belongs to the Lecythidaceae family (Shekhawat and Manokari, Reference Shekhawat and Manokari2016). Peculiar features of its flower and fruit make it a distinguished tree, and it is widely planted as an ornamental tree in botanical gardens around the globe. Its flower shape gives a distinct impression that, in Indian traditional culture, is interpreted as a snake hood-like stamen structure guarding a stigma in the shape of a Shiva lingam (an Indian holy symbol) at the flower's centre. This feature has given rise to many Indian common names, such as ‘Kailashpati’ in Hindi, ‘Mallikarjuna’ in Telugu and ‘Nagalingapushpam or Nagpushpa’ in Tamil. The tree is considered sacred in India and Sri Lanka, as the flowers of the tree are offered in holy ceremonies in these countries (Lim, Reference Lim2012; Shekhawat and Manokari, Reference Shekhawat and Manokari2016). The tree is also commonly referred to as the ‘Cannonball tree’ in English due to its fruit shape and size. The fruits are globular brown woody with size of a human head or a cannon and are used for feeding animals (Lim, Reference Lim2012; Shekhawat and Manokari, Reference Shekhawat and Manokari2016).
This sacred tree has been used in traditional medicine in India as well as worldwide. Flowers, leaves and barks of C. guianensis are used to treat hypertension, tumours, pain, inflammatory processes, malaria and many other health issues (Sanz-Biset et al., Reference Sanz-Biset, Campos-de-la-Cruz, EpiquiÃn-Rivera and Canigueral2009). Juice made from the leaves is used to cure skin diseases. The fruit has been used for disinfecting the wounds and young leaves are used in curing the toothache (Al-Dhabi et al., Reference Al-Dhabi, Balachandran, Raj, Duraipandiyan, Muthukumar, Ignacimuthu, Khan and Rajput2012). Further, it is shown to possess pharmacologically relevant biological properties, such as anti-bacterial, anti-biofilm, anti-oxidant, ovicidal, larvicidal, anti-ulcer, anti-arthritic, anti-platelet, anti-diarrhoea, analgesic, anti-inflammatory, anti-fertility, anti-cancer, neuropharmacological, anxiolytic, anti-plasmodial, anti-depressant, anti-nociceptive, immunomodulatory, anti-quorum sensing, anti-malarial and wound healing (Sanz-Biset et al., Reference Sanz-Biset, Campos-de-la-Cruz, EpiquiÃn-Rivera and Canigueral2009; Al-Dhabi et al., Reference Al-Dhabi, Balachandran, Raj, Duraipandiyan, Muthukumar, Ignacimuthu, Khan and Rajput2012; Shekhawat and Manokari, Reference Shekhawat and Manokari2016; Kaneria et al., Reference Kaneria, Rakholiya, Jakasania, Dave and Chanda2017).
Few reports are available in literature on metabolite profiling of C. guianensis, which indicated the presence of eugenol, linalool, nerol, tryptanthrine, indigo, indirubin, isatin, linoleic acid, carotenoids, sterols and (E,E)-farnesol metabolites (Khan et al., Reference Khan, Shivashankara and Roy2014; Kaneria et al., Reference Kaneria, Rakholiya, Jakasania, Dave and Chanda2017).
Despite multiple available reports on metabolite profiling and pharmacological activities, we realized that C. guianensis is still a rather uncharacterized plant in terms of transcriptomics and tissue-specific metabolite analysis. With the advent of high-throughput mass spectral analytical techniques and nucleotide sequencing in the past decade, significant efforts have been made by researchers to carry out metabolomics and transcriptomics approaches on unexplored medicinal plants for their detailed characterization (Guo et al., Reference Guo, Huang, Sun, Cui and Liu2021; Alami et al., Reference Alami, Ouyang, Zhang, Shu, Yang, Mei and Wang2022). Techniques such as RNA-Seq and mass spectrometry act as modern lenses through which we can characterize traditional medicinal plants in detail at the molecular level. These efforts are specifically focused on identifying genes and metabolites of secondary metabolism because it is primarily these molecules that directly or indirectly give rise to unique characteristics such as fragrance, flavour, colour, pharmacological activity, plant defence against abiotic and biotic stresses, disease resistance, etc., in medicinal plants (A. Kumar et al., Reference Kumar, Mulge, Thakar, Pandreka, Warhekar, Ramkumar, Sharma, Upadrasta, Shanmugam and Thulasiram2023; S. Kumar et al., Reference Kumar, Korra, Thakur, Arutselvan, Kashyap, Nehela, Chaplygin, Minkina and Keswani2023). Such knowledge offers the possibility of further biotechnological interventions, such as plant breeding or genetic manipulation for trait improvement, optimization of plant cultivation and more recently heterologous gene expression for the production of desirable secondary metabolites in bacteria and yeast (Rai et al., Reference Rai, Saito and Yamazaki2017; Navale et al., Reference Navale, Sharma, Said, Ramkumar, Dharne, Thulasiram and Shinde2019; Guo et al., Reference Guo, Huang, Sun, Cui and Liu2021; A. Kumar et al., Reference Kumar, Mulge, Thakar, Pandreka, Warhekar, Ramkumar, Sharma, Upadrasta, Shanmugam and Thulasiram2023; S. Kumar et al., Reference Kumar, Korra, Thakur, Arutselvan, Kashyap, Nehela, Chaplygin, Minkina and Keswani2023).
Considering the lacunae, we ventured into metabolite and transcriptome profiling of C. guianensis to identify the range of its secondary metabolites and decipher relevant secondary metabolite pathway genes. Such molecular data may provide us the opportunity to assess whether there may be some scientific reasoning underlying the age old traditional wisdom, that the flower should be used for sacred offerings. These genes elucidated in this study may further be used in numerous biotechnological applications in the future.
For the study, initially, we carried out metabolic profiling of the whole flower, petals, stamen, stem and leaf of C. guianensis and also screened these tissues for their antimicrobial activity. Flower tissue stood out among all other plant parts for having a diverse and large terpenoid repertoire and potent antibacterial action. These results led us to concentrate our efforts on a thorough examination of floral tissue and construct a flower transcriptome. A cDNA library generated from the RNA of flower tissue was sequenced, and transcriptomic analysis was carried out. We successfully screened out terpenoid pathway transcripts from flower tissue and correlated them with terpenoid biosynthesis. Then, using three full-length terpene synthase gene sequences, we performed structural investigations to predict gene architecture and 3D protein structure for protein function prediction. This work is the first study of the secondary metabolite biosynthesis pathway of the hitherto underexplored plant C. guianensis.
Materials and methods
Plant materials
Couroupita guianensis tissue, i.e. the whole flower, flower petal, stamen, stem and leaf, was collected in liquid nitrogen from a tree at the NCL commercial complex near the National Chemical Laboratory in Pune. The plant used in this study was confirmed as C. guianensis and authenticated by a botanist at Agharkar Research Institute, Pune (the herbarium accession number allotted to the plant is AHMA: 32430).
Phytochemical extraction and GC-MS analysis
Metabolite analysis was carried out in five tissues, including the whole flower, petal, stamen, leaf and stem of C. guianensis. Tissues were crushed to powder under liquid nitrogen and extracted with TBME (10 ml × 3) by continuous stirring for 3 h for a total of three times. The pooled TBME layer was passed through anhydrous sodium sulphate, concentrated under reduced pressure to obtain crude triterpenoid extract, and reconstituted to 500 μl in TBME. For analysis of TBME extract, GC-MS was performed on an Agilent 7890A GC coupled with a 5975C mass detector, and the conditions used were as follows: Restek Rtx-5 ms (30 m × 0.25 mm × 0.25 μm) capillary column was used; helium was used as carrier gas flow with a flow rate of 1.0 ml/min. The column was initially maintained at 150 °C for 2 min, then the temperature was raised from 150 to 250 °C at a 5 °C/min rate with a hold of 11 min, and finally the temperature was maintained at 270 °C for 15 min. Injector and detector temperatures were 230 and 280 °C, respectively. Then 1 μl of plant extract was injected into the column. Compounds were identified by comparison with the mass spectra reference library NIST MS and by using retention time matches with reference standards wherever possible (Eugenol and Linalool). The data were processed by MSD ChemStation Data Analysis (Agilent Technologies, USA).
RNA extraction from the flower tissue of C. guianensis
Total RNA was isolated from flower tissue (pre-treated with an acetone wash) by a spectrum kit (Total RNA Isolation Kit, Sigma-Aldrich, USA) and treated with DNase to remove DNA contamination. Any residual contamination and integrity of total RNA were checked by electrophoresis on 1% agarose made in 0.1% DEPC containing TAE buffer. Further concentrations and impurities of salt and proteins were analysed on the Nanodrop (Thermo Fisher, USA). Isolated and high-quality RNA from flower tissue was sent for sequencing.
Transcriptome de novo assembly and functional annotation
Isolated RNA from the flower of C. guianensis tissues was sent to Genotypic Technology in Bengaluru, India. NextSeq500 (Illumina, USA) was used for the sequencing of RNA to generate processed reads. These reads were assembled to generate unigenes by Trinity software for the generation of the k-mers (25 base pairs). To assign molecular function, biological processes and cellular components of the transcript, functional annotation of unigenes was performed using KEGG-KAAS analysis, Pfam domain analysis and MEGA blast search against the NCBI database, SwissProt/Uniprot database and Protein Data Bank (PDB) with an E-value ⩽ 10−5.
cDNA synthesis and semi-quantitative PCR of selected terpene synthases
The RNA isolated from the floral tissue of C. guianensis was used to produce cDNA according to the SuperScript® III First-Strand Synthesis System (Invitrogen, USA) kit instructions. Semi-quantitative RT-PCR was used to verify the expression of the cloned genes (details can be found in SI Material and Methods) obtained from the study. The total RNA of the flower was extracted, as mentioned earlier. After reverse transcription, semi-quantitative RT-PCR was carried out with GADPH as the internal control reference gene. The semi-quantitative RT-PCR experiment on tissue was performed with three repetitions. The final PCR programme used for amplification of all three transcripts was: 1 cycle of 95 °C (5 min); 30 cycles of 95 °C (30 s); 56 °C (30 s); 72 °C (2.5 min); 72 °C (5 min). The amplified fragment was resolved on a 1% agarose gel, visualized by staining with Gel Red dye (Sigma-Aldrich, USA), and imaged digitally. ImageJ was used for densitometry analysis of amplified PCR products in gel images. The intensity of cloned genes was normalized against that of internal control, and the expression ratio of three cloned genes was represented as arbitrary units in flower tissue along with the standard deviation.
Physiochemical characterization and phylogenetic analysis
Expasy's ProtParam server was used for the primary structure analysis of the three sequenced genes. The biophysical and biochemical properties such as isoelectric point (pI), molecular weight, aliphatic index, extinction coefficient and GRAVY were computed using this programme (Gasteiger et al., Reference Gasteiger, Hoogland, Gattiker, Wilkins, Appel, Bairoch and Walker2005). The nucleotide sequences of three full-length ORFs of putative terpene synthases were subjected to blastx analysis. Conserved domain searches were performed using the Clustal Omega tool and the Conserved Domain Database (CDD) available at NCBI for the identification of conserved motifs. Further, matching-reviewed terpene synthases were screened out from the UniProt database. All these sequences were subjected to NgPhylogeny.Fr analysis for phylogenetic analysis in a one-click workflow (Lemoine et al., Reference Lemoine, Correia, Lefort, Doppelt-Azeroual, Mareuil, Cohen-Boulakia and Gascuel2019).
Protein 3D-structure and gene function prediction analysis
Homology protein modelling was carried out using the Swiss modeller in Automated mode, and validation of the protein models was carried out using PDBSum by evaluating the Ramachandran Plot (Schwede et al., Reference Schwede, Kopp, Guex and Peitsch2003; Laskowski et al., Reference Laskowski, Chistyakov and Thornton2005). The PROCHECK programme was used to check the stereochemical excellence and the overall structural geometry of the homology model at both 2D and 3D levels (Laskowski et al., Reference Laskowski, Jabłońska, Pravda, Vařeková and Thornton2018). The ProFunc web server tool was used to predict the biochemical function of homology-modelled C. guianensis terpene synthase proteins at the 3D structure level (Laskowski et al., Reference Laskowski, Chistyakov and Thornton2005). All the protein structures were visualized using UCSF Chimera software (Pettersen et al., Reference Pettersen, Goddard, Huang, Couch, Greenblatt, Meng and Ferrin2004).
Results
Metabolite profiling
The stem, leaf and flower tissues of C. guianensis showed inherent metabolite variety when metabolite profiling was done using GC-MS (Fig. 1 and Fig. S1). The metabolite composition in these plant tissues ranged from volatiles such as phenylpropanoids, benzenoids and terpenoid groups; straight chain hydrocarbons; and non-volatiles such as high molecular weight terpenoids, steroids, straight chain hydrocarbons, etc. Phenylpropanoids/benzenoids and terpenoid group volatiles were discovered to be highly concentrated in flower tissue, accounting for 83 and 16%, respectively (Fig. 1). Although high in percentage, phenylpropanoids and benzenoids diversity was low, and only nine metabolites of the group were found in flower tissue (Table S1). Of the nine metabolites, eugenol (29.54%), isatin (21.96%) and phenylethyl alcohol (0.79%) were the main metabolites of the group in flower tissue.
Although low in percentage, terpene diversity was high, and 28 different terpenes were found in flower tissue (Table S1). Among the 28 terpenes, beta-linalool (a monoterpene) (2.22%), geranic acid (a monoterpene) (3.20%) and alpha-farnesene (a sesquiterpene) (0.49%) were major metabolites detected in the flower tissue. Further, monoterpenes and sesquiterpenes were the dominant types of terpenoids in flower tissue, with an overall monoterpene to sesquiterpene ratio of 4:1. A heat map comparing terpenoid content variation in flower, petal and stamen is also shown in Fig. 1. We found that petals had a high terpene content compared to stamen (Fig. 1, Table S1). These terpenoids and phenylpropanoids with benzenoids may contribute to the flower's scent as well as various biological activities.
The flower, petal, stamen, stem and leaf tissue of C. guianensis were screened for antimicrobial activity (as described in SI Material and Methods). Among them, flower tissue and its subparts, petal and stamen, showed the most potent anti-microbial activity against the bacterial cultures used in this study (Fig. S1). Flower, petal and stamen tissue extracts inhibited bacterial growth for bacterial strains Klebsiella pneumoniae, Salmonella typhi and Staphylococcus aureus at MIC ⩽ 0.0039 mg/ml and Pseudomonas aeruginosa at MIC 0.5 mg/ml, respectively. Evidently, the secondary metabolite profile containing highly diverse terpenes and high antibacterial activity of flower tissue made it stand out among other tissues. We focused on performing transcriptomics on diverse terpenoid-containing flower tissue to better understand the terpene production pathway.
Transcriptome generation and analysis
The transcriptome of any tissue reflects its biosynthetic machinery, and therefore, could reveal key genes involved in secondary metabolite production as well. For this purpose, good quality of RNA was isolated from flowers and used for transcriptome sequencing (Fig. S2). The de novo transcriptomic assembly of sequenced RNA was carried out using Trinity software to generate 32.94 million high-quality reads. The clustering of these reads resulted in 55,995 unique putative transcripts of an average length of 1208 bp and an N50 of 1808 bp (Table 1). Further, around 23,474 proteins were found with an average length ranging from 1000 to 5000 bp, transcript indicating the presence of functional proteins (Table 1).
Using the BLAST2.5.03 version, a homology search was conducted against the Viridiplantae dataset from the Uniprot database, which contains 4,269,328 protein sequences, to annotate transcripts. At least 64.73% of the transcripts were functionally annotated with high confidence (e1–5). Couroupita unigenes were functionally classified into different Gene Ontology (GO) terms (Fig. 2). Classification showed that 14.38% of the annotated genes were involved in biological processes, 42.54% in cellular components and 43.08% in molecular function (Fig. 2). Within the biological process, regulation of transcription (20.79%) and transcription (18.6%) were the two dominant GO terms, followed by terms such as metabolic processes, defence responses, translation, protein folding and transmembrane transport. Defence responses (4.6%) in the GO term suggest that Couroupita flowers are probably an active tissue for secondary metabolism. The majority of 53.54, 17.51 and 7.9% of the annotated genes fell into the GO terms of integral components of the membrane, nucleus and cytoplasm, respectively, under the category of cellular components. In the group of molecular functions, ATP binding, zinc binding, nucleic acid DNA binding and metal ion binding were the principal GO terms of molecular function, comprising 26.69, 15.2, 11.51 and 10.5% of annotated genes, respectively.
Pathway analysis was done using the KAAS4 Server. Different plants, namely, Arabidopsis thaliana (thale cress), Arabidopsis lyrate (lyrate rockcress), Brassica napus (rapeseed), Brassica rapa (field mustard), Capsella rubella, Eutrema salsugineum, Fragaria vesca (woodland strawberry), Theobroma cacao (cacao) and Vitis vinifera (wine grape), were taken as reference organisms for pathway analysis using the KAAS server. Then, KO_ID assignment of transcripts using KEGG pathway analysis was carried out to identify genes of different secondary metabolite pathways (Figs. S4–S6). We focused on screening of genes involved in terpenoid biosynthetic pathway. During the process, 45 KO_IDs related to the terpenoid pathway were assigned to a total of 67 transcripts. For the terpenoid pathway, KEGG pathway analysis indicated the presence of several terpene synthases, such as monoterpene pathway-related transcripts, namely, terpineol synthase, linalool synthase, ocimene synthase and myrcene synthase; sesquiterpene pathway-related genes, namely, germacrene D synthase and farnesene synthase; and diterpenoid pathway genes, namely, geranyllinalool synthase and ent-kaurene synthase. In addition, phenylpropanoid pathway transcripts were also mined for KEGG pathway analysis. Relevant information can be found in the Supplementary material.
Virtual Ribosome, a web-based server, was also used for finding the Open Reading Frame (ORF) of transcripts. The virtual ribosome technique was used to convert a total of 55,995 clustered transcript sequences into 48,320 peptide sequences, which were then subjected to Pfam analysis (Table S2). Of these, 43,483 peptides had lengths between 100 and 500 amino acids, which is the ideal range for proteins involved actively in cellular processes. These 48,320 submitted peptides yielded 40,745 predicted proteins belonging to different protein domains and families. Proteins belonging to the terpene synthase family were screened out by searching for two essential domains: PF01397 (the N-terminal domain) and PF03936 (the C-terminal domain or metal binding domain). A total of 24 transcripts contained these conserved domains. Among these, eight transcripts had both domains but were missing a few bases towards the N- and C-terminal ends; 13 transcripts were missing the N-terminal end; and three transcripts were missing the C-terminal end. Further, blastx studies for these total 24 transcript sequences were carried out for homology-based annotation of putative gene function. Monoterpene pathway-related genes, namely alpha-terpineol synthase, linalool synthase, geranyl linalool synthase, beta-ocimene synthase, and myrcene synthase, were identified. Sesquiterpene pathway-related genes, i.e. germacrene D synthase and farnesene synthase, were identified. The diterpenoid pathway gene, namely, ent-kaurene synthase, was identified. These results are in agreement with terpenoid metabolite profiling of flower tissues. Among the transcripts, we were able to clone and sequence three candidate full-length terpene synthase genes (Table 2).
Primary structure analysis of full-length ORF of three terpene synthases
Three candidate terpene synthase genes, A_c43359_g3_i1, A_c38347_g2_i1 and A_c45679_g1_i3, with full-length ORFs, were each successfully cloned in pET-28a as well as pET-32a expression vectors and validated through sequencing and restriction digestion studies (Fig. S2). Further full-length ORFs of the three sequenced genes were subjected to Blastx analysis. These results indicated that A_c38347_g2_i1 is putative α-farnesene synthase (sesquiterpene synthase), and henceforth A_c38347_g2_i1 is termed as Cg_Fs; A_c43359_g3_i1 is putative β-ocimene synthase (monoterpene synthase), and henceforth A_c43359_g3_i1 is termed as Cg_Os; A_c45679_g1_i3 is putative ent-kaurene synthase (diterpene synthase), and henceforth A_c45679_g1_i3 is termed as Cg_Ks, respectively.
Physiochemical properties play an important role in determining protein functions. Expasy's ProtParam tool was used for computing the physiochemical properties of all three genes. The molecular weight for cloned genes encoding terpene synthase proteins fell in the range 63.0–90.0 kD and pI fell in the range of 5.4–6.4, respectively. The three terpene protein sequences showed higher values of the aliphatic index (86.2–96.0) and lower values of Grand Average Hydropathy (GRAVY) (−0.186 to −0.284). The computed instability index for the three terpene protein sequences was >40. The estimated half-lives of these three proteins in different cell systems were predicted to be 30 h (mammalian reticulocytes, in vitro), >20 h (yeast, in vivo) and >10 h (E. coli, in vivo).
The multiple sequence alignment of amino acid sequences of cloned terpene synthases Cg_Fs, Cg_Os and Cg_Ks was carried out with that of amino acid sequences of functionally characterized terpene synthases from the UniProt database using CLUSTALW. The results revealed the presence of two highly conserved motifs, DDxxD and NSE/DTE motifs, in all three candidate genes (Fig. S3). Further, two more motifs, i.e. SAYDTAW and QxxDGSW, were also found in the putative ent-kaurene synthase of C. guianensis Cg_Ks terpene sequences (Fig. S3).
A phylogenetic tree of these three genes and similar proteins from different plant species was constructed using NgPhylogeny.fr to investigate the evolutionary relations. Cg_Fs is grouped with other farnesene synthase genes. The Cg_Os is grouped with ocimene synthases in a phylogenetic tree. Both sequences belonged to the TPS-b family group and shared a common ancestor. Cg_Ks is grouped with other ent-kaurene synthase sequences in family groups TPS-e, f. Both TPS-b and TPS-e, f share a common evolutionary origin (Fig. 3).
Gene expression analysis in C. guianensis tissue
The three terpene synthase genes, i.e. Cg_Os (putative monoterpene tricyclene/β-ocimene synthase), Cg_Fs (putative sesquiterpene α-farnesene synthase) and Cg_Ks (putative diterpene ent-kaurene synthase), were analysed for their expression in flower tissue. Semi-quantitative PCR analysis verified their expression in flower tissue. Among the three terpene synthase genes, Cg_Os showed the highest expression in flower tissues. In comparison, Cg_Fs had an expression level half that of Cg_Os. Further, Cg_Ks showed the least expression, which was around 18-fold less than that of Cg_Os in flower tissues (Fig. 4). Semiquantitative RT-PCR analysis verified the expression of three cloned terpene synthases in flower tissue with the biosynthetic potential to produce terpenes.
Protein structure-based function prediction
After cloning terpene synthase genes, we generated 3D protein structures to predict protein function and conduct in silico structural studies. Homology-based protein models of the three terpene synthases were constructed and validated (Fig. 4). Based on the best fit, the crystal structure of limonene synthase from Citrus sinensis (PDBID: 5uv0.1A) was used as a template for modelling Cg_Fs and Cg_Os terpene synthases. The crystal structure of abietadiene synthase from Abies grandis (PDBId:3s9v.1.A) was used as a template for modelling Cg_Ks terpene synthase. The models generated were validated using PDB Sum, which generated a Ramachandran plot and evaluated all its constraints (Fig. 4). For Cg_Fs, Cg_Os and Cg_Ks, respectively, 92.8, 93.3 and 90.4% of residues were observed in the favoured regions, whereas 6.6, 6.1 and 8.9% of residues were observed in the allowed regions. The protein models were deemed to be of good quality when 90% or more of the residues were found in the Ramachandran plot's preferred regions. The G-factor was in the optimal range for high-quality protein models, which was between −1.0 and 0.1.
In the homology models of putative Cg_Os and Cg_Fs proteins constructed in the study, αβ domains with a DDxxD motif in the α domain can be seen, which is a characteristic feature of proteins of the type I TPS terpene synthase gene family. In the case of a homology model of Cg KS, all three αβγ domains can be seen. The DDxxD motif was found in the α domain, whereas the DxDD motif was absent.
The validated homology models of terpene synthases were further analysed by the ProFunc web server for protein function prediction from the 3D structure. The results of the homology search are summarized in Table S2. ProFunc predicted GO terms associated with the three terpene synthases indicate that all three proteins have metal-binding capacities and take part in cellular metabolite processes. ‘Enzyme active site template’-based homology search of 3D structure by ProFunc analysis revealed that Cg_Fs and Cg_Os terpene synthases had high similarity with sesquiterpene synthase, namely, homo5-epi-aristolochene synthase from Nicotiana tabacum. Further, as shown in Table S2, a reverse template 3D structure-based search by ProFunc indicated that Cg_Fs and Cg_Os had high similarity with a hemiterpene synthase. A ‘protein 3D structure enzyme active site template’-based homology search by ProFunc for Cg_Ks predicted it to have pentalene synthase activity. Reverse template 3D structure-based ProFunc search revealed Cg_Ks to have high similarity with ent-copalyl diphosphate synthase (diterpene synthase) from A. thaliana.
Discussion
Metabolite profiling
Metabolite profiling of different plant parts showed that flower tissue contained the highest terpene and phenolic content. The terpene volatiles linalool, ocimene and farnesene, as well as the phenylpropanoid/benzenoid volatiles eugenol, isatin and phenylethyl alcohol, were the most abundant metabolites in flower tissue. These volatiles could contribute to the fragrance as well as the different biological activities of a flower. Terpenoids and phenylpropanoid/benzenoid compounds have been found as major constituents of the flower tissue of many different plants (Knudsen et al., Reference Knudsen, Eriksson, Gershenzon and Stahl2006; Dhandapani et al., Reference Dhandapani, Jin, Sridhar, Sarojam, Chua and Jang2021). Further, many reports have also substantiated the terpene linalool as a ubiquitous floral volatile. It is implicated in diverse functions, from a toxin involved in plant defence to long distance pollinator attraction (Raguso, Reference Raguso2016). In agreement, the C. guianensis flower showed the presence of phenylpropanoid and terpenoid groups as major volatile constituents.
It is specifically the flower of the plant that is used in sacred ceremonies in Indian and Asian cultures, and no other parts like the stem and leaf. To comprehend this traditional wisdom, we decided to assess the anti-microbial activity of flower, petal, stamen, stem and leaf tissues of C. guianensis in order to acquire a sense of their comparative bioactive potential & (Wiegand et al., Reference Wiegand, Hilpert and Hancock2008; Mann and Markham, Reference Mann and Markham1998). In our study, the flower showed higher bioactive potential compared to the stem and leaf. Previously, many studies have reported the antimicrobial activity of C. guianensis plant extracts against many Gram-positive and Gram-negative bacteria. In one such study, methanol extracts of leaves, flowers, fruit, stem and roots of the plant inhibited the growth of the microorganisms (Khan et al., Reference Khan, Kihara and Omoloso2003). In another study, chloroform extracts from flowers also showed antimicrobial activity (Al-Dhabi et al., Reference Al-Dhabi, Balachandran, Raj, Duraipandiyan, Muthukumar, Ignacimuthu, Khan and Rajput2012). However, no metabolite profiling was reported for any of these tissues. Our analysis of metabolite profiles identified the main volatile components in the C. guianensis flower. These metabolites have previously been found to have potent antimicrobial properties (Pauli and Kubeczka, Reference Pauli and Kubeczka2010; Chouhan et al., Reference Chouhan, Sharma and Guleria2017; Caulier et al., Reference Caulier, Nannan, Gillis, Licciardi, Bragard and Mahillon2019; Khameneh et al., Reference Khameneh, Iranshahy, Soheili and Bazzaz2019). Accordingly, metabolites could be connected to the bioactive potential of flower, which could result in a variety of documented pharmacological effects (Sanz-Biset et al., Reference Sanz-Biset, Campos-de-la-Cruz, EpiquiÃn-Rivera and Canigueral2009; Al-Dhabi et al., Reference Al-Dhabi, Balachandran, Raj, Duraipandiyan, Muthukumar, Ignacimuthu, Khan and Rajput2012; Shekhawat and Manokari, Reference Shekhawat and Manokari2016; Kaneria et al., Reference Kaneria, Rakholiya, Jakasania, Dave and Chanda2017).
Transcriptomic analysis
A great diversity of volatile terpenes was identified to make up flower tissue, compared to phenylpropanoid/benzenoids that were shown to be less diverse. Thus, we focused on the transcriptome profiling of the terpenoid pathway in flower tissue. The transcriptome of C. guianensis flower revealed the occurrence of many terpene pathway-related genes, which strongly correlated with the terpenoid profile of C. guianensis flower tissues in the study. In plants, the biosynthesis of terpenoids arises from the methylerythritol 4-phosphate (MEP) pathway in plastids and/or the mevalonate (MVA) pathway in the cytosol. The first committed step is the condensation of IPP and DMAPP into geranyl diphosphate (GPP, C10), farnesyl diphosphate (FPP, C15) and geranylgeranyl diphosphate (GGPP, C20), which are precursors for the production of mono-, sesqui- and diterpenes, respectively. Then the final cyclization and oxidation steps are carried out by the terpene synthases (TPS) and cytochrome P450s (CYP450) to generate diverse terpene structures (Srivastava et al., Reference Srivastava, Daramwar, Krithika, Pandreka, Shankar and Thulasiram2015).
The transcripts of terpenoid pathway enzymes were found to be expressed in the C. guianensis flower. These transcripts were monoterpene pathway-related transcripts, namely terpineol synthase, linalool synthase, ocimene synthase and myrcene synthase; sesquiterpene pathway-related genes, namely germacrene D synthase and farnesene synthase; and diterpenoid pathway-related genes namely geranyl linalool synthase and ent-kaurene synthase. These transcripts may be involved in the production of terpenoids confirmed in the flower tissue.
Thus, we created a transcriptomic resource for C. guianensis in this study, enabling us to mine numerous nucleotide and protein sequences implicated in the biosynthesis of terpenoids in flower tissue.
Primary structure analysis of full-length ORF of three terpene synthases
After screening out potential terpene synthase gene candidates that may be part of the terpene biosynthetic pathway, we carried out on detailed analysis of Cg_Os (putative monoterpene tricyclene/beta-ocimene synthase), Cg_Fs (putative sesquiterpene alpha-farnesene synthase) and Cg_Ks (putative diterpene ent-kaurene synthase) terpene synthase gene candidates, which serve as entry point enzymes for several terpenoid biosynthesis routes.
Multiple sequence alignments of Cg_Fs, Cg_Os and Cg_Ks genes with known terpene synthase genes revealed the presence of two highly conserved terpene synthase motifs, DDxxD and NSE/DTE motifs. The DDxxD motif is involved in the coordination of divalent metal ions (Mg2 + ) for substrate binding. NSE/DTE is also reported to be a consensus sequence (L, V) (V, L, A) (N, D) D (L, I, V) x (S, T) x x x (E) and a second divalent cation (Mg2+) binding site in terpenoid synthases in all three sequences (Bohlmann et al., Reference Bohlmann J, Meyer-Gauen and Croteau1998; Gao et al., Reference Gao, Honzatko and Peters2012). Further, two more motifs, i.e. SAYDTAW and QxxDGSW, were found in the putative ent-kaurene synthase of C. guianensis. These motifs are found to be highly conserved among ent-kaurene synthase proteins (Kim et al., Reference Kim, Han, Lim and Choi2009; Alquzar et al., Reference Alquzar, Rodre-guez, de la Pena and Pena2017). The functional role of these conserved motifs in ent-kaurene synthases remains elusive, although QxxDGSW motifs in a bacterial squalene–hopene cyclase are involved in the stabilization of the whole protein (Wendt et al., Reference Wendt, Poralla and Schulz1997). Another important motif worth mentioning is the DxDD motif, which also mediates the initial protonation of the substrate in coordination with divalent cations (Mg2+) (Zhou and Pichersky, Reference Zhou and Pichersky2020). Multiple sequence alignments revealed that the Cg_Ks transcript does not possess a conserved DxDD motif; thus, it can be annotated as a monofunctional ent-kaurene synthase.
Physiochemical properties play an important role in determining protein functions. Expasy's ProtParam tool helped predict the molecular weight, pI, aliphatic index and GRAVY score of three cloned sequences. Recently, general terpene synthase structure and function were reviewed (Tholl, Reference Tholl2006; Rafiqi et al., Reference Rafiqi, Gul, Saifi, Nasrullah, Ahmad, Dash and Abdin2019). In general, terpene synthase cDNAs encode proteins of 550–850 aa, leading to molecular masses of 50–100 kDa. The pI is the pH value at which a protein is neutral, i.e. it has zero net charge. Terpene synthases bear zero net charges at pH 5–6. An aliphatic index is an indicator of the thermostability of proteins. The three terpene proteins showed thermostability within a wider temperature range. Our calculated parameters for cloned genes encoding terpene synthase proteins are in agreement with consensus values for terpene synthases (Tholl, Reference Tholl2006). The GRAVY value of a protein is a measure of the interaction of a particular protein with water. The lower values of GRAVY of these three terpene synthases indicate the possibility of better interaction with water. The instability index evaluates the stability of a protein in vitro. Our three terpene protein sequences were predicted to be highly unstable proteins in vitro. The estimated half-lives of these three proteins in different cell systems indicated that their expression was stable for many hours. Such information on the physiochemical properties of predicted proteins is useful when utilising and characterising proteins for bioinformatics, biochemistry and biotechnology analysis.
Phylogeny analysis helps us accurately represent how molecular function evolved for any particular set of protein, and is thus often used for function predictions supported by evolutionary principles (Eisen, Reference Eisen1998). Phylogenetic tree analysis helped us gain insight into the evolutionary history of these three cloned terpene synthases compared to previously known terpene synthases. Cg_Fs was grouped with other farnesene synthase genes. Cg_Os was grouped with ocimene synthases in a phylogenetic tree. Both sequences belonged to the TPS-b family group and shared a common ancestor. Cg_Ks is grouped with other ent-kaurene synthase sequences in family groups TPS-e, f. Both TPS-b and TPS-e, f share a common evolutionary origin. Phylogenetic analysis results were in agreement with Blastx and multiple sequence alignment results.
Gene expression analysis in C. guianensis tissue
The three terpene synthase genes, i.e. Cg_Os (putative monoterpene tricyclene/beta-ocimene synthase), Cg_Fs (putative sesquiterpene alpha-farnesene synthase) and Cg_Ks (putative diterpene ent-kaurene synthase), were examined for their expression in flower tissue using semi-quantitative PCR. Among the three terpene synthase genes, Cg_Os showed the highest expression in flower tissues. In comparison, Cg_Fs had an expression level half that of Cg_Os in flower tissues. The expression of Cg_Ks was the lowest in flower tissues, almost 18-fold lower than that of Cg_Os.
Metabolite profiling results reveal ocimene and α-farnesene to be present in flower tissue at 0.16 and 0.4%, respectively. Metabolite profiling did not confirm the presence of the kaurene metabolite. Many studies have suggested that for some metabolites, high gene expression may or may not translate to high metabolite content (Iijima et al., Reference Iijima, Davidovich-Rikanati, Fridman, Gang, Bar, Lewinsohn and Pichersky2004; Redestig and Costa, Reference Redestig and Costa2011). This could partly be due to transcriptional or post-translational regulatory factors limiting enzyme activity and, therefore, metabolite biosynthesis at the levels determined. Finally, semiquantitative RT-PCR analysis verified the expression of Cg_Os, Cg_Fs and Cg_Ks cloned terpene synthases in terpene-producing flowers of C. guianensis.
Protein structure-based function prediction
Proteins are linear chains of amino acids that fold into exceedingly complex three-dimensional structures, depending on the sequence and physical interactions within the chain. The structure, in turn, determines the ultimate biological function of proteins as well as their interactions. Homology-based protein models of the three terpene synthases were constructed and validated. The Ramachandran plot score suggested the refined models were of good quality (Greener et al., Reference Greener, Filippis and Sternberg2017). The G-factor provides a measure of how ‘normal’, or ‘unusual’, a given stereochemical property, i.e. bonds, is in protein structure. If a protein has many residues with low G-factors, it indicates a less stereochemically valid structure. Ideally, G values should be above −0.5 (Rising et al., Reference Rising, Crenshaw, Koo, Subramanian, Chehade, Starks, Allen, res, Spielmann and Noel2020). For the three predicted models, the G-factor value indicated satisfactory geometry.
Generally, the plant terpene synthase TPS family consists of two types of domains, i.e. αβ or αβγ (Zhou and Pichersky, Reference Zhou and Pichersky2020). These domains can be traced from the N-terminus to the C-terminus in the forward direction as γ, β and α. Type I TPSs have the conserved DDxxD motif in the α domain, while type II TPSs have the conserved DxDD motif in the β domain. A recent review by Zhou and Pichersky (Reference Zhou and Pichersky2020) provides a detailed understanding of the 3D structure of proteins in the terpene synthase gene family. Based on homology models of putative Cg_Os, Cg_Fs and Cg_Ks proteins, it can be predicted that they belong to the type I TPS terpene synthase gene family.
The validated homology models of terpene synthases were further analysed by the ProFunc web server for protein function prediction from their 3D structures. ProFunc is a web server for predicting the likely function of proteins using predicted homology models of 3D protein structure. ProFunc makes use of the protein sequence alignment, conserved motif features, enzyme active site and ligand-binding site comparisons in the 3D structure of known proteins, etc., to functionally characterize proteins. All three predicted protein structures, Cg_Os, Cg_Fs and Cg_Ks, had metal-binding capacities and took part in cellular metabolite processes. ‘Protein 3D structure enzyme active site template’-based homology search compares against manually curated residues in PDB known from the literature to be catalytic. This search analysis gave a strong prediction for sesquiterpene synthase capability for both Cg_Fs and Cg_Os. Reverse 3D structure template-based search uses hundreds of small residue reverse templates generated by breaking down the target structure. These are then scanned against a representative set of the structures in the PDB. The approach tends to match functionally important sites. This search gave a prediction for hemiterpene synthase activity for both Cg_Fs and Cg_Os.
Recently, many studies have highlighted biochemical reaction similarities between isoprene synthase (hemiterpene synthase) and farnesene synthase (sesquiterpene synthase) (Koksal et al., Reference Koksal, Zimmer, Schnitzler and Christianson2010). A study involving Poplar isoprene synthase expression revealed that the chemistry of the elimination step yielding isoprene is identical to that yielding farnesene from farnesyl diphosphate (Pazouki and Niinemets, Reference Pazouki and Niinemets2016). In an earlier study, it was reported that isoprene synthase and β-ocimene synthase formed a monophyletic group within the TPS-b clade of terpene synthases (Sharkey et al., Reference Sharkey, Gray, Pell, Breneman and Topper2013). In agreement, we also found Cg_Os and several isoprene synthases in the TPS-b group in our phylogeny analysis. The chemistry of isoprene synthase and ocimene synthase is reported to be similar and likely affects the phylogenetic relationships among TPS-b enzymes (Koksal et al., Reference Koksal, Zimmer, Schnitzler and Christianson2010; Faraldos et al., Reference Faraldos, Gonzalez, Li, Yu, Koksal, Christianson and Allemann2012).
‘Protein 3D structure enzyme active site template’-based homology search for Cg_Ks predicted pentalene synthase activity. The enzyme pentalene synthase catalyses the cyclization of farnesyl diphosphate into pentalene, a tricyclic sesquiterpene that is the hydrocarbon precursor of the pentalenolactone family of antibiotics (Irmisch et al., Reference Irmisch, Muller, Schmidt, Gunther, Gershenzon and Kullner2015). A study dealing with detailed bioinformatics and crystalized 3D structure analysis of bacterial Bradyrhizobium japonicum kaurene synthase found that the protein structure had high homology with epi-aristolochene synthase from the plant N. tabacum, 1,8-cineole synthase from the plant Salvia fruticose and pentalene synthase from the bacterium Streptomyces (Liu et al., Reference Liu, Feng, Zheng, Huang, Nakano, Hoshino, Bogue, Ko, Chen and Cui2015). The homology analysis revealed the DDxxD motif and ND(x)6(D/E) sequence to be conserved in active sites in all of them (Liu et al., Reference Liu, Feng, Zheng, Huang, Nakano, Hoshino, Bogue, Ko, Chen and Cui2015). The crystal structure of this pentalene synthase revealed that the active site is present in the α-barrel active site and is proposed as a minimal terpenoid synthase fold preserved among a majority of terpenoid synthases in α domain (Lesburg et al., Reference Lesburg, Zhai, Cane and Christianson1997). A reverse template-based ProFunc search revealed Cg_Ks to have high similarity with ent-copalyl diphosphate synthase (diterpene synthase) from A. thaliana. The biosynthesis of diterpenoids starts with the conversion of GGPP into ent-copalyl diphosphate, catalysed by a type II enzyme, ent-copalyl diphosphate synthase. Subsequently, a class I enzyme, ent-kaurene synthase, converts ent-copalyl diphosphate to ent-kaurene (Zhou et al., Reference Zhou, Xu, Tiernan, Xie, Toyomasu, Sugawara, Oku, Usui, Mitsuhashi and Chono2012). Type II terpene synthase enzymes are characterized by highly conserved DxDD motif. Type I diterpene synthases possess characteristic DDxxD and NSE/DTE motifs (Cho et al., Reference Cho, Okada, Kenmoku, Otomo, Toyomasu, Mitsuhashi, Sassa, Yajima, Yabuta and Mori2004). Bifunctional copalyl diphosphate and kaurene synthase also occur in nature, containing both DxDD and DDxxD motifs. During multiple sequence alignment of Cg_Ks, it was confirmed that it does not possess the DxDD motif; thus, Cg_Ks cannot be an ent-copalyl diphosphate synthase and is most likely a monofunctional ent-kaurene synthase of type I terpene synthase.
After taking into account the results of both the Blastx investigation and the ProFunc analysis, it was determined that putative C. guianensis Cg_Os, Cg_Fs and Cg_Ks terpene synthases may exhibit a diverse range of catalytic properties. Earlier, many studies on several plant TPS genes revealed the existence of remarkable plasticity in terpenoid biosynthesis in higher plants (Yang et al., Reference Yang, Wang, Kimani, Li, Bao, Ning, Li, Liu, Wang and Gao2022). There is a growing body of proof that many TPSs are multi-substrate enzymes capable of producing terpenes of different chain lengths depending on corresponding substrate availability, i.e. TPSs can form monoterpenes with GDP as the substrate and sesquiterpenes with FDP as the substrate (Gao et al., Reference Gao, Honzatko and Peters2012). Therefore, accurate prediction of the enzymatic products of terpene synthases solely based on the protein similarity of terpene synthases is often difficult. However, structural studies do offer insight into the possible range of catalytic activities that may exist in terpene synthases. In our case, we can predict Cg_Fs to have isoprene or sesquiterpene (farnesene) synthase-like catalytic activity, Cg_Os to have isoprene, monoterpene (ocimene) or sesquiterpene synthase-like catalytic activity, and Cg_Ks to have pentalene or ent-kaurene synthase-like activity.
To better understand the conventional wisdom behind the revered status of C. guianensis flowers, the entire flower, petals, stamen, stem and leaf of C. guianensis were metabolically profiled in the beginning, and these tissues were also tested for antibacterial activity. The findings made it evident that flower tissue stood out from other tissues like stem and leaf due to its diverse terpenoid repertoire and strong antibacterial property. Encouraged by the findings, we concentrated our efforts on a thorough examination of floral tissue and constructed a flower transcriptome by RNA sequencing to reveal terpene metabolite pathway genes in flower.
Finally, three full-length terpene synthase gene candidates representing a putative monoterpene synthase, a putative diterpene synthase and a putative sesquiterpene synthase were cloned and sequenced. These candidates are entry point enzymes for several terpenoid biosynthesis routes. The transcript expression of three cloned terpene synthase genes was also verified in flower tissue. Furthermore, we used a variety of fast and accessible bioinformatics methods for rapid terpene synthase gene function prediction. With the use of these three gene sequences, we were able to predict protein function at the level of the 3D structure and conduct in silico structural investigations to better understand the range of terpene synthesising catalytic capabilities.
To the best of our knowledge, C. guianensis is an underexplored medicinal plant in terms of transcriptomics for secondary metabolite biosynthetic pathway studies. Our study was carried out with an exploratory perspective to characterize previously unstudied C. guianensis at metabolite and transcriptome level in detail. We have generated a transcriptomic resource for the plant to unravel hidden gene sequences involved in terpene production in flower tissue. The study can pave the way for translational work in the fields of protein engineering and metabolic engineering, where potential terpene synthase genes can be functionally validated and heterologous production of C. guianensis terpenes can be attempted in industrially friendly host systems in future.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123000953.
Availability of data
Raw reads generated from RNA sequencing of C. guianensis Aubl flower tissue were deposited at NCBI's SRA database with accession number PRJNA715623.
Acknowledgements
S. J. K. would like to thank ICMR for fellowship. P. S. would like to thank CSIR-Research Associate Fellowship (31/11(953)/2017-EMRI). We would also like to thank CSIR-National Chemical Laboratory for funding the research work through projects CSC0130 and CSC0106. We are grateful to Director, CSIR-National Chemical Laboratory, India for infrastructure and research facility. The authors declare no conflict of interest.
Author contributions
S. J. K. and P. S. performed all major experiments, data analysis and manuscript writing. S. J. K. handled transcriptomics, gene cloning and gene expression analysis. P. S. carried out RNA isolation, transcriptomics, metabolite profiling and protein structural bioinformatics. A. K. and A. P. helped in transcriptomics along with S. J. K. S. R. helped RNA isolation and antimicrobial activity along with P. S. Work was planned, supervised and critically analysed by H. V. T.