Hostname: page-component-586b7cd67f-2brh9 Total loading time: 0 Render date: 2024-11-28T03:43:42.297Z Has data issue: false hasContentIssue false

Insights from genomes and genetic epidemiology of SARS-CoV-2 isolates from the state of Andhra Pradesh

Published online by Cambridge University Press:  03 August 2021

Pallavali Roja Rani
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Mohamed Imran
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
J. Vijaya Lakshmi
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Bani Jolly
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
S. Afsar
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Abhinav Jain
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
Mohit Kumar Divakar
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
Panyam Suresh
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Disha Sharma
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India
Nambi Rajesh
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Rahul C. Bhoyar
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India
Dasari Ankaiah
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Sanaga Shanthi Kumari
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Gyan Ranjan
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
Valluri Anitha Lavanya
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Mercy Rophina
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
S. Umadevi
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Paras Sehgal
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
Avula Renuka Devi
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
A. Surekha
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Pulala Chandra Sekhar
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Rajamadugu Hymavathy
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
P.R. Vanaja
Affiliation:
Kurnool Medical College, Kurnool, Andhra Pradesh 518002, India
Vinod Scaria*
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
Sridhar Sivasubbu*
Affiliation:
CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi 110025, India Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
*
Authors for correspondence: Sridhar Sivasubbu, E-mail: [email protected]; Vinod Scaria, E-mail: [email protected]
Authors for correspondence: Sridhar Sivasubbu, E-mail: [email protected]; Vinod Scaria, E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Coronavirus disease 2019 (COVID-19) emerged from a city in China and has now spread as a global pandemic affecting millions of individuals. The causative agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is being extensively studied in terms of its genetic epidemiology using genomic approaches. Andhra Pradesh is one of the major states of India with the third-largest number of COVID-19 cases with a limited understanding of its genetic epidemiology. In this study, we have sequenced 293 SARS-CoV-2 genome isolates from Andhra Pradesh with a mean coverage of 13324X. We identified 564 high-quality SARS-CoV-2 variants. A total of 18 variants mapped to reverse transcription polymerase chain reaction primer/probe sites, and four variants are known to be associated with an increase in infectivity. Phylogenetic analysis of the genomes revealed the circulating SARS-CoV-2 in Andhra Pradesh majorly clustered under the clade A2a (20A, 20B and 20C) (94%), whereas 6% fall under the I/A3i clade, a clade previously defined to be present in large numbers in India. To the best of our knowledge, this is the most comprehensive genetic epidemiological analysis performed for the state of Andhra Pradesh.

Type
Short Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

The emergence of coronavirus disease 2019 (COVID-19) as a global pandemic has necessitated approaches to understand the evolution and transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Genome sequencing has emerged as one of the widely used approaches to understand the genetic epidemiology of SARS-CoV-2 [Reference Wu1]. The availability of the complete genome of the pathogen early in the epidemic and subsequent application of genomics on a global and unprecedented scale has provided an immense opportunity to trace the introduction, spread and genetic evolution of the SARS-CoV-2 across the globe [Reference Shu and McCauley2].

India is now a major country affected by COVID-19 with over 10 million people affected since the initial introduction of SARS-CoV-2 into the country in 2020 and subsequent introductions through travellers across major cities. These include states with significantly large populations and air travellers such as Andhra Pradesh which has a population of 49 million, with an estimated 1.5–2 million people who are part of the diaspora spread across the world. Although a number of genomes have been sequenced from different states in India [Reference Radhakrishnan3], there is a paucity of genomic data and genetic epidemiology of SARS-CoV-2 isolates from the state of Andhra Pradesh which motivated us to study the genomes from this state in detail. In the current study, we report a total of 293 SARS-CoV-2 genomes from the state of Andhra Pradesh. To the best of our knowledge, this is the first comprehensive report of the genetic epidemiology and evolution of SARS-CoV-2 from the state of Andhra Pradesh.

The study is in compliance with relevant laws and institutional guidelines and in accordance with the ethical standards of the Declaration of Helsinki and approved by Institutional Human Ethics Committee (RC. No. 03/IHC/kmcknl/2020, dated 03/08/2020). The patient consent has been waived by the ethics committee. RNA samples isolated from nasopharyngeal/oropharyngeal swabs of patients from a tertiary care teaching hospital (Kurnool Medical College) were used in the study. RNA isolation was performed using GenoSens SARS-CoV-2 PCR Viral RNA extraction reagents and the Truprep system (Molbio Diagnostics) for COVID-19. All samples were confirmed by multiplex reverse transcription polymerase chain reaction (RT-PCR).

In total, 143 726 samples were tested between 21 April to 5 August and 10 073 samples were identified as COVID-19 positive. In total, 293 samples collected between 27 June 2020 and 3 August 2020 were considered for viral genome sequencing. Library preparation and sequencing was performed as per the COVIDSeq protocol (Illumina, USA) as described in a previous study [Reference Bhoyar4]. The samples were sequenced in technical replicates.

We followed a previously published protocol for data analysis [Reference Poojary5]. Briefly, raw FASTQ files underwent quality control and adapter trimming using Trimmomatic (version 0.39) [Reference Bolger6]. The Wuhan-Hu-1 (NC_045512.2) genome was used as the reference. Replicate files were independently aligned and merged. Genomes with ≥99% coverage and ≤5% unassigned nucleotides were processed for variant calls. Genetic variants were annotated by ANNOVAR [Reference Wang7] using custom database tables for annotating the SARS-CoV-2 genome. Filtered variants were systematically compared with other viral genomes deposited in the Global Initiative on Sharing All Influenza Data (GISAID). Genomes from the GISAID were aligned with the reference genome using EMBOSS [Reference Rice8] and variants were called using SNP-Sites [Reference Page9]. Only genomes with an alignment percentage of ≥99% and degenerate bases ≤5% were used for comparative analysis. This accounts for a total of 45 830 high-quality genome sequences submitted until 26 September 2020 (Supplementary Table S1).

Phylogenetic analysis was performed following the Nextstrain protocol for analysis of SARS-CoV-2 genomes using the genomes sequenced in this study and the dataset of 3058 genomes from India deposited in the GISAID (Supplementary Table S2) [Reference Jolly and Scaria10Reference Sagulenko12]. Lineages were assigned to the genomes using the Phylogenetic Assignment of Named Global Outbreak LINeages (pangoLEARN version 2020-07-20) package [Reference Rambaut13]. Genetic variants in the genomes sequenced were mapped against RT-PCR primer/probe sites used in the molecular detection of SARS-CoV-2 [Reference Jain14] and with other variants associated with functional consequences in the viral genome which were compiled from the published studies and article pre-prints.

A total of 200 μl of VTM (Himedia, Mumbai) with throat swab samples were used to extract the viral RNA from subjects with symptoms of COVID-19 infection. A total of 10 μl of viral RNA was used for RT-PCR detection. Target genes, including ORF1ab gene, RNA-dependent RNA polymerase (RdRP gene), nucleocapsid protein (N gene) and envelope protein (E gene) were simultaneously amplified and tested during the real-time PCR assay. Confirmed SARS-CoV-2 RNA extracts with Ct values 22–28 were further processed for whole-genome sequencing.

In total, 293 SARS-CoV-2 genomes were sequenced with a mapping percentage of 97.27% and 13324X coverage (Supplementary Table S3). In total, 276 samples having genome coverage ≥99% and ≤5% unassigned nucleotides were further processed for variant calling and consensus sequence generation.

The reference-based assembly generated a total of 615 unique genetic variants. Out of these, 564 variants having read frequencies ≥50% were considered for the comparative analysis. The distribution of the variants in the genomes and their annotations are summarised in Supplementary Figure S1. Of the total variants considered for comparative analysis, 72 were found to span the spike protein region. We identified four genetic variants in the S gene which have been previously reported to be involved in increased infectivity through experimental validation [Reference Zhang15Reference Oliva22]. These mutations include 23403:A>G (D614G) and three co-occurring mutations 23403A>G+ 21575C>T (D614G + L5F), 23403A>G+ 24368G>T (D614G + D936Y), 23403A>G+ 24378C>T (D614G + S939F), having a frequency of 94.20%, 0.725%, 5.435% and 0.362%, respectively, in the 276 genomes analysed in this study. The sequence variant N440K spanning the receptor-binding domain of SARS-CoV-2 spike protein was found in 92 samples out of the 276 genomes analysed in this study. N440K is one of the hotspot residues involved in viral immune escape mechanisms and has been found to be resistant to a range of monoclonal antibodies including C135 and REGN10987 [Reference Weisblum23Reference Robbiani25]. This variant has also been reported in a case of SARS-CoV-2 reinfection, including one report from within the state of Andhra Pradesh, and has been recently reported to have higher infective fitness compared to the prevalent A2a clade [Reference Gupta26Reference Tandel28]. A total of 145 variants were annotated as deleterious by SIFT [Reference Vaser29] whereas 18 genetic variants mapped to diagnostic RT-PCR primer/probe sites (Supplementary Table S4). A total of 42 and 421 variants were predicted to map to potential B and T cell epitopes, respectively (Supplementary Table S5).

Phylogenetic analysis was conducted for the dataset of 3033 SARS-CoV-2 genomes from India including 276 genomes from this study and the genome Wuhan/WH01 (EPI_ISL_406798) as the root. Out of 276 genomes, 260 genomes (94%) clustered under the clade A2a (20A, 20B and 20C) whereas 16 were under clade I/A3i (6%), a distinct cluster of genomes previously reported from India [Reference Banu30] (Fig. 1a and 1b). The dominant lineages for the 276 genomes, as assigned by PANGOLIN, were B.1.113 (n = 129) and B.1 (n = 95) as compared to other Indian genomes where B.1.1.32 and B.6 were dominant whereas B.1 and B.1.1 lineages were dominant for genomes in the global dataset. Five and one genomes were assigned lineages B.1.112 and B.1.104, respectively, which have not been previously reported for the genomes from India (Fig. 1c).

Fig. 1. (a) Time-resolved phylogenetic reconstruction of 3033 SARS-CoV-2 genomes from India. In total, 276 genomes from this study are highlighted. (b) Proportion of clades in the 276 genomes from Andhra Pradesh and other genomes from India. (c) Distribution of PANGOLIN lineages in the genomes in this study in comparison with other genomes from India and across the world.

The two earlier genomes from Andhra Pradesh (EPI_ISL_436440 and EPI_ISL_435089) [Reference Kumar31] sampled in the months of March and April, respectively, cluster under clade I/A3i. Genomes from the neighbouring state of Telangana also show a dominant prevalence of clade I/A3i (86% of all sequenced genomes) in the initial days of the pandemic (March–April 2020) and an increased representation of clade A2a in later months with 26.6% of all sequenced genomes belonging to clade I/A3i [Reference Banu30]. The genomes sequenced in this study show a prevalence of clade A2a in Andhra Pradesh which suggests a shift in clade dominance for this region. From the 263 genomes that cluster under clade A2a, a majority of the genomes (n = 171) were observed to fall under a distinct sub-cluster that has been previously reported for genomes from Gujarat [Reference Joshi32]. The cluster is characterised by an S194L (C28854T) mutation in the nucleocapsid protein of the virus, a mutation that was found to be significantly associated with disease mortality in Gujarat [Reference Joshi32]. One genome from this study (CS0804) also forms a polytomy with other samples from the neighbouring state of Telangana in the phylogenetic tree of Indian genomes, which could be suggestive of multiple, simultaneous divergence events although further data and analysis would be needed to confirm this hypothesis reliably.

Put together, our analysis using the COVIDSeq approach and downstream data analysis has provided detailed insights into the genetic epidemiology and evolution of SARS-CoV-2 isolates in the state of Andhra Pradesh. A total of 564 high-quality unique genetic variants were identified, out of which 15 variants are novel. Extensive analysis of the functional consequences of the filtered variants has provided insights into the impact of these genetic variants in current diagnostic practices.

Phylogenetic analysis of the genomes highlights the potential shift in clade dominance from clade I/A3i to A2a in Andhra Pradesh, a trend also observed in the neighbouring state of Telangana. Lineages B.1.112 and B.1.104 were also reported for the first time from Indian genomes.

In conclusion, our study highlights the utility of whole-genome sequencing to study the genetic landscape and evolution of SARS-CoV-2 isolates in major states such as Andhra Pradesh and emphasises the use of such scalable technologies to gain better and timely insights into epidemics.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821001424

Acknowledgements

The authors acknowledge funding from CSIR India (MLP2005). AJ, BJ, MD and PS acknowledge research fellowships from CSIR-India. The funders had no role in the analysis of data, preparation of the manuscript or decision to publish.

Conflict of interest

The authors declare no conflict of interest.

Data availability statement

The data that support the findings of this study are available at NCBI Short Read Archive with Project ID PRJNA662193 with accession IDs from SAMN16707355 to SAMN16707555. Dataset of the remaining samples is available at NCBI Short Read Archive with Project ID PRJNA655577.

References

Wu, F et al. (2020) A new coronavirus associated with human respiratory disease in China. Nature 580, E7.CrossRefGoogle ScholarPubMed
Shu, Y and McCauley, J (2017) GISAID: global initiative on sharing all influenza data – from vision to reality. EuroSurveillance 22, 30494.CrossRefGoogle Scholar
Radhakrishnan, C et al. (2021) Initial insights into the genetic epidemiology of SARS-CoV-2 isolates from Kerala suggest local spread from limited introductions. Frontiers in Genetics 12, 630542.CrossRefGoogle ScholarPubMed
Bhoyar, RC et al. (2021) High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing. PLoS One 16, e0247115.CrossRefGoogle ScholarPubMed
Poojary, M et al. (2020) Computational protocol for assembly and analysis of SARS-nCoV-2 genomes. Research Reports 4, e1e14. doi: 10.9777/rr.2020.10001.Google Scholar
Bolger, AM et al. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics (Oxford, England) 30, 21142120.CrossRefGoogle ScholarPubMed
Wang, K et al. (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research 38, e164.CrossRefGoogle ScholarPubMed
Rice, P et al. (2000) EMBOSS: the European molecular biology open software suite. Trends in Genetics 16, 276277.CrossRefGoogle ScholarPubMed
Page, AJ et al. (2016) SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genomics 2, e000056.CrossRefGoogle ScholarPubMed
Jolly, B and Scaria, V (2021) Computational analysis and phylogenetic clustering of SARS-CoV-2 genomes. Bio-Protocol 11, e3999.CrossRefGoogle ScholarPubMed
Hadfield, J et al. (2018) Nextstrain: real-time tracking of pathogen evolution. Bioinformatics (Oxford, England) 34, 41214123.CrossRefGoogle ScholarPubMed
Sagulenko, P et al. (2018) TreeTime: maximum-likelihood phylodynamic analysis. Virus Evolution 4, vex042.CrossRefGoogle ScholarPubMed
Rambaut, A et al. (2020) A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 5, 14031407.CrossRefGoogle ScholarPubMed
Jain, A et al. (2021) Analysis of the potential impact of genomic variants in global SARS-CoV-2 genomes on molecular diagnostic assays. International Journal of Infectious Diseases 102, 460462.CrossRefGoogle ScholarPubMed
Zhang, L et al. (2020) SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity. Nature Communications 11, 6013.CrossRefGoogle ScholarPubMed
Becerra-Flores, M et al. (2020) SARS-CoV-2 viral spike G614 mutation exhibits higher case fatality rate. International Journal of Clinical Practice 74, e13525.CrossRefGoogle ScholarPubMed
Raghav, S et al. (2020) Analysis of Indian SARS-CoV-2 genomes reveals prevalence of D614G mutation in spike protein predicting an increase in interaction with TMPRSS2 and virus infectivity. Frontiers in Microbiology 11, 594928.CrossRefGoogle ScholarPubMed
Korber, B et al. (2020) Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 182, 812827, e19.CrossRefGoogle ScholarPubMed
Eaaswarkhanth, M et al. (2020) Could the D614G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? International Journal of Infectious Diseases 96, 459460.CrossRefGoogle ScholarPubMed
Maitra, A et al. (2020) Mutations in SARS-CoV-2 viral RNA identified in Eastern India: possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility. Journal of Biosciences 45, 76.CrossRefGoogle ScholarPubMed
Li, Q et al. (2020) The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity. Cell 182, 12841294, e9.CrossRefGoogle ScholarPubMed
Oliva, R et al. (2021) D936Y and other mutations in the fusion core of the SARS-CoV-2 spike protein heptad repeat 1: frequency, geographical distribution, and structural effect. Molecules 26, 2622.CrossRefGoogle ScholarPubMed
Weisblum, Y et al. (2020) Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants. Elife 9, e61312.CrossRefGoogle ScholarPubMed
Starr, TN et al. (2020) Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 12951310, e20.CrossRefGoogle ScholarPubMed
Robbiani, DF et al. (2020) Convergent antibody responses to SARS-CoV-2 in convalescent individuals. Nature 584, 437442.CrossRefGoogle ScholarPubMed
Gupta, V et al. (2020) Asymptomatic reinfection in two healthcare workers from India with genetically distinct SARS-CoV-2. Clinical Infectious Diseases 1451, ciaa1451.CrossRefGoogle Scholar
Rani, PR et al. (2021) Symptomatic reinfection of SARS-CoV-2 with spike protein variant N440K associated with immune escape. Journal of Medical Virology 93, 41634165.CrossRefGoogle ScholarPubMed
Tandel, D et al. (2021) N440K variant of SARS-CoV-2 has higher infectious fitness. bioRxiv.Google Scholar
Vaser, R et al. (2016) SIFT Missense predictions for genomes. Nature Protocols 11, 19.CrossRefGoogle ScholarPubMed
Banu, S et al. (2020) A distinct phylogenetic cluster of Indian severe acute respiratory syndrome coronavirus 2 isolates. Open Forum Infectious Diseases 7, ofaa434.CrossRefGoogle ScholarPubMed
Kumar, P et al. (2020) Integrated genomic view of SARS-CoV-2 in India. Wellcome Open Research 5, 184.CrossRefGoogle ScholarPubMed
Joshi, M et al. (2021) Genomic variations in SARS-CoV-2 genomes from Gujarat: underlying role of variants in disease epidemiology. Frontiers in Genetics 12, 586569.CrossRefGoogle ScholarPubMed
Figure 0

Fig. 1. (a) Time-resolved phylogenetic reconstruction of 3033 SARS-CoV-2 genomes from India. In total, 276 genomes from this study are highlighted. (b) Proportion of clades in the 276 genomes from Andhra Pradesh and other genomes from India. (c) Distribution of PANGOLIN lineages in the genomes in this study in comparison with other genomes from India and across the world.

Supplementary material: Image

Roja Rani et al. supplementary material

Roja Rani et al. supplementary material 1

Download Roja Rani et al. supplementary material(Image)
Image 1.5 MB
Supplementary material: File

Roja Rani et al. supplementary material

Roja Rani et al. supplementary material 2

Download Roja Rani et al. supplementary material(File)
File 35.8 KB
Supplementary material: File

Roja Rani et al. supplementary material

Roja Rani et al. supplementary material 3

Download Roja Rani et al. supplementary material(File)
File 7.1 KB
Supplementary material: File

Roja Rani et al. supplementary material

Roja Rani et al. supplementary material 4

Download Roja Rani et al. supplementary material(File)
File 113.5 KB