13, e1006698 (2017). Evol. The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Global epidemiology of bat coronaviruses. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. July 26, 2021. However, inconsistency in the nomenclature limits uniformity in its epidemiological understanding. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. The pangolin coronaviruses show lower similarity to SARS-CoV-2 than bat coronavirus RaTG13 across the whole genome, but higher similarity in the spike receptor binding domain, although the similarity at either scale remains too low to implicate . The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. 1 Phylogenetic relationships in the C-terminal domain (CTD). Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. 4. and P.L.) Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. Lond. "This is an extremely interesting . PureBasic 53 13 constellations Public Python 42 17 Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. Google Scholar. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. A pneumonia outbreak associated with a new coronavirus of probable bat origin. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. M.F.B., P.L. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. Sibling lineages to RaTG13/SARS-CoV-2 include a pangolin sequence sampled in Guangdong Province in March 2019 and a clade of pangolin sequences from Guangxi Province sampled in 2017. Collectively our analyses point to bats being the primary reservoir for the SARS-CoV-2 lineage. Wan, Y., Shang, J., Graham, R., Baric, R. & Li, F. Receptor recognition by the novel Coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. Split diversity in constrained conservation prioritization using integer linear programming. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide. 26 March 2020. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . 95% credible interval bars are shown for all internal node ages. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. Mol. A., Filip, I., AlQuraishi, M. & Rabadan, R. Recombination and lineage-specific mutations led to the emergence of SARS-CoV-2. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). Mol. We demonstrate that the sarbecoviruses circulating in horseshoe bats have complex recombination histories as reported by others15,20,21,22,23,24,25,26. Get the most important science stories of the day, free in your inbox. 110. We thank A. Chan and A. Irving for helpful comments on the manuscript. master 4 branches 94 tags Code AngieHinrichs Add entries for pangolin-data/-assignment 1.18.1.1 ( #512) ad16752 4 days ago 990 commits .github/ workflows Update pangolin.yml 7 months ago docs docs need guide tree now 3 years ago pangolin We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. Sci. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Dis. PubMed Wu, F. et al. performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. It compares the new genome against the large, diverse population of sequenced strains using a We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. J. Virol. Except for specifying that sequences are linear, all settings were kept to their defaults. 874850). The sizes of the black internal node circles are proportional to the posterior node support. These are in general agreement with estimates using NRR2 and NRA3, which result in divergence times of 1982 (19482009) and 1948 (18791999), respectively, for SARS-CoV-2, and estimates of 1952 (19061989) and 1970 (19321996), respectively, for the divergence time of SARS-CoV from its closest known bat relative. Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). and T.A.C. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. The origins we present in Fig. P.L. 5). 5 Comparisons of GC content across taxa. All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. Zhang, Y.-Z. performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (1730-1958) to 1877 (1746-1986), indicating that these pangolin . Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. Dudas, G., Carvalho, L. M., Rambaut, A. Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. Nature 558, 180182 (2018). Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. A.R. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. PANGOLIN lineage database (15, 16) was used to analyze the frequency of lineages among countries. 6, eabb9153 (2020). Proc. MC_UU_1201412). For weather, science, and COVID-19 . Lam, T. T. et al. He, B. et al. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. This underscores the need for a global network of real-time human disease surveillance systems, such as that which identified the unusual cluster of pneumonia in Wuhan in December 2019, with the capacity to rapidly deploy genomic tools and functional studies for pathogen identification and characterization. Genetics 176, 10351047 (2007). This leaves the insertion of polybasic. is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. A single 3SEQ run on the genome alignment resulted in 67 out of 68sequences supporting some recombination in the past, with multiple candidate breakpoint ranges listed for each putative recombinant. T.L. performed codon usage analysis. 3) clusters with viruses from provinces in the centre, east and northeast of China. Virus Evol. The red and blue boxplots represent the divergence time estimates for SARS-CoV-2 (red) and the 2002-2003 SARS-CoV (blue) from their most closely related bat virus, with the light- and dark-colored versions based on the HCoV-OC43 and MERS-CoV centered priors, respectively. Software package for assigning SARS-CoV-2 genome sequences to global lineages. Extended Data Fig. Virological.org http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339 (2020). We thank T. Bedford for providing M.F.B. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It is available as a command line tool and a web application. This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. The genetic distances between SARS-CoV-2 and RaTG13 (bottom) demonstrate that their relationship is consistent across all regions except for the variable loop. Without better sampling, however, it is impossible to estimate whether or how many of these additional lineages exist. 90, 71847195 (2016). Virology 507, 110 (2017). Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. The Artic Network receives funding from the Wellcome Trust through project no. Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. When viewing the last 7kb of the genome, a clade of viruses from northern China appears to cluster with sequences from southern Chinese provinces but, when inspecting trees from different parts of ORF1ab, the N. China clade is phylogenetically separated from the S. China clade. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Suchard, M. A. et al. CAS While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. Are you sure you want to create this branch? Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. BFRs were concatenated if no phylogenetic incongruence signal could be identified between them. J. Virol. We call this approach breakpoint-conservative, but note that this has the opposite effect to the construction of NRR1 in that this approach is the most likely to allow breakpoints to remain inside putative non-recombining regions. Share . Discovery and genetic analysis of novel coronaviruses in least horseshoe bats in southwestern China. Nat. Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). Because 3SEQ is the most statistically powerful of the mosaic methods61, we used it to identify the best-supported breakpoint history for each potential child (recombinant) sequence in the dataset. The research leading to these results received funding (to A.R. Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. Liu, P. et al. One study suggests that over a century ago, one lineage of coronavirus circulating in bats gave rise to SARS-CoV-2, RaTG13 and a Pangolin coronavirus known as Pangolin-2019, Live Science . 382, 11991207 (2020). The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. 725422-ReservoirDOCS). Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. Ji, W., Wang, W., Zhao, X., Zai, J. We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. 88, 70707082 (2014). It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Biol. This produced non-recombining alignment NRA3, which included 63 of the 68genomes. Extended Data Fig. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. Lin, X. et al. Bioinformatics 30, 13121313 (2014). ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). In other words, a true breakpoint is less likely to be called as such (this is breakpoint-conservative), and thus the construction of a non-recombining region may contain true recombination breakpoints (with insufficient evidence to call them as such). Slider with three articles shown per slide. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. Microbes Infect. Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. Our approach resulted in similar posterior rates using two different prior means, implying that the sarbecovirus data do inform the rate estimate even though a root-to-tip temporal signal was not apparent. In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. 3) to examine the sensitivity of date estimates to this prior specification. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Genetics 172, 26652681 (2006). 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). Li, Q. et al. Because there is no single accepted method of inferring breakpoints and identifying clean subregions with high certainty, we implemented several approaches to identifying three classic statistical signals of recombination: mosaicism, phylogenetic incongruence and excessive homoplasy51. PubMed We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Membrebe, J. V., Suchard, M. A., Rambaut, A., Baele, G. & Lemey, P. Bayesian inference of evolutionary histories under time-dependent substitution rates. b, Similarity plot between SARS-CoV-2 and several selected sequences including RaTG13 (black), SARS-CoV (pink) and two pangolin sequences (orange). 4 TMRCAs for SARS-CoV and SARS-CoV-2. Calibration of priors can be performed using other coronaviruses (SARS-CoV, MERS-CoV and HCoV-OC43), but estimated rates vary with the timescale of sample collection. PLoS Pathog. Nat Microbiol 5, 14081417 (2020). with an alignment on which an initial recombination analysis was done. Extended Data Fig. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. A deep dive into the genetics of the novel coronavirus shows it seems to have spent some time infecting both bats and pangolins before it jumped into humans, researchers said . The command line tool is open source software available under the GNU General Public License v3.0. J. Virol. The web application was developed by the Centre for Genomic Pathogen Surveillance. This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. All authors contributed to analyses and interpretations. 4). The first available sequence data6 placed this novel human pathogen in the Sarbecovirus subgenus of Coronaviridae7, the same subgenus as the SARS virus that caused a global outbreak of >8,000 cases in 20022003. 1. PubMed Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. Syst. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2.