Mutational signatures and transmissibility of SARS-CoV-2 Gamma and Lambda variants

08/23/2021 ∙ by Karen Y. Oróstica, et al. ∙ Max Planck Society 0

The emergence of SARS-CoV-2 variants of concern endangers the long-term control of COVID-19, especially in countries with limited genomic surveillance. In this work, we explored genomic drivers of contagion in Chile. We sequenced 3443 SARS-CoV-2 genomes collected between January and July 2021, where the Gamma (P.1), Lambda (C.37), Alpha (B.1.1.7), B.1.1.348, and B.1.1 lineages were predominant. Using a Bayesian model tailored for limited genomic surveillance, we found that Lambda and Gamma variants' reproduction numbers were about 5 16 mutations in the Spike gene, strongly correlated with the variant's transmissibility. Furthermore, the variants' mutational signatures featured a breakpoint concurrent with the beginning of vaccination (mostly CoronaVac, an inactivated virus vaccine), indicating an additional putative selective pressure. Thus, our work provides a reliable method for quantifying novel variants' transmissibility under subsampling (as newly-reported Delta, B.1.617.2) and highlights the importance of continuous genomic surveillance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Despite widespread efforts on vaccination against COVID-19, the early lifting of restrictions and emerging variants of SARS-CoV-2 endanger a smooth transition from epidemicity to endemicity [1, 2, 3, 4, 5]. Countries deploying only partially protecting vaccines and having limited resources for sustaining non-pharmaceutical interventions (NPIs) face additional challenges; larger fractions of the population remaining susceptible lead to higher levels of morbidity and mortality, and case under-reporting obscures the real extents of the spread [6, 7, 2].

Higher COVID-19 incidence increases the risk of breeding new SARS-CoV-2 variants [8]. Such variants could escape the partial immunity prevailing in parts of the population or have evolutionary advantages that facilitate their spread [9, 10]. Genomic surveillance programs worldwide have reported more than 2.3 million SARS-CoV-2 genomes to the GISAID database [11], where they are collected and shared. However, the capability to perform genomic surveillance effectively varies across countries, depending on the public policies and in low-to-middle income countries primarily on the resources to fund it [12, 13, 14, 15]. For example, in Chile, despite the governmental and private investments in genomic surveillance, the sequencing rate is around 400 samples per week. In these settings (hereafter referred to as subsampling), selecting which samples should be sequenced is fundamental for avoiding biases and misleading results. Thus, the presence of an entity coordinating the sampling and sequencing efforts (as the Chilean Public Health Institute) is mandatory.

The spread of COVID-19 in Chile has been remarkably heterogeneous, not only because of its geography and sparse urbanization but also because of the pronounced social inequalities [16, 17, 18, 19, 20, 21]. The Chilean government has deployed an ambitious vaccination program [22, 23], so that, to date, almost 60% of the total population has been fully vaccinated [24]. However, despite its success in vaccination, the spread has not been completely controlled because of unsteady NPIs due to economic pressures [25], reporting delays [26], inefficient contact tracing [27], and the comparatively low protection against infection granted by the predominant vaccine. According to official sources, more than half of the administrated doses correspond to CoronaVac [28], an inactivated SARS-CoV-2 vaccine with protection against infection [29]. Furthermore, the partial isolation of certain regions of Chile and the fast connections to Santiago, the capital city, further favors the spread of locally generated variants [30] or the insertion of new lineages in zones where there were no cases. The above highlights the importance of optimizing available genomic surveillance resources to timely alert policymakers about emerging threats, such as the Lambda lineage [31, 32]. Recently, the Delta (B.1.617.2) variant has also been reported in Chile [33, 34], and we are actively working in collecting sequencing data to incorporate it in revised versions of the manuscript.

Here, we quantified the contribution of different variants of SARS-CoV-2 to the spread of COVID-19 in Chile in 2021 and analyze the genetic drivers of the observable differences among lineages. We observed temporal variations of the genome (namely, total mutational load and individual mutations) in the samples collected, hinting at a selective pressure that prompts differentiation from the reference lineage. Growing post-infection and vaccine-induced immunity levels, and changing NPIs induce a varying susceptibility landscape, where certain variants might have comparative advantages. In that way, variants do not directly compete among each other but with the environment. Those that emerge manage to break the barrier imposed by the reduced susceptibility in the population, others either adapt or die out. Remarkably, our work includes the recently-emerged, but little-researched Lambda variant of interest. Quantifying the spreading potential of new variants through genomic surveillance enables the implementation of preventive measures. Thus, our work offers a framework for assessing the potential future impact of variants in the early stages of their spread using genomic data and Bayesian modeling.

Methods overview

We sequenced whole SARS-CoV-2 genomes of samples from different Chilean regions using a MiSeq (Illumina) platform with a 300-cycle (total) reagent kit. We assessed sequencing quality with the FastQC program, v0.11.8, and used the IRMA (v0.9.3) and MAFFT (v7.458) software to respectively assemble and align the genomes [35, 36]. To determine the lineage of each genome obtained, we used Pangolin v3.1.5 [37].

To assess the relative transmissibility of the different variants in Chile, we proposed a Bayesian model, which simulates the spread of each variant separately using a discrete renewal process [38, 39, 40]. The disease spreads with an inferred time-dependent effective reproduction number [41], with the addition that the reproduction number of each variant is modulated by a time-invariant factor . We set the Alpha variant as the reference, as it is well studied [42], defining its factor to be 1. The variable, therefore, accounts for the relative transmissibility of the variants, i.e., their relative reproduction number compared to Alpha. As data, we used the weekly relative share of the variants in all sequenced cases (i.e., the fraction a given variant represents of the total samples), assuming that these observations follow a multinomial distribution, and use the daily number of (largely non-sequenced) observed new cases to infer the absolute prevalence of the variants in time. Our model also included a small random influx of variants from abroad, which was essential to explain the sudden emergence of new variants among sequenced samples. Our method differs from the phylodynamic inference of population growth rates as implemented in BEAST 2 [43, 44] in that it does not build phylogenetic trees, but only groups the different variants together, which simplifies significantly the inference. We herewith obtained an overall description of the spreading dynamics of the different variants over seven months.

In addition, we sought to understand the relationship between mutational patterns and transmissibility of the predominant variants, integrating sequencing data and variant-level spreading parameters. We analyzed the relationship between the accumulation of both specific and total mutations and the spread of the virus to detect patterns of co-occurrence of mutations over time.

Samples analyzed in this work were collected from hospitals belonging to the influenza surveillance network, strategically distributed across the country. All samples must have tested positive in an RT-PCR SARS-CoV-2 test with a Ct value lower than 25 and were sent to ISP in Santiago for sequencing under a strict cold transportation chain. Nevertheless, contingencies and other factors related to sample transportation may cause samples to be discarded, although samples were selected proportionally to regional COVID-19 incidence. Altogether, the above implies deviations from an ideal sampling (binomial distribution). Consequently, we incorporate a factor

in our Bayesian framework, which penalizes non-ideal measures with more significant errors than expected under binomial sampling (see Methods).

Results

Quantification of the transmissibility of most predominant variants in Chile

Since January 2021, we successfully sequenced 3443 SARS-CoV-2 genomes at the Chilean Public Health Institute (ISP), identifying 86 different lineages, of which only some have persisted over time. We filtered our dataset to analyze only those lineages representing at least 20% of the total samples during one weekly observation period. Finally, we identified the Gamma (labeled as Variant of Concern, VOC), Lambda (labeled as Variant of Interest, VOI), Alpha (VOC), B.1.1.348, and B.1.1 lineages as predominant in the time frame analyzed (see Fig. 1a).

As of August 2021, the Gamma VOC, first reported in November 2020 in Manaus, Brazil [45], was the dominant variant in Chile, counting 1614 samples. It was followed by the Lambda VOI, with 790 samples identified from January. On the other hand, the Alpha (VOC, to date reported in 154 countries around the world [46]

, has been detected only 122 times in Chile. In addition to those VOCs and VOI mentioned before, we have identified 253 samples classified as B.1.1.348 and 55 as B.1.1. The following sections will further characterize the epidemiological and genomic features of the circulating variants in Chile.

Figure 1: Bayesian inference enables individual assessment of the contribution of different SARS-CoV-2 variants to the spread of COVID-19. a. Throughout 2021, five SARS-CoV-2 variants were identified as predominant in Chile, two considered Variants of Concern (VOC) by the WHO (Alpha, and Gamma), one Variant of Interest (Lambda), and two other unflagged lineages (B.1.1 and B.1.1.348). Assuming that the contribution of each variant to the spreading dynamics (a–c) is proportional to their share (i.e., the fraction they represent of the total samples, d–h), we quantified their transmissibility compared to the Alpha variant (i–m). The Lambda and Gamma variants showed a 1.05 (95% CI [1.01,1.14]) and 1.16 (95% CI [1.11,1.21]) fold higher reproduction number than the Alpha variant. Other variants had a comparatively lower influence on the spread. Shaded areas in the b–h

panels account for the 95% credible intervals of the model fit. Complementary parameters and variables are summarized in Supplementary Figure S1.

The Bayesian model fitted the daily number of cases well (Fig. 1b) by adapting the effective reproduction number (Fig. 1c) and also modeling the share of the different variants over time (Fig. 1d–h). The emergence and sudden increase in the predominance of the Lambda variant around week 12 (cf. Fig. 1g) is unlikely to be due solely to community transmission. As Lambda cases were zero or extremely low, this increase can be explained by an abrupt influx of cases (Supplementary Fig. S2d), which acted as a seed for community transmission.

We found that the inferred relative reproduction number was the lowest for the non-VOC variants B.1.1 and B.1.1.348 (Fig. 1i,j). From all the variants of concern and interest, our reference variant Alpha had the lowest reproduction number, followed by Lambda and Gamma with the highest reproduction number (Fig. 1k–m). In principle, knowing the base reproduction number of Alpha ( [42]

) enables the estimation of other variants’ base reproduction numbers by multiplying it by the corresponding factor

.

Mutational load of the Spike gene correlates with variant transmissibility

Next, we sought to understand the genomic drivers of the differences in spreading properties among variants through studying their mutations. These mutations could be insertions, deletions, or substitutions, and were typically missense, i.e., they caused an observable change in the generated amino acid sequence, thus likely having a functional effect in the translated protein [47]. We then calculated the normalized Total Mutational Load (TML), i.e., the total number of mutations observed in the sequence compared to its reference, divided by the reference length. We calculated the normalized TML for both the whole genome and solely for the Spike gene, for the most predominant circulating variants in Chile (cf. Fig. 2a).

We observed a statistically significant enrichment in mutations in the Spike gene: Differences in the median value between the normalized TML for both the whole genome sequencing and Spike gene sequencing were significant for most predominant lineages in Chile (between and ). Among the most predominant variants in Chile, Gamma had the highest number of mutations in the Spike gene, followed by Alpha, Lambda, B.1.1.348, and finally B.1.1 with the lowest TML (Fig. 2a). The Spike gene showed a marked dispersion in the normalized TML in all samples compared to the whole genome. As the main differences between variants were on the degree of mutation enrichment in the Spike gene, we explored whether there is a correlation between the TML in the Spike gene and the relative transmissibility of the different variants.

The normalized TML in the Spike gene shows a marked linear correlation with the relative transmissibility of the most predominant lineages (, Fig. 2c). The Gamma variant had the highest total mutational load in the Spike gene compared to its whole genome (cf. Fig. 2c). Furthermore, its relative reproduction number is markedly larger than other variants’, and its share among collected samples has a marked increasing trend (cf. Fig. 1h). On the other hand, the Lambda variant was found to have a lower TML in the Spike gene than the Alpha variant while having a larger relative reproduction number. However, its relative prevalence in the population shows a decreasing trend. This might suggest that the spread of a certain variant would not only be related to the number of mutations but also to the composition of the accumulated mutation pattern, reflecting synergic or epistatic interactions between mutations.

Figure 2: Predominant variants are enriched with mutations in the Spike gene. a. When analyzing their normalized total mutational load (TML), namely, the number of mutations divided by the size of the gene x1000 (1 kbp), we observe that the number of mutations in the Spike gene (yellow) is above the average (red) for all variants herein analyzed. Furthermore, variants with higher normalized TML are consistent with those contributing more strongly to the transmission. The discreteness of the Spike gene mutational load is due to the shorter gene length. The white points denote the median, black boxes denote the interquartile ranges, and whiskers (thin black lines) extend until at most 1.5 times the length of the interquartile range. b. The most predominant variants do not show a considerable drift in their average TML over time. However, the TML could not account for the replacement of mutations, or other genetic dynamics, thus suggesting that it should not be used as a stand-alone measure of variability. Dotted lines account for weeks where the variants were not observed. c.

The normalized TML in the Spike gene correlates positively with the relative contribution to the spread of the analyzed lineages. Furthermore, the discrepancy between observed TML and the linear regression is quite low, i.e.,

. Errorbars denote 95%. Vertical errorbars are those reported in Figure 1, and horizontal errorbars were estimated through bootstrapping.

Local samples differ significantly from reference genomes

Chilean samples of different variants systematically exhibit mutation patterns that are not present in the reference lineages, i.e., they drift from minimal list of defining mutations presented in Supplementary Table S1. We selected the mutations present in at least 1% of the samples of each lineage and sorted them according to their frequency of occurrence (Fig. 3a–e). In the five most predominant lineages, besides the lineage-defining mutations we observe novel ones, not only restricted to the Spike gene.

The B.1.1 lineage, first reported on March 1, 2020, has been detected in 47,500 sequences worldwide. However, in South America, Brazil alone has contributed with 20,702 sequences. Signature mutations of this lineage are in the N gene (R203K, G204R), the Spike gene (D614G), and in the ORF1b gene (P314L) [48]. Among the samples we have sequenced in Chile (), the most predominant non-definitory mutations in the Spike protein were A262S and G1167A (Fig. 3e and Table S1).

The B.1.1.348 lineage is widely present in South America, and some countries in Europe [11]. Our analysis found four mutations in the Spike protein: D614G, G1167A, R346K, and S373P. The R346K mutation, together with A348T and N354K, has been suggested to improve the transmissibility of SARS-CoV-2 due to its higher binding affinity to the ACE2 receptor [49]. In addition, the S373P mutation in the RBD domain has been reported to escape immunity granted by mRNA vaccines partially and to decrease plasma therapy success [50]. Therefore, both R346K and S373P mutations in B.1.1.348 are of particular interest and suggest the need to carefully observing their progression since they have been reported to favor the transmission of the virus and simultaneously reduce the effectiveness of vaccination [50].

The Alpha VOC, first detected in the UK in mid-2020, has among its defining non-synonymous mutations the N501Y mutation in the RBD domain and deletion at positions 69 and 70 of the Spike protein, associated with enhanced transmissibility and pathogenicity [51, 52]. In particular, its base reproduction number has been estimated to be around 4.5 [42]. To date, we have analyzed 122 samples classified as the Alpha variant, and additionally, to its defining mutations, we found three new Spike mutations in about 20% of the samples: G1219V, L938F, and S493P. The S493P is in close contact with the ACE2 binding region because it is located in the RBD domain. Furthermore, evolutionary analyzes have found that the S493P is under strong positive selection bias, altering human ACE2 binding affinity [53].

The Lambda VOI has eight defining mutations in the ORF1a gene (T1246I, P2287S, F2387V, L3201P, T3255I, G3278S, P314L, and 3675-3677) and seven mutations on the Spike gene (246-252, G75V, T76I, L452Q, F490S, D614G, and T859N) [54, 55]. To date, 790 samples have been identified as Lambda from January 2021, making it the second most predominant variant in Chile. Additionally to its defining mutations, in Chilean Lambda samples, we found the R246_D253delinsN mutation as a characteristic deletion and insertion of the lineage and no deletions in the ORF1a gene (Fig. 3e). Other less predominant non-synonymous mutations are presented in Fig. 3e and summarized in Supplementary Table S1.

The Gamma VOC has 21 defining mutations, of which ten occur on the Spike gene (L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y and T1027I) [56, 45], where the most relevant are K417T, E484K, and N501Y, located in the receptor-binding domain (RBD) of Spike [45] (Table S1). Complementary to its defining mutations, we discovered among the Chilean Gamma samples an additional mutation in the Spike, V1176F, which has been reported to produce a more severe course of the disease [57]. Additionally, we found mutations on non-structural proteins such as ORF3a, NSP3, and NSP12, which play a relevant role in the evolution and spread of the virus.

Figure 3: There exist different groups of co-occurring mutations under the same lineage category, indicating convergent evolution. a–e. We analyzed the sequences of all samples collected for the predominant variants and analyzed whether they present a given mutation (filtered to be presented in at least 1% of the samples). The mutations are grouped and sorted by co-occurrence, and observation frequency grows to the right of the plots. Mutations highlighted in red have only been found in this study in the respective lineages. Supplementary Table S1 summarizes the defining mutations for each lineage.

Temporal drift of predominant variants suggests evolutionary pressure driven by vaccination

As variants drifting from their lineage reference genome defined marked groups (cf. Fig. 3

), we explored whether there was a temporal structure in their evolution. We analyzed the frequency of non-defining mutations observed among the samples for each lineage, selecting only those trends that present the most considerable variance.

Samples belonging to B.1.1 and B.1.1.348 lineages (Fig. 4a,b) presented the greater variability in their mutational profile. Although being highly dispersed, we did not observe a marked temporal variation in these trends (Fig. 2b yellow and blue lines). Furthermore, their occurrence was less frequent than the other lineages (especially for B.1.1), suggesting that the subsampling-induced noise can explain parts of the variability. Nonetheless, these two lineages seemed to be strongly affected by vaccination, as they completely disappeared by the end of the analyzed period. As vaccination progressed, the NSP3_F106F mutation for B.1.1 ( Fig. 4a), and NSP12_N733N and NSP13_L438L mutations for B.1.1.348 (among other mutations highlighted in bold in Fig. 4b) steadily became less frequent.

Figure 4: Signatures of the settlement, replacement, and selection of mutations in the different observed lineages of SARS-CoV-2. Throughout 2021, the set of mutations that are present in the analyzed samples of the predominant lineages has changed. This temporal evolution of the mutational footprint of the lineages can be quantified by the proportion of the analyzed samples which present a given mutation. We selected mutations with the largest temporal variability for each lineage, and we present their evolution as a heat-map. a–e: Evolution of the fraction of the samples presenting a given mutation for the B.1.1 (a), B.1.1.348 (b), Alpha (c), Gamma (d), and Lambda (e) variants, respectively, with their number of observations. Triangle markers in the lower end of each heat-map account for the progress in vaccination. We can observe that, as vaccination progressed, some mutations were not frequent in the analyzed samples anymore. For example, the P411P (NSP12 gene), S36S (NSP2 gene), F106F and F1089F (NSP3 gene) mutations in Alpha (c) and the D156D (NSP1 gene), Y31Y (NSP9 gene), D139D (NSP12 gene), R203R (N gene) and D10D, P1200P, V1298V (NSP3 gene) in Gamma variants (d) were not frequently observed after reaching the fully vaccinated population milestone. On the other hand, mutations in the B.1.1 (a) variant are highly variable, which could be an subsampling artifact, and the mutational profile of the Lambda variant does not significantly change over time or with the progress of vaccination (e).

Lineages with the highest relative reproduction number and TML, i.e., Gamma, Lambda, and Alpha (cf. Fig. 2), persisted throughout vaccine roll-out. For instance, several Alpha-lineage specific mutations became less frequent as vaccination progressed (Fig. 4c): Mutations in the NSP2, NSP3, and NSP12 genes became less frequent after vaccination was noticeable, while mutations in the Spike gene started to appear by the end of the analyzed period. Similarly, Gamma variant lost several mutations in the NSP1, NSP3, NSP12, NSP9, and N genes, while keeping most of the mutations in the Spike domain (Fig. 4d). Remarkably, the Gamma lineage steadily increased its share among the circulating variants, as more samples were identified as variants of it.

Finally, the circulating Lambda variants showed a relatively stable set of predominant mutations, as variability was only noticeable when Lambda samples were less frequent (Fig. 4e, histogram). However, there was a slight tendency to incorporate more mutations in the Spike protein.

Discussion

Using a Bayesian model, we estimated the relative transmissibility of Chile’s most predominant SARS-CoV-2 variants and explored the genetic rationale behind the observed behaviors. Our results agree well with those reported in the literature and constitute —to our knowledge— the first approach to quantifying the spreading properties of the Lambda variant. We also update early estimates of the transmissibility of Gamma variant, incorporating novel mutations identified in our samples. Finally, we analyzed our results under the light of the genomic properties of the circulating lineages, correlating transmissibility with the number of mutations in the Spike gene and finding significant effects.

For inferring the transmissibility of the different variants, we made a set of assumptions aiming to overcome the challenges posed by subsampling in countries with limited genomic surveillance. Our methodology is general and easily adaptable to scenarios with more circulating variants and other countries. Even though more data would help to narrow the confidence intervals for our inferred parameters, the results were statistically consistent and agree well with the evidence already reported

[45, 58]. For example, the mean increase in reproduction number for the Gamma variant compared to Alpha found in [58] corresponds to , which is close to our result . However, whether differences between variants are due to immune escape or enhanced transmissibility cannot be disentangled from the data available.

The estimated reproduction number () depends not only on the estimated spreading rate but also on the choice of the generation interval of infections (namely, the time between consecutive infections). However, whether changes in are due to changes in the spreading rate or the generation interval cannot be disentangled from the data available. Therefore, if different variants should have different generation intervals, the estimations of their relative reproduction numbers would be affected. We performed our analysis during a period where was close to one, minimizing the effect of potentially differing generation intervals between variants. However, differing generation intervals among variants would lead to larger differences on the estimated relative reproduction numbers in waves featuring a high .

We assumed that the influx of infections (and thereby, of variants) was proportional to the COVID-19 incidence in neighboring countries and evenly distributed across all tracked variants. Currently, the influx corresponds to a tiny percentage of the total cases (, cf. Fig. 1b and Fig. S2a–e). However, more exact modeling would be required when neighboring countries have considerably more cases than the country of study, as the influx can considerably affect community spread.

Besides the modeling aspects, our results strongly rely on data and are thus affected by its quality. Samples analyzed in this work were collected from hospitals belonging to the influenza surveillance network, strategically distributed across the country, and proportional to the local COVID-19 incidence at the time of collection. Despite these efforts, logistic challenges related to sample transportation might have caused deviations from optimal representative sampling. However, as described in Methods, our model can mitigate these effects by penalizing highly correlated samples. Sequencing capabilities also increased considerably in the period analyzed as more resources were destined for genomic surveillance. Besides, as novel variants are officially defined a considerable time after their emergence, previous samples have to be reassigned (as was the case for the Lambda variant), posing a trade-off between analyzing newly sequenced samples or historical data. Overall, we can assume that the data is representative and that our modeling framework corrects any deviations from this trend.

Mutations in the observed samples were typically missense, i.e., caused changes at a protein level. Although all vaccines, and therefore vaccine-elicited antibodies, are targeted towards the SARS-CoV-2 Spike protein, mutational data suggests that evolutionary pressure was also exerted on other viral genes. This fact becomes evident after reaching the 20% vaccination milestone, i.e., the eldest 20% of the population was vaccinated, on weeks 13–14. Many earlier mutations in non-Spike genes disappeared during this transition, while others increased their frequency (cf. Supplementary Table S5

and Fig 4). However, we cannot infer a causal relationship between extinction and the appearance of mutations with the vaccination process with the data we have. Receding lineages (B.1.1 and B.1.1.348) tended to develop new Spike mutations before disappearing, while thriving lineages (Gamma and Lambda) tended to conserve and fix pre-existing Spike mutations. In contrast, all lineages, except for Lambda, consistently developed non-Spike mutations during vaccine roll-out. The remaining variants were probably selected through epistatic fitness of a restricted protein subgroup, particularly Spike (S) and the nucleocapsid (N) protein

[59]. On the other hand, our genomic data shows that both Gamma and Lambda lineages seem to have evolved successful Spike mutations previous to vaccination campaigns, suggesting a better survival of these variants when confronted to vaccine-elicited antibodies.

Furthermore, our data suggests co-occurring mutations between the Spike and other viral proteins. This could also be putatively epistatic and driven by host adaptation. However, the evidence of extinction of non-synonymous mutations in non-Spike proteins suggests a selection mechanism that combines wiping-out of variants generated by genetic drift together with positive selection of fittest Spike and, possibly, nucleocapsid variants. This process seems to have eliminated virus variants that were unfit when confronted with vaccine-elicited antibodies and also carried inconsequential non-synonymous mutations, which resulted in the fixation and thriving of escape variants. However, whether there is causality behind this correlation should be separately studied.

Specific mutations in the Spike gene of Gamma and Lambda variants were crucial for the survival of these variants during vaccine roll-out. For the Gamma variant, Spike mutations have been associated to enhanced transmissibility (N501Y) and with partial immune escape (K417T and E484K) [60]. For the Lambda variant, Spike mutations L452Q, F490S and deletion 246-252 conferred partial immune escape against neutralizing antibodies elicited by CoronaVac, and a higher infectiousness than the Gamma variant [61]. However, regarding its transmissibility, our results indicate it is not higher than that of the Gamma variant (cf. Fig 1l,m).

Even though some of the lineages we report have been already studied in other countries and settings, Chilean samples differ from the GISAID/Pangolin reference genomes. In the context of transmissibility, variants showing a considerable drift from the original lineage can have enhanced properties. These properties make them more transmissible (i.e., increases) or counteract the partial host population’s natural immunity to the lineage, thus enabling its community transmission. The above suggests the necessity of thoroughly characterizing both genomic and epidemiological properties of variants and highlights the importance of performing genomic surveillance both in community infections and entry points to the country. In this sense, genomic surveillance may allow identifying hidden fast-spreading lineages before they become a threat. In turn, experts can propose timely preventive policies that are more accessible and entail fewer infection and death risks compared to later corrective policies. For instance, the use of vaccine-dependent green passports or mobility allowances could be instated or withheld following the advance or retreat of fast-spreading or vaccine-resistant lineages. Although variants that emerged abroad resulted from different selective pressures (host immune response, vaccination, NPIs), they thrived in the particular Chilean environment and thus must be controlled swiftly to avoid new pandemic waves [21].

In summary, the methodology proposed in this work, supported by sufficient active genomic surveillance, can promptly detect all the circulating variants and estimate their transmissibility. Quantifying their contribution to contagion early on, we can assess whether they will endanger containment should they become predominant, and thus enable early eradication if they are evaluated to pose a threat. Therefore, through genomic surveillance, we could detect situations in which early control and lockdown could save us months of restrictions and fatalities.

Methods

Nucleic acid extraction and amplification

Nasopharyngeal samples, previously confirmed as positive for SARS-CoV-2, were used for total nucleic acid extraction using the automated system Zybio EXM 6000. Reverse transcription for cDNA synthesis was performed with SuperScript III One-Step RT-PCR System with Platinum Taq Kit, RNase OUT (Invitrogen) with 2 mM random primers and 4.5 M DTT at 55C for 60 min. cDNA was amplified based on COVID-19 ARTIC Illumina Library Construction and Sequencing Protocol V.3 (Farr, 2020), generating two pools with 400 pb length amplicons covering the whole viral genome.

Library preparation

DNA fragments from each pool were mixed together and library was prepared with Illumina DNA PREP kit (Illumina, San Diego, CA, USA), purified using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA) and quantified by Victor Nivo Fluorimeter (Perkin Elmer) using Quant-it dsDNA HS Assay Kit (Invitrogen). DNA libraries were sequenced in a MiSeq (Illumina) using a 300 cycles kit. Around 0.3 GB of data was obtained for each sample.

Whole Genome Sequence analysis

Sequence quality was analyzed with FastQC software v0.11.8. Readings were filtered and trimmed with BBDuk software considering a minimum of 36 bases length and a quality above 20. Genome assembly was performed with IRMA software v0.9.3 using as a the reference sequence the NCBI entry NC_045512.2. Genomes were aligned with MAFFT v7.458 and the lineages for the assembled sequences were assigned with Pangolin v3.1.5 [37]. Final genomes with epidemiological metadata were submitted to https://www.gisaid.org/ for the final quality check and the corrected lineages. We analyzed 3956 SARS-CoV-2 sequencing samples in the Chilean Public Health Institute (ISP) obtained from January 2021 to date, of which 3443 obtained good results in terms of quality and genome coverage. We used Pangolin to assign the variant classification for samples with good quality measures.

Determination of Total Mutational load

From the mutational data, we implemented an mutation count matrix by considering all types of mutations and deletions. In the matrix, is the number of samples (2726, considering only those belonging to the five lineages studied herein), and is the number of genes (25 genes). Therefore, the value in entry indicates the number of mutations and deletions of gene in the sample . Later, we computed the Total Mutational Load (TML), equivalent to the total number of mutations, divided by the length of the reference of the Spike gene and the whole genome, by 1 kb (kilobases) for each sample

(1)

where accounts for the sequence length, 3821 and 29903 Kbp for the Spike gene and whole genome, respectively. We then study whether there was a statistically significant enrichment of mutations in the Spike gene. For that, we first applied a Levene’s test for evaluating whether, for a given lineage, the distributions of normalized TML for the whole genome and the Spike gene only have equal or different variances. Then, as the test confirmed that variances were different for all lineages, we used a non-parametric Mann–Whitney U test to assess whether the medians of the categories were significantly different for every variant. Results for both assessments are summarized in Supplementary Table S2.

Inferring the variant specific contribution to the spread

We built our model on top of our existing spreading dynamic model [41] to assess the relative transmissibility of the different variants in Chile. Given different data, this model can be easily adapted for other countries or time frames.

We simulated the spread of each variant independent whereby the susceptible pool was shared across the different variants. For each variant we computed the number of newly exposed iteratively given a prior distributions and the generation interval distribution

with hyperprior

. This follows the work of [38, 39, 40]. To account for non pharmaceutical intervention or other measures against the spread we introduced the time-dependent effective reproduction number , which is allowed a change every 14 days relative to the previous reproduction number.

For each variant the effective reproduction number was modulated by the time-invariant factor , called relative reproduction number in the text. Additionally to account for cases induced by travel we also add a small random influx for each variant which was scaled by the reported case numbers in the neighboring countries (we used Argentina, Peru and Brazil). In discrete form the spreading dynamics in our model read as:

(2)
(3)
(4)
(5)

Whereby is the population size of our considered country (Chile). The susceptible pool gets initialized with the population size. The prior distributions for the initial new cases, influx and the time-invariant contribution factor were set to

(6)
(7)
(8)
(9)
(10)

The external input was modeled in a weekly fashion, indexed by , to decrease the number of variables to be estimated. In addition to the five variants mentioned in the main text, we also include in our model the share of sequenced cases not categorized into these five variants (). In contrary to the other five main variants, the relative reproduction number of these other variants is allowed to vary over time (described later).

Let be the measured number of samples successfully sequenced (from samples having a positive PCR test), corresponding to variant . Let be the total number of sequenced samples and the inferred relative case numbers of the variant at time compared to the total non variant case numbers. If we model the number of samples corresponding to a variant

as a multinomial random variable, and assuming that samples collected for sequencing are independent, we can build the multinomial likelihood function for our model with our real world data

and and the fraction of variant from the model:

(11)

The fraction is obtained from the model by the fraction between daily cases of a variant and total daily cases.

(12)

However in our model we do not use this multinomial likelihood function but instead parameterize our model using the conjugate distribution, the Dirichlet distribution. In theory it is equivalent to using the multinomial distribution. The advantage is that we can add a factor that parameterizes an eventual non-optimal sampling strategy, for example, samples that are not being perfectly randomized across the country but are correlated to some extent. This has mathematically the consequence that the measured fractions are all reduced by a factor . Thus, the resulting likelihood function is given by:

(13)
(14)

To infer the slowly changing reproduction number we introduce sigmoidal change points relative to the previous reproduction number whereby the priors for the date of occurrence of the change point are set every 14 days. The transient length such as the date of each change point are defined relatively flat to express our uncertainty in these values.

(16)
(17)
(18)
(19)
(20)
(21)

For the five variants that we focused on in the main text, is multiplied by an time-invariant relative reproduction number . For the spread of the ‘other variants’ that we modeled separately, we multiplied this by a time dependent as the mixture of variants can slowly change over time. We assumed the this change is slower than the :

(22)
(23)
(24)
(25)
(26)
(27)

Additional to the sequenced samples we constrain our model using the publicly reported case numbers (in Chile) aggregated by the Johns Hopkins University [62]. We sum over the newly infected pools for all variants to obtain the total number of new infections . These are than delayed with the LogNormal kernel with mean delay to account for a reporting delay and further modulated by a weekly absolute sinus function parameterized by an amplitude and an offset .

(28)
(29)
(30)
(31)

The likelihood given the reported case numbers is than modeled by a StudentT distribution and quantifies the similarity between model outcome and the available real-world time series. The scale factor heuristically incorporates the measurement noise.

(32)
(33)

For a complete list of model parameters and priors see Table S3 and Table S4 respectively.

Author Contributions

Conceptualization: KYO, SC, SBM, JD, SB, ÁO-N, VP
Methodology: KYO, SBM, JD, VP
Software: KYO, SBM, JD
Validation: KYO, SC, JD, SBM, SB, ÁO-N, JF, VP
Formal analysis: KYO, SC, JD, SBM, DM-O
Investigation: KYO, SC, JD, SBM, AC, KM, PC, SU, AC, DM-O
Resources: KYO, JF, AC, KM, SO, PC
Data curation: KYO, JF, AC
Writing - Original Draft: KYO, SC, SB, SBM, JD, ENI, ÁO-N, AC, KM, PC, SU, RAV
Writing - Review & Editing: KYO, SC, JD, SBM, ENI, JF, VP, RAV
Visualization: KYO, SC, JD, SBM
Supervision: ÁO-N, JF, VP
Project administration: KYO, SC, ÁO-N, VP
Funding acquisition: ÁO-N, JF, VP.

Data availability

Some source code for data generation and analysis is available online on GitHub https://github.com/Priesemann-Group/covid19_inference. Sequencing of the test was done at the Chilean Public Health Institute (ISP). All genomes sequenced by ISP are hosted in the GISAID Initiative [11]. Additionally for the Bayesian inference we used the daily case reports for Chile, Brazil, Argentina and Peru aggregated by the Johns Hopkins University [62]. inline,caption=,color=mypurpleinline,caption=,color=mypurpletodo: inline,caption=,color=mypurpleNote
If you already want to have a look into the code for the Bayesian analysis feel free to write a message to SBM at sebastian.mohr@ds.mpg.de. It will be publicly available at a later point in time.

Acknowledgments

We thank the Priesemann group for exciting discussions and for their valuable input, and the Molecular Genetics and Viral Diseases Sub Departments of the ISP for their valuable assistance. We thank Anamaria Sanchez Daza for carefully reading, commenting, and improving the manuscript. All authors with affiliation (2) received support from the Max-Planck-Society. SC, DM-O, and ÁO-N received funding from PIA-FB0001, ANID, Chile. JD and SBM received funding from the "Netzwerk Universitätsmedizin" (NUM) project egePan (01KX2021). This project is also supported by grant no. COVID0557 by ANID.

References

  • [1] Sebastian Contreras and Viola Priesemann. Risking further COVID-19 waves despite vaccination. The Lancet Infectious Diseases, 2021.
  • [2] Simon Bauer, Sebastian Contreras, Jonas Dehning, Matthias Linden, Emil Iftekhar, Sebastian B Mohr, Álvaro Olivera-Nappa, and Viola Priesemann. Relaxing restrictions at the pace of vaccination increases freedom and guards against further covid-19 waves. arXiv preprint arXiv:2103.06228, 2021.
  • [3] Joao Viana, Christiaan H van Dorp, Ana Nunes, Manuel C Gomes, Michiel van Boven, Mirjam E Kretzschmar, Marc Veldhoen, and Ganna Rozhnova. Controlling the pandemic during the SARS-CoV-2 vaccination rollout: a modeling study. Nature communications, 12(3674):1–15, 2021.
  • [4] Jennie S Lavine, Ottar N Bjornstad, and Rustom Antia. Immunological characteristics govern the transition of COVID-19 to endemicity. Science, 371(6530):741–745, 2021.
  • [5] Sarah Cobey, Daniel B Larremore, Yonatan H Grad, and Marc Lipsitch. Concerns about sars-cov-2 evolution should not hold back efforts to expand vaccination. Nature Reviews Immunology, 21(5):330–335, 2021.
  • [6] Sebastian Contreras, Jonas Dehning, Sebastian B Mohr, F Paul Spitzner, and Viola Priesemann. Low case numbers enable long-term stable pandemic control without lockdowns. medRxiv, 2020.
  • [7] Sebastian Contreras, Jonas Dehning, Matthias Loidolt, Johannes Zierenberg, F Paul Spitzner, Jorge H Urrea-Quintero, Sebastian B Mohr, Michael Wilczek, Michael Wibral, and Viola Priesemann. The challenges of containing SARS-CoV-2 via test-trace-and-isolate. Nature communications, 12(1):1–13, 2021.
  • [8] Robin N Thompson, Edward M. Hill, and Julia R. Gog. Sars-cov-2 incidence and vaccine escape. Lancet Infectious Diseases, 2021.
  • [9] Jessica A. Plante, Brooke M. Mitchell, Kenneth S. Plante, Kari Debbink, Scott C. Weaver, and Vineet D. Menachery. The variant gambit: COVID’s next move. Cell Host & Microbe, 2021.
  • [10] Debra Van Egeren, Alexander Novokhodko, Madison Stoddard, Uyen Tran, Bruce Zetter, Michael Rogers, Bradley L Pentelute, Jonathan M Carlson, Mark S Hixon, Diane Joseph-McCarthy, et al. Risk of evolutionary escape from neutralizing antibodies targeting SARS-CoV-2 spike protein. medRxiv, 2020.
  • [11] Shu, Yuelong and McCauley, John. Gisaid: Global initiative on sharing all influenza data – from vision to reality. Eurosurveillance, 22(13), 2017.
  • [12] David Cyranoski. Alarming COVID variants show vital role of genomic surveillance. Nature, 589(7842):337–338, January 2021. Bandiera_abtest: a Cg_type: News Number: 7842 Publisher: Nature Publishing Group.
  • [13] M Shaheen S Malick and Helen Fernandes. The genomic landscape of sars-cov-2: Surveillance of variants of concern. Advances in Molecular Pathology, 2021.
  • [14] Andrew W Bartlow, Earl A Middlebrook, Alicia T Romero, and Jeanne M Fair. How cooperative engagement programs strengthen sequencing capabilities for biosurveillance and outbreak response. Frontiers in Public Health, 9:163, 2021.
  • [15] Mohamed Helmy, Mohamed Awad, and Kareem A Mosa. Limited resources of genome sequencing in developing countries: challenges and solutions. Applied & translational genomics, 9:15–19, 2016.
  • [16] Gonzalo E. Mena, Pamela P. Martinez, Ayesha S. Mahmud, Pablo A. Marquet, Caroline O. Buckee, and Mauricio Santillana. Socioeconomic status determines covid-19 incidence and related mortality in santiago, chile. Science, 2021.
  • [17] Nicolò Gozzi, Michele Tizzoni, Matteo Chinazzi, Leo Ferres, Alessandro Vespignani, and Nicola Perra. Estimating the effect of social inequalities on the mitigation of covid-19 across communities in santiago de chile. Nature communications, 12(1):1–9, 2021.
  • [18] Magdalena Bennett. All things equal? heterogeneity in policy effectiveness against covid-19 spread in chile. World development, 137:105208, 2021.
  • [19] Danton Freire-Flores, Nyna Llanovarced-Kawles, Anamaria Sanchez-Daza, and Álvaro Olivera-Nappa. On the heterogeneous spread of covid-19 in chile. Chaos Solitons & Fractals, 2021.
  • [20] Sebastián Contreras, H Andrés Villavicencio, David Medina-Ortiz, Juan Pablo Biron-Lattes, and Álvaro Olivera-Nappa. A multi-group SEIRA model for the spread of COVID-19 among heterogeneous populations. Chaos, Solitons & Fractals, 136:109925, 2020.
  • [21] Andrés E Castillo, Bárbara Parra, Paz Tapia, Jaime Lagos, Loredana Arata, Alejandra Acevedo, Winston Andrade, Gabriel Leal, Carolina Tambley, Patricia Bustos, et al. Geographical distribution of genetic variants and lineages of sars-cov-2 in chile. Frontiers in public health, 8:525, 2020.
  • [22] Alison Shepherd. Covid-19: Chile joins top five countries in world vaccination league. BMJ, 2021.
  • [23] Ximena Aguilera, Adrian P. Mundt, Rafael Araos, and Thomas Weitzel. The story behind chile’s rapid rollout of covid-19 vaccination. Travel Medicine and Infectious Disease, 2021.
  • [24] Esteban Ortiz-Ospina Max Roser, Hannah Ritchie and Joe Hasell. Coronavirus pandemic (covid-19). Our World in Data, 2020. https://ourworldindata.org/coronavirus, (Europe, America, and Oceania and Asia).
  • [25] Kenzo Asahi, Eduardo A. Undurraga, Rodrigo Valdés, and Rodrigo Wagner. The effect of covid-19 on the economy: Evidence from an early adopter of localized lockdowns. Journal of Global Health, 2021.
  • [26] Sebastián Contreras, Juan Pablo Biron-Lattes, H Andrés Villavicencio, David Medina-Ortiz, Nyna Llanovarced-Kawles, and Álvaro Olivera-Nappa. Statistically-based methodology for revealing real contagion trends and correcting delay-induced errors in the assessment of COVID-19 pandemic. Chaos, Solitons & Fractals, 139:110087, 2020.
  • [27] Ministerio de Salud de Chile (MINSAL) Department of Epidemiology. Tech Report: National strategy for test-trace-and-isolate (COVID-19), 3–9, July, 2021 (Estrategia Nacional de Testeo, Trazabilidad y Aislamiento COVID-19, SEMANA DEL 3 - 9 DE JULIO, 2021). https://www.minsal.cl/wp-content/uploads/2021/07/Indicadores-de-Testeo-y-Trazabilidad-13072021.pdf.
  • [28] MINSAL MINSAL. Vacunas contra sars- cov-2 utilizadas en chile mantienen altos niveles de efectividad para evitar hospitalización, ingreso a uci y muerte, Aug 2021.
  • [29] Alejandro Jara, Eduardo A Undurraga, Cecilia González, Fabio Paredes, Tomás Fontecilla, Gonzalo Jara, Alejandra Pizarro, Johanna Acevedo, Katherine Leo, Francisco Leon, et al. Effectiveness of an inactivated sars-cov-2 vaccine in chile. New England Journal of Medicine, 2021.
  • [30] Jorge González-Puelma, Jacqueline Aldridge, Marco Montes de Oca, Mónica Pinto, Roberto Uribe-Paredes, Jose Fernandez-Goycoolea, Diego Alvarez-Saravia, Hermy Álvarez, Gonzalo Encina, Thomas Weitzel, Thomas Weitzel, Rodrigo Muñoz, Rodrigo Muñoz, Álvaro Olivera-Nappa, Sergio Pantano, Sergio Pantano, and Marcelo A. Navarrete. Mutation in a sars-cov-2 haplotype from sub-antarctic chile reveals new insights into the spike’s dynamics. Viruses, 2021.
  • [31] Mónica L Acevedo, Luis Alonso-Palomares, Andrés Bustamante, Aldo Gaggero, Fabio Paredes, Claudia P Cortés, Fernando Valiente-Echeverría, and Ricardo Soto-Rifo. Infectivity and immune escape of the new sars-cov-2 variant of interest lambda. medRxiv, 2021.
  • [32] Pedro E Romero, Alejandra Dávila-Barclay, Guillermo Salvatierra, Luis González, Diego Cuicapuza, Luis Solis, Pool Marcos-Carbajal, Janet Huancachoque, Lenin Maturrano, and Pablo Tsukayama. The emergence of sars-cov-2 variant lambda (c. 37) in south america. medRxiv, 2021.
  • [33] Eduardo Lopez Mora, Jorge Espinoza, Jeannette Dabanch, and Rodrigo Cruz. Emergencia de variante delta-b. 1.617. 2. su impacto potencial en la evolución de la pandemia por sars-cov-2. Boletín Micológico, 36(1), 2021.
  • [34] Pilar Vargas. Comunicado de sochinf sobre variante delta en chile, Jul 2021.
  • [35] Samuel S. Shepard, Sarah Meno, Justin Bahl, Malania M. Wilson, John Barnes, and Elizabeth Neuhaus. Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler. BMC Genomics, 17(1):708, September 2016.
  • [36] Kazutaka Katoh, Kazuharu Misawa, Kei‐ichi Kuma, and Takashi Miyata.

    MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

    Nucleic Acids Research, 30(14):3059–3066, July 2002.
  • [37] Andrew Rambaut, Edward C. Holmes, Áine O’Toole, Verity Hill, John T. McCrone, Christopher Ruis, Louis du Plessis, and Oliver G. Pybus. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology, 5(11):1403–1407, November 2020. Bandiera_abtest: a Cg_type: Nature Research Journals Number: 11 Primary_atype: Research Publisher: Nature Publishing Group Subject_term: Classification and taxonomy;Phylogenetics;Phylogeny;SARS-CoV-2;Viral evolution Subject_term_id: classification-and-taxonomy;phylogenetics;phylogeny;sars-cov-2;viral-evolution.
  • [38] Christophe Fraser. Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic. PLoS ONE, 2(8), August 2007.
  • [39] Seth Flaxman, Swapnil Mishra, Axel Gandy, H Juliette T Unwin, Thomas A Mellan, Helen Coupland, Charles Whittaker, Harrison Zhu, Tresnia Berah, Jeffrey W Eaton, and Others. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature, pages 1–8, 2020.
  • [40] Jan M. Brauner, Sören Mindermann, Mrinank Sharma, David Johnston, John Salvatier, Tomáš Gavenčiak, Anna B. Stephenson, Gavin Leech, George Altman, Vladimir Mikulik, Alexander John Norman, Joshua Teperowski Monrad, Tamay Besiroglu, Hong Ge, Meghan A. Hartwick, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal, and Jan Kulveit. Inferring the effectiveness of government interventions against COVID-19. Science, 2020.
  • [41] Jonas Dehning, Johannes Zierenberg, F Paul Spitzner, Michael Wibral, Joao Pinheiro Neto, Michael Wilczek, and Viola Priesemann. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science, 2020.
  • [42] Nicholas G. Davies, Sam Abbott, Rosanna C. Barnard, Christopher I. Jarvis, Adam J. Kucharski, James D. Munday, Carl A. B. Pearson, Timothy W. Russell, Damien C. Tully, Alex D. Washburne, Tom Wenseleers, Amy Gimma, William Waites, Kerry L. M. Wong, Kevin van Zandvoort, Justin D. Silverman, CMMID COVID-19 Working Group1‡, COVID-19 Genomics UK (COG-UK) Consortium‡, Karla Diaz-Ordaz, Ruth Keogh, Rosalind M. Eggo, Sebastian Funk, Mark Jit, Katherine E. Atkins, and W. John Edmunds. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science, March 2021.
  • [43] Erik M. Volz and Igor Siveroni. Bayesian phylodynamic inference with complex models. PLOS Computational Biology, 14(11):e1006546, November 2018.
  • [44] Remco Bouckaert, Timothy G. Vaughan, Joëlle Barido-Sottani, Sebastián Duchêne, Mathieu Fourment, Alexandra Gavryushkina, Joseph Heled, Graham Jones, Denise Kühnert, Nicola De Maio, Michael Matschiner, Fábio K. Mendes, Nicola F. Müller, Huw A. Ogilvie, Louis du Plessis, Alex Popinga, Andrew Rambaut, David Rasmussen, Igor Siveroni, Marc A. Suchard, Chieh-Hsi Wu, Dong Xie, Chi Zhang, Tanja Stadler, and Alexei J. Drummond. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Computational Biology, 15(4):e1006650, April 2019.
  • [45] Nuno R. Faria, Thomas A. Mellan, Charles Whittaker, Ingra M. Claro, Darlan da S. Candido, Swapnil Mishra, Myuki A. E. Crispim, Flavia C. S. Sales, Iwona Hawryluk, John T. McCrone, Ruben J. G. Hulswit, Lucas A. M. Franco, Mariana S. Ramundo, Jaqueline G. de Jesus, Pamela S. Andrade, Thais M. Coletti, Giulia M. Ferreira, Camila A. M. Silva, Erika R. Manuli, Rafael H. M. Pereira, Pedro S. Peixoto, Moritz U. G. Kraemer, Nelson Gaburo, Cecilia da C. Camilo, Henrique Hoeltgebaum, William M. Souza, Esmenia C. Rocha, Leandro M. de Souza, Mariana C. de Pinho, Leonardo J. T. Araujo, Frederico S. V. Malta, Aline B. de Lima, Joice do P. Silva, Danielle A. G. Zauli, Alessandro C. de S. Ferreira, Ricardo P. Schnekenberg, Daniel J. Laydon, Patrick G. T. Walker, Hannah M. Schlüter, Ana L. P. dos Santos, Maria S. Vidal, Valentina S. Del Caro, Rosinaldo M. F. Filho, Helem M. dos Santos, Renato S. Aguiar, José L. Proença-Modena, Bruce Nelson, James A. Hay, Mélodie Monod, Xenia Miscouridou, Helen Coupland, Raphael Sonabend, Michaela Vollmer, Axel Gandy, Carlos A. Prete, Vitor H. Nascimento, Marc A. Suchard, Thomas A. Bowden, Sergei L. K. Pond, Chieh-Hsi Wu, Oliver Ratmann, Neil M. Ferguson, Christopher Dye, Nick J. Loman, Philippe Lemey, Andrew Rambaut, Nelson A. Fraiji, Maria do P. S. S. Carvalho, Oliver G. Pybus, Seth Flaxman, Samir Bhatt, and Ester C. Sabino. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science, 372(6544):815–821, May 2021. Publisher: American Association for the Advancement of Science Section: Research Article.
  • [46] Áine O’Toole, Verity Hill, and GISAID. COV-lineages: B.1.1.7. https://cov-lineages.org/global_report_B.1.1.7.
  • [47] Shaolei Teng, Adebiyi Sobitan, Raina Rhoades, Dongxiao Liu, and Qiyi Tang. Systemic effects of missense mutations on SARS-CoV-2 spike glycoprotein stability and receptor-binding affinity. Briefings in Bioinformatics, 22(2):1239–1253, October 2020.
  • [48] Lineage Mutation Tracker from Outbreak.info: B.1.1. https://outbreak.info/situation-reports?pango=B.1.1.
  • [49] Rui Wang, Jiahui Chen, Kaifu Gao, Yuta Hozumi, Changchuan Yin, and Guo-Wei Wei. Characterizing SARS-CoV-2 mutations in the United States. Research Square, pages rs.3.rs–49671, August 2020.
  • [50] Elmira Mohammadi, Fatemeh Shafiee, Kiana Shahzamani, Mohammad Mehdi Ranjbar, Abbas Alibakhshi, Shahrzad Ahangarzadeh, Leila Beikmohammadi, Laleh Shariati, Soodeh Hooshmandi, Behrooz Ataei, and Shaghayegh Haghjooy Javanmard. Novel and emerging mutations of SARS-CoV-2: Biomedical implications. Biomedicine & Pharmacotherapy, 139:111599, July 2021.
  • [51] Erik Volz, Swapnil Mishra, Meera Chand, Jeffrey C. Barrett, Robert Johnson, Lily Geidelberg, Wes R. Hinsley, Daniel J. Laydon, Gavin Dabrera, Áine O’Toole, Robert Amato, Manon Ragonnet-Cronin, Ian Harrison, Ben Jackson, Cristina V. Ariani, Olivia Boyd, Nicholas J. Loman, John T. McCrone, Sónia Gonçalves, David Jorgensen, Richard Myers, Verity Hill, David K. Jackson, Katy Gaythorpe, Natalie Groves, John Sillitoe, Dominic P. Kwiatkowski, Seth Flaxman, Oliver Ratmann, Samir Bhatt, Susan Hopkins, Axel Gandy, Andrew Rambaut, and Neil M. Ferguson. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature, 593(7858):266–269, May 2021. Bandiera_abtest: a Cg_type: Nature Research Journals Number: 7858 Primary_atype: Research Publisher: Nature Publishing Group Subject_term: Population genetics;SARS-CoV-2;Viral infection Subject_term_id: population-genetics;sars-cov-2;viral-infection.
  • [52] Dan Frampton, Tommy Rampling, Aidan Cross, Heather Bailey, Judith Heaney, Matthew Byott, Rebecca Scott, Rebecca Sconza, Joseph Price, Marios Margaritis, Malin Bergstrom, Moira J. Spyer, Patricia B. Miralhes, Paul Grant, Stuart Kirk, Chris Valerio, Zaheer Mangera, Thaventhran Prabhahar, Jeronimo Moreno-Cuesta, Nish Arulkumaran, Mervyn Singer, Gee Yen Shin, Emilie Sanchez, Stavroula M. Paraskevopoulou, Deenan Pillay, Rachel A. McKendry, Mariyam Mirfenderesky, Catherine F. Houlihan, and Eleni Nastouli. Genomic characteristics and clinical effect of the emergent SARS-CoV-2 B.1.1.7 lineage in London, UK: a whole-genome sequencing and hospital-based cohort study. The Lancet Infectious Diseases, 0(0), April 2021. Publisher: Elsevier.
  • [53] Sandipan Chakraborty. Evolutionary and structural analysis elucidates mutations on SARS-CoV2 spike protein with altered human ACE2 binding affinity. Biochemical and Biophysical Research Communications, 538:97–103, January 2021.
  • [54] Priscila Lamb Wink, Fabiana Caroline Zempulski Volpato, Francielle Liz Monteiro, Julia Biz Willig, Alexandre Prehn Zavascki, Afonso Luís Barth, and Andreza Francisco Martins. First identification of SARS-CoV-2 Lambda (C.37) variant in Southern Brazil. medRxiv, page 2021.06.21.21259241, June 2021. Publisher: Cold Spring Harbor Laboratory Press.
  • [55] outbreak.info.
  • [56] Felipe Gomes Naveca, Valdinete Nascimento, Victor Costa de Souza, André de Lima Corado, Fernanda Nascimento, George Silva, Ágatha Costa, Débora Duarte, Karina Pessoa, Matilde Mejía, Maria Júlia Brandão, Michele Jesus, Luciana Gonçalves, Cristiano Fernandes da Costa, Vanderson Sampaio, Daniel Barros, Marineide Silva, Tirza Mattos, Gemilson Pontes, Ligia Abdalla, João Hugo Santos, Ighor Arantes, Filipe Zimmer Dezordi, Marilda Mendonça Siqueira, Gabriel Luz Wallau, Paola Cristina Resende, Edson Delatorre, Tiago Gräf, and Gonzalo Bello. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nature Medicine, pages 1–9, May 2021. Bandiera_abtest: a Cg_type: Nature Research Journals Primary_atype: Research Publisher: Nature Publishing Group Subject_term: SARS-CoV-2;Virology Subject_term_id: sars-cov-2;virology.
  • [57] Ádám Nagy, Sándor Pongor, and Balázs Győrffy. Different mutations in SARS-CoV-2 associate with severe and mild outcome. International Journal of Antimicrobial Agents, 57(2):106272, February 2021.
  • [58] Finlay Campbell, Brett Archer, Henry Laurenson-Schafer, Yuka Jinnai, Franck Konings, Neale Batra, Boris Pavlin, Katelijn Vandemaele, Maria D Van Kerkhove, Thibaut Jombart, Oliver Morgan, and Olivier le Polain de Waroux. Increased transmissibility and global spread of sars-cov-2 variants of concern as at june 2021. Eurosurveillance, 26(24), 2021.
  • [59] Nash D. Rochman, Yuri I. Wolf, Guilhem Faure, Pascal Mutz, Feng Zhang, and Eugene V. Koonin. Ongoing global and regional adaptive evolution of SARS-CoV-2. Proceedings of the National Academy of Sciences, 118(29), July 2021. Publisher: National Academy of Sciences Section: Biological Sciences.
  • [60] William T. Harvey, Alessandro M. Carabelli, Ben Jackson, Ravindra K. Gupta, Emma C. Thomson, Ewan M. Harrison, Catherine Ludden, Richard Reeve, Andrew Rambaut, Sharon J. Peacock, and David L. Robertson. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Microbiology, 19(7):409–424, July 2021. Bandiera_abtest: a Cg_type: Nature Research Journals Number: 7 Primary_atype: Reviews Publisher: Nature Publishing Group Subject_term: Protein analysis;SARS-CoV-2;Vaccines;Viral evolution;Viral infection Subject_term_id: protein-analysis;sars-cov-2;vaccines;viral-evolution;viral-infection.
  • [61] Mónica L. Acevedo, Luis Alonso-Palomares, Andrés Bustamante, Aldo Gaggero, Fabio Paredes, Claudia P. Cortés, Fernando Valiente-Echeverría, and Ricardo Soto-Rifo. Infectivity and immune escape of the new SARS-CoV-2 variant of interest Lambda. medRxiv, page 2021.06.28.21259673, July 2021. Publisher: Cold Spring Harbor Laboratory Press.
  • [62] E. Dong, H. Du, and L. Gardner. An interactive web-based dashboard to track covid-19 in real time. The Lancet. Infectious Diseases, 20:533 – 534, 2020.

Supplementary Information

Supplementary Figure S1: Progress of the vaccination program in Chile and the OxRCTT stringency index during vaccine rollout.
Supplementary Figure S2: Posterior distributions of further parameters of the Bayesian model. a-f: The external influx is low for most variants, following approximately our prior assumptions. An exception is Lambda, which eventually features a large influx when the measured fraction increased. Note however that the credible intervals of this influx are large, meaning that the model cannot decide whether the sudden increase Lambda cases is due to a large influx, or to a previous subsampling of Lambda cases (compare with Fig. 1 g). g-h: The relative reproduction number (compared to Alpha) of the modeled ‘other variants’ spreading in Chile. These are the variants that are not separately modeled, and therefore are allowed to change their relative reproduction number over time. i-k Prior (gray) and posterior (green) distributions of some other parameters of the model.
Genes
Lineage ORF1a ORF1b S ORF3a ORF8 N
B.1.1 P314L D614G
R203K
G204R
B.1.1.348
L1175F
V3718F
P314L
D614G
G1167A
G174D
S2Y
R203K
G204R
Alpha
T1001I
A1708D
I2230T
del3675/3677
P314L
del69/70
del144/145
N501Y
A570D
D614G
P681H
T716I
S982A
D1118H
Q27*
R52I
Y73C
D3L
R203K
G204R
S235F
Lambda
T1246I
P2287S
F2387V
L3201P
T3255I
G3278S
del3675/3677
P314L
G75V
T76I
del247/253
L452Q
F490S
D614G
T859N
P13L
R203K
G204R
G214C
Gamma
S1188L
K1795Q
del3675/3677
P314L
E1264D
L18F
T20N
P26S
D138Y
R190S
K417T
E484K
N501Y
D614G
H655Y
T1027I
V1176F
S253P E92K
P80R
R203K
G204R
Table S1: Characteristic mutations in prevalent lineages.
Lineage
Levene
test
Kolmogorov-Smirnov
test
t-test with
different variances
U-Test
Gamma 6.34e-19 0.0 0.0 0.0
Lambda 0.0 0.0 0.0 0.0
Alpha 5.11e-06 0.0 0.0 0.0
B.1.1 8.49e-05 7.87e-05 0.0001 0.0018
B.1.1.348 0.0 0.0 0.0 0.0
Values lower than were considered as zero.
Table S2: Statistical assessment of mutation enrichment in the Spike gene.
Variable Parameter
Effective Reproduction number
New infectious
Susceptible pool
Generation interval
External influx
Population size (19276715)
Delay of case detection
Reported (summed) cases in neighboring countries
Length of change point
Transient length of change point
Log-transformed reproduction number of each change point
Measured number of samples sequenced
Total number of sequenced samples
Fraction of variants in circulation
Contribution of variant to spread
Amplitude of weekend corrections
Phase shift of weekend correction
Subscript Denotes a distinct variant
Subscript Denotes discretized time
Subscript Denotes a change poit
Table S3: Overview of model parameters.
Variable Parameter
Table S4: List of priors.
Lineage
Synonymous mutations
becoming extinct
Non-synonymous mutations
becoming more predominant
B.1.1
E_V14V, NSP3_F106F, NSP13_L438L,
NSP4_N244N
NSP13_E341D, NSP3_A231V, NSP3_A579V,
NSP3_P1469S, NSP4_L438P, NSP4_T492I,
NSP5_G15S, NSP6_S106-F108del, NSP8_T141M,
N_P80R, N_S202T, N_S235F, ORF3a_S253P,
ORF8_E92K, ORF8_Q27*
B.1.1.348
NSP12_N733N, NSP13_L438L, NSP3_F106F,
NSP4_A416A, N_R203R, ORF3a_F43F
NSP12_I695T, NSP1_L27F, NSP3_A1215T,
NSP3_K1386N, NSP3_T678I, NSP9_G38S,
Gamma
NSP12_D139D, NSP1_D156D, NSP3_D10D,
NSP3_F106F, NSP3_P1200P, NSP3_V1298V,
NSP9_Y31Y, N_R203R
None
Alpha
NSP12_P411P, NSP2_S36S, NSP3_F106F,
NSP3_F1089F
NSP3_F709L, NSP6_L260F, N_R203fs,
Lambda None None
Table S5: Mutations becoming extinct and more predominant during vaccination roll-out in non-Spike proteins.