Relative Contagiousness of Emerging Virus Variants: An Analysis of SARS-CoV-2 Alpha and Delta Variants

10/01/2021 ∙ by Peter Reinhard Hansen, et al. ∙ 0

We propose a simple dynamic model for estimating the relative contagiousness of two virus variants. Maximum likelihood estimation and inference is conveniently invariant to variation in the total number of cases over the sample period and can be expressed as a logistic regression. Using weekly Danish data we estimate the Alpha variant of SARS-CoV-2 to increase the reproduction number by a factor of 1.51 [CI 95 ancestral variant. The Delta variant increases the reproduction number by a factor of 2.17 [CI 95 of 3.28 [CI 95 proportion of an emerging virus variant is straight forward and we proceed to show how the effective reproduction number for the new variant can be estimated without contemporary sequencing results. This is useful for assessing the state of the pandemic in real time as we illustrate empirically with the inferred effective reproduction number for the Alpha variant.



There are no comments yet.


page 12

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

During the fall of 2020, confirmed cases of COVID-19 grew rapidly in the UK with the emergence of the Alpha variant of SARS-CoV-2 (B.1.1.7) formerly known as the British variant, see Rambaut et al. (2020). The Alpha variant was shown to be more contagious than earlier lineages, see Volz et al. (2021) and Washington et al. (2021). Moreover, infection with the Alpha variant was found to increase the risk of hospitalization by about 42%, see Bager et al. (2021). India experienced a similar explosive growth in COVID-19 cases in April 2021 following the emergence of the Delta variant (B.1.617.2), formerly known as the Indian variant. This variant is estimated to increase the risk of hospitalization by about 85%, see Sheikh et al. (2021).

In this paper, we formulate a simple model for two virus variants of an infections disease, where the object of interest is the relative contagiousness, denoted by

. The time series of new-variant cases to total cases can be modeled as binomially distributed variables with a time-varying parameter,

. The dynamic properties of are given by the relative contagiousness that can be estimated solely from changes in the relative proportion of the two variant. The analysis is therefore invariant to the reproduction number for the existing lineage and time variation therein. This is convenient because the reproductive number varies substantially over time due to changes in behavior and preventive measures along with other factors. The analysis is also invariant to testing intensity and time variation therein, so long as the variation in sampling does not “favor” one variant over another. Our starting point is maximum likelihood analysis of the sequenced tests, which leads to a simple logistic regression after some straight forward algebra. This greatly simplifies the estimation, inference, and prediction.

The basic reproduction number, , is an important characteristic of an infections disease and during a pandemic the effective reproduction number, , provides a measure of the direction for cases numbers during the pandemic. The time it takes to determine the genome in positive cases present an obstacle for assessing the reproduction number for a new virus variant. Such data are typically only available with some delay. We show that the highly predictable nature of the proportion of a new variant can be used to infer its reproduction number from the aggregate reproduction number and the most recently available estimate of the proportion, . The effective reproduction number for all cases is simpler to compute and the proportion of a new variant can be projected forward, typically with high accuracy. This makes it possible to compute the effective reproduction number of a new variant with a simple formula before contemporaneous sequencing results are available.

We apply the methodology to weekly Danish data using two sample periods. The first sample period, November 9, 2020 to March 14, 2021 (18 weeks), is the period where the Alpha variant made its inroad in Denmark and the second sample period, May 17 to July 25, 2021 (10 weeks), is the period where the Delta variant grew to dominance in Denmark. The Danish data are excellent for studying the progression of a new variant of SARS-COV-2 because the vast majority of confirmed COVID-19 cases are being sequenced. Moreover, testing is extensive in Denmark. The number of weekly PCR tests varied between nearly 500 thousands and over 1 million tests per week during the two sample periods for a population of about 5.8 million individuals. The proportion of the new-variant cases increased from <0.5% to over 90% for both variants during their respective sample periods, see Figure 1. The progression of the Alpha variant is shown in the left panel Figure 1

and the progression of the Delta variant in the right panel, along with 95% confidence intervals for each week. It can be seen that the Delta variant progressed substantially faster than the Alpha variant. For instance, it took the Alpha variant about eight weeks to increase from <10% to >90%, while this same growth was achieved by the Delta variant in just 4 weeks.

Figure 1: Weekly proportions of the Alpha variant and Delta variant relative to all cases.

The paper is organized as follows: We present the statistical model in Section 2, where the time series of binomially distributed variables is expressed as a logistic regression. We present the data and the empirical results on relative contagiousness in Section 3. Section 4 presents the two auxiliary results on prediction and inferring the latent reproductive number for a new emerging variant. We illustrate how the proportion of new-variant cases can be predicted and derive the associated confidence bands. We apply this to the Alpha variant data and review how accurate the estimated model was at predicting the realized proportion of the Alpha variant out-of-sample. Then we derive the method for estimating the latent reproductive number for a new emerging variant and apply the method to the Alpha variant data. Section 5 has some concluding remarks, and we present some details about the maximum likelihood estimation and robust standard errors in the Appendix.

2 The Statistical Model

In this section we present the simple dynamic structure for the case numbers of two competing virus variants. The structure is not specific to competing virus variants, but could be used to analyze other competing objects.

2.1 Two Competing Virus Variants

Consider a virus with two variants, and , where the number of new cases in period are denoted and , respectively. We use to represent a new, emerging variant whereas represents the older variant. The rate of growth in case numbers for the old variant is denoted, , which depends on its contagiousness and the number of “opportunities” the virus has to jump from an infected individual to another person. The latter is heavily influenced by preventive measures and restrictions imposed by health authorities, seasonality, percentage of susceptible people in the population, individual behavior, along with many other things. The new virus variant is subject to the same level of “opportunities”, but differs in terms of its contagiousness. Thus, its rate of growth is proportional to , i.e.

where is the parameter of interest that captures the relative contagiousness of variant to variant .

In period , the genome is determined in of the new cases, where are variant and are variant . We assume is a representative random sample from the population of new cases such that , with . It follows that


Note that the dynamic equation for depends on the ratio but not the actual values of and . This greatly simplifies the analysis and estimation and inference about becomes invariant to a range of changes during the sample period that influence the rate of change in case numbers. Equation (1) defines the function , which is strictly increasing for . Figure 2 presents two examples of . The case is shown in the left panel and in the right panel along with the progression of the weekly observed proportions of the two variants.

Figure 2: Scatterplot of the empirical proportion of the New Variant this week against that of the previous week. The solid line is with for the Alpha variant in the left panel and for the Delta variant in the right panel.

The progression of a far more contagious variant can seem deceptively slow in the early phase. If we take the case , which is our estimate for the Alpha variant, then it takes four weeks for the new variant to increase from 1 in 1,000 cases to 1 in a 100 cases. Then another four weeks to increase to 1 in 10 cases. After that, it picks up the pace and the new variant becomes the dominant variant (

) four weeks later and reaches +90% of all cases after another four weeks. So, it can take several months from the moment the first case of a new and more contagious variant is observed to the time when the new variant begins to have a noticeable impact on the total number of cases. After the new variant becomes dominant it can radically change the momentum of the pandemic. A relatively stable period can suddenly become one where cases grow exponentially, because the new dominant variant has a larger reproduction number. This scenario played out for both the Alpha and Delta variants in many places, such as the August 2021 surge in total cases in Florida, Texas, along with other states in the US.

2.2 The Likelihood Analysis and Logistic Regression

Let denote the total number of new cases in period for which the genome is identified, and let be the number of cases that are identified as the new emerging variant. The log-likelihood function for a sample , , is proportional to

where evolves according to (1). The two unknown parameters, the initial value and , can be estimated by maximum likelihood, , and confidence intervals for and

can be obtained with conventional methods. The likelihood can conveniently be expressed as a logistic regression model. For this purpose, we introduce the odds ratio,

, and it is simple to show that (1) is equivalent to the simple dynamic equation, . This implies that

where and . Since , the structure of the logistic regression model emerges such that


This model is straight forward to estimate and analyze using standard software implementations, including the generalized linear model package, glm, that is implemented in R and Julia. In the empirical analysis we estimate the model by maximum likelihood and compute robust standard errors from the score and hessian of the log-likelihood function, see White (1980). The details are presented in the Appendix.111Identical estimates were obtained with the glm packaged in Julia, see Besançon et al. (2019) and Lin et al. (2021). The proper command for the glm package in Julia is: glm(@formula(x / n ~ time_trend), [data], wts = n, Binomial()) and in it is: glm(x/n ~ tt, weights=n, [data], family = binomial), see R Core Team (2018) for details. The latter was kindly provided by Peter Dalgaard. The glm package computes the non-robust standard errors based on the Fisher information. These were smaller than the robust standard errors, in particular in our analysis of the Delta variant. Robust and non-robust confidence intervals are reported in the Appendix.

3 Empirical Analysis of Alpha and Delta Variants

Weekly data for the sequenced COVID-19 tests were obtained from the Statens Serum Institute, Denmark, and the vast majority of positive COVID-19 tests have their genome identified in Denmark. The weekly numbers positive PCR COVID-19 tests, the number of tests with the genome identified, , and the number of tests for which the new emerging variant was found, , are presented in Table 1 along with the percentages of positive tests for which the genome was determined and the percentage of these tests that were the new variant.

Week Tested Cases Sequenced Alpha cases Alpha proportion
(PCR) ()
46 490,543 7,533 1,486 (19.7%) 4 0.27%
47 502,852 8,456 1,941 (23.0%) 3 0.15%
48 502,851 8,774 2,127 (24.2%) 7 0.33%
49 544,578 12,816 2,868 (22.4%) 11 0.38%
50 694,989 21,925 4,226 (19.3%) 16 0.38%
51 883,253 24,579 4,943 (20.1%) 37 0.75%
52 650,374 17,043 3,633 (21.3%) 64 1.76%
53 536,958 14,560 3,916 (26.9%) 80 2.04%
1 563,348 11,311 4,161 (36.8%) 157 3.77%
2 596,048 7,008 4,230 (60.4%) 298 7.04%
3 739,922 5,321 3,688 (69.3%) 473 12.83%
4 768,925 3,616 2,660 (73.6%) 519 19.51%
5 794,917 3,096 2,235 (72.2%) 663 29.66%
6 809,028 2,716 1,974 (72.7%) 929 47.06%
7 833,795 3,335 2,416 (72.4%) 1,590 65.81%
8 956,070 3,688 2,683 (72.7%) 2,042 76.11%
9 1,033,111 3,616 2,699 (74.6%) 2,299 85.18%
10 1,056,404 3,809 2,874 (75.5%) 2,657 92.45%
Week Tested Cases Sequenced Delta cases Delta proportion
(PCR) ()
20 1,167,981 6,867 5,366 (78.1%) 13 0.24%
21 1,013,403 6,698 5,213 (77.8%) 15 0.29%
22 911,764 5,662 4,565 (80.6%) 36 0.79%
23 720,274 2,811 2,467 (87.8%) 66 2.68%
24 575,207 1,649 1,364 (82.7%) 91 6.67%
25 524,837 1,315 1,165 (88.6%) 345 29.61%
26 608,540 2,674 2,418 (90.4%) 1,555 64.31%
27 624,414 4,614 3,322 (72.0%) 2,702 81.34%
28 583,932 6,818 6,253 (91.7%) 5,781 92.45%
29 473,843 5,289 4,800 (90.8%) 4,591 95.65%
Table 1: Weekly Danish data for positive SARS-CoV-2 tests: Cases, sequenced, and Alpha cases. Source: Status for udvikling af B.1.1.7 og andre mere smitsomme varianter i Danmark, SSI, April 7, 2021 and Status for udvikling af SARS-CoV-2 Varianter der overvåges i Danmark SSI, August 27, 2021. Data available at: and

A preliminary probing of the data can be done by considering the empirical odds ratios of new-variant cases to old-variant cases. This ratio should be approximately proportional to , such that the ratio of consecutive odds ratios,

Thus we can use the ratio of consecutive odds ratios as a measurements of in week . These empirical ratios and the corresponding confidence intervals are shown in Figure 3.222Weekly crude measures for NHS England STP areas were reported in Volz et al. (2021, figure 3), who used the median as an estimate of the reproductive advantages. The crude measures tend to have large confidence intervals early in the sample because the number of new-variant cases is small. The width of the confidence intervals are also influenced by the number of tests that are being sequenced, . For instance, in week 25, this number was relatively small for the simple reason that there were few positive COVID-19 cases in Denmark that week – just 1,315 positive cases of which 1,165 were successfully sequenced. The crude measures for the Alpha variant in the left panel of Figure 3 stabilizes about their average value, 1.73. For the Delta variant, the crude measures are substantially larger and more disperse. The progression of the Delta variant was particularly rapid in weeks 25 and 26.

Figure 3: The weekly growth in new-variant cases divided by the weekly growth in old-variant cases, , is plotted with 95% confidence intervals. There is more uncertainty associated with the first observations for both the Alpha variant (left panel) and the Delta variant (right panel) because the number of cases of the new variant is quite small in the early phase. The horizontal dotted lines represent the average crude measures: 1.73 in for Alpha and 3.19 for Delta.

The crude measurements of in Figure 3 do not fully exploit the information in the data, and the simple sample averages (the dotted lines in Figure 3) do not account for heteroskedasticity and autocorrelation in the measurements errors. To exploit the information in full, we turn to maximum likelihood estimation using the parametrization of the logistic regression. We compute robust standard errors using the Parzen kernel (with bandwidth parameter ), see the Appendix. The results are not very sensitive to the choice of bandwidth, but the heteroskedasticity robust standard errors are somewhat larger than the non-robust standard errors, especially for the Delta variant, see Table A.1.

Alpha vs Ancestral Delta vs Alpha Delta vs Ancestral
Per Week
Per Generation (4.7 days)
Table 2: Empirical estimates with 95% confidence intervals computed with robust standard errors.

The maximum likelihood estimates along with 95% confidence intervals are presented in Table 2. The Alpha variant is estimated to be about 86% more contagious per week than the preceding variant, which we refer to as the ancestral variant.333The ancestral variant represents a group of variants without a WHO label, with the most prevalent variant before Alpha being B.1.177 also know as 20E (EU1). The Delta variant, which emerged after then Alpha variant had become completely dominant, is estimated to be 216% more contagious than the Alpha variant on a weekly basis. The reproduction number for SARS-CoV-2 is defined for a generation period (the typical time from a person gets infected to the same person infects the next person). For SARS-CoV-2 this period is shorter than a week. The Statens Serum Institut in Denmark use 4.7 days per generation which we adopt in our calculations. We can convert to a period of days using , and the estimates for days are presented in the last row of Table 2. The estimates suggest that the Alpha variant has a reproductive number that is about 1.5 times larger than the ancestral variant. The Delta variant is estimated to increase the reproduction number by an additional factor of 2.17, which implies more than a threefold increase relative to the ancestral variant. This is in line with other estimates, which include those for the Alpha variant based on British data by Volz et al. (2021) and those for the Delta variant by Wenseleers (2021). The implication is that it requires a larger proportion () to be immune to reach herd immunity. Suppose that 70% immunity was needed for the ancestral variant. Our estimates of suggest this number increased to about 80% for the Alpha variant and about 90% for the Delta variant.

The estimated model and the observed odds ratios are shown in Figure 4. Overall the model fit looks good, especially for the analysis of the Alpha variant. There are some discrepancies between the data and the linear specification for log odds ratios with the Delta variant. A possible explanation is that many of the COVID-19 cases that were detected in Denmark during the second sample period were contracted abroad. According to the Danish Patient Safety Authority, about 25% of Covid-19 cases were imported cases, primarily by people who had been vacationing in Spain in July.444 This could potentially influence the progression of the Delta variant because imported cases could be acquired in areas with a higher or a lower Delta proportion than that in Denmark.

Figure 4: Observed logarithmically transformed odds ratios and the corresponding estimated model, , for the Alpha variant in the left panel and the Delta variant in the right panel.

A second possible explanation is that vaccines were less effective at preventing Delta variant infections than Alpha variant infections. This would give the Delta variant an additional relative advantage over the Alpha variant as vaccine coverage increased over time. A study based on breakthrough infections in Denmark during the period from March 1, 2021 to August 3, 2021 did not find vaccine effectiveness to declined substantially against the Delta variant. The Pfizer vaccine was reported to be 81.0% (95% CI: 79.4; 82.4) effective at preventing Alpha variant infections and 78.8% (95% CI: 77.2; 80.4) effective at preventing Delta variant infections.555 The Pfizer vaccine (Comirnaty) is the most commonly used vaccine in Denmark and accounts for about 85% of all vaccinations. The study only reported vaccine efficacy for fully vaccinated individuals but many individuals had only received their first vaccine dose in this period. So, a discrepancy in efficacy between the Alpha and Delta variants following a single vaccine dose could potentially have influenced our analysis.

A third possible explanation is that restrictions were largely abolished during the second sample period, which could result in more noisy data for the Delta variant and possible misspecification of the model. During the Alpha sample period restrictions were quite restrictive. In contrast, during the Delta sample period most restrictions were abolished in Denmark, especially in relation to large gatherings. The relaxed restrictions may explain the larger degree of randomness in the progression of the Delta variant. For instance, the Euro 2020 games in Copenhagen may have contributed to the accelerated growth in the Delta variant in Week 25 (see right panel of Figure 4

) because spectators at two games accounted for a large fraction of the Delta variant cases. Following the Denmark-Belgium Euro 2020 game in Copenhagen on June 17, 2021, 41 attending spectators tested positive for COVID-19 of which 25 cases (61.0%) were the Delta variant. The following week, on Monday June 21, 2021, Denmark played Russia in Copenhagen at another Euro 2020 game, where 62 cases were subsequently detected among spectators of which 28 (45.2%) were Delta variant cases. These are large numbers and percentages, because the total number of Delta variant cases in Week 24 and Week 25 were 91 and 345, respectively, and Delta variant only accounted for 6.7% in Week 24 and 29.6% in Week 25. The binomial model for Delta variant cases, assumes that the individual cases are generated by independent Bernoulli random variables. This independence assumption becomes questionable when a large proportion of cases can be linked to the same events. The right panel of Figure

4 is consistent with a single larger than expected jump in the proportion of Delta cases around the time of the Euro 2020 games, and two parallel lines (one fitting data up until Week 24 and one fitting data from Week 25 and onwards) would appear to fit the data about as well as a single line can fits the data in the left panel of Figure 4.

4 Confidence Intervals, Predictions, and Inferring reproduction Number

In this section, we detail detail two ancillary results. First, in Section 4.1, we show the estimated model can be used to predict the proportion of an emerging virus variant and develop methods for quantifying the associated uncertainty. We illustrate these methods with the data for the Alpha variant. Then, in Section 4.2, we develop a simple formula for the reproduction number of the new variant, which does not require concurrent genome data. Instead it projects the most recent estimate of the proportion forward and infer the effective reproduction number from the recent growth in total cases.

4.1 Confidence Sets and Out-of-Sample Analysis

At times we can estimate and

, as well as their variance-covariance matrix,

, where , , and denote the resulting estimates. Point forecasts for the proportion of the new virus variant, , is given from (2). The period ahead point forecast, made at time , is simply

and the corresponding confidence bands can be deduced from the asymptotic distribution of . The confidence band based on

units of standard deviations is given by


where would correspond to a 95% confidence bands and

The estimated and predicted progression of for the Alpha variant along with confidence bands (using and standard deviations) are presented in Figure 5. The saltires (x-crosses) in Figure 5 are the observed weekly empirical proportion of the Alpha variant.

In the upper left panel of Figure 5, we have estimated the model by maximum likelihood using 4 weeks of data (Week 50-53) which leaves 10 weeks for out-of-sample forecasting. The point forecasts are reasonably close to the realized proportions, but with just four weeks of data for estimation, there is a great deal of uncertainty about the estimated parameters, causing to be large. With two additional weeks for estimation (six weeks total), the parameters are more precisely estimated, resulting in tighter confidence bands, as shown in the upper-right panel of Figure 5. With eight or ten weeks for estimation, the parameter estimates become even more accurate, resulting in the even tighter confidence intervals in the two lower panels.

Figure 5: The predicted path for (solid black line) is shown for when the model is estimated with, 4, 6, 8, and 10 weeks of data, which translates to an out-of-sample period of 10, 8, 6 and 4 weeks, respectively. The shaded areas are the confidence bands using and the standard deviation as defined in (3). The observed proportion of Alpha are indicated with the blue crosses.

The point forecasts are reasonably accurate at horizons up to four weeks, but tend to be below the realized values, especially at longer horizons. This is because the four in-sample estimates of (, , , and ) are all smaller than the full sample estimate: . This highlights that we should expect the out-of-sample forecasting errors to be positively autocorrelated and likely have the same sign as . It should be noted that the confidence bands reflect the uncertainty about , while the realized empirical proportions, , are themselves noisy estimates of , see the confidence bands in Figure 1.

4.2 Inferring the Reproduction Number for the New Variant in Real Time

We can infer the effective reproduction number for an emerging variant from the effective reproduction number of all cases when combined with knowledge about and . Let be the number of all cases in this period, of which are the new-variant cases and are the old-variant cases. If the current reproduction number for all cases is , then there were cases one generation ago. Similarly, there were new-variant cases and old-variant cases one generation earlier, where and denote the current reproduction numbers for the old and new variant, respectively. The number of cases for the previous generation have to add up to the total number of cases. Hence, , and since it follows that


The value of to be used in this expression should be the that for the current period, which is typically predicted from earlier periods, and the value of to be used in (4), should be the one that corresponds to the same generation period as used to compute . We estimated for the Alpha variant and for Delta. Thus, based on the Danish data we approximately have,

This formula makes it possible to assess the reproduction number for an emerging variant before concurrent sequencing data are available. The reproduction number, , for all cases can be inferred from the progression in the total number of COVID-19 cases and the proportion of the new variant, , can be obtained from the estimated model, by projecting forward the most recent knowledge about the proportion, see Figure 4.

4.2.1 Empirical Illustration for the Alpha Variant

We can use (4) to characterize the combinations of that correspond to a particular reproduction number for the Alpha variant. A contour plot for based on the point estimate of that corresponds to a generation period, , is presented in Figure 6. The region above the solid line, , are the combinations of and where case numbers for Alpha are increasing, and the region below the solid line is the region where Alpha cases are decreasing. The shaded region about the solid line represent the uncertainty about the threshold, due to uncertainty about . The shaded area is given by

where is the 95% confidence interval for we obtained in Section 3. Note that the uncertainty interval shrinks to zero as . The reason is that the limited case, , represents the situation where Alpha cases make up all cases, and its rate of increase can therefore be inferred from the rate of increase in all cases. More formally, the result follows from the fact that as .

Figure 6: Contour plot for the Alpha variant reproduction number as a function of the proportion of Alpha cases, , and the reproduction number for all cases. Alpha cases are increasing above the solid line and decreasing below the solid line. The solid line is based on the estimate of and shaded area reflects the statistical uncertainty therein. Danish weekly statistics for are shown and labelled with the corresponding week number.

A model-free proxy for is and a crude estimate of in week , is given by , where is the number of all cases in week after adjusting for the testing intensity. The adjustment is given by , where is a baseline number of tests, see Statens Serum Institute (2020). The baseline number, , does not influence the ratio because,

and we use this ratio to compute .

The estimated reproduction number, , is plotted against the observed proportion of the Alpha variant in Figure 6, labelled with the corresponding week number. All pairs fall above the solid line, where the effective reproduction number for the Alpha variant is greater than one. This indicates that the number of Alpha cases (detected and undetected) was growing throughout the sample period even though the total number of cases was declining most weeks.

5 Discussion

We have shown how the relative contagiousness of a new virus variant can be estimated by maximum likelihood and how robust standard errors can be computed. The underlying structure is that of a logistic regression model. We applied the methodology to weekly Danish data from the periods where the Alpha and Delta variant emerge to become the dominant variants. The methodology can also be applied to data at different frequencies, such as daily data, and to time series with missing data. The analysis can also be extended to situations with more than two competing virus variants and is not specific to the analysis of competing virus variant, but could be applied in a context with other competing objects. We found the Alpha variant increased the contagiousness by about 50% and the Delta variant increased the contagiousness further by more than 100% per generation. To reach herd immunity, it was originally estimated that about 70% of the population needed to be immune, which corresponds to a basic reproductive number equal to . So, if the Delta variant increases by a factor of 3, it reduces the fraction of the population that can be without immunity to a third. In this case from to 10%, so that 90% population immunity is needed for herd immunity.

Two new variants of the SARS-CoV-2 have emerged to become dominant in short succession, which suggests that even more contagious variants may emerge in the time to come. Both variants were not only more contagious but were also determined to substantially increase the risk of hospitalization. It is unclear when a more contagious variant will emerge, if at all. It is, however, discomforting that there were just 18 weeks between the time the Alpha variant made up 90% of all cases to the time the Delta variant surpassed that same threshold. Fortunately, vaccinations have shown to abate transmission and greatly reduce the risk of severe disease. So, vaccination appears to be the most effective measure for slowing the emergence of new variants and preventing new variants from having harmful effects.


Appendix A The Case with Multiple Competing Virus Variants

In this Appendix we consider the model extension to the case where there may be more than two competing virus variant.

For , we let and denote the number of cases and the proportion of the -th variant at time . Here is the number of virus variants and is the total number of cases at time . We measure the contagiousness of all variants relative to the first variant () and let represent the progression in this variant. So, the first variant is used as the numéraire and the relative contagiousness of other variants is represented by the parameter, , with . Analogous to the case with two variants we now have

and the evolution of the variant proportions are given by

a.1 Estimation and Inference

Let be the number of cases that are identified as the -th variant at times , and assume that is a representative sample of . The log-likelihood is now given by,


where evolves according to (A

) and the two vectors of unknown parameters are

and the initial value . These can be estimated by maximum likelihood, , and confidence intervals for and can be obtained with conventional methods. This model does not conform with the simple logistic regression model we used in the case with two variant, but could be cast as a multinomial logistic regression model.

a.2 Omitting Variants from the Analysis

For some variants, the number of observed cases may be too small to obtain reliable estimates of their relative contagiousness. An interesting question is if omitting all but two variants from the analysis will induce bias and/or inconsistencies in the estimated relative contagiousness. This is fortunately not the case. Suppose we drop variants from the analysis and proceed to estimate solely from data on the first two variant. Define , then from (A) it follows that

which is identical to (1). This shows that estimation and inference about a single relative contagiousness parameter can be based entirely on data forthe two variants whose relative contagiousness is the object of interest.

Forecasting, as well as most hypothesis tests involving multiple -parameters, would require knowledge about the dependence between the estimates. Such situations calls for estimation of the full model defined by the log-likelihood in (A.2).

Appendix B Robust Standard Errors of Estimators

While the log-likelihood estimates are identical to those obtained with logistic regression packages, the standard errors provided by most packages are based on the Fisher Information matrix, (defined below), and for these to be reliable, the model must be correctly specified. We will compute standard errors using the sandwich form of variance-covariance matrix for the estimated parameters, which is detailed next.

We parameterize the log-likelihood with the standard parameterization of the logistic regression, . The Maximum likelihood estimates are obtained by maximizing , where . To compute robust standard errors we derive the score , , and hessian, . To this end we observe that the derivatives of are simply

and similarly of , such that the score for the observations in the -th week is given


Next, by combining the expression for with (A.3) we obtain,

It is now straightforward to compute the information matrices and , which yields the heteroskedasticity robust variance covariance matrix for , .666The numerical derivatives computed by the Julia package, ForwardDiff, see Revels et al. (2016), are identical to the analytical expressions for both the score, , , and the hessian, . For heteroskedasticity and autocorrelation (HAC) robust standard errors we compute:

where is a kernel function with , .

Variance-Estimator Alpha Variant Delta Variant
[CI 95%: 1.5037, 1.5262] [CI 95%: 2.1319, 2.2033]
[CI 95%: 1.4994, 1.5306] [CI 95%: 2.0215, 2.3236]
[CI 95%: 1.4990, 1.5310] [CI 95%: 2.0119, 2.3347]
[CI 95%: 1.4986, 1.5314] [CI 95%: 2.0009, 2.3476]
[CI 95%: 1.4980, 1.5320] [CI 95%: 1.9949, 2.3546]
[CI 95%: 1.4971, 1.5329] [CI 95%: 1.9909, 2.3593]
[CI 95%: 1.4962, 1.5339] [CI 95%: 1.9888, 2.3618]
[CI 95%: 1.4952, 1.5349] [CI 95%: 1.9888, 2.3618]
Table A.1: Sensitivity of confidence intervals for relative contagiousness, , to the choice of variance-covariance estimator (computed with non-robust and various robust standard errors).

Our empirical results are based on HAC robust estimator, , with and the Parzen kernel function for . Standard errors and confidence intervals for and are given from the diagonal elements of , which we denote by and , respectively. The reported 95% confidence intervals for and are based on the point estimates times the corresponding standard error. Those for are given by .