COVID-19 infectivity profile correction

by   Peter Ashcroft, et al.

The infectivity profile of an individual with COVID-19 is attributed to the paper Temporal dynamics in viral shedding and transmissibility of COVID-19 by He et al., published in Nature Medicine in April 2020. However, the analysis within this paper contains a mistake such that the published infectivity profile is incorrect and the conclusion that infectiousness begins 2.3 days before symptom onset is no longer supported. In this document we discuss the error and compute the correct infectivity profile. We also establish confidence intervals on this profile, quantify the difference between the published and the corrected profiles, and discuss an issue of normalisation when fitting serial interval data. This infectivity profile plays a central role in policy and decision making, thus it is crucial that this issue is corrected with the utmost urgency to prevent the propagation of this error into further studies and policies. We hope that this preprint will reach all researchers and policy makers who are using the incorrect infectivity profile to inform their work.



There are no comments yet.


page 1

page 2

page 3

page 4


A robust and efficient algorithm to find profile likelihood confidence intervals

Profile likelihood confidence intervals are a robust alternative to Wald...

Mining of high throughput screening database reveals AP-1 and autophagy pathways as potential targets for COVID-19 therapeutics

The recent global pandemic of Coronavirus Disease 2019 (COVID-19) caused...

Policy evaluation in COVID-19: A graphical guide to common design issues

Policy responses to COVID-19, particularly those related to non-pharmace...

Modeling the Dynamics of the COVID-19 Population in Australia: A Probabilistic Analysis

The novel Corona Virus COVID-19 arrived on Australian shores around 25 J...

Profile Closeness in Complex Networks

We introduce a new centrality measure, known as profile closeness, for c...

Prediction of next career moves from scientific profiles

Changing institution is a scientist's key career decision, which plays a...

Code Repositories



view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While investigating the results of the paper Temporal dynamics in viral shedding and transmissibility of COVID-19 (He et al., 2020), we have found an erroneous step in the likelihood calculation which is cause for concern. The consequence of this step is that two datapoints with negative serial interval are dropped from the calculation, without any explicit mentioning in the text of the manuscript. The inclusion of these datapoints results in an infectiousness profile that is substantially different from the one shown in Figure 1C of your original publication. As a result, the infectiousness starts significantly before the reported days before the onset of symptoms. We still find, however, a presymptomatic infection fraction of in agreement with the conclusion of He et al. (2020)

. Given that the estimate of

days of infectiousness before symptom onset is highly relevant to the implementation of contact tracing, we believe it is of very high importance to clarify this situation. Our reanalysis suggests that tracing contacts of infected index cases as far back as 2 or 3 days before symptom onset in the index case might not be sufficient to find all secondary cases. In addition, we remark on a less consequential issue with the normalisation of the likelihood, which awards higher weight to transmission pairs with more uncertain symptom onset times of the index case, but does not affect the results significantly. Due to the central position this study currently has in the field, there is a high probability that these errors propagate in future studies. Therefore, a fast response towards this issue is crucial. We note that detecting this issue was only possible thanks to the availability and accessibility of the code and data that accompany the publication.

With this letter we address the following three points:

  1. The infectivity profile is computed without erroneously dropping datapoints and we compare this corrected profile with the published profile;

  2. Confidence intervals via likelihood profiling are provided for the infectivity profile;

  3. An issue relating to the normalisation of the likelihood over serial interval ranges is discussed.

2 Results

Figure 1:

The infectivity profiles extracted from the serial interval data and the log-normally distributed incubation period, as performed in

He et al. (2020)

. Here we use an adaptive grid search to reconstruct the likelihood landscape over the three parameters of the shifted gamma distribution. The confidence intervals are the range of the infectivity profiles which have a likelihood-ratio test statistic within

of a

distribution with 3 degrees of freedom when compared to the maximum likelihood estimate. Panel A shows the probability density function and is analogous to Figure 1C of

He et al. (2020). The blue dashed line is the maximum likelihood estimate using the method from He et al. (2020), while the solid orange line is the corrected maximum likelihood estimate. Panel B shows the corresponding cumulative density functions.

The infectivity profile, , describes the infectiousness of an individual at a time relative to the onset of their symptoms. When this is convolved with an incubation period distribution [from Li et al. (2020)], one recovers the serial interval distribution, describing the time between symptom onsets in a transmission pair. This approach was used in (He et al., 2020), with a fixed incubation period distribution and empirical serial interval distribution, to infer the infectivity profile for COVID-19.

The optimisation procedure maximises the likelihood of observing the empirical serial interval distribution under a model, which is specified by the parameters of the infectivity profile. This profile is parametrised as a shifted gamma distribution. Full details of this procedure can be found in He et al. (2020).

In the fitting procedure used in the script Fig1c_Rscript.R (available at, the following condition is used in the return line of the likelihood function:


This condition will erroneously drop any datapoint that has a probability of zero (and hence a log-probability of ) under the current model parameters. As the optimisation is initiated with a shift value of days, two datapoints (54 and 68) are dropped from the beginning of the fit procedure. This then leads to an erroneous maximum likelihood infectiousness profile, which is displayed in Figure 1C of the original manuscript (He et al., 2020). Initiating the fitting procedure at a shift of days shows convergence to a very different optimum infectiousness profile.

Here we use an adaptive grid search algorithm to scan the three-dimensional parameter space of the shifted gamma distribution which describes the infectiousness profile. We compute the log-likelihood with and without the return condition in the likelihood function at each point in parameter space to construct likelihood surfaces. The maximum likelihood parameter values that we find are enumerated in Table 1.

parameter He et al. corrected
shape 1.56250 97.18750
rate 0.53125 3.71875
shift (days) 2.12500 25.62500
Table 1: Maximum likelihood parameter estimates based on our adaptive grid search approach using the method of (He et al., 2020) and the corrected computation.

We construct confidence intervals around the distribution via likelihood ratio tests, compared to the maximum likelihood estimate (also known as likelihood profiling). This leads to the optimum infectivity profiles and confidence intervals shown in Figure 1. We also find a presymptomatic infection fraction of 45.6% [23.8%,75.8%] using the He et al. (2020) method and 43.7% [26.4%,64.5%] using the corrected profile, where numbers in square brackets represent the 95% confidence interval.

The correct optimum fits in Figure 1 are smoother than the ones which drop the data points. Although there is some asymmetry in the profiles within the confidence interval, the correct optimum solution has a very large shape parameter and approaches a normal distribution. We can also use these fitted distributions to reconstruct the serial interval (Fig. 2), the distribution of which is broader when all datapoints are taken into account.

Figure 2: Using the maximum likelihood estimates of the infectivity profile from Fig. 2 we can reconstruct the serial interval. We sample infection times from the infectivity profile, and add this to samples from the log-normally distributed incubation period to generate samples of the serial interval. We then plot the probability density of these serial intervals (filled density profiles). We compare this to the serial interval data used in He et al. (2020), where we have added points for each day in the possible serial interval range.

To further quantify the difference between the published and corrected infectivity profiles, we can use an example based on contact tracing. We use the infectivity profiles to answer the following question: What fraction of presymptomatic infections are traced if we look back days from symptom onset? Formally, this fraction is defined as


These values are enumerated in Table 2. We see that while the published infectivity profile suggests of presymptomatic infections occur within two days symptom onset, the corrected distribution suggests on of presymptomatic infections will be traced. Thus the published profile overestimates the efficacy of contact tracing, while the corrected distribution tells us we need to look back at least 4 days to catch 90% of presymptomatic infections.

time (days) He et al. corrected
1 50% [37%,100%] 33% [19%,51%]
2 98% [87%,100%] 61% [40%,83%]
3 100% [100%,100%] 80% [57%,96%]
4 100% [100%,100%] 91% [71%,99%]
5 100% [100%,100%] 97% [82%,100%]
Table 2: The fraction of presymptomatic infections that are traced if we look back days from symptom onset using the published and corrected infectivity profiles. The computed quantity is described in Eq. 1. Values in brackets represent 95% confidence intervals of when accounting for the uncertainty in the infectivity profiles.

A second less-consequential problem in the methodology of He et al. (2020) is that a normalisation factor is missing in the likelihood function when considering transmission pairs with serial interval estimates specified by a range. Ignoring this normalisation awards higher weight to transmission pairs with more uncertain symptom onset times of the index case.

Concretely, the probability under model to observe a window of symptom onset of the index case and symptom onset in the secondary case on day is defined in the original manuscript as


where is the infectivity profile and is the incubation period distribution. The outer integral over the symptom onset window of the index case should include an accompanying probability to observe the onset time , . I.e.


Assuming a uniform distribution for

, this simplifies to


We find that including this normalisation has little effect on the location of the optimum fit. However, these full details should have been included in the optimisation procedure.

3 Footnotes

The code used to generate these results is archived at


  • He et al. (2020) He X, Lau EHY, Wu P, Deng X, Wang J, Hao X, Lau YC, Wong JY, Guan Y, Tan X, Mo X, Chen Y, Liao B, Chen W, Hu F, Zhang Q, Zhong M, Wu Y, Zhao L, Zhang F, et al. Temporal Dynamics in Viral Shedding and Transmissibility of COVID-19. Nat Med. 2020 May; 26(5):672–675. doi: 10.1038/s41591-020-0869-5.
  • Li et al. (2020) Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, Ren R, Leung KSM, Lau EHY, Wong JY, Xing X, Xiang N, Wu Y, Li C, Chen Q, Li D, Liu T, Zhao J, Liu M, Tu W, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia. New England Journal of Medicine. 2020; 382(13):1199–1207. doi: 10.1056/NEJMoa2001316.