COVID19-Replication-He-et-al-2020
None
view repo
The infectivity profile of an individual with COVID-19 is attributed to the paper Temporal dynamics in viral shedding and transmissibility of COVID-19 by He et al., published in Nature Medicine in April 2020. However, the analysis within this paper contains a mistake such that the published infectivity profile is incorrect and the conclusion that infectiousness begins 2.3 days before symptom onset is no longer supported. In this document we discuss the error and compute the correct infectivity profile. We also establish confidence intervals on this profile, quantify the difference between the published and the corrected profiles, and discuss an issue of normalisation when fitting serial interval data. This infectivity profile plays a central role in policy and decision making, thus it is crucial that this issue is corrected with the utmost urgency to prevent the propagation of this error into further studies and policies. We hope that this preprint will reach all researchers and policy makers who are using the incorrect infectivity profile to inform their work.
READ FULL TEXT VIEW PDFNone
While investigating the results of the paper Temporal dynamics in viral shedding and transmissibility of COVID-19 (He et al., 2020), we have found an erroneous step in the likelihood calculation which is cause for concern. The consequence of this step is that two datapoints with negative serial interval are dropped from the calculation, without any explicit mentioning in the text of the manuscript. The inclusion of these datapoints results in an infectiousness profile that is substantially different from the one shown in Figure 1C of your original publication. As a result, the infectiousness starts significantly before the reported days before the onset of symptoms. We still find, however, a presymptomatic infection fraction of in agreement with the conclusion of He et al. (2020)
. Given that the estimate of
days of infectiousness before symptom onset is highly relevant to the implementation of contact tracing, we believe it is of very high importance to clarify this situation. Our reanalysis suggests that tracing contacts of infected index cases as far back as 2 or 3 days before symptom onset in the index case might not be sufficient to find all secondary cases. In addition, we remark on a less consequential issue with the normalisation of the likelihood, which awards higher weight to transmission pairs with more uncertain symptom onset times of the index case, but does not affect the results significantly. Due to the central position this study currently has in the field, there is a high probability that these errors propagate in future studies. Therefore, a fast response towards this issue is crucial. We note that detecting this issue was only possible thanks to the availability and accessibility of the code and data that accompany the publication.
With this letter we address the following three points:
The infectivity profile is computed without erroneously dropping datapoints and we compare this corrected profile with the published profile;
Confidence intervals via likelihood profiling are provided for the infectivity profile;
An issue relating to the normalisation of the likelihood over serial interval ranges is discussed.
The infectivity profile, , describes the infectiousness of an individual at a time relative to the onset of their symptoms. When this is convolved with an incubation period distribution [from Li et al. (2020)], one recovers the serial interval distribution, describing the time between symptom onsets in a transmission pair. This approach was used in (He et al., 2020), with a fixed incubation period distribution and empirical serial interval distribution, to infer the infectivity profile for COVID-19.
The optimisation procedure maximises the likelihood of observing the empirical serial interval distribution under a model, which is specified by the parameters of the infectivity profile. This profile is parametrised as a shifted gamma distribution. Full details of this procedure can be found in He et al. (2020).
In the fitting procedure used in the script Fig1c_Rscript.R (available at https://github.com/ehylau/COVID-19), the following condition is used in the return line of the likelihood function:
return(-sum(lli[!is.infinite(lli)]))
This condition will erroneously drop any datapoint that has a probability of zero (and hence a log-probability of ) under the current model parameters. As the optimisation is initiated with a shift value of days, two datapoints (54 and 68) are dropped from the beginning of the fit procedure. This then leads to an erroneous maximum likelihood infectiousness profile, which is displayed in Figure 1C of the original manuscript (He et al., 2020). Initiating the fitting procedure at a shift of days shows convergence to a very different optimum infectiousness profile.
Here we use an adaptive grid search algorithm to scan the three-dimensional parameter space of the shifted gamma distribution which describes the infectiousness profile. We compute the log-likelihood with and without the return condition in the likelihood function at each point in parameter space to construct likelihood surfaces. The maximum likelihood parameter values that we find are enumerated in Table 1.
parameter | He et al. | corrected |
---|---|---|
shape | 1.56250 | 97.18750 |
rate | 0.53125 | 3.71875 |
shift (days) | 2.12500 | 25.62500 |
We construct confidence intervals around the distribution via likelihood ratio tests, compared to the maximum likelihood estimate (also known as likelihood profiling). This leads to the optimum infectivity profiles and confidence intervals shown in Figure 1. We also find a presymptomatic infection fraction of 45.6% [23.8%,75.8%] using the He et al. (2020) method and 43.7% [26.4%,64.5%] using the corrected profile, where numbers in square brackets represent the 95% confidence interval.
The correct optimum fits in Figure 1 are smoother than the ones which drop the data points. Although there is some asymmetry in the profiles within the confidence interval, the correct optimum solution has a very large shape parameter and approaches a normal distribution. We can also use these fitted distributions to reconstruct the serial interval (Fig. 2), the distribution of which is broader when all datapoints are taken into account.
To further quantify the difference between the published and corrected infectivity profiles, we can use an example based on contact tracing. We use the infectivity profiles to answer the following question: What fraction of presymptomatic infections are traced if we look back days from symptom onset? Formally, this fraction is defined as
(1) |
These values are enumerated in Table 2. We see that while the published infectivity profile suggests of presymptomatic infections occur within two days symptom onset, the corrected distribution suggests on of presymptomatic infections will be traced. Thus the published profile overestimates the efficacy of contact tracing, while the corrected distribution tells us we need to look back at least 4 days to catch 90% of presymptomatic infections.
time (days) | He et al. | corrected |
---|---|---|
1 | 50% [37%,100%] | 33% [19%,51%] |
2 | 98% [87%,100%] | 61% [40%,83%] |
3 | 100% [100%,100%] | 80% [57%,96%] |
4 | 100% [100%,100%] | 91% [71%,99%] |
5 | 100% [100%,100%] | 97% [82%,100%] |
A second less-consequential problem in the methodology of He et al. (2020) is that a normalisation factor is missing in the likelihood function when considering transmission pairs with serial interval estimates specified by a range. Ignoring this normalisation awards higher weight to transmission pairs with more uncertain symptom onset times of the index case.
Concretely, the probability under model to observe a window of symptom onset of the index case and symptom onset in the secondary case on day is defined in the original manuscript as
(2) |
where is the infectivity profile and is the incubation period distribution. The outer integral over the symptom onset window of the index case should include an accompanying probability to observe the onset time , . I.e.
(3) |
Assuming a uniform distribution for
, this simplifies to(4) |
We find that including this normalisation has little effect on the location of the optimum fit. However, these full details should have been included in the optimisation procedure.
The code used to generate these results is archived at https://zenodo.org/badge/latestdoi/278170144.
Comments
There are no comments yet.