Simpson's paradox in Covid-19 case fatality rates: a mediation analysis of age-related causal effects

by   Julius von Kugelgen, et al.
Max Planck Society

We point out an example of Simpson's paradox in COVID-19 case fatality rates (CFRs): comparing data from >72,000 cases from China with data from Italy reported on March 9th, we find that CFRs are lower in Italy for each age group, but higher overall. This phenomenon can be explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we review basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and mortality. As a case study, we then investigate how total, direct, and indirect (age-mediated) causal effects between China and Italy evolve over two months until May 7th 2020.



There are no comments yet.


page 2

page 9

page 13

page 14


Identification of causal direct-indirect effects without untestable assumptions

In causal mediation analysis, identification of existing causal direct o...

Causal mediation analysis with mediator values below an assay limit

Causal indirect and direct effects provide an interpretable method for d...

Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn

The incorporation of causal inference in mediation analysis has led to t...

Assessing the impact of the COVID-19 shock on a stochastic multi-population mortality model

We aim to assess the impact of a pandemic data point on the calibration ...

Non-parametric Bayesian Causal Modeling of the SARS-CoV-2 Viral Load Distribution vs. Patient's Age

The viral load of patients infected with SARS-CoV-2 varies on logarithmi...

Unpacking the Drop in COVID-19 Case Fatality Rates: A Study of National and Florida Line-Level Data

Since the COVID-19 pandemic first reached the United States, the case fa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The following analysis is preliminary and mainly intended for educational purposes. It is not meant to be predictive or prescriptive, and we kindly ask the reader to not cite it as such. We are neither epidemiologists nor experts on the novel corona virus, and there are a number of more sophisticated models out there. The data used are outdated at the time of writing and may, in fact, not even be comparable in the first place due to a number of issues such as, e.g., discrepancies in reporting and testing practices across countries. Nevertheless, the below may still illustrate some fundamental and useful concepts in causal inference to facilitate reasoning about different causal hypothesis regarding the ongoing SARS-CoV-2 pandemic, relating to the attribution of mortality to different factors.

1 Introduction: the SARS-CoV-2 pandemic of 2019/20

The 2019–20 coronavirus pandemic originates from a virus, referred to as the 2019 novel coronavirus, or as the severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2), which induces the infectious disease called Covid-19  (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses, 2020). After an outbreak was identified in Wuhan, China, in December 2019, cases started being reported across multiple countries all over the world, ultimately leading to the World Health Organization (WHO) declaring it a pandemic on the 11 of March 2020 (WHO, 2020). As of 12 May 2020, the pandemic led to more than 287,800 confirmed deaths and more than 4.2 million confirmed cases, spreading across 187 countries.

One of the most cited indicators regarding the disease is the reported case fatality rate (CFR), which indicates the proportion of confirmed cases which end fatally. In this work, we illustrate how tools from causal inference, and in particular mediation analysis, can help interpret data related to the epidemic and better compare CFRs across different countries. Our analysis starts from the observation of a peculiar statistical paradox involving data from China and Italy, which we introduce in the next section.

2 Comparing Covid-19 case fatality rates across China and Italy

When comparing case fatality rates (CFRs) of Covid-19 for different age groups (i.e., the proportion of confirmed Covid-19 cases within a given age group which end deadly) reported by the Chinese Center for Disease Control and Prevention (CCDCP; Wu and McGoogan, 2020) with preliminary CFRs in Italy as reported on March 9 by the Italian National Institute of Health (ISS; Istituto Superiore di Sanità, 2020) a seemingly strange pattern can be observed:

  • for all age groups, CFRs in Italy are lower than those in China;

  • but the total CFR in Italy is higher than that in China.

Figure 1: (a) Snapshot of Covid-19 case fatality rates (CFRs) in Italy and China by age group and in aggregated form (“Total”, last pair of bars), i.e., including all confirmed cases up to the time of reporting (see legend). (b) Proportion of all confirmed cases included in (a) within each age group by country. Sources: Wu and McGoogan (2020) and Istituto Superiore di Sanità (2020), see Table 1 for exact numbers.
Age 0–9 10–19 20–29 30–39 40–49 50–59 60–69 70–79 80 Total
Italy 0% (0/43) 0% (0/85) 0% (0/296) 0% (0/470) 0.1% (1/891) 0.2% (3/1,453) 2.5% (37/1,471) 6.4% (114/1,785) 13.2% (202/1,532) 4.3% (357/8,342)
China 0%(0/0) 0.2% (1/549) 0.2% (7/3,619) 0.2% (18/7,600) 0.4% (38/8,571) 1.3% (130/10,008) 3.6% (309/8,583) 8% (312/3,918) 14.8% (208/1,408) 2.3% (1,023/44,672)
Table 1: Comparison of case fatality rates (CFRs) by age group for Italy and China (deaths/confirmed cases in brackets). Lower CFRs are highlighted in bold. Sources: Wu and McGoogan (2020) and Istituto Superiore di Sanità (2020).

This pattern is illustrated in Figure 0(a) (see Table 1 in for exact numbers). It constitutes a textbook example of a statistical phenomenon known as Simpson’s paradox (or Simpson’s reversal) which refers to the observation that aggregating data across subpopulations (here, age groups) may yield opposite trends (and thus lead to reversed conclusions) from considering the different subpopulations separately (Simpson, 1951).

But how can such a pattern be explained? The key to understanding the phenomenon lies in the fact that we are dealing with relative frequencies: the CFRs shown in percent in Figure 0(a)

are ratios and correspond to the conditional probabilities of fatality given a case from a particular age group and country. However, such percentages conceal the absolute numbers of cases within each age group. Considering these absolute numbers (shown in small print below the CFRs in Table

1), sheds light on how the phenomenon can arise: the distribution of cases across age groups differs significantly between the two countries, i.e., there is a statistical association between the country of reporting and the number of confirmed cases per age group. In particular, Italy recorded a much higher proportion of confirmed cases in older patients compared to China. This is illustrated in Figure 0(b) (see Table 2 for exact numbers).

Age 0–9 10–19 20–29 30–39 40–49 50–59 60–69 70–79 80
Italy 0.5% 1.0% 3.5% 5.6% 10.7% 17.4% 17.7% 21.4% 18.4%
China 0.9% 1.2% 8.1% 17.0% 19.2% 22.4% 19.2% 8.8% 3.2%
Table 2: Proportion of confirmed cases from Table 1 by age group, larger proportion highlighted in bold.
Age 0–9 10–19 20–29 30–39 40–49 50–59 60–69 70–79 80
Italy 8.3% 9.5% 10.1% 11.6% 14.9% 15.8% 12.4% 10% 7.5%
China 11.9% 11.6% 12.9% 15.9% 15% 15.4% 10.5% 5% 1.8%
Table 3: Age demographic (of the general population) for Italy and China, larger proportion highlighted in bold.

While most cases in China fell into the age range of 30–59, the majority of cases reported in Italy were in people aged over 60, which are thought to be at higher risk for Covid-19 in general, as supported by the increase in CFRs with age shown in Figure 0(a) for both countries. The observed difference may partly stem from the fact that the Italian population in general is older than the Chinese one with median ages of 45.4 and 38.4 respectively (see Table 3 for full age demographics of both countries), but additional factors such as different testing strategies implemented in the two countries and different patterns in the social contacts among older and younger generations (e.g., Mossong et al., 2008) may also play a role.

In summary, the larger share of confirmed cases in elderly people in Italy shown in Figure 0(b), combined with the fact that the elderly are generally at higher risk when contracting Covid-19 , explains the mismatch between total CFR and CFRs segregated by age group shown in Figure 0(a) and thus gives rise to Simpson’s paradox in the data.

As a further remark, the observed phenomenon can indeed only be explained if there is some association between the country and the number of confirmed cases per age group: if we simply took a weighted average of the CFRs from Table 1 shown in Figure 0(a) using the same weights for both countries, Simpson’s paradox could not not arise since for implies that for any set of weights .

3 A causal view

While the previous reasoning provides a perfectly consistent explanation in a statistical sense, the phenomenon may still seem puzzling as it can defy our causal intuition—similar to how an optical illusion defies our visual intuition. Humans appear to naturally extrapolate conditional probabilities to read them as causal effects, which can lead to inconsistent conclusions and may leave one wondering: how can the disease in Italy be less fatal for young, less fatal for the old, but more fatal for the people? It is for this reason of ascribing causal meaning to probabilistic statements, that the reversal of (conditional) probabilities in section 2 is perceived as and referred to as a “paradox”.

The aspiration to extrapolate causal conclusions from observational data is particularly strong in the context of a pandemic, during which many inherently causal questions are naturally asked. For example, politicians and citizens may want to evaluate and compare different strategies to fight the disease by asking interventional (“what if …?”) or counterfactual (”what would have happened if …?”) questions.

However, we should be very careful if we want to give a causal interpretation of the data at hand. Tables 1 and 2 show only correlational data, and thus additional considerations are required since it is a well-known scientific mantra that “correlation does not imply causation”. A link between these two modalities is given by the common cause principle.

Principle 1 (Common cause principle).

Any statistical dependence between two variables and must have a causal explanation in that either (i) causes (denoted ), (ii) causes , or (iii) and have a common cause , i.e., (Reichenbach, 1956).111Note that (i) and (ii) can be seen as special cases of (iii) with and , respectively, hence justifying the name.

The common cause principle indicates that different causal models can explain the same statistical dependence pattern equally well. Applying this idea, for example, to the observed correlation between country and case demographic shown in Figure 0(b), and more generally to the Simpson’s reversal example of Figure 0(a) described in section 2, it is clear that multiple different causal models could be employed to capture the phenomenon, while differing in their causal interpretations of the data. To resolve this ambiguity, we therefore need to make additional assumptions and specify an underlying causal structure as a basis for further analysis.

In other words, choosing a model is a problem that needs to be addressed before starting to carry out any causal analysis: “no causes in, no causes out” Cartwright and others (1994). However, once specified the causal model dictates how to interpret the data, thus effectively resolving any apparent “paradoxes”.

3.1 Assumptions

We now state our assumptions about the causal relationships between the involved variables, which are most easily articulated in the form of causal diagrams, or causal graphs. For now, we consider the following three variables which appear in Table 1:

  • the country

    by which a confirmed case is reported, modelled as a categorical variable;

  • the age group

    of a positively-tested patient, an ordinal variable with 10-year intervals as values;

  • and the medical outcome, or mortality,

    , a binary variable indicating whether a patient has deceased by the time of reporting (

    ) or not ().

Let us stress that the age variable in the data reflects the case demographic, i.e., the age distribution among the positively-tested cases only, and not the general demographic of the country. This will be further discussed later.

Data generating process and causal graph

We will assume the causal graph shown Figure 2, and motivate it by thinking of the following data-generating process:

  1. Choosing a country at random.

  2. Given the selected country, sampling a positively-tested patient with age group .

  3. Conditional on the choice of country and age group , sampling the medical outcome, or mortality, .

This is clearly a very simple and coarse-grained view of what is known to be a complex underlying phenomenon. As a consequence, we abstract away various influences and mechanisms within the arrows in our causal graph. In particular, this view encompasses at least the following influences:

  • The arrow encodes that the age distribution of cases depends on the country. This difference might partly be due to a general difference in age demographic between countries. Furthermore, the country influences the age group of a confirmed case not only through the demographic of the overall population, but also potentially via other mechanisms such as inter-generational mixing or age-targeted social distancing.

  • The arrow reflects the uncontroversial notion that the disease is more dangerous for the elderly than for the young. Age therefore clearly influences mortality through the general health condition of a patient.

  • The arrow represents and summarises country-specific influences on mortality other than age, e.g., approaches to testing, lockdown strategy and other non-pharmaceutical interventions, face mask policy and adoption by the population, and medical infrastructure such as availability of ventilators, intensive care units (ICUs), and personal protective equipment (PPE).

Figure 2: Assumed coarse-grained causal graph for the relationship between country , age group , and mortality, or medical outcome, . Note how, within this view, age acts as a mediator of the effect of country on mortality.

Causal sufficiency

In addition to the above causal graph, we assume causal sufficiency, meaning that all common causes of are observed. In other words, there are no hidden confounders. We assume causal sufficiency for now for the sake of the following analysis and discuss the role of unobserved variables further in section 6.

Observational sample

Further, we assume that the CFRs in Table 1 and the proportion of cases by age group in Table 2

are based on an observational sample and thus constitute estimates of

and , respectively. Like causal sufficiency, we also only expect this assumption to hold approximately and discuss its limitations in section 6.

3.2 Total causal effect (TCE) of country on mortality

Having stated clearly our causal assumptions in the previous section, we are now in the position to compute causal effects and answer causal queries. Given the example in section 2, the first such query we are interested in is the overall causal effect of country on mortality, i.e., an answer to the following question.

: “What would be the effect on mortality of changing country from China to Italy?”

The answer to this query is called the total causal effect (TCE). It is defined as follows.

Definition 1 (Tce).

The TCE of a binary treatment on an outcome is defined as the interventional contrast

where the expectations are taken over the interventional distributions and .

Note that in our example (i.e., according to the causal graph in Figure 2), the country takes the role of a treatment that affects the outcome mortality (denoted by and , respectively, in Definition 1). To address (i.e., to quantify the expected change in mortality if country were changed), we thus need to compute


From the assumed causal graph, causal sufficiency, and the rules of do-calculus (Pearl, 2009), it follows that for our setting and . We can thus compute (1) as

Note that this corresponds to the difference of total CFRs reported in the last column of Table 1. This means that the difference of total CFRs indeed constitutes a causal effect, and changing country from China to Italy would lead to an overall increase in CFR of (given the data in Table 1 and subject to our modelling assumptions).

3.3 Asking “why?”: beyond total effects

While computing the TCE is the principled and correct way to quantify the causal influence on mortality of changing the country , it does not (necessarily) help us understand what drives the difference between the two countries, i.e., why it exists in the first place. In other words, we may also be interested in the mechanisms which give rise to the different CFRs observed across different countries.

Motivated by the fact that the age of patients was crucial for explaining the instance of Simpson’s paradox in section 2, we now seek to better understand the role of age as a mediator of the effect of country on mortality . This seems particularly relevant from the perspective of the countries who, without being able to influence the age distribution of its general population, only have very limited control over the age demographic across confirmed cases and may thus wish to factor out age-related effects.222Though the demographic of confirmed cases can, of course, also be influenced, e.g., via measures such as targeted isolation of the elderly, see also the discussion of the arrow in section 3.1.

However, such considerations about the role of potential mediators are not reflected within the TCE, as evident from the absence of the age variable from (1). We therefore now turn to the field of mediation analysis.

4 Mediation analysis

We start with the obvious but important observation that the country causally influences mortality along two different paths:

  • a direct path , giving rise to a direct effect;333Recall though that the direct effect of country on mortality is likely mediated by additional variables which are subsumed in in the current view—see section 6 for further discussion.

  • an indirect path mediated by , giving rise to an indirect effect.

The TCE of on considered in section 3.2 thus comprises both direct and indirect effects. In mediation analysis, the aim is to quantify such direct and indirect effects and, ideally, decompose the TCE into a direct and an indirect contribution—though, as we shall see, the latter is not a challenge in general. The main challenge is that any changes to the country will propagate along both direct and indirect paths, making it difficult to isolate the different effects.

We start by reviewing main tools and concepts of mediation analysis, mainly following the exposition of Pearl (2001). Using the running example of Covid-19 CFRs in China and Italy, we motivate each formula with a natural language query that the corresponding quantity addresses (similar to in section 3.2) and perform an example computation using the data from Table 1. First, we consider direct effects, and then turn to indirect ones. In both cases, the shared main idea is to let changes propagate only along one path while somehow controlling or fixing the other path.

4.1 Controlled direct effect (CDE)

The simplest way to measure a direct effect is by changing the treatment (country) while keeping the mediator fixed to a particular value, thereby blocking the flow of influence along the indirect path. For example, we may ask about the effect of switching country on mortality for a particular age group such as 50–59 years olds:

: “For 50–59 year-olds, is it safer to get the disease in China or in Italy?”

Because it involves actively controlling the value of the mediator, the answer to such a query is referred to as the average controlled direct effect (CDE). It is defined as follows.

Definition 2 (Cde).

The CDE of a binary treatment on an outcome with mediator fixed to is given by the interventional contrast


where the expectations are taken over the corresponding interventional distributions .

To address in our example, we thus need to compute

This corresponds to the difference between CFRs across the two countries within a particular age group, i.e., the difference of two CFRs within a particular column of Table 1. Hence, the answer to is that for this age group it is safer to switch country to Italy with a resulting change in CFR of .

A practical shortcoming of the CDE is that for real world scenarios it is often difficult or even impossible to control both the treatment and the mediator. In medical settings, for example, one generally cannot easily control individual down-stream effects of a drug within the body, such as fixing, e.g., blood glucose levels while changing treatments.

A second more fundamental problem of measuring direct effects with the CDE, is that there are many different CDEs, one for each value of the mediator. In our running example, there is a different CDE for each age group. However, we may instead want to measure a direct effect at the population level which is not addressed by the CDE.

4.2 Natural direct effect (NDE)

Instead of fixing the mediator to a specific value (which also often comes with practical problems, s.a.), we now consider the setting where it is allowed to depend on the treatment. We can then consider the hypothetical question of what would happen under a change in treatment if the mediator kept behaving as it would under the control treatment, i.e., if the change in treatment were only propagated along the direct path. In our running example, this corresponds to asking about the effect of switching country without affecting the age distribution across the confirmed cases.

: “For the Chinese case demographic, would it have been better to take the Italian approach instead?”

Since it relies on the natural distribution of the mediator (age) under the control (China) to evaluate the treatment (switching to the Italian approach), the answer to is referred to as the average natural direct effect (NDE).

Definition 3 (Nde).

The NDE of a binary treatment on an outcome with mediator is given by the counterfactual contrast


where the subscript refers to the counterfactual distribution of had been 0, and where the expectations are over both and w.r.t. the corresponding interventional and counterfactual distributions.

Applying our assumptions, in particular causal sufficiency, we can calculate the NDE to answer for our running example as follows,

We thus find that when we only consider the Chinese case demographic, using the Italian approach (i.e., the CFRs for Italy from Table 1) would lead to a reduction in total CFR of , consistent with our observation from section 2 that CFRs were lower in Italy for each age group.

Remark 1.

As is apparent from the last line of the above calculation, the NDE can be interpreted as an expected CDE w.r.t. a particular (counterfactual) distribution of the mediator. Here, due to our assumption of causal sufficiency the expectation is taken w.r.t. the conditional distribution of in the control group (China).

Remark 2.

Taking the previous remark about NDE as the expected CDE within the control group one step further, we can, of course, also consider expected CDEs w.r.t. other distributions describing a target-population we want to reason about. For example, a third country, say Spain, may be considering whether to adopt the Chinese or Italian approach given its own case demographic. In this case, we would be interested in the following quantity.

Having discussed ways to quantify direct effects, we now turn to indirect effects.

4.3 Natural indirect effect (NIE)

When measuring direct effects we were able to change the treatment while keeping the influence along the indirect path constant by (i) externally fixing the mediator to a specific value (CDE) or (ii) letting it behave according to its natural distribution under the control (or baseline) setting (NDE). For measuring indirect effects, we run into the additional complication that it is (by the very nature of a direct path) not possible to keep the influence along the direct path constant under a change in treatment. To overcome this problem in quantifying indirect effects, we consider a hypothetical change in the mediator while keeping the treatment constant. Specifically, we consider that the distribution of the mediator changes as if the treatment were changed (but without actually changing it). In our Covid-19 setting, for example, we may ask:

: “How would the overall CFR in China change if the case demographic had instead been that from Italy while keeping all else (i.e., the CFRs) the same?”

Since this considers a change of the mediator (age) to the natural distribution it would follow under a change treatment (case demographic from Italy) while keeping the treatment the same (Chinese CFRs), the answer to this question is referred to as the average natural indirect effect (NIE). It is formally defined as:

Definition 4 (Nie).

The NIE of a binary treatment on an outcome with mediator is given by the counterfactual contrast


where the subscript refers to the counterfactual distribution of had been 1, and where the expectations are over both and w.r.t. the corresponding interventional and counterfactual distributions.

Again, using causal sufficiency, we can calculate the NIE to answer for our running example as follows,


We thus find that changing only the case demographic to that from Italy would lead to a substantial increase in total CFR in China of about 3.3%. Notably, the NIE is of the opposite sign of the NDE suggesting that indirect and direct effects are counteracting in our example as the attentive reader may have expected from section 2: despite the lower CFRs in each age group (leading to a negative NDE) the total CFR is larger in Italy due to the higher age of positively-tested patients (leading to a positive NIE).

4.4 Experimental (non-)identifiability of direct and indirect effects

Since the CDE in (2) only involves interventional quantities it is in principle experimentally identifiable, meaning that it can be determined through an experimental study in which both the treatment and the mediator are randomised, thus providing valid estimates of the interventional distributions .

In contrast, NDE and NIE are, in general (i.e., without further assumptions), not experimentally identifiable owing to their counterfactual nature. However, under certain conditions such non-confoundedness of mediator and outcome experimental identifiability is obtained.444For the interested reader, a more general criterion is the existence of a set of covariates , non-descendants of both treatment and mediator , which satisfy the graphical d-separation criterion , see Pearl (2001, Thms. 1&4) for details. In this case, NDE and NIE are given by


Note that even then, identifying natural effects requires combining results from two different experimental settings: one where both mediator and treatment are randomised, and a second in which treatment is randomised and the mediator observed. This again highlights the hypothetical nature of NDE and NIE and explains why they—unlike TCE and CDE—cannot simply be read off from a table like Table 1 even when causal sufficiency is assumed.

4.5 Mediation formulas: direct and indirect effects in causally sufficient (Markovian) systems

For the special case of causally sufficient (or Markovian) systems, NDE and NIE are even identifiable from purely observational data: without hidden confounding, interventional distributions of each variable given its causal parents in (6) and (7) can be replaced by the corresponding observational (conditional) distributions. This leads to the following expressions for NDE and NIE which are also known as mediation formulas:


Similarly, in this special setting TCE and CDEs simplify to the expressions


These are precisely the expressions we use in our example calculations for comparing CFRs between China and Italy where, only having access to observational data, we rely on the (strong) assumption of causal sufficiency to compute total, direct and indirect effects via (8), (9), (10) and (11) using the data from Tables 1 and 2.

4.6 Relation between TCE, NDE and NIE: moderation and the substractivity principle

At this point, the sceptical reader may wonder whether the causal machinery presented in the previous sections is really necessary. Can the total causal effect not simply be decomposed into a sum of direct and indirect contributions?

total effect direct effect + indirect effect

While such an additive decomposition indeed exists for simple linear models, in which causal effects just correspond to path coefficients, it does not hold in general. I.e., for non-linear models, we generally have

due to possible interactions between treatment and mediator in non-linear models, also referred to as moderation. Pearl and Mackenzie (2018) give the illustrative example of a drug (treatment) that works by activating some proteins (mediator) inside the body before jointly attacking the disease. In the example, the drug is useless without the activated proteins (so the direct effect is zero) and the activated protein is useless without the chemical compound of the drug (so the indirect effect is also zero), but the total effect is non-zero, because of the interaction between the two.

As a consequence of moderation, direct and indirect effects are not even uniquely defined in general (as they depend on the value of the mediator, see also the discussion in section 4.1). Counterfactual quantities such as NDE and NIE are therefore useful and much-needed tools to measure some average form of direct and indirect effect with a meaningful interpretation. Moreover, comparing TCE, NDE, and NIE can reveal interactions if present. E.g., note that in our running example we find that

indicating that some level of interaction is present.

Moreover, it is worth noting that there exists a general formula relating TCE, NDE, and NIE known as the substractivity principle that follows directly from their definitions and holds without restrictions on the type of model:

5 Case study: direct and indirect effects on total CFR between China and Italy

Equipped with tools from mediation analysis for quantifying path-specific effects discussed in the previous section, we now return to our running example. In particular, we will take a closer look at direct and indirect (age-mediated) effects of country on mortality and their evolution over time. Figure 3 shows results of carrying out the example calculations of TCE, NDE, and NIE between China and Italy from section 4 using not only the data from Italy reported on 9 March, but also considering subsequent reports over a time period of roughly two months. The data from China meanwhile remains the same for all calculations since the study of Wu and McGoogan (2020) already contains information of over 72,000 cases and only very few new cases have been reported from China since.

Figure 3: Evolution of TCE, NDE, and NIE of changing country () on total CFR over time. Calculations compare (static) data from China based on the large scale study of Wu and McGoogan (2020) with different snapshots from Italy reported by Istituto Superiore di Sanità (2020) between 9 March and 7 May 2020. A similar plot for a change of country from Italy to China can be found in Appendix A.

Several observations can be made from Figure 3. Let us start with the evolution of the NDE (shown in blue) which captures what would happen to the total CFR from China if the case demographic were kept the same, while the CFRs per age group were changed to those from Italy. As can be seen (and noted already in section 4.2), the NDE is negative initially meaning that the considered change of country would lead to a further decrease in CFR consistent with the lower CFRs in each age group noted in section 2. However, there is a turning point between 12 and 19 of March when the NDE flips sign and becomes positive: beyond this point, switching to the Italian CFRs would thus lead to an increase in total CFR on the Chinese case demographic. While we can only speculate about the precise factors that came together in producing this reversal in NDE, it seems worth pointing out that a number of articles reported an over-whelmed health care system “close to collapse” in (northern) Italy during the same period of early to mid-March (e.g., Armocida et al., 2020). From mid-March to mid-April, the NDE keeps rising before seemingly plateauing around 3.5%. Most notably, another relatively large jump in NDE of ca. 1.3% can be observed between 26 March and 2 April.

Next, let us consider the NIE (shown in orange) which measures what would happen to total CFR in China if the CFRs by age group were kept the same, while the case demographic were changed to that in Italy. As can be seen from the large NIE of over 3%, simply changing the case demographic from China for that in Italy would already lead to a substantial increase in total CFR, consistent with the larger share of confirmed cases amongst the elderly in Italy reported in section 2. Moreover, this effect exists from the very beginning and remains relatively constant over time (with some small fluctuations, ) indicating that the case demographic in Italy (while very different from that in China) does not change much over time.

Finally, let us consider the TCE (shown in green), which measures what would happen to total CFR if both CFRs by age group and case demographic were changed to those from Italy. The TCE is positive throughout (indicating higher total CFR in Italy) and gradually increases from 2.2% initially to about 10.8% over the two months considered. In particular, note that the TCE grows more quickly than the sum of NDE and NIE, indicating moderation.

In summary, while the NIE considerably contributes to the difference in total CFRs between China and Italy (especially initially), it appears to be mainly the NDE that drives changes in TCE over time.

6 Discussion and future directions

Motivated by the initial observation of Simpson’s paradox in Covid-19 CFRs manifested in the data in Table 1 and described in section 2, in our subsequent analysis we considered the three variables , , and representing country, age-group, and medical outcome of confirmed Covid-19 cases, respectively. We assume that the data are generated as described in section 3.1, with causal relationships between variables captured by the causal graph in Figure 2. In particular, this constitutes a coarse-grained view which subsumes many other potentially important factors and variables within the paths of the assumed causal graph, instead of including and modelling them explicitly.

A strength of this approach is that it allows for consistent reasoning about different causal effects in situations where the data does not support a more fine-grained analysis: even if we are not able to fully identify what precise factors the difference in CFR between Italy and China should be attributed to, we are still able to distinguish between age-mediated and other non-age-related effects. On the other hand, our conclusions only hold at this coarse-grained level and a more fine-grained interpretation will require further investigation. In the following, we critically discuss assumptions and limitations of our approach and propose further directions for future research.

Considering additional mediators

In our coarse-grained view, the direct arrow from country to mortality abstracts away any details on how exactly this influence is exerted. However, it seems safe to assume that the virus is ultimately agnostic to the notion of different “countries”, and that the influence of country on mortality is thus not actually a direct one, but instead mediated by additional variables , as illustrated in Figure 3(a). Potential candidates for such additional mediators to be incorporated into our model include non-pharmaceutical interventions, such as quarantine and lockdown strategies; the wide-spread availability and habit of wearing sanitary masks; and the number of ventilators, intensive care beds, and other critical healthcare infrastructure.

We believe that many questions that governments (or citizens) may have regarding the Covid-19 pandemic can be phrased as (path-specific) causal effects involving such mediators, e.g.: “What would be the effect on total CFR in country if people were to wear sanitary masks as in country (all else being equal)?”. Extending our model with these additional variables and modifying the data analysis accordingly is an interesting direction for future work. The interpretation of these variables as mediators indicates that mediation analysis would be the proper tool to reason about them, and we hope that this work can serve as a starting point for correct reasoning in such an extended model.


Figure 4: Illustration of potential extensions of our approach. (a) The direct effect of country on mortality is likely actually mediated by additional variables describing, e.g., different aspects of a country’s response strategy and medical infrastructure. (b) Testing strategy may introduce a selection bias, since the data we presented always implicitly conditions on having tested positive (), represented by the shaded variable in the graph.

Testing strategy and selection bias

Throughout this work, we have considered data regarding only confirmed cases of Covid-19, i.e., patients who have tested positive for the virus. As a consequence, we have interpreted the variable as the age group of positively-tested patients and referred to as the case demographic of country .

However, we can also make the notion of testing more explicit by introducing a testing variable , with meaning that an individual tested positive and that either no test was conducted or that the result was negative. In this view, the data we considered is always implicitly conditioned on . In other words, the data in Tables 1 and 2 would correspond to and , respectively. Moreover, in its unconditional form would change meaning to describe the age group of a general citizen, as opposed to that of a Covid-19 patient.

If tests were simply performed by randomly sampling from the population (and irrespective of the country), this implicit conditioning on testing status would not introduce any bias. However, tests are generally not performed at random! Older people are more likely to develop (severe) symptoms, and people with symptoms are more likely to get tested than healthy ones (e.g., through self-selection, external encouragement, or regulation). It thus seems that age has a causal influence on testing status, (likely mediated, e.g., by the severity of symptoms). Moreover, different testing policies have clearly been adopted in different countries, , and even across different regions within the same country. Testing policy might also change across different phases of the pandemic depending on the evolution of mortality numbers, so that there may be complex interactions, potentially involving feedback, between and throughout the pandemic spread. These causal relationship are illustrated in the extended graph in Figure 3(b).

The fact that only positively tested cases are considered, while testing itself depends on multiple other factors, ultimately results in a problem of selection bias. The number and methodology of tests applied within a country greatly influences CFRs, which implies that the reported CFRs might have very different meanings across different countries. For example, two countries might have the same true mortality rate, but their CFRs can differ significantly if one of them only tests patients with severe symptoms (which are therefore more likely to die), see Rajgor et al. (2020). Similar problems have been addressed in the causal inference literature when discussing recoverability from selection bias, e.g., by Bareinboim and Tian (2015) and Correa et al. (2019), which could provide valuable insights to extend our work and account for this aspect of the problem.

Finally, we remark that an additional piece of information, which we have not made use of in this work, but which could be useful to tackle the problem of selection bias, is given by the (unconditional) age demographic of the general population, which is available for most countries in the world, thus providing a straight-forward way to estimate . Note, in particular, that this can differ significantly from the case demographic as can be seen by comparing Tables 2 and 3.

Counterfactuals and causal sufficiency

Another important point is that mediation analysis (in particular, computing NDE and NIE) entails counterfactual reasoning and can thus only be performed under strong assumptions. One such assumption that was crucial for our analysis based on purely observational data is causal sufficiency, i.e., the absence of hidden confounding. However, causal sufficiency is not strictly necessary for identifying NDE and NIE, and, subject to the availability of experimental data, it can potentially be replaced by a weaker set of assumptions. Our analysis is furthermore simplified by the fact that both treatment and mediator are categorical variables, though identifiability can also be shown for certain more complicated settings; we refer to Pearl (2001) for further discussion.

7 Conclusions

Using the contemporary example of comparing Covid-19 CFRs between China and Italy, we have illustrated how methods from causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes and answer various causal questions from data regarding the current pandemic. As for any causal analysis, this required to start from a set of assumptions about the data generating process. While our modelling assumptions are admittedly an oversimplification of the actual underlying phenomena—we leave the interpretation of our results to epidemiologists and experts from other related fields—we hope that our exposition helps clarify how mediation analysis can be used to investigate direct and indirect effects along different causal paths. The purpose of our work is mainly educational, and we hope that it serves as a stepping stone for further in-depth analyses.

Interactive notebook and data

Together with this report, we publicly release the following interactive Jupyter notebook which contains the data used in our analysis (and additional data), as well as Python code to compute different (direct and indirect) causal effects in Markovian systems, which can be used to reproduce all of our results and serve as a basis for further exploration.

Request for additional data and feedback

The present report constitutes a preprint of ongoing work. We plan to extend our analysis to other countries and are actively looking for data sources consistent with the format of Table 1. If you know of any such relevant data and with any other feedback, please get in touch with us via email: {jvk, luigi.gresele, bs}


We are grateful to Elias Bareinboim for very detailed and insightful feedback on an earlier version of this manuscript.


  • B. Armocida, B. Formenti, S. Ussai, F. Palestra, and E. Missoni (2020) The Italian health system and the COVID-19 challenge. The Lancet Public Health. Cited by: §5.
  • E. Bareinboim and J. Tian (2015) Recovering causal effects from selection bias. In

    Twenty-Ninth AAAI Conference on Artificial Intelligence

    Cited by: §6.
  • N. Cartwright et al. (1994) Nature’s capacities and their measurement. OUP Catalogue. Cited by: §3.
  • Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (2020)

    The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCov and naming it SARS-CoV-2

    Nature Microbiology. Cited by: §1.
  • J. Correa, J. Tian, and E. Bareinboim (2019) Adjustment criteria for generalizing experimental findings. In

    International Conference on Machine Learning

    pp. 1361–1369. Cited by: §6.
  • Istituto Superiore di Sanità (2020) Epidemia COVID-19: Aggiornamento nazionale, 09 marzo 2020 – ore 16:00. External Links: Link Cited by: Figure 5, Figure 1, Table 1, §2, Figure 3.
  • J. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, et al. (2008) Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS medicine 5 (3). Cited by: §2.
  • J. Pearl and D. Mackenzie (2018) The book of why: the new science of cause and effect. Basic Books. Cited by: §4.6.
  • J. Pearl (2001) Direct and indirect effects. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 411–420. Cited by: §4, §6, footnote 4.
  • J. Pearl (2009) Causality. Cambridge university press. Cited by: §3.2.
  • D. D. Rajgor, M. H. Lee, S. Archuleta, N. Bagdasarian, and S. C. Quek (2020) The many estimates of the COVID-19 case fatality rate. The Lancet Infectious Diseases. Cited by: §6.
  • H. Reichenbach (1956) The direction of time. Cited by: Principle 1.
  • E. H. Simpson (1951)

    The interpretation of interaction in contingency tables

    Journal of the Royal Statistical Society: Series B (Methodological) 13 (2), pp. 238–241. Cited by: §2.
  • WHO (2020) Statement on the second meeting of the international health regulations (2005) emergency committee regarding the outbreak of novel coronavirus (2019-ncov). External Links: Link Cited by: §1.
  • Z. Wu and J. M. McGoogan (2020) Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. Jama. Cited by: Figure 5, Figure 1, Table 1, §2, Figure 3, §5.

Appendix A Additional Figures

Figure 5: Evolution of TCE, NDE, and NIE of changing country () on total CFR over time. Calculations compare (static) data from China based on the large scale study of Wu and McGoogan (2020) with different snapshots from Italy reported by Istituto Superiore di Sanità (2020) between 9 March and 7 May 2020.
Figure 6: Ratios between the proportion of the general population within each age group (red) and the proportion of confirmed cases by age group (green) between Italy and China.
Figure 7: Different snapshots case demographic in Italy over time.
Figure 8: Different snapshots of CFRs by age group in Italy over time.