A note on 'Collider bias undermines our understanding of COVID-19 disease risk and severity' and how causal Bayesian networks both expose and resolve the problem
An important recent preprint by Griffiths et al highlights how 'collider bias' in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are overrepresented. For example, collider bias caused by smokers being underrepresented in the dataset can explain empirical results which claim that smoking reduces the risk of COVID19. We extend the work of Griffiths et al making more explicit use of graphical causal models to interpret observed data. We show yhat their smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like 'stress' which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the biased dataset and the fact that 'healthcare worker' is a confounding variable, it is likely that studies will wrongly conclude that stress reduces rather than increases the risk of COVID19. To avoid such erroneous conclusions, any analysis of observational data must take account of the underlying causal structure including colliders and confounders. If analysts fail to do this explicitly then any conclusions they make about the effect of specific risk factors on COVID19 are likely to be flawed if they are based only on data from people who have been tested.
READ FULL TEXT