Simulation Framework for Realistic Large-scale Individual-level Health Data Generation

by   Santtu Tikka, et al.

We propose a general framework for realistic data generation and simulation of complex systems in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which ensure reproducibility and scalability. With the framework it is possible to run daily-level simulations for populations of millions individuals for decades of simulated time. An example on the occurrence of stroke, type 2 diabetes and mortality illustrates the usage of the framework in the Finnish context. In the example, we demonstrate the data-collection functionality by studying the impact of non-participation on the estimated risk models.


page 1

page 2

page 3

page 4


Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

High-quality and large-scale data are key to success for AI systems. How...

Towards realistic HPC models of the neuromuscular system

Realistic simulations of detailed, biophysics-based, multi-scale models ...

Consent verification monitoring

Advances in service personalization are driven by low-cost data collecti...

Constructing synthetic populations in the age of big data

To develop public health intervention models using microsimulations, ext...

Large scale simulation of pressure induced phase-field fracture propagation using Utopia

Non-linear phase field models are increasingly used for the simulation o...

Analysis of zero inflated dichotomous variables from a Bayesian perspective: Application to occupational health

This work proposes a new methodology to fit zero inflated Bernoulli data...

The necessity and power of random, under-sampled experiments in biology

A vast array of transformative technologies developed over the past deca...