Generative Synthesis of Insurance Datasets

by   Kevin Kuo, et al.

One of the impediments in advancing actuarial research and developing open source assets for insurance analytics is the lack of realistic publicly available datasets. In this work, we develop a workflow for synthesizing insurance datasets leveraging state-of-the-art neural network techniques. We evaluate the predictive modeling efficacy of datasets synthesized from publicly available data in the domains of general insurance pricing and life insurance shock lapse modeling. The trained synthesizers are able to capture representative characteristics of the real datasets. This workflow is implemented via an R interface to promote adoption by researchers and data owners.


page 1

page 2

page 3

page 4


Objects of violence: synthetic data for practical ML in human rights investigations

We introduce a machine learning workflow to search for, identify, and me...

Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform

We present a new workflow to create components for the MaryTTS text-to-s...

Artimate: an articulatory animation framework for audiovisual speech synthesis

We present a modular framework for articulatory animation synthesis usin...

A Collection and Categorization of Open-Source Wind and Wind Power Datasets

Wind power and other forms of renewable energy sources play an ever more...

Independent evaluation of state-of-the-art deep networks for mammography

Deep neural models have shown remarkable performance in image recognitio...

Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

The detection of online cyberbullying has seen an increase in societal i...

Towards Realistic Single-Task Continuous Learning Research for NER

There is an increasing interest in continuous learning (CL), as data pri...

Code Repositories


Generative Synthesis of Insurance Datasets

view repo

1 Introduction

2 Methodology

3 Application Examples

4 Workflow and Privacy

5 Conclusion


  • Bishop [2006] C. M. Bishop. Pattern recognition and machine learning. Springer Science+ Business Media, 2006.
  • Charpentier [2014] A. Charpentier. Computational Actuarial Science with R. CRC Press, Aug. 2014. ISBN 978-1-4665-9260-5.
  • Chen et al. [2019] D. Chen, N. Yu, Y. Zhang, and M. Fritz. Gan-leaks: A taxonomy of membership inference attacks against gans. arXiv preprint arXiv:1909.03935, 2019.
  • Choi et al. [2017] E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun. Generating multi-label discrete patient records using generative adversarial networks, 2017.
  • Dwork et al. [2014] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  • Esteban et al. [2017] C. Esteban, S. L. Hyland, and G. Rätsch. Real-valued (medical) time series generation with recurrent conditional gans, 2017.
  • Gabrielli and Wüthrich [2018] A. Gabrielli and M. V. Wüthrich. An individual claims history simulation machine. Risks, 6(2):29, 2018.
  • Goodfellow et al. [2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014. URL
  • Gulrajani et al. [2017] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of wasserstein gans, 2017.
  • Hayes et al. [2017] J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro. Logan: Membership inference attacks against generative models, 2017.
  • Hilprecht et al. [2019] B. Hilprecht, M. Härterich, and D. Bernau. Reconstruction and membership inference attacks against generative models, 2019.
  • Kueker et al. [2014] D. Kueker, T. Rozar, M. Cusumano, S. Willeat, and R. Xu. Report on the lapse and mortality experience of post-level premium period term plans. Technical report, Society of Actuaries, May 2014. URL
  • Lin et al. [2017] Z. Lin, A. Khetan, G. Fanti, and S. Oh. Pacgan: The power of two samples in generative adversarial networks, 2017.
  • Noll et al. [2018] A. Noll, R. Salzmann, and M. V. Wuthrich. Case Study: French Motor Third-Party Liability Claims. SSRN Scholarly Paper ID 3164764, Social Science Research Network, Rochester, NY, Nov. 2018.
  • Park et al. [2018] N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y. Kim. Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment, 11(10):1071–1083, Jun 2018. ISSN 2150-8097. doi: 10.14778/3231751.3231757. URL
  • Richman [2018] R. Richman. Ai in actuarial science. 2018.
  • Wüthrich and Buser [2019] M. V. Wüthrich and C. Buser. Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper, (16-68), 2019.
  • Xie et al. [2018] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
  • Xu et al. [2019] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni. Modeling Tabular data using Conditional GAN. arXiv:1907.00503 [cs, stat], June 2019.
  • Yoon et al. [2019] J. Yoon, J. Jordon, and M. van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2019. URL