Foundations of Bayesian Learning from Synthetic Data

11/16/2020
by   Harrison Wilde, et al.
21

There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for situations where a researcher wishes to augment real data with another party's synthesised data. We use a Bayesian paradigm to characterise the updating of model parameters when learning in these settings, demonstrating that caution should be taken when applying conventional learning algorithms without appropriate consideration of the synthetic data generating process and learning task. Recent results from general Bayesian updating support a novel and robust approach to Bayesian synthetic-learning founded on decision theory that outperforms standard approaches across repeated experiments on supervised learning and inference problems.

READ FULL TEXT

page 26

page 27

page 28

research
03/11/2015

Learning Classifiers from Synthetic Data Using a Multichannel Autoencoder

We propose a method for using synthetic data to help learning classifier...
research
05/26/2023

On Consistent Bayesian Inference from Synthetic Data

Generating synthetic data, with or without differential privacy, has att...
research
03/10/2022

Conditional Synthetic Data Generation for Personal Thermal Comfort Models

Personal thermal comfort models aim to predict an individual's thermal c...
research
11/13/2021

HydraGAN A Multi-head, Multi-objective Approach to Synthetic Data Generation

Synthetic data generation overcomes limitations of real-world machine le...
research
07/06/2018

A Bayesian Framework for Non-Collapsible Models

In this paper, we discuss the non-collapsibility concept and propose a n...
research
09/20/2019

BinarySDG: binary sensor data generation with R

The scarcity of Smart Home data is still a pretty big problem, and in a ...
research
09/17/2023

Fully Synthetic Data for Complex Surveys

When seeking to release public use files for confidential data, statisti...

Please sign up or login with your details

Forgot password? Click here to reset