Sequential Models in the Synthetic Data Vault

07/28/2022
by   Kevin Zhang, et al.
3

The goal of this paper is to describe a system for generating synthetic sequential data within the Synthetic data vault. To achieve this, we present the Sequential model currently in SDV, an end-to-end framework that builds a generative model for multi-sequence, real-world data. This includes a novel neural network-based machine learning model, conditional probabilistic auto-regressive (CPAR) model. The overall system and the model is available in the open source Synthetic Data Vault (SDV) library https://github.com/sdv-dev/SDV, along with a variety of other models for different synthetic data needs. After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN. To compare the sequential synthetic data against its real counterpart, we invented a new metric called Multi-Sequence Aggregate Similarity (MSAS). We used it to conclude that our Sequential SDV model learns higher level patterns than non-sequential models without any trade-offs in synthetic data quality.

READ FULL TEXT

page 3

page 4

page 5

page 14

research
01/08/2019

Autoencoders and Generative Adversarial Networks for Anomaly Detection for Sequences

We introduce synthetic oversampling in anomaly detection for multi-featu...
research
04/24/2023

A Study on Improving Realism of Synthetic Data for Machine Learning

Synthetic-to-real data translation using generative adversarial learning...
research
10/16/2022

Comparing Synthetic Tabular Data Generation Between a Probabilistic Model and a Deep Learning Model for Education Use Cases

The ability to generate synthetic data has a variety of use cases across...
research
02/08/2023

Machine Learning for Synthetic Data Generation: a Review

Data plays a crucial role in machine learning. However, in real-world ap...
research
03/02/2017

Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

We draw a formal connection between using synthetic training data to opt...
research
11/14/2022

Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes

Sum-product networks (SPNs) have recently emerged as a novel deep learni...
research
05/19/2023

TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series

Temporally indexed data are essential in a wide range of fields and of i...

Please sign up or login with your details

Forgot password? Click here to reset