Privately generating tabular data using language models

06/07/2023
by   Alexandre Sablayrolles, et al.
0

Privately generating synthetic data from a table is an important brick of a privacy-first world. We propose and investigate a simple approach of treating each row in a table as a sentence and training a language model with differential privacy. We show this approach obtains competitive results in modelling tabular data across multiple datasets, even at small scales that favor alternative methods based on marginal distributions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Differentially Private Language Models for Secure Data Sharing

To protect the privacy of individuals whose data is being shared, it is ...
research
10/25/2022

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Privacy concerns have attracted increasing attention in data-driven prod...
research
05/29/2018

Table-to-Text: Describing Table Region with Natural Language

In this paper, we present a generative model to generate a natural langu...
research
05/04/2022

Provably Confidential Language Modelling

Large language models are shown to memorize privacy information such as ...
research
08/18/2021

Table Caption Generation in Scholarly Documents Leveraging Pre-trained Language Models

This paper addresses the problem of generating table captions for schola...
research
02/04/2023

REaLTabFormer: Generating Realistic Relational and Tabular Data using Transformers

Tabular data is a common form of organizing data. Multiple models are av...
research
10/04/2022

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Pretrained Language Models (LMs) memorize a vast amount of knowledge dur...

Please sign up or login with your details

Forgot password? Click here to reset