vec2text with Round-Trip Translations

09/14/2022
by   Geoffrey Cideron, et al.
0

We investigate models that can generate arbitrary natural language text (e.g. all English sentences) from a bounded, convex and well-behaved control space. We call them universal vec2text models. Such models would allow making semantic decisions in the vector space (e.g. via reinforcement learning) while the natural language generation is handled by the vec2text model. We propose four desired properties: universality, diversity, fluency, and semantic structure, that such vec2text models should possess and we provide quantitative and qualitative methods to assess them. We implement a vec2text model by adding a bottleneck to a 250M parameters Transformer model and training it with an auto-encoding objective on 400M sentences (10B tokens) extracted from a massive web corpus. We propose a simple data augmentation technique based on round-trip translations and show in extensive experiments that the resulting vec2text model surprisingly leads to vector spaces that fulfill our four desired properties and that this model strongly outperforms both standard and denoising auto-encoders.

READ FULL TEXT

page 27

page 29

page 35

research
04/21/2018

Unsupervised Natural Language Generation with Denoising Autoencoders

Generating text from structured data is important for various tasks such...
research
05/12/2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many ...
research
02/19/2021

Multilingual Augmenter: The Model Chooses

Natural Language Processing (NLP) relies heavily on training data. Trans...
research
06/09/2021

Auto-tagging of Short Conversational Sentences using Natural Language Processing Methods

In this study, we aim to find a method to auto-tag sentences specific to...
research
09/06/2022

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Understanding the intention of the users and recognizing the semantic en...
research
05/13/2016

Semantic Spaces

Any natural language can be considered as a tool for producing large dat...
research
09/15/2015

Splitting Compounds by Semantic Analogy

Compounding is a highly productive word-formation process in some langua...

Please sign up or login with your details

Forgot password? Click here to reset