CST5: Data Augmentation for Code-Switched Semantic Parsing

11/14/2022
by   Anmol Agarwal, et al.
0

Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set (≈100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data, both intrinsically (per human evaluation) and extrinsically by comparing baseline models which are trained without data augmentation to models which are trained with augmented data. Empirically we observe that using CST5, one can achieve the same semantic parsing performance by using up to 20x less labeled data. To aid further research in this area, we are also releasing (a) Hinglish-TOP, the largest human annotated code-switched semantic parsing dataset to date, containing 10k human annotated Hindi-English (Hinglish) code-switched utterances, and (b) Over 170K CST5 generated code-switched utterances from the TOPv2 dataset. Human evaluation shows that both the human annotated data as well as the CST5 generated data is of good quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2021

El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing

Being able to parse code-switched (CS) utterances, such as Spanish+Engli...
research
12/16/2021

Few-Shot Semantic Parsing with Language Models Trained On Code

Large language models, prompted with in-context examples, can perform se...
research
05/18/2022

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

We introduce a novel setup for low-resource task-oriented semantic parsi...
research
06/20/2018

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

Semantic parsing is the task of transducing natural language (NL) uttera...
research
07/25/2023

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

In this paper, we conduct a holistic exploration of the Universal Decomp...
research
06/29/2021

Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis

While state-of-the-art NLP models have been achieving the excellent perf...
research
05/09/2022

Few-shot Mining of Naturally Occurring Inputs and Outputs

Creating labeled natural language training data is expensive and require...

Please sign up or login with your details

Forgot password? Click here to reset