CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

10/13/2022
by   Andy Rosenbaum, et al.
0

A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data. Given the complexity and cost of human annotation for SP, labeled data is often scarce, particularly in multilingual settings. Large Language Models (LLMs) excel at SP given only a few examples, however LLMs are unsuitable for runtime systems which require low latency. In this work, we propose CLASP, a simple method to improve low-resource SP for moderate-sized models: we generate synthetic data from AlexaTM 20B to augment the training set for a model 40x smaller (500M parameters). We evaluate on two datasets in low-resource settings: English PIZZA, containing either 348 or 16 real examples, and mTOP cross-lingual zero-shot, where training data is available only in English, and the model must generalize to four new languages. On both datasets, we show significant improvements over strong baseline methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2023

Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing

Cross-lingual semantic parsing transfers parsing capability from a high-...
research
04/26/2023

Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Obtaining and annotating data can be expensive and time-consuming, espec...
research
02/14/2022

Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

Keyword extraction is the task of retrieving words that are essential to...
research
05/23/2023

LLM-powered Data Augmentation for Enhanced Crosslingual Performance

This paper aims to explore the potential of leveraging Large Language Mo...
research
09/25/2021

Language Model Priming for Cross-Lingual Event Extraction

We present a novel, language-agnostic approach to "priming" language mod...
research
04/10/2022

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Despite the advances in digital healthcare systems offering curated stru...
research
11/14/2022

Language Agnostic Code-Mixing Data Augmentation by Predicting Linguistic Patterns

In this work, we focus on intrasentential code-mixing and propose severa...

Please sign up or login with your details

Forgot password? Click here to reset