Idiomatic Expression Paraphrasing without Strong Supervision

12/16/2021
by   Jianing Zhou, et al.
0

Idiomatic expressions (IEs) play an essential role in natural language. In this paper, we study the task of idiomatic sentence paraphrasing (ISP), which aims to paraphrase a sentence with an IE by replacing the IE with its literal paraphrase. The lack of large-scale corpora with idiomatic-literal parallel sentences is a primary challenge for this task, for which we consider two separate solutions. First, we propose an unsupervised approach to ISP, which leverages an IE's contextual information and definition and does not require a parallel sentence training set. Second, we propose a weakly supervised approach using back-translation to jointly perform paraphrasing and generation of sentences with IEs to enlarge the small-scale parallel sentence training dataset. Other significant derivatives of the study include a model that replaces a literal phrase in a sentence with an IE to generate an idiomatic expression and a large scale parallel dataset with idiomatic/literal sentence pairs. The effectiveness of the proposed solutions compared to competitive baselines is seen in the relative gains of over 5.16 points in BLEU, over 8.75 points in METEOR, and over 19.57 points in SARI when the generated sentences are empirically validated on a parallel dataset using automatic and manual evaluations. We demonstrate the practical utility of ISP as a preprocessing step in En-De machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

The availability of parallel sentence simplification (SS) is scarce for ...
research
06/13/2018

Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

Parallel sentence extraction is a task addressing the data sparsity prob...
research
05/24/2018

Filtering and Mining Parallel Data in a Joint Multilingual Space

We learn a joint multilingual sentence embedding and use the distance be...
research
09/29/2015

Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

Parallel sentences are a relatively scarce but extremely useful resource...
research
09/10/2021

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

An important task in NLP applications such as sentence simplification is...
research
11/10/2022

Assistive Completion of Agrammatic Aphasic Sentences: A Transfer Learning Approach using Neurolinguistics-based Synthetic Dataset

Damage to the inferior frontal gyrus (Broca's area) can cause agrammatic...
research
11/14/2018

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling

In real-world applications of natural language generation, there are oft...

Please sign up or login with your details

Forgot password? Click here to reset