Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models

08/26/2023
by   Shuang Li, et al.
0

To translate well, machine translation (MT) systems and general-purposed language models (LMs) need a deep understanding of both source and target languages and cultures. Therefore, idioms, with their non-compositional nature, pose particular challenges for Transformer-based systems, as literal translations often miss the intended meaning. Traditional methods, which replace idioms using existing knowledge bases (KBs), often lack scale and context awareness. Addressing these challenges, our approach prioritizes context awareness and scalability, allowing for offline storage of idioms in a manageable KB size. This ensures efficient serving with smaller models and provides a more comprehensive understanding of idiomatic expressions. We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. This KB facilitates better translation by smaller models, such as BLOOMZ (7.1B), Alpaca (7B), and InstructGPT (6.7B), by retrieving idioms' figurative meanings. We present a novel, GPT-4-powered metric for human-aligned evaluation, demonstrating that IdiomKB considerably boosts model performance. Human evaluations further validate our KB's quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Do GPTs Produce Less Literal Translations?

Large Language Models (LLMs) such as GPT-3 have emerged as general-purpo...
research
08/02/2023

Do Multilingual Language Models Think Better in English?

Translate-test is a popular technique to improve the performance of mult...
research
11/16/2022

Prompting PaLM for Translation: Assessing Strategies and Performance

Large language models (LLMs) that have been trained on multilingual but ...
research
05/30/2022

Can Transformer be Too Compositional? Analysing Idiom Processing in Neural Machine Translation

Unlike literal expressions, idioms' meanings do not directly follow from...
research
01/26/2018

Context Models for OOV Word Translation in Low-Resource Languages

Out-of-vocabulary word translation is a major problem for the translatio...
research
02/25/2019

Lost in Machine Translation: A Method to Reduce Meaning Loss

A desideratum of high-quality translation systems is that they preserve ...
research
01/31/2018

Paraphrase-Supervised Models of Compositionality

Compositional vector space models of meaning promise new solutions to st...

Please sign up or login with your details

Forgot password? Click here to reset