Automating Behavioral Testing in Machine Translation

09/05/2023
by   Javier Ferrando, et al.
0

Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2022

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

Traditional machine translation (MT) metrics provide an average measure ...
research
02/16/2023

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Machine translation (MT) requires a wide range of linguistic capabilitie...
research
11/16/2022

Prompting PaLM for Translation: Assessing Strategies and Performance

Large language models (LLMs) that have been trained on multilingual but ...
research
07/11/2023

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

A challenge towards developing NLP systems for the world's languages is ...
research
09/14/2023

ChatGPT MT: Competitive for High- (but not Low-) Resource Languages

Large language models (LLMs) implicitly learn to perform a range of lang...
research
08/07/2023

AI Text-to-Behavior: A Study In Steerability

The research explores the steerability of Large Language Models (LLMs), ...
research
03/24/2023

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods

Large language models (LLMs) are currently at the forefront of intertwin...

Please sign up or login with your details

Forgot password? Click here to reset