On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

05/12/2022
by   Kabir Ahuja, et al.
0

Borrowing ideas from Production functions in micro-economics, in this paper we introduce a framework to systematically evaluate the performance and cost trade-offs between machine-translated and manually-created labelled data for task-specific fine-tuning of massively multilingual language models. We illustrate the effectiveness of our framework through a case-study on the TyDIQA-GoldP dataset. One of the interesting conclusions of the study is that if the cost of machine translation is greater than zero, the optimal performance at least cost is always achieved with at least some or only manually-created data. To our knowledge, this is the first attempt towards extending the concept of production functions to study data collection strategies for training multilingual models, and can serve as a valuable tool for other similar cost vs data trade-offs in NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2017

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

Linguistically diverse datasets are critical for training and evaluating...
research
01/17/2023

Which Model Shall I Choose? Cost/Quality Trade-offs for Text Classification Tasks

Industry practitioners always face the problem of choosing the appropria...
research
06/08/2023

The economic trade-offs of large language models: A case study

Contacting customer service via chat is a common practice. Because emplo...
research
09/07/2021

Don't Go Far Off: An Empirical Study on Neural Poetry Translation

Despite constant improvements in machine translation quality, automatic ...
research
05/25/2023

Towards Higher Pareto Frontier in Multilingual Machine Translation

Multilingual neural machine translation has witnessed remarkable progres...
research
03/29/2023

Did You Mean...? Confidence-based Trade-offs in Semantic Parsing

We illustrate how a calibrated model can help balance common trade-offs ...

Please sign up or login with your details

Forgot password? Click here to reset