Milimili. Collecting Parallel Data via Crowdsourcing

07/23/2023
by   Alexander Antonov, et al.
0

We present a methodology for gathering a parallel corpus through crowdsourcing, which is more cost-effective than hiring professional translators, albeit at the expense of quality. Additionally, we have made available experimental parallel data collected for Chechen-Russian and Fula-English language pairs.

READ FULL TEXT

page 1

page 2

page 3

research
02/15/2021

Crowdsourcing Parallel Corpus for English-Oromo Neural Machine Translation using Community Engagement Platform

Even though Afaan Oromo is the most widely spoken language in the Cushit...
research
10/25/2018

An Incremental Truth Inference Approach to Aggregate Crowdsourcing Contributions in Games with a Purpose

We introduce our approach for incremental truth inference over the contr...
research
06/17/2022

Crowdsourcing Relative Rankings of Multi-Word Expressions: Experts versus Non-Experts

In this study we investigate to which degree experts and non-experts agr...
research
07/06/2020

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

We present a new release of the Czech-English parallel corpus CzEng 2.0 ...
research
04/24/2020

TeleCrowd: A Crowdsourcing Approach to Create Informal to Formal Text Corpora

Crowdsourcing has been widely used recently as an alternative to traditi...
research
01/17/2022

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

In this paper we introduce PerPaDa, a Persian paraphrase dataset that is...
research
02/12/2017

Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

We present Octopus, an AI agent to jointly balance three conflicting tas...

Please sign up or login with your details

Forgot password? Click here to reset