CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus

08/28/2022
by   Raoul Blin, et al.
0

We present a free Japanese-French parallel corpus. It includes 15M aligned segments and is obtained by compiling and filtering several existing resources. In this paper, we describe the existing resources, their quantity and quality, the filtering we applied to improve the quality of the corpus, and the content of the ready-to-use corpus. We also evaluate the usefulness of this corpus and the quality of our filtering by training and evaluating some standard MT systems with it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2023

"A Little is Enough": Few-Shot Quality Estimation based Corpus Filtering improves Machine Translation

Quality Estimation (QE) is the task of evaluating the quality of a trans...
research
05/28/2020

A Corpus for Large-Scale Phonetic Typology

A major hurdle in data-driven research on typology is having sufficient ...
research
04/23/2023

NAIST-SIC-Aligned: Automatically-Aligned English-Japanese Simultaneous Interpretation Corpus

It remains a question that how simultaneous interpretation (SI) data aff...
research
10/29/2017

JESC: Japanese-English Subtitle Corpus

In this paper we describe the Japanese-English Subtitle Corpus (JESC). J...
research
07/06/2020

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

We present a new release of the Czech-English parallel corpus CzEng 2.0 ...
research
10/03/2016

An Arabic-Hebrew parallel corpus of TED talks

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT...
research
03/31/2020

MULTEXT-East

MULTEXT-East language resources, a multilingual dataset for language eng...

Please sign up or login with your details

Forgot password? Click here to reset