The ELITR ECA Corpus

09/15/2021
by   Philip Williams, et al.
0

We present the ELITR ECA corpus, a multilingual corpus derived from publications of the European Court of Auditors. We use automatic translation together with Bleualign to identify parallel sentence pairs in all 506 translation directions. The result is a corpus comprising 264k document pairs and 41.9M sentence pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2021

Icelandic Parallel Abstracts Corpus

We present a new Icelandic-English parallel corpus, the Icelandic Parall...
research
05/31/2023

Sentence Simplification Using Paraphrase Corpus for Initialization

Neural sentence simplification method based on sequence-to-sequence fram...
research
04/11/2018

Generating Multilingual Parallel Corpus Using Subtitles

Neural Machine Translation with its significant results, still has a gre...
research
09/17/2018

Open Subtitles Paraphrase Corpus for Six Languages

This paper accompanies the release of Opusparcus, a new paraphrase corpu...
research
02/28/2016

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...
research
03/24/2021

Finnish Paraphrase Corpus

In this paper, we introduce the first fully manually annotated paraphras...
research
12/04/2019

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

We present the construction of an annotated corpus of PubMed abstracts r...

Please sign up or login with your details

Forgot password? Click here to reset