Icelandic Parallel Abstracts Corpus

08/11/2021
by   Haukur Barri Símonarson, et al.
0

We present a new Icelandic-English parallel corpus, the Icelandic Parallel Abstracts Corpus (IPAC), composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

The ELITR ECA Corpus

We present the ELITR ECA corpus, a multilingual corpus derived from publ...
research
05/05/2019

A Parallel Corpus of Theses and Dissertations Abstracts

In Brazil, the governmental body responsible for overseeing and coordina...
research
07/06/2020

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

We present a new release of the Czech-English parallel corpus CzEng 2.0 ...
research
05/31/2023

Sentence Simplification Using Paraphrase Corpus for Initialization

Neural sentence simplification method based on sequence-to-sequence fram...
research
09/18/2020

Unsupervised Parallel Corpus Mining on Web Data

With a large amount of parallel data, neural machine translation systems...
research
02/28/2016

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...
research
03/04/2016

Parallel Texts in the Hebrew Bible, New Methods and Visualizations

In this article we develop an algorithm to detect parallel texts in the ...

Please sign up or login with your details

Forgot password? Click here to reset