HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment

03/16/2020
by   Anssi Yli-Jyrä, et al.
0

Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher's permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the cross-lingual morpheme alignments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2014

Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge

We present a method for coarse-grained cross-lingual alignment of compar...
research
09/19/2018

Unsupervised cross-lingual matching of product classifications

Unsupervised cross-lingual embeddings mapping has provided a unique tool...
research
05/18/2019

Cross-referencing using Fine-grained Topic Modeling

Cross-referencing, which links passages of text to other related passage...
research
12/02/2015

Annotating Character Relationships in Literary Texts

We present a dataset of manually annotated relationships between charact...
research
09/13/2021

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...
research
08/02/2023

Chat Translation Error Detection for Assisting Cross-lingual Communications

In this paper, we describe the development of a communication support sy...
research
09/01/2021

Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast

In this paper, we propose to align sentence representations from differe...

Please sign up or login with your details

Forgot password? Click here to reset