ARPA: Armenian Paraphrase Detection Corpus and Models

09/26/2020
by   Arthur Malajyan, et al.
0

In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERTbased models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

We show that margin-based bitext mining in a multilingual sentence space...
research
05/24/2022

Lack of Fluency is Hurting Your Translation Model

Many machine translation models are trained on bilingual corpus, which c...
research
09/04/2020

ViS-Á-ViS : Detecting Similar Patterns in Annotated Literary Text

We present a web-based system called ViS-Á-ViS aiming to assist literary...
research
05/20/2018

The UN Parallel Corpus Annotated for Translation Direction

This work distinguishes between translated and original text in the UN p...
research
12/28/2018

Identifying Computer-Translated Paragraphs using Coherence Features

We have developed a method for extracting the coherence features from a ...
research
10/21/2022

CEFR-Based Sentence Difficulty Annotation and Assessment

Controllable text simplification is a crucial assistive technique for la...
research
06/10/2021

Parallel Deep Learning-Driven Sarcasm Detection from Pop Culture Text and English Humor Literature

Sarcasm is a sophisticated way of wrapping any immanent truth, mes-sage,...

Please sign up or login with your details

Forgot password? Click here to reset