Finnish Paraphrase Corpus

03/24/2021
by   Jenna Kanerva, et al.
0

In this paper, we introduce the first fully manually annotated paraphrase corpus for Finnish containing 53,572 paraphrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98 context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both cost and quality.

READ FULL TEXT

page 6

page 7

research
11/27/2019

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

This paper introduces the SAMSum Corpus, a new dataset with abstractive ...
research
06/25/2021

Manually Annotated Spelling Error Corpus for Amharic

This paper presents a manually annotated spelling error corpus for Amhar...
research
09/15/2021

The ELITR ECA Corpus

We present the ELITR ECA corpus, a multilingual corpus derived from publ...
research
09/18/2022

Evolution of a Web-Scale Near Duplicate Image Detection System

Detecting near duplicate images is fundamental to the content ecosystem ...
research
09/04/2019

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Text-rich heterogeneous information networks (text-rich HINs) are ubiqui...
research
08/02/2021

The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms

The RareDis corpus contains more than 5,000 rare diseases and almost 6,0...
research
02/28/2016

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...

Please sign up or login with your details

Forgot password? Click here to reset