Creating a contemporary corpus of similes in Serbian by using natural language processing

11/22/2018
by   Nikola Milosevic, et al.
0

Simile is a figure of speech that compares two things through the use of connection words, but where comparison is not intended to be taken literally. They are often used in everyday communication, but they are also a part of linguistic cultural heritage. In this paper we present a methodology for semi-automated collection of similes from the World Wide Web using text mining and machine learning techniques. We expanded an existing corpus by collecting 442 similes from the internet and adding them to the existing corpus collected by Vuk Stefanovic Karadzic that contained 333 similes. We, also, introduce crowdsourcing to the collection of figures of speech, which helped us to build corpus containing 787 unique similes.

READ FULL TEXT

page 7

page 8

page 9

research
05/20/2016

As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in Serbian

Similes are natural language expressions used to compare unlikely things...
research
01/14/2022

Multilingual Open Text 1.0: Public Domain News in 44 Languages

We present a new multilingual corpus containing text in 44 languages, ma...
research
02/18/2016

Corpus analysis without prior linguistic knowledge - unsupervised mining of phrases and subphrase structure

When looking at the structure of natural language, "phrases" and "words"...
research
12/19/2019

Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

This paper briefly reports our ongoing attempt at the development of a m...
research
06/22/1999

Resolving Part-of-Speech Ambiguity in the Greek Language Using Learning Techniques

This article investigates the use of Transformation-Based Error-Driven l...
research
11/19/2018

The Mafiascum Dataset: A Large Text Corpus for Deception Detection

Detecting deception in natural language has a wide variety of applicatio...
research
06/14/2021

Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus

We introduced the contemporary Amharic corpus, which is automatically ta...

Please sign up or login with your details

Forgot password? Click here to reset