Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection

10/19/2022
by   Elisa Sanchez-Bayona, et al.
0

The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking. This means that most research on supervised metaphor detection has been published only for that language. In order to address this issue, this work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection. The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews. In order to label CoMeta, we apply the MIPVU method, the guidelines most commonly used to systematically annotate metaphor on real data. We use our newly created dataset to provide competitive baselines by fine-tuning several multilingual and monolingual state-of-the-art large language models. Furthermore, by leveraging the existing VUAM English data in addition to CoMeta, we present the, to the best of our knowledge, first cross-lingual experiments on supervised metaphor detection. Finally, we perform a detailed error analysis that explores the seemingly high transfer of everyday metaphor across these two languages and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

Czech Dataset for Cross-lingual Subjectivity Classification

In this paper, we introduce a new Czech subjectivity dataset of 10k manu...
research
05/29/2023

A Corpus for Sentence-level Subjectivity Detection on English News Articles

We present a novel corpus for subjectivity detection at the sentence lev...
research
05/10/2023

Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

We present Vārta, a large-scale multilingual dataset for headline genera...
research
03/31/2020

Multilingual Stance Detection: The Catalonia Independence Corpus

Stance detection aims to determine the attitude of a given text with res...
research
04/03/2023

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification

Current research on hate speech analysis is typically oriented towards m...
research
01/22/2023

Ensemble Transfer Learning for Multilingual Coreference Resolution

Entity coreference resolution is an important research problem with many...
research
11/06/2021

Linguistic Cues of Deception in a Multilingual April Fools' Day Context

In this work we consider the collection of deceptive April Fools' Day(AF...

Please sign up or login with your details

Forgot password? Click here to reset