DeepAI AI Chat
Log In Sign Up

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

by   Elena Alvarez-Mellado, et al.
Brandeis University

The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish newswire.


page 1

page 2

page 3

page 4


Creation of an Annotated Corpus of Spanish Radiology Reports

This paper presents a new annotated corpus of 513 anonymized radiology r...

FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Quotation extraction is a widely useful task both from a sociological an...

Nominal Compound Chain Extraction: A New Task for Semantic-enriched Lexical Chain

Lexical chain consists of cohesion words in a document, which implies th...

CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

In this paper, we present CrudeOilNews, a corpus of English Crude Oil ne...

Emotion Stimulus Detection in German News Headlines

Emotion stimulus extraction is a fine-grained subtask of emotion analysi...

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

This work presents a new resource for borrowing identification and analy...

Possibilities, Challenges and Limits of a European Charters Corpus (Cartae Europae Medii Aevi - CEMA)

The objective of this paper is to present a meta-corpus of diplomatic do...

Code Repositories


Annotated corpus and CRF model for automatic extraction of anglicisms in Spanish newswire

view repo