Detecting New Word Meanings: A Comparison of Word Embedding Models in Spanish

01/12/2020
by   Andrés Torres-Rivera, et al.
0

Semantic neologisms (SN) are defined as words that acquire a new word meaning while maintaining their form. Given the nature of this kind of neologisms, the task of identifying these new word meanings is currently performed manually by specialists at observatories of neology. To detect SN in a semi-automatic way, we developed a system that implements a combination of the following strategies: topic modeling, keyword extraction, and word sense disambiguation. The role of topic modeling is to detect the themes that are treated in the input text. Themes within a text give clues about the particular meaning of the words that are used, for example: viral has one meaning in the context of computer science (CS) and another when talking about health. To extract keywords, we used TextRank with POS tag filtering. With this method, we can obtain relevant words that are already part of the Spanish lexicon. We use a deep learning model to determine if a given keyword could have a new meaning. Embeddings that are different from all the known meanings (or topics) indicate that a word might be a valid SN candidate. In this study, we examine the following word embedding models: Word2Vec, Sense2Vec, and FastText. The models were trained with equivalent parameters using Wikipedia in Spanish as corpora. Then we used a list of words and their concordances (obtained from our database of neologisms) to show the different embeddings that each model yields. Finally, we present a comparison of these outcomes with the concordances of each word to show how we can determine if a word could be a valid candidate for SN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2020

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

We propose a novel generative model to explore both local and global con...
research
08/23/2023

Semantic Change Detection for the Romanian Language

Automatic semantic change methods try to identify the changes that appea...
research
03/31/2021

Self-Supervised Euphemism Detection and Identification for Content Moderation

Fringe groups and organizations have a long history of using euphemisms–...
research
11/14/2019

Query Expansion for Patent Searching using Word Embedding and Professional Crowdsourcing

The patent examination process includes a search of previous work to ver...
research
08/04/2021

An analytical study of content and contexts of keywords on physics

This paper analysed author-assigned and title keywords into constituent ...
research
10/25/2018

The Logoscope: a Semi-Automatic Tool for Detecting and Documenting French New Words

In this article we present the design and implementation of the Logoscop...
research
02/20/2020

FrameAxis: Characterizing Framing Bias and Intensity with Word Embedding

We propose FrameAxis, a method of characterizing the framing of a given ...

Please sign up or login with your details

Forgot password? Click here to reset