Contextualized Word Vector-based Methods for Discovering Semantic Differences with No Training nor Word Alignment

05/19/2023
by   Ryo Nagata, et al.
0

In this paper, we propose methods for discovering semantic differences in words appearing in two corpora based on the norms of contextualized word vectors. The key idea is that the coverage of meanings is reflected in the norm of its mean word vector. The proposed methods do not require the assumptions concerning words and corpora for comparison that the previous methods do. All they require are to compute the mean vector of contextualized word vectors and its norm for each word type. Nevertheless, they are (i) robust for the skew in corpus size; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora for comparison. We show these advantages for native and non-native English corpora and also for historical corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

Building a language evolution tree based on word vector combination model

In this paper, we try to explore the evolution of language through case ...
research
07/15/2016

Enriching Word Vectors with Subword Information

Continuous word representations, trained on large unlabeled corpora are ...
research
01/19/2018

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

In this paper, we present a distributional word embedding model trained ...
research
11/19/2015

Joint Word Representation Learning using a Corpus and a Semantic Lexicon

Methods for learning word representations using large text corpora have ...
research
04/29/2020

Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation

State-of-the-art methods for Word Sense Disambiguation (WSD) combine two...
research
04/30/2020

Word Rotator's Distance: Decomposing Vectors Gives Better Representations

One key principle for assessing semantic similarity between texts is to ...
research
12/28/2021

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that...

Please sign up or login with your details

Forgot password? Click here to reset