Learning language variations in news corpora through differential embeddings

11/13/2020
by   Carlos Selmo, et al.
0

There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2019

Learning dynamic word embeddings with drift regularisation

Word usage, meaning and connotation change throughout time. Diachronic w...
research
11/06/2017

Evaluation of Croatian Word Embeddings

Croatian is poorly resourced and highly inflected language from Slavic l...
research
10/02/2020

Enriching Word Embeddings with Temporal and Spatial Information

The meaning of a word is closely linked to sociocultural factors that ca...
research
02/14/2020

Semantic Relatedness and Taxonomic Word Embeddings

This paper connects a series of papers dealing with taxonomic word embed...
research
12/25/2017

Generative Adversarial Nets for Multiple Text Corpora

Generative adversarial nets (GANs) have been successfully applied to the...
research
08/31/2020

Discovering Bilingual Lexicons in Polyglot Word Embeddings

Bilingual lexicons and phrase tables are critical resources for modern M...
research
10/21/2022

Discovering Differences in the Representation of People using Contextualized Semantic Axes

A common paradigm for identifying semantic differences across social and...

Please sign up or login with your details

Forgot password? Click here to reset