Balancing the composition of word embeddings across heterogenous data sets

01/14/2020
by   Stephanie Brandl, et al.
0

Word embeddings capture semantic relationships based on contextual information and are the basis for a wide variety of natural language processing applications. Notably these relationships are solely learned from the data and subsequently the data composition impacts the semantic of embeddings which arguably can lead to biased word vectors. Given qualitatively different data subsets, we aim to align the influence of single subsets on the resulting word vectors, while retaining their quality. In this regard we propose a criteria to measure the shift towards a single data subset and develop approaches to meet both objectives. We find that a weighted average of the two subset embeddings balances the influence of those subsets while word similarity performance decreases. We further propose a promising optimization approach to balance influences and quality of word embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing a...
research
08/20/2016

Learning Word Embeddings from Intrinsic and Extrinsic Views

While word embeddings are currently predominant for natural language pro...
research
08/25/2019

On Measuring and Mitigating Biased Inferences of Word Embeddings

Word embeddings carry stereotypical connotations from the text they are ...
research
08/31/2016

Hash2Vec, Feature Hashing for Word Embeddings

In this paper we propose the application of feature hashing to create wo...
research
11/03/2017

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number ...
research
11/10/2017

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural l...
research
02/15/2019

Contextual Word Representations: A Contextual Introduction

This introduction aims to tell the story of how we put words into comput...

Please sign up or login with your details

Forgot password? Click here to reset