Causally Denoise Word Embeddings Using Half-Sibling Regression

11/24/2019
by   Zekun Yang, et al.
0

Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), which allows us to identify and remove confounding noise contained in word vectors. Compared to previous work, our proposed method has the advantages of interpretability and transparency due to its causal inference grounding. Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2018

Unsupervised Post-processing of Word Vectors via Conceptor Negation

Word vectors are at the core of many natural language processing tasks. ...
research
12/09/2021

Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

With widening deployments of natural language processing (NLP) in daily ...
research
02/19/2018

Learning Word Vectors for 157 Languages

Distributed word representations, or word vectors, have recently been ap...
research
05/11/2020

Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Word embeddings have found their way into a wide range of natural langua...
research
05/21/2018

Aff2Vec: Affect--Enriched Distributional Word Representations

Human communication includes information, opinions, and reactions. React...
research
10/01/2019

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Word embeddings are an essential component in a wide range of natural la...
research
11/10/2017

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural l...

Please sign up or login with your details

Forgot password? Click here to reset