Human-in-the-Loop Refinement of Word Embeddings

10/06/2021
by   James Powell, et al.
0

Word embeddings are a fixed, distributional representation of the context of words in a corpus learned from word co-occurrences. Despite their proven utility in machine learning tasks, word embedding models may capture uneven semantic and syntactic representations, and can inadvertently reflect various kinds of bias present within corpora upon which they were trained. It has been demonstrated that post-processing of word embeddings to apply information found in lexical dictionaries can improve the semantic associations, thus improving their quality. Building on this idea, we propose a system that incorporates an adaptation of word embedding post-processing, which we call "interactive refitting", to address some of the most daunting qualitative problems found in word embeddings. Our approach allows a human to identify and address potential quality issues with word embeddings interactively. This has the advantage of negating the question of who decides what constitutes bias or what other quality issues may affect downstream tasks. It allows each organization or entity to address concerns they may have at a fine grained level and to do so in an iterative and interactive fashion. It also allows for better insight into what effect word embeddings, and refinements to word embeddings, have on machine learning pipelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Interactive Re-Fitting as a Technique for Improving Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
05/18/2021

Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings

It is well-known that typical word embedding methods such as Word2Vec an...
research
12/14/2020

Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings

Static word embeddings encode word associations, extensively utilized in...
research
08/11/2022

Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic

Proponents of the Distributed Morphology framework have posited the exis...
research
11/25/2017

Experiential, Distributional and Dependency-based Word Embeddings have Complementary Roles in Decoding Brain Activity

We evaluate 8 different word embedding models on their usefulness for pr...
research
05/21/2020

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical ...
research
06/04/2019

Tracing Antisemitic Language Through Diachronic Embedding Projections: France 1789-1914

We investigate some aspects of the history of antisemitism in France, on...

Please sign up or login with your details

Forgot password? Click here to reset