Subspace-based Set Operations on a Pre-trained Word Embedding Space

10/24/2022
by   Yoichi Ishibashi, et al.
0

Word embedding is a fundamental technology in natural language processing. It is often exploited for tasks using sets of words, although standard methods for representing word sets and set operations remain limited. If we can leverage the advantage of word embedding for such set operations, we can calculate sentence similarity and find words that effectively share a concept with a given word set in a straightforward way. In this study, we formulate representations of sets and set operations in a pre-trained word embedding space. Inspired by quantum logic, we propose a novel formulation of set operations using subspaces in a pre-trained word embedding space. Based on our definitions, we propose two metrics based on the degree to which a word belongs to a set and the similarity between embedding two sets. Our experiments with Text Concept Set Retrieval and Semantic Textual Similarity tasks demonstrated the effectiveness of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2018

Word Embedding based on Low-Rank Doubly Stochastic Matrix Decomposition

Word embedding, which encodes words into vectors, is an important starti...
research
04/12/2021

Learning to Remove: Towards Isotropic Pre-trained BERT Embedding

Pre-trained language models such as BERT have become a more common choic...
research
08/20/2019

CatE: Category-Name GuidedWord Embedding

Unsupervised word embedding has benefited a wide spectrum of NLP tasks d...
research
10/28/2019

Cross-Domain Ambiguity Detection using Linear Transformation of Word Embedding Spaces

The requirements engineering process is a crucial stage of the software ...
research
03/29/2021

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Most unsupervised NLP models represent each word with a single point or ...
research
06/24/2021

A comprehensive empirical analysis on cross-domain semantic enrichment for detection of depressive language

We analyze the process of creating word embedding feature representation...
research
06/07/2018

Characterizing Departures from Linearity in Word Translation

We investigate the behavior of maps learned by machine translation metho...

Please sign up or login with your details

Forgot password? Click here to reset