Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

03/29/2021
by   Haw-Shiuan Chang, et al.
18

Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of multi-mode codebook embeddings to capture different semantic facets of its meaning. The codebook embeddings can be viewed as the cluster centers which summarize the distribution of possibly co-occurring words in a pre-trained word embedding space. We introduce an end-to-end trainable neural model that directly predicts the set of cluster centers from the input text sequence during test time. Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks. In phrase similarity experiments, we discover that the multi-facet embeddings provide an interpretable semantic representation but do not outperform the single-facet baseline.

READ FULL TEXT

page 3

page 19

page 20

research
06/09/2019

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings

Word embeddings typically represent different meanings of a word in a si...
research
10/24/2022

Subspace-based Set Operations on a Pre-trained Word Embedding Space

Word embedding is a fundamental technology in natural language processin...
research
11/09/2019

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

We address the task of unsupervised Semantic Textual Similarity (STS) by...
research
04/02/2016

Discriminative Phrase Embedding for Paraphrase Identification

This work, concerning paraphrase identification task, on one hand contri...
research
01/03/2019

Feature reinforcement with word embedding and parsing information in neural TTS

In this paper, we propose a feature reinforcement method under the seque...
research
04/14/2023

SimpLex: a lexical text simplification architecture

Text simplification (TS) is the process of generating easy-to-understand...
research
02/26/2019

Semantic Hilbert Space for Text Representation Learning

Capturing the meaning of sentences has long been a challenging task. Cur...

Please sign up or login with your details

Forgot password? Click here to reset