Generalised Spherical Text Embedding

11/30/2022
by   Souvik Banerjee, et al.
0

This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual similarity benchmark tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2019

Spherical Text Embedding

Unsupervised text embedding has shown great power in a wide range of NLP...
research
10/30/2018

Word Mover's Embedding: From Word2Vec to Document Embedding

While the celebrated Word2Vec technique yields semantically rich represe...
research
11/01/2021

Domain-adaptation of spherical embeddings

Domain adaptation of embedding models, updating a generic embedding to t...
research
11/06/2018

Semantic Term "Blurring" and Stochastic "Barcoding" for Improved Unsupervised Text Classification

The abundance of text data being produced in the modern age makes it inc...
research
05/14/2018

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Motivations like domain adaptation, transfer learning, and feature learn...
research
10/13/2022

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a...
research
05/14/2021

Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification

Recent studies on neural networks with pre-trained weights (i.e., BERT) ...

Please sign up or login with your details

Forgot password? Click here to reset