A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

05/14/2018
by   Mikhail Khodak, et al.
0

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2017

Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some wor...
research
11/09/2018

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Word embeddings are a key component of high-performing natural language ...
research
10/18/2019

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from ...
research
07/08/2017

Efficient Vector Representation for Documents through Corruption

We present an efficient document representation learning framework, Docu...
research
07/13/2017

Learning Features from Co-occurrences: A Theoretical Analysis

Representing a word by its co-occurrences with other words in context is...
research
11/30/2022

Generalised Spherical Text Embedding

This paper aims to provide an unsupervised modelling approach that allow...
research
02/28/2022

Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches

Feature embedding methods have been proposed in literature to represent ...

Please sign up or login with your details

Forgot password? Click here to reset