Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

11/12/2018
by   Victor Prokhorov, et al.
2

Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and in-vivo, in two downstream text classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2017

Learning Rare Word Representations using Semantic Bridging

We propose a methodology that adapts graph embedding techniques (DeepWal...
research
12/22/2018

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Text classification must sometimes be applied in situations with no trai...
research
05/08/2018

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources

Word vector specialisation (also known as retrofitting) is a portable, l...
research
09/11/2018

Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization

Semantic specialization is the process of fine-tuning pre-trained distri...
research
10/24/2019

Wasserstein distances for evaluating cross-lingual embeddings

Word embeddings are high dimensional vector representations of words tha...
research
02/16/2018

Deep Generative Model for Joint Alignment and Word Representation

This work exploits translation data as a source of semantically relevant...
research
10/06/2021

A Fast Randomized Algorithm for Massive Text Normalization

Many popular machine learning techniques in natural language processing ...

Please sign up or login with your details

Forgot password? Click here to reset