Embedding Words in Non-Vector Space with Unsupervised Graph Learning

10/06/2020
by   Max Ryabinin, et al.
0

It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.

READ FULL TEXT
research
10/15/2018

Poincaré GloVe: Hyperbolic Word Embeddings

Words are not created equal. In fact, they form an aristocratic graph wi...
research
09/07/2021

Learning grounded word meaning representations on similarity graphs

This paper introduces a novel approach to learn visually grounded meanin...
research
02/02/2018

Preserved Structure Across Vector Space Representations

Certain concepts, words, and images are intuitively more similar than ot...
research
10/08/2019

Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Learning useful representations is a key ingredient to the success of mo...
research
10/08/2019

Beyond Vector Spaces: Compact Data Representationas Differentiable Weighted Graphs

Learning useful representations is a key ingredient to the success of mo...
research
11/02/2022

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

Word and graph embeddings are widely used in deep learning applications....
research
04/27/2015

Document Classification by Inversion of Distributed Language Representations

There have been many recent advances in the structure and measurement of...

Please sign up or login with your details

Forgot password? Click here to reset