Word2Box: Learning Word Representation Using Box Embeddings

06/28/2021
by   Shib Sankar Dasgupta, et al.
21

Learning vector representations for words is one of the most fundamental topics in NLP, capable of capturing syntactic and semantic relationships useful in a variety of downstream NLP tasks. Vector representations can be limiting, however, in that typical scoring such as dot product similarity intertwines position and magnitude of the vector in space. Exciting innovations in the space of representation learning have proposed alternative fundamental representations, such as distributions, hyperbolic vectors, or regions. Our model, Word2Box, takes a region-based approach to the problem of word representation, representing words as n-dimensional rectangles. These representations encode position and breadth independently and provide additional geometric operations such as intersection and containment which allow them to model co-occurrence patterns vectors struggle with. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a qualitative analysis exploring the additional unique expressivity provided by Word2Box.

READ FULL TEXT
research
01/21/2020

Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese

Word embeddings are numerical vectors which can represent words or conce...
research
09/10/2021

Box Embeddings: An open-source library for representation learning using geometric structures

A major factor contributing to the success of modern representation lear...
research
02/18/2017

Reproducing and learning new algebraic operations on word embeddings using genetic programming

Word-vector representations associate a high dimensional real-vector to ...
research
10/06/2020

Are "Undocumented Workers" the Same as "Illegal Aliens"? Disentangling Denotation and Connotation in Vector Spaces

In politics, neologisms are frequently invented for partisan objectives....
research
01/31/2019

Learning Taxonomies of Concepts and not Words using Contextualized Word Representations: A Position Paper

Taxonomies are semantic hierarchies of concepts. One limitation of curre...
research
11/08/2018

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

Word vector representations are a crucial part of Natural Language Proce...
research
01/02/2023

Tsetlin Machine Embedding: Representing Words Using Logical Expressions

Embedding words in vector space is a fundamental first step in state-of-...

Please sign up or login with your details

Forgot password? Click here to reset