Reproducing and learning new algebraic operations on word embeddings using genetic programming

02/18/2017
by   Roberto Santana, et al.
0

Word-vector representations associate a high dimensional real-vector to every word from a corpus. Recently, neural-network based methods have been proposed for learning this representation from large corpora. This type of word-to-vector embedding is able to keep, in the learned vector space, some of the syntactic and semantic relationships present in the original word corpus. This, in turn, serves to address different types of language classification tasks by doing algebraic operations defined on the vectors. The general practice is to assume that the semantic relationships between the words can be inferred by the application of a-priori specified algebraic operations. Our general goal in this paper is to show that it is possible to learn methods for word composition in semantic spaces. Instead of expressing the compositional method as an algebraic operation, we will encode it as a program, which can be linear, nonlinear, or involve more intricate expressions. More remarkably, this program will be evolved from a set of initial random programs by means of genetic programming (GP). We show that our method is able to reproduce the same behavior as human-designed algebraic operators. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2020

Generating Sense Embeddings for Syntactic and Semantic Analogy for Portuguese

Word embeddings are numerical vectors which can represent words or conce...
research
06/28/2021

Word2Box: Learning Word Representation Using Box Embeddings

Learning vector representations for words is one of the most fundamental...
research
12/17/2019

Analyzing Structures in the Semantic Vector Space: A Framework for Decomposing Word Embeddings

Word embeddings are rich word representations, which in combination with...
research
02/24/2021

Abelian Neural Networks

We study the problem of modeling a binary operation that satisfies some ...
research
03/18/2018

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in ...
research
11/14/2017

Modeling Semantic Relatedness using Global Relation Vectors

Word embedding models such as GloVe rely on co-occurrence statistics fro...
research
05/11/2017

Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

Solving algebraic word problems requires executing a series of arithmeti...

Please sign up or login with your details

Forgot password? Click here to reset