Tsetlin Machine Embedding: Representing Words Using Logical Expressions

01/02/2023
by   Bimal Bhattarai, et al.
0

Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.

READ FULL TEXT
research
11/25/2019

hauWE: Hausa Words Embedding for Natural Language Processing

Words embedding (distributed word vector representations) have become an...
research
08/26/2020

Discrete Word Embedding for Logical Natural Language Understanding

In this paper, we propose an unsupervised neural model for learning a di...
research
04/14/2021

Distributed Word Representation in Tsetlin Machine

Tsetlin Machine (TM) is an interpretable pattern recognition algorithm b...
research
06/28/2021

Word2Box: Learning Word Representation Using Box Embeddings

Learning vector representations for words is one of the most fundamental...
research
05/25/2023

Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

Vector space models of word meaning all share the assumption that words ...
research
06/25/2016

Intrinsic Subspace Evaluation of Word Embedding Representations

We introduce a new methodology for intrinsic evaluation of word represen...
research
02/24/2017

Consistent Alignment of Word Embedding Models

Word embedding models offer continuous vector representations that can c...

Please sign up or login with your details

Forgot password? Click here to reset