comp-syn: Perceptually Grounded Word Embeddings with Color

Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings for over 40,000 English words, each associated with crowd-sourced word concreteness judgments.

READ FULL TEXT
research
02/21/2022

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

Distributional semantic models capture word-level meaning that is useful...
research
05/04/2022

Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem

Word embeddings are one of the most fundamental technologies used in nat...
research
04/18/2020

Effect of Text Color on Word Embeddings

In natural scenes and documents, we can find the correlation between a t...
research
03/30/2020

QRMine: A python package for triangulation in Grounded Theory

Grounded theory (GT) is a qualitative research method for building theor...
research
09/07/2018

Learning Embeddings of Directed Networks with Text-Associated Nodes---with Applications in Software Package Dependency Networks

A network embedding consists of a vector representation for each node in...
research
10/19/2021

Exploring the Sensory Spaces of English Perceptual Verbs in Natural Language Data

In this study, we explore how language captures the meaning of words, in...
research
07/11/2016

Mapping distributional to model-theoretic semantic spaces: a baseline

Word embeddings have been shown to be useful across state-of-the-art sys...

Please sign up or login with your details

Forgot password? Click here to reset