Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora

10/15/2019
by   Marius Cătălin Iordan, et al.
0

Understanding how human semantic knowledge is organized and how people use it to judge fundamental relationships, such as similarity between concepts, has proven difficult. Theoretical models have consistently failed to provide accurate predictions of human judgments, as has the application of machine learning algorithms to large-scale, text-based corpora (embedding spaces). Based on the hypothesis that context plays a critical role in human cognition, we show that generating embedding spaces using contextually-constrained text corpora greatly improves their ability to predict human judgments. Additionally, we introduce a novel context-based method for extracting interpretable feature information (e.g., size) from embedding spaces. Our findings suggest that contextually-constraining large-scale text corpora, coupled with applying state-of-the-art machine learning algorithms, may improve the correspondence between representations derived using such methods and those underlying human semantic structure. This promises to provide novel insight into human similarity judgments and designing algorithms that can interact effectively with human semantic knowledge.

READ FULL TEXT

page 6

page 14

page 15

page 32

page 33

page 34

page 35

page 38

research
09/07/2018

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Distributional models provide a convenient way to model semantics using ...
research
06/19/2019

Considerations for the Interpretation of Bias Measures of Word Embeddings

Word embedding spaces are powerful tools for capturing latent semantic r...
research
09/21/2017

Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness

Estimation of semantic similarity and relatedness between biomedical con...
research
12/23/2018

Improving Context-Aware Semantic Relationships in Sparse Mobile Datasets

Traditional semantic similarity models often fail to encapsulate the ext...
research
07/17/2023

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

Topic models are a class of unsupervised learning algorithms for detecti...
research
06/22/2020

Understanding Object Affordances Through Verb Usage Patterns

In order to interact with objects in our environment, we rely on an unde...
research
04/02/2019

Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection

Irony and sarcasm are two complex linguistic phenomena that are widely u...

Please sign up or login with your details

Forgot password? Click here to reset