Quantifying the visual concreteness of words and topics in multimodal datasets

04/18/2018
by   Jack Hessel, et al.
0

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

READ FULL TEXT

page 4

page 5

page 13

research
09/11/2018

Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset

In this paper we introduce vSTS, a new dataset for measuring textual sim...
research
06/30/2022

Visual grounding of abstract and concrete words: A response to Günther et al. (2020)

Current computational models capturing words' meaning mostly rely on tex...
research
04/04/2020

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

The combination of visual and textual representations has produced excel...
research
10/03/2022

Theme and Topic: How Qualitative Research and Topic Modeling Can Be Brought Together

Qualitative research is an approach to understanding social phenomenon b...
research
02/06/2023

MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

Multimodal learning has attracted the interest of the machine learning c...
research
10/14/2021

Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames

Social concepts referring to non-physical objects–such as revolution, vi...
research
04/16/2019

Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents

Images and text co-occur everywhere on the web, but explicit links betwe...

Please sign up or login with your details

Forgot password? Click here to reset