VASR: Visual Analogies of Situation Recognition

12/08/2022
by   Yonatan Bitton, et al.
0

A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label  80 (chance level 25 gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly ( 86 human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 10

page 11

page 12

research
03/13/2023

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Weird, unusual, and uncanny images pique the curiosity of observers beca...
research
08/19/2015

Exploring Metaphorical Senses and Word Representations for Identifying Metonyms

A metonym is a word with a figurative meaning, similar to a metaphor. Be...
research
05/28/2021

What Is Considered Complete for Visual Recognition?

This is an opinion paper. We hope to deliver a key message that current ...
research
06/10/2015

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

While there has been remarkable progress in the performance of visual re...
research
07/02/2023

ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

Situation Recognition is the task of generating a structured summary of ...
research
04/12/2021

Visual Goal-Step Inference using wikiHow

Procedural events can often be thought of as a high level goal composed ...
research
11/16/2016

Fast On-Line Kernel Density Estimation for Active Object Localization

A major goal of computer vision is to enable computers to interpret visu...

Please sign up or login with your details

Forgot password? Click here to reset