Ambiguous Images With Human Judgments for Robust Visual Event Classification

10/06/2022
by   Kate Sanders, et al.
0

Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100 trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.

READ FULL TEXT

page 2

page 5

page 7

page 15

page 17

page 18

research
10/12/2018

Does Haze Removal Help CNN-based Image Classification?

Hazy images are common in real scenarios and many dehazing methods have ...
research
07/10/2016

Annotation Methodologies for Vision and Language Dataset Creation

Annotated datasets are commonly used in the training and evaluation of t...
research
11/14/2019

Give me (un)certainty – An exploration of parameters that affect segmentation uncertainty

Segmentation tasks in medical imaging are inherently ambiguous: the boun...
research
08/09/2017

WebVision Database: Visual Learning and Understanding from Web Data

In this paper, we present a study on learning visual recognition models ...
research
05/16/2017

WebVision Challenge: Visual Learning and Understanding With Web Data

We present the 2017 WebVision Challenge, a public image recognition chal...
research
08/18/2023

Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of...
research
02/29/2016

Pandora: Description of a Painting Database for Art Movement Recognition with Baselines and Perspectives

To facilitate computer analysis of visual art, in the form of paintings,...

Please sign up or login with your details

Forgot password? Click here to reset