Gaze Embeddings for Zero-Shot Image Classification

11/28/2016
by   Nour Karessli, et al.
0

Zero-shot image classification using auxiliary information, such as attributes describing discriminative object properties, requires time-consuming annotation by domain experts. We instead propose a method that relies on human gaze as auxiliary information, exploiting that even non-expert users have a natural ability to judge class membership. We present a data collection paradigm that involves a discrimination task to increase the information content obtained from gaze data. Our method extracts discriminative descriptors from the data and learns a compatibility function between image and gaze using three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid (GFG) and Gaze Features with Sequence (GFS). We introduce two new gaze-annotated datasets for fine-grained image classification and show that human gaze data is indeed class discriminative, provides a competitive alternative to expert-annotated attributes, and outperforms other baselines for zero-shot image classification.

READ FULL TEXT

page 1

page 5

page 8

research
12/05/2022

I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification

Recent works have shown that unstructured text (documents) from online s...
research
09/21/2022

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

Despite the tremendous progress in zero-shot learning(ZSL), the majority...
research
02/22/2021

Cognitively Aided Zero-Shot Automatic Essay Grading

Automatic essay grading (AEG) is a process in which machines assign a gr...
research
09/30/2014

Evaluation of Output Embeddings for Fine-Grained Image Classification

Image classification has advanced significantly in recent years with the...
research
11/29/2022

ExpNet: A unified network for Expert-Level Classification

Different from the general visual classification, some classification ta...
research
08/02/2023

More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes

CLIP, as a foundational vision language model, is widely used in zero-sh...
research
10/23/2021

MTGLS: Multi-Task Gaze Estimation with Limited Supervision

Robust gaze estimation is a challenging task, even for deep CNNs, due to...

Please sign up or login with your details

Forgot password? Click here to reset