Semantically Grounded Visual Embeddings for Zero-Shot Learning

01/03/2022
by   Shah Nawaz, et al.
0

Zero-shot learning methods rely on fixed visual and semantic embeddings, extracted from independent vision and language models, both pre-trained for other large-scale tasks. This is a weakness of current zero-shot learning frameworks as such disjoint embeddings fail to adequately associate visual and textual information to their shared semantic content. Therefore, we propose to learn semantically grounded and enriched visual information by computing a joint image and text model with a two-stream network on a proxy task. To improve this alignment between image and textual representations, provided by attributes, we leverage ancillary captions to provide grounded semantic information. Our method, dubbed joint embeddings for zero-shot learning is evaluated on several benchmark datasets, improving the performance of existing state-of-the-art methods in both standard (+1.6% on aPY, +2.6% on FLO) and generalized (+2.1% on AWA2, +2.2% on CUB) zero-shot recognition.

READ FULL TEXT

page 3

page 6

page 7

research
03/20/2022

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

Human-annotated attributes serve as powerful semantic embeddings in zero...
research
06/10/2023

EventCLIP: Adapting CLIP for Event-based Object Recognition

Recent advances in 2D zero-shot and few-shot recognition often leverage ...
research
07/26/2021

Language Models as Zero-shot Visual Semantic Learners

Visual Semantic Embedding (VSE) models, which map images into a rich sem...
research
07/15/2020

Enhancing Generalized Zero-Shot Learning via Adversarial Visual-Semantic Interaction

The performance of generative zero-shot methods mainly depends on the qu...
research
02/27/2015

Probabilistic Zero-shot Classification with Semantic Rankings

In this paper we propose a non-metric ranking-based representation of se...
research
06/28/2023

Is ChatGPT a Biomedical Expert? – Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks

We assessed the performance of commercial Large Language Models (LLMs) G...
research
08/06/2020

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

Zero-shot learning (ZSL) makes object recognition in images possible in ...

Please sign up or login with your details

Forgot password? Click here to reset