Detection and Captioning with Unseen Object Classes

08/13/2021
by   Berkan Demirel, et al.
1

Image caption generation is one of the most challenging problems at the intersection of visual recognition and natural language modeling domains. In this work, we propose and study a practically important variant of this problem where test images may contain visual objects with no corresponding visual or textual training examples. For this problem, we propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model. In order to improve the detection component, we jointly define a class-to-class similarity based class representation and a practical score calibration mechanism. We also propose a novel evaluation metric that provides complimentary insights to the captioning outputs, by separately handling the visual and non-visual components of the captions. Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset and the zero-shot captioning approach yields promising results.

READ FULL TEXT

page 1

page 2

page 8

page 10

page 11

page 12

research
07/31/2019

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at t...
research
01/22/2022

Visual Information Guided Zero-Shot Paraphrase Generation

Zero-shot paraphrase generation has drawn much attention as the large-sc...
research
05/16/2018

Zero-Shot Object Detection by Hybrid Region Embedding

Object detection is considered as one of the most challenging problems i...
research
04/11/2018

Decoupled Novel Object Captioner

Image captioning is a challenging task where the machine automatically d...
research
10/07/2020

Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations

Stance detection is an important component of understanding hidden influ...
research
03/02/2023

X Fuse: Fusing Visual Information in Text-to-Image Generation

We introduce X Fuse, a general approach for conditioning on visual inf...
research
04/06/2023

DoUnseen: Zero-Shot Object Detection for Robotic Grasping

How can we segment varying numbers of objects where each specific object...

Please sign up or login with your details

Forgot password? Click here to reset