Detect Only What You Specify : Object Detection with Linguistic Target

11/18/2022
by   Moyuru Yamada, et al.
0

Object detection is a computer vision task of predicting a set of bounding boxes and category labels for each object of interest in a given image. The category is related to a linguistic symbol such as 'dog' or 'person' and there should be relationships among them. However the object detector only learns to classify the categories and does not treat them as the linguistic symbols. Multi-modal models often use the pre-trained object detector to extract object features from the image, but the models are separated from the detector and the extracted visual features does not change with their linguistic input. We rethink the object detection as a vision-and-language reasoning task. We then propose targeted detection task, where detection targets are given by a natural language and the goal of the task is to detect only all the target objects in a given image. There are no detection if the target is not given. Commonly used modern object detectors have many hand-designed components like anchor and it is difficult to fuse the textual inputs into the complex pipeline. We thus propose Language-Targeted Detector (LTD) for the targeted detection based on a recently proposed Transformer-based detector. LTD is a encoder-decoder architecture and our conditional decoder allows the model to reason about the encoded image with the textual input as the linguistic context. We evaluate detection performances of LTD on COCO object detection dataset and also show that our model improves the detection results with the textual input grounding to the visual object.

READ FULL TEXT

page 1

page 5

page 7

page 8

page 12

research
03/02/2020

Plug Play Convolutional Regression Tracker for Video Object Detection

Video object detection targets to simultaneously localize the bounding b...
research
04/03/2023

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

The goal of open-vocabulary detection is to identify novel objects based...
research
05/30/2023

Multi-modal Queried Object Detection in the Wild

We introduce MQ-Det, an efficient architecture and pre-training strategy...
research
12/09/2020

Hateful Memes Detection via Complementary Visual and Linguistic Networks

Hateful memes are widespread in social media and convey negative informa...
research
05/30/2018

Visual Referring Expression Recognition: What Do Systems Actually Learn?

We present an empirical analysis of the state-of-the-art systems for ref...
research
03/22/2022

Open-Vocabulary DETR with Conditional Matching

Open-vocabulary object detection, which is concerned with the problem of...
research
06/26/2020

Expandable YOLO: 3D Object Detection from RGB-D Images

This paper aims at constructing a light-weight object detector that inpu...

Please sign up or login with your details

Forgot password? Click here to reset