OvarNet: Towards Open-vocabulary Object Attribute Recognition

01/23/2023
by   Keyan Chen, et al.
6

In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. To achieve this goal, we make the following contributions: (i) we start with a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr. The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes; (ii) we combine all available datasets and train with a federated strategy to finetune the CLIP model, aligning the visual representation with attributes, additionally, we investigate the efficacy of leveraging freely available online image-caption pairs under weakly supervised learning; (iii) in pursuit of efficiency, we train a Faster-RCNN type model end-to-end with knowledge distillation, that performs class-agnostic object proposals and classification on semantic categories and attributes with classifiers generated from a text encoder; Finally, (iv) we conduct extensive experiments on VAW, MS-COCO, LSA, and OVAD datasets, and show that recognition of semantic category and attributes is complementary for visual scene understanding, i.e., jointly training object detection and attributes prediction largely outperform existing approaches that treat the two tasks independently, demonstrating strong generalization ability to novel attributes and categories.

READ FULL TEXT

page 1

page 8

page 15

page 16

research
03/20/2022

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

Open-vocabulary object detection aims to detect novel object categories ...
research
08/25/2021

Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction

We explore object detection with two attributes: color and material. The...
research
08/15/2021

Learning Open-World Object Proposals without Learning to Classify

Object proposals have become an integral preprocessing steps of many vis...
research
08/08/2017

Weakly Supervised Image Annotation and Segmentation with Objects and Attributes

We propose to model complex visual scenes using a non-parametric Bayesia...
research
08/31/2023

Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Open-vocabulary semantic segmentation is a challenging task that require...
research
11/23/2022

Open-vocabulary Attribute Detection

Vision-language modeling has enabled open-vocabulary tasks where predict...
research
12/23/2022

Learning to Detect and Segment for Open Vocabulary Object Detection

Open vocabulary object detection has been greatly advanced by the recent...

Please sign up or login with your details

Forgot password? Click here to reset