Understanding Dark Scenes by Contrasting Multi-Modal Observations

08/23/2023
by   Xiaoyu Dong, et al.
0

Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be together or apart. We validate our approach on a variety of tasks that cover diverse light conditions and image modalities. Experiments show that our approach can effectively enhance dark scene understanding based on multi-modal images with limited semantics by shaping semantic-discriminative feature spaces. Comparisons with previous methods demonstrate our state-of-the-art performance. Code and pretrained models are available at https://github.com/palmdong/SMMCL.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

page 8

research
03/07/2020

Cross-modal Learning for Multi-modal Video Categorization

Multi-modal machine learning (ML) models can process data in multiple mo...
research
05/18/2023

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

In this work, we explore a scalable way for building a general represent...
research
03/28/2020

Semantically Multi-modal Image Synthesis

In this paper, we focus on semantically multi-modal image synthesis (SMI...
research
03/28/2020

Semantically Mutil-modal Image Synthesis

In this paper, we focus on semantically multi-modal image synthesis (SMI...
research
07/21/2022

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

Point clouds and RGB images are two general perceptional sources in auto...
research
09/09/2023

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Multi-modal keyphrase generation aims to produce a set of keyphrases tha...
research
11/18/2020

Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) systems can infer t...

Please sign up or login with your details

Forgot password? Click here to reset