Single-branch Network for Multimodal Training

03/10/2023
by   Muhammad Saad Saeed, et al.
0

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed single-branch network on the challenging multimodal problem (face-voice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet

READ FULL TEXT
research
09/18/2019

Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals

We propose a novel deep training algorithm for joint representation of a...
research
08/22/2022

Learning Branched Fusion and Orthogonal Projection for Face-Voice Association

Recent years have seen an increased interest in establishing association...
research
11/19/2015

Multimodal sparse representation learning and applications

Unsupervised methods have proven effective for discriminative tasks in a...
research
09/03/2019

Do Cross Modal Systems Leverage Semantic Relationships?

Current cross-modal retrieval systems are evaluated using R@K measure wh...
research
07/09/2020

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

Violence detection has been studied in computer vision for years. Howeve...
research
12/20/2021

Fusion and Orthogonal Projection for Improved Face-Voice Association

We study the problem of learning association between face and voice, whi...
research
01/06/2020

Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia

Objective: Multimodal measurements of the same phenomena provide complem...

Please sign up or login with your details

Forgot password? Click here to reset