Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

04/17/2021
by   Yang Yang, et al.
0

Complex objects are usually with multiple labels, and can be represented by multiple modal representations, e.g., the complex articles contain text and image information as well as multiple annotations. Previous methods assume that the homogeneous multi-modal data are consistent, while in real applications, the raw data are disordered, e.g., the article constitutes with variable number of inconsistent text and image instances. Therefore, Multi-modal Multi-instance Multi-label (M3) learning provides a framework for handling such task and has exhibited excellent performance. However, M3 learning is facing two main challenges: 1) how to effectively utilize label correlation; 2) how to take advantage of multi-modal learning to process unlabeled instances. To solve these problems, we first propose a novel Multi-modal Multi-instance Multi-label Deep Network (M3DN), which considers M3 learning in an end-to-end multi-modal deep network and utilizes consistency principle among different modal bag-level predictions. Based on the M3DN, we learn the latent ground label metric with the optimal transport. Moreover, we introduce the extrinsic unlabeled multi-modal multi-instance data, and propose the M3DNS, which considers the instance-level auto-encoder for single modality and modified bag-level optimal transport to strengthen the consistency among modalities. Thereby M3DNS can better predict label and exploit label correlation simultaneously. Experiments on benchmark datasets and real world WKG Game-Hub dataset validate the effectiveness of the proposed methods.

READ FULL TEXT

page 2

page 9

page 11

page 14

research
07/27/2019

Many could be better than all: A novel instance-oriented algorithm for Multi-modal Multi-label problem

With the emergence of diverse data collection techniques, objects in rea...
research
01/15/2022

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify vari...
research
04/21/2018

Multi-modal space structure: a new kind of latent correlation for multi-modal entity resolution

Multi-modal data is becoming more common than before because of big data...
research
06/14/2022

Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous Meme Classification Framework

In this paper we describe our work towards building a generic framework ...
research
05/13/2019

Multi-View Multi-Instance Multi-Label Learning based on Collaborative Matrix Factorization

Multi-view Multi-instance Multi-label Learning(M3L) deals with complex o...
research
06/30/2019

Multi-Label Product Categorization Using Multi-Modal Fusion Models

In this study, we investigated multi-modal approaches using images, desc...
research
06/24/2021

Label Disentanglement in Partition-based Extreme Multilabel Classification

Partition-based methods are increasingly-used in extreme multi-label cla...

Please sign up or login with your details

Forgot password? Click here to reset