A Universal Model for Cross Modality Mapping by Relational Reasoning

02/26/2021
by   Zun Li, et al.
6

With the aim of matching a pair of instances from two different modalities, cross modality mapping has attracted growing attention in the computer vision community. Existing methods usually formulate the mapping function as the similarity measure between the pair of instance features, which are embedded to a common space. However, we observe that the relationships among the instances within a single modality (intra relations) and those between the pair of heterogeneous instances (inter relations) are insufficiently explored in previous approaches. Motivated by this, we redefine the mapping function with relational reasoning via graph modeling, and further propose a GCN-based Relational Reasoning Network (RR-Net) in which inter and intra relations are efficiently computed to universally resolve the cross modality mapping problem. Concretely, we first construct two kinds of graph, i.e., Intra Graph and Inter Graph, to respectively model intra relations and inter relations. Then RR-Net updates all the node features and edge features in an iterative manner for learning intra and inter relations simultaneously. Last, RR-Net outputs the probabilities over the edges which link a pair of heterogeneous instances to estimate the mapping results. Extensive experiments on three example tasks, i.e., image classification, social recommendation and sound recognition, clearly demonstrate the superiority and universality of our proposed model.

READ FULL TEXT
research
09/14/2021

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

A key solution to temporal sentence grounding (TSG) exists in how to lea...
research
05/24/2022

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately...
research
11/26/2021

Neural Collaborative Graph Machines for Table Structure Recognition

Recently, table structure recognition has achieved impressive progress w...
research
09/19/2019

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

Multimodal datasets contain an enormous amount of relational information...
research
04/22/2020

Graph-based Kinship Reasoning Network

In this paper, we propose a graph-based kinship reasoning (GKR) network ...
research
08/30/2023

Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems

In sequential recommendation, multi-modal information (e.g., text or ima...
research
12/24/2022

MURPHY: Relations Matter in Surgical Workflow Analysis

Autonomous robotic surgery has advanced significantly based on analysis ...

Please sign up or login with your details

Forgot password? Click here to reset