Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

06/03/2020
by   Semih Kaya, et al.
0

While many approaches exist in the literature to learn representations for data collections in multiple modalities, the generalizability of the learnt representations to previously unseen data is a largely overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

READ FULL TEXT
research
05/02/2023

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i.e. learned representations) of multi-modal d...
research
03/07/2020

Cross-modal Learning for Multi-modal Video Categorization

Multi-modal machine learning (ML) models can process data in multiple mo...
research
04/26/2021

Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data

This paper studies the problem of novel category discovery on single- an...
research
10/19/2017

Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings

The recovery of the intrinsic geometric structures of data collections i...
research
07/23/2020

METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams

Many learning tasks involve multi-modal data streams, where continuous d...
research
03/08/2022

Multi-Modal Mixup for Robust Fine-tuning

Pre-trained large-scale models provide a transferable embedding, and the...
research
03/08/2021

Self-Augmented Multi-Modal Feature Embedding

Oftentimes, patterns can be represented through different modalities. Fo...

Please sign up or login with your details

Forgot password? Click here to reset