LANISTR: Multimodal Learning from Structured and Unstructured Data

05/26/2023
by   Sayna Ebrahimi, et al.
0

Multimodal large-scale pretraining has shown impressive performance gains for unstructured data including language, image, audio, and video. Yet, the scenario most prominent in real-world applications is the existence of combination of structured (including tabular and time-series) and unstructured data, and this has so far been understudied. Towards this end, we propose LANISTR, a novel attention-based framework to learn from LANguage, Image, and STRuctured data. We introduce a new multimodal fusion module with a similarity-based multimodal masking loss that enables LANISTR to learn cross-modal relations from large-scale multimodal data with missing modalities during training and test time. On two publicly available challenging datasets, MIMIC-IV and Amazon Product Review, LANISTR achieves absolute improvements of 6.47 state-of-the-art multimodal models while showing superior generalization capabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2021

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

Multimodal learning is an emerging yet challenging research area. In thi...
research
07/01/2023

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

Multimodal multitask learning has attracted an increasing interest in re...
research
11/07/2022

Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions

Modern Review Helpfulness Prediction systems are dependent upon multiple...
research
07/15/2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Learning multimodal representations involves integrating information fro...
research
05/12/2023

MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition

In this paper, we study a novel problem in egocentric action recognition...
research
09/12/2023

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Leveraging multimodal information from biosignals is vital for building ...
research
06/01/2023

Cross Modal Data Discovery over Structured and Unstructured Data Lakes

Organizations are collecting increasingly large amounts of data for data...

Please sign up or login with your details

Forgot password? Click here to reset