Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

11/29/2022
by   Sushil Thapa, et al.
0

Deep learning has been the subject of growing interest in recent years. Specifically, a specific type called Multimodal learning has shown great promise for solving a wide range of problems in domains such as language, vision, audio, etc. One promising research direction to improve this further has been learning rich and robust low-dimensional data representation of the high-dimensional world with the help of large-scale datasets present on the internet. Because of its potential to avoid the cost of annotating large-scale datasets, self-supervised learning has been the de facto standard for this task in recent years. This paper summarizes some of the landmark research papers that are directly or indirectly responsible to build the foundation of multimodal self-supervised learning of representation today. The paper goes over the development of representation learning over the last few years for each modality and how they were combined to get a multimodal agent later.

READ FULL TEXT

page 4

page 6

research
10/20/2022

A survey on Self Supervised learning approaches for improving Multimodal representation learning

Recently self supervised learning has seen explosive growth and use in v...
research
06/18/2022

Self-Supervised Learning for Videos: A Survey

The remarkable success of deep learning in various domains relies on the...
research
11/07/2022

Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy

In recent years, deep learning has infiltrated every field it has touche...
research
03/09/2023

Robust Holographic mmWave Beamforming by Self-Supervised Hybrid Deep Learning

Beamforming with large-scale antenna arrays has been widely used in rece...
research
08/25/2023

AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

The atmosphere affects humans in a multitude of ways, from loss of life ...
research
11/27/2018

Learning State Representations in Complex Systems with Multimodal Data

Representation learning becomes especially important for complex systems...
research
07/28/2019

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Contact-rich manipulation tasks in unstructured environments often requi...

Please sign up or login with your details

Forgot password? Click here to reset