Learning from Multiview Correlations in Open-Domain Videos

11/21/2018
by   Nils Holzenberger, et al.
0

An increasing number of datasets contain multiple views, such as video, sound and automatic captions. A basic challenge in representation learning is how to leverage multiple views to learn better representations. This is further complicated by the existence of a latent alignment between views, such as between speech and its transcription, and by the multitude of choices for the learning objective. We explore an advanced, correlation-based representation learning method on a 4-way parallel, multimodal dataset, and assess the quality of the learned representations on retrieval-based tasks. We show that the proposed approach produces rich representations that capture most of the information shared across views. Our best models for speech and textual modalities achieve retrieval rates from 70.7 user-generated instructional videos. This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single on...
research
10/13/2015

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Recently there has been a lot of interest in learning common representat...
research
02/07/2022

GMC – Geometric Multimodal Contrastive Representation Learning

Learning representations of multimodal data that are both informative an...
research
10/14/2020

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Many recent methods for unsupervised representation learning involve tra...
research
08/16/2016

Application of multiview techniques to NHANES dataset

Disease prediction or classification using health datasets involve using...
research
04/03/2019

Multimodal Representation Learning using Deep Multiset Canonical Correlation

We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an ex...
research
03/04/2016

Learning deep representation of multityped objects and tasks

We introduce a deep multitask architecture to integrate multityped repre...

Please sign up or login with your details

Forgot password? Click here to reset