Learning deep representation of multityped objects and tasks

03/04/2016
by   Truyen Tran, et al.
0

We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The deep architecture produces more compact representation, naturally integrates multiviews and multimodalities, exploits better side information, and most importantly, performs competitively against baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2016

A Comprehensive Survey on Cross-modal Retrieval

In recent years, cross-modal retrieval has drawn much attention due to t...
research
11/19/2015

Multimodal sparse representation learning and applications

Unsupervised methods have proven effective for discriminative tasks in a...
research
01/10/2022

Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations

In tissue characterization and cancer diagnostics, multimodal imaging ha...
research
02/04/2019

Embodied Multimodal Multitask Learning

Recent efforts on training visual navigation agents conditioned on langu...
research
03/26/2015

Generalized K-fan Multimodal Deep Model with Shared Representations

Multimodal learning with deep Boltzmann machines (DBMs) is an generative...
research
11/21/2018

Learning from Multiview Correlations in Open-Domain Videos

An increasing number of datasets contain multiple views, such as video, ...

Please sign up or login with your details

Forgot password? Click here to reset