A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

09/13/2014
by   Yin Zheng, et al.
0

Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.

READ FULL TEXT

page 15

page 16

page 20

page 23

research
05/23/2013

A Supervised Neural Autoregressive Topic Model for Simultaneous Image Classification and Annotation

Topic modeling based on latent Dirichlet allocation (LDA) has been a fra...
research
08/11/2018

Document Informed Neural Autoregressive Topic Models

Context information around words helps in determining their actual meani...
research
03/18/2016

Document Neural Autoregressive Distribution Estimation

We present an approach based on feed-forward neural networks for learnin...
research
09/15/2018

Document Informed Neural Autoregressive Topic Models with Distributional Prior

We address two challenges in topic models: (1) Context information aroun...
research
05/14/2017

Machine learning methods for multimedia information retrieval

In this thesis we examined several multimodal feature extraction and lea...
research
07/04/2012

Mining Associated Text and Images with Dual-Wing Harmoniums

We propose a multi-wing harmonium model for mining multimedia data that ...
research
03/28/2013

Scalable Text and Link Analysis with Mixed-Topic Link Models

Many data sets contain rich information about objects, as well as pairwi...

Please sign up or login with your details

Forgot password? Click here to reset