Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

05/02/2015
by   Tejaswi Nimmagadda, et al.
0

Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Object classification is carried out via inference on the learnt conditional tree model, and we obtain significant gain in precision-recall and F-measures on MS-COCO, especially for difficult object categories. Moreover, the latent variables in the CLTM capture scene information: the images with top activations for a latent node have common themes such as being a grasslands or a food scene, and on on. In addition, we show that a simple k-means clustering of the inferred latent nodes alone significantly improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining, and without using scene labels during training. Thus, we present a unified framework for multi-object classification and unsupervised scene understanding.

READ FULL TEXT
research
03/14/2022

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Self-supervised learning (SSL) holds promise in leveraging large amounts...
research
03/22/2020

HDF: Hybrid Deep Features for Scene Image Representation

Nowadays it is prevalent to take features extracted from pre-trained dee...
research
07/23/2019

Not Only Look But Observe: Variational Observation Model of Scene-Level 3D Multi-Object Understanding for Probabilistic SLAM

We present NOLBO, a variational observation model estimation for 3D mult...
research
06/05/2020

Scene Image Representation by Foreground, Background and Hybrid Features

Previous methods for representing scene images based on deep learning pr...
research
05/27/2018

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

This paper proposes the decision tree latent controller generative adver...
research
03/16/2018

Learning Sparse Deep Feedforward Networks via Tree Skeleton Expansion

Despite the popularity of deep learning, structure learning for deep mod...
research
05/26/2020

Learning Robust Feature Representations for Scene Text Detection

Scene text detection based on deep neural networks have progressed subst...

Please sign up or login with your details

Forgot password? Click here to reset