Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

09/05/2017
by   Yulei Niu, et al.
0

Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address the first issue, we propose a multi-modal multi-scale deep learning model for extracting descriptive features from multifarious images. Specifically, the visual features extracted by a multi-scale deep learning subnetwork are refined with the textual features extracted from social tags along with images by a simple multi-layer perception subnetwork. Since we have extracted very powerful features by multi-modal multi-scale deep learning, we simplify the second issue and decompose large-scale image annotation into multi-class classification and label quantity prediction. Note that the label quantity prediction subproblem can be implicitly solved when a recurrent neural network (RNN) model is used for image annotation. However, in this paper, we choose to explicitly solve this subproblem directly using our deep learning model, resulting in that we can pay more attention to deep feature learning. Experimental results demonstrate the superior performance of our model as compared to the state-of-the-art (including RNN-based models).

READ FULL TEXT

page 1

page 4

page 6

page 9

research
05/05/2021

Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation

Recently, referring image segmentation has aroused widespread interest. ...
research
11/11/2016

Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification

In this paper, we propose a multi-scale deep feature learning method for...
research
05/30/2018

CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

Diversity of food and its attributes represents the culinary habits of p...
research
08/10/2017

Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering

Visual question answering (VQA) is challenging because it requires a sim...
research
09/09/2020

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

Collecting and accessing a large amount of medical data is very time-con...
research
01/08/2023

Multi-scale multi-modal micro-expression recognition algorithm based on transformer

A micro-expression is a spontaneous unconscious facial muscle movement t...
research
09/06/2019

One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification

Edge sensing with micro-power pulse-Doppler radars is an emergent domain...

Please sign up or login with your details

Forgot password? Click here to reset