DeepAI AI Chat
Log In Sign Up

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

by   Yulei Niu, et al.
Renmin University of China
Queen Mary University of London
Columbia University

Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address the first issue, we propose a multi-modal multi-scale deep learning model for extracting descriptive features from multifarious images. Specifically, the visual features extracted by a multi-scale deep learning subnetwork are refined with the textual features extracted from social tags along with images by a simple multi-layer perception subnetwork. Since we have extracted very powerful features by multi-modal multi-scale deep learning, we simplify the second issue and decompose large-scale image annotation into multi-class classification and label quantity prediction. Note that the label quantity prediction subproblem can be implicitly solved when a recurrent neural network (RNN) model is used for image annotation. However, in this paper, we choose to explicitly solve this subproblem directly using our deep learning model, resulting in that we can pay more attention to deep feature learning. Experimental results demonstrate the superior performance of our model as compared to the state-of-the-art (including RNN-based models).


page 1

page 4

page 6

page 9


Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation

Recently, referring image segmentation has aroused widespread interest. ...

Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification

In this paper, we propose a multi-scale deep feature learning method for...

CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

Diversity of food and its attributes represents the culinary habits of p...

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

Collecting and accessing a large amount of medical data is very time-con...

Multi-scale multi-modal micro-expression recognition algorithm based on transformer

A micro-expression is a spontaneous unconscious facial muscle movement t...

AntM^2C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

Click-through rate (CTR) prediction is a crucial issue in recommendation...

One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification

Edge sensing with micro-power pulse-Doppler radars is an emergent domain...