DeepAI AI Chat
Log In Sign Up

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

09/05/2017
by   Yulei Niu, et al.
Renmin University of China
Queen Mary University of London
Columbia University
0

Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address the first issue, we propose a multi-modal multi-scale deep learning model for extracting descriptive features from multifarious images. Specifically, the visual features extracted by a multi-scale deep learning subnetwork are refined with the textual features extracted from social tags along with images by a simple multi-layer perception subnetwork. Since we have extracted very powerful features by multi-modal multi-scale deep learning, we simplify the second issue and decompose large-scale image annotation into multi-class classification and label quantity prediction. Note that the label quantity prediction subproblem can be implicitly solved when a recurrent neural network (RNN) model is used for image annotation. However, in this paper, we choose to explicitly solve this subproblem directly using our deep learning model, resulting in that we can pay more attention to deep feature learning. Experimental results demonstrate the superior performance of our model as compared to the state-of-the-art (including RNN-based models).

READ FULL TEXT

page 1

page 4

page 6

page 9

05/05/2021

Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation

Recently, referring image segmentation has aroused widespread interest. ...
11/11/2016

Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification

In this paper, we propose a multi-scale deep feature learning method for...
05/30/2018

CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

Diversity of food and its attributes represents the culinary habits of p...
09/09/2020

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

Collecting and accessing a large amount of medical data is very time-con...
01/08/2023

Multi-scale multi-modal micro-expression recognition algorithm based on transformer

A micro-expression is a spontaneous unconscious facial muscle movement t...
08/31/2023

AntM^2C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

Click-through rate (CTR) prediction is a crucial issue in recommendation...
09/06/2019

One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification

Edge sensing with micro-power pulse-Doppler radars is an emergent domain...