Image and Encoded Text Fusion for Multi-Modal Classification

10/03/2018
by   Ignazio Gallo, et al.
2

Multi-modal approaches employ data from multiple input streams such as textual and visual domains. Deep neural networks have been successfully employed for these approaches. In this paper, we present a novel multi-modal approach that fuses images and text descriptions to improve multi-modal classification performance in real-world scenarios. The proposed approach embeds an encoded text onto an image to obtain an information-enriched image. To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task. We demonstrate how a CNN based pipeline can be used to learn representations of the novel fusion approach. We compare our approach with individual sources on two large-scale multi-modal classification datasets while obtaining encouraging results. Furthermore, we evaluate our approach against two famous multi-modal strategies namely early fusion and late fusion.

READ FULL TEXT

page 4

page 6

page 8

research
02/06/2018

Efficient Large-Scale Multi-Modal Classification

While the incipient internet was largely text-based, the modern digital ...
research
08/31/2018

Seeing Colors: Learning Semantic Text Encoding for Classification

The question we answer with this work is: can we convert a text document...
research
10/13/2020

A Multi-Modal Method for Satire Detection using Textual and Visual Cues

Satire is a form of humorous critique, but it is sometimes misinterprete...
research
10/26/2017

Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis

Pancreatic cancer has the poorest prognosis among all cancer types. Intr...
research
06/24/2022

Multi-modal Sensor Data Fusion for In-situ Classification of Animal Behavior Using Accelerometry and GNSS Data

We examine using data from multiple sensing modes, i.e., accelerometry a...
research
07/28/2019

Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization

This paper studies automated categorization of age-related macular degen...
research
04/26/2020

When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition

Recognizing objects and scenes are two challenging but essential tasks i...

Please sign up or login with your details

Forgot password? Click here to reset