DeepAI AI Chat
Log In Sign Up

Image and Encoded Text Fusion for Multi-Modal Classification

10/03/2018
by   Ignazio Gallo, et al.
Uninsubria
2

Multi-modal approaches employ data from multiple input streams such as textual and visual domains. Deep neural networks have been successfully employed for these approaches. In this paper, we present a novel multi-modal approach that fuses images and text descriptions to improve multi-modal classification performance in real-world scenarios. The proposed approach embeds an encoded text onto an image to obtain an information-enriched image. To learn feature representations of resulting images, standard Convolutional Neural Networks (CNNs) are employed for the classification task. We demonstrate how a CNN based pipeline can be used to learn representations of the novel fusion approach. We compare our approach with individual sources on two large-scale multi-modal classification datasets while obtaining encouraging results. Furthermore, we evaluate our approach against two famous multi-modal strategies namely early fusion and late fusion.

READ FULL TEXT

page 4

page 6

page 8

02/06/2018

Efficient Large-Scale Multi-Modal Classification

While the incipient internet was largely text-based, the modern digital ...
08/31/2018

Seeing Colors: Learning Semantic Text Encoding for Classification

The question we answer with this work is: can we convert a text document...
10/13/2020

A Multi-Modal Method for Satire Detection using Textual and Visual Cues

Satire is a form of humorous critique, but it is sometimes misinterprete...
05/17/2021

Multi-modal Visual Place Recognition in Dynamics-Invariant Perception Space

Visual place recognition is one of the essential and challenging problem...
06/24/2022

Multi-modal Sensor Data Fusion for In-situ Classification of Animal Behavior Using Accelerometry and GNSS Data

We examine using data from multiple sensing modes, i.e., accelerometry a...
07/28/2019

Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization

This paper studies automated categorization of age-related macular degen...