Efficient Large-Scale Multi-Modal Classification

02/06/2018
by   D. Kiela, et al.
0

While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2018

Image and Encoded Text Fusion for Multi-Modal Classification

Multi-modal approaches employ data from multiple input streams such as t...
research
11/29/2016

Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

Classifying products into categories precisely and efficiently is a majo...
research
06/24/2022

Multi-modal Sensor Data Fusion for In-situ Classification of Animal Behavior Using Accelerometry and GNSS Data

We examine using data from multiple sensing modes, i.e., accelerometry a...
research
04/17/2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study

Most of the existing work in one-stage referring expression comprehensio...
research
10/26/2018

Investigating non-classical correlations between decision fused multi-modal documents

Correlation has been widely used to facilitate various information retri...
research
12/15/2020

QUARC: Quaternion Multi-Modal Fusion Architecture For Hate Speech Classification

Hate speech, quite common in the age of social media, at times harmless ...

Please sign up or login with your details

Forgot password? Click here to reset