MemeFier: Dual-stage Modality Fusion for Image Meme Classification

04/06/2023
by   Christos Koutlis, et al.
0

Hate speech is a societal problem that has significantly grown through the Internet. New forms of digital content such as image memes have given rise to spread of hate using multimodal means, being far more difficult to analyse and detect compared to the unimodal case. Accurate automatic processing, analysis and understanding of this kind of content will facilitate the endeavor of hindering hate speech proliferation through the digital world. To this end, we propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes, utilizing a dual-stage modality fusion module. The first fusion stage produces feature vectors containing modality alignment information that captures non-trivial connections between the text and image of a meme. The second fusion stage leverages the power of a Transformer encoder to learn inter-modality correlations at the token level and yield an informative representation. Additionally, we consider external knowledge as an additional input, and background image caption supervision as a regularizing component. Extensive experiments on three widely adopted benchmarks, i.e., Facebook Hateful Memes, Memotion7k and MultiOFF, indicate that our approach competes and in some cases surpasses state-of-the-art. Our code is available on https://github.com/ckoutlis/memefier.

READ FULL TEXT
research
03/20/2023

IMF: Interactive Multimodal Fusion Model for Link Prediction

Link prediction aims to identify potential missing triples in knowledge ...
research
01/29/2022

MVPTR: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment

In this paper, we propose a Multi-stage Vision-language Pre-TRaining (MV...
research
10/05/2019

Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

This work addresses the challenge of hate speech detection in Internet m...
research
08/04/2023

Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

Multi-modality image fusion and segmentation play a vital role in autono...
research
05/04/2022

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

Multimodal Knowledge Graphs (MKGs), which organize visual-text factual k...
research
08/23/2023

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification

Recognizing target objects using an event-based camera draws more and mo...
research
01/08/2023

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

The High-Resolution Transformer (HRFormer) can maintain high-resolution ...

Please sign up or login with your details

Forgot password? Click here to reset