Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution

12/15/2020
by   Ron Zhu, et al.
0

Hateful meme detection is a new research area recently brought out that requires both visual, linguistic understanding of the meme and some background knowledge to performing well on the task. This technical report summarises the first place solution of the Hateful Meme Detection Challenge 2020, which extending state-of-the-art visual-linguistic transformers to tackle this problem. At the end of the report, we also point out the shortcomings and possible directions for improving the current methodology.

READ FULL TEXT
research
04/04/2022

On Explaining Multimodal Hateful Meme Detection Models

Hateful meme detection is a new multimodal task that has gained signific...
research
06/28/2023

The 2nd Place Solution for 2023 Waymo Open Sim Agents Challenge

In this technical report, we present the 2nd place solution of 2023 Waym...
research
02/28/2021

RuSentEval: Linguistic Source, Encoder Force!

The success of pre-trained transformer language models has brought a gre...
research
09/09/2021

TxT: Crossmodal End-to-End Learning with Transformers

Reasoning over multiple modalities, e.g. in Visual Question Answering (V...
research
09/21/2022

Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos

During recent years transformers architectures have been growing in popu...
research
10/05/2020

Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

This paper aims to bring a new lightweight yet powerful solution for the...
research
05/05/2021

Visual Composite Set Detection Using Part-and-Sum Transformers

Computer vision applications such as visual relationship detection and h...

Please sign up or login with your details

Forgot password? Click here to reset