HateMM: A Multi-Modal Dataset for Hate Video Classification

by   Mithun Das, et al.

Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate  43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by  5.7 model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.


page 2

page 3

page 7

page 8


Offensive Language and Hate Speech Detection for Danish

The presence of offensive language on social media platforms and the imp...

InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition

Indoor scene recognition is a growing field with great potential for beh...

Misinformation Detection on YouTube Using Video Captions

Millions of people use platforms such as YouTube, Facebook, Twitter, and...

Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

In order to understand content and automatically extract labels for vide...

Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios

The forensic investigation of a terrorist attack poses a significant cha...

Multi-Label Product Categorization Using Multi-Modal Fusion Models

In this study, we investigated multi-modal approaches using images, desc...

Efficient Search of Live-Coding Screencasts from Online Videos

Programming videos on the Internet are valuable resources for learning p...

Please sign up or login with your details

Forgot password? Click here to reset