Exploiting Temporal Coherence for Multi-modal Video Categorization

02/07/2020
by   Palash Goyal, et al.
0

Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.

READ FULL TEXT
research
03/07/2020

Cross-modal Learning for Multi-modal Video Categorization

Multi-modal machine learning (ML) models can process data in multiple mo...
research
09/12/2023

DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention

With the rise in manipulated media, deepfake detection has become an imp...
research
06/15/2023

Multi-modal Hate Speech Detection using Machine Learning

With the continuous growth of internet users and media content, it is ve...
research
10/26/2021

Leveraging Local Temporal Information for Multimodal Scene Classification

Robust video scene classification models should capture the spatial (pix...
research
07/28/2019

Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization

This paper studies automated categorization of age-related macular degen...
research
03/28/2018

Topic Modeling Based Multi-modal Depression Detection

Major depressive disorder is a common mental disorder that affects almos...
research
06/07/2023

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Multimodal summarization with multimodal output (MSMO) has emerged as a ...

Please sign up or login with your details

Forgot password? Click here to reset