Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis

05/28/2021
by   Fan Huang, et al.
0

Nowadays, the videos on the Internet are prevailing. The precise and in-depth understanding of the videos is a difficult but valuable problem for both platforms and researchers. The existing video understand models do well in object recognition tasks but currently still cannot understand the abstract and contextual features like highlight humor frames in comedy videos. The current industrial works are also mainly focused on the basic category classification task based on the appearances of objects. The feature detection methods for the abstract category remains blank. A data structure that includes the information of video frames, audio spectrum and texts provide a new direction to explore. The multimodal models are proposed to make this in-depth video understanding mission possible. In this paper, we analyze the difficulties in abstract understanding of videos and propose a multimodal structure to obtain state-of-the-art performance in this field. Then we select several benchmarks for multimodal video understanding and apply the most suitable model to find the best performance. At last, we evaluate the overall spotlights and drawbacks of the models and methods in this paper and point out the possible directions for further improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

A Multimodal Sentiment Dataset for Video Recommendation

Recently, multimodal sentiment analysis has seen remarkable advance and ...
research
09/19/2023

Language as the Medium: Multimodal Video Classification through text only

Despite an exciting new wave of multimodal machine learning models, curr...
research
07/03/2018

Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

In the last decade, video blogs (vlogs) have become an extremely popular...
research
09/19/2022

MECCANO: A Multimodal Egocentric Dataset for Humans Behavior Understanding in the Industrial-like Domain

Wearable cameras allow to acquire images and videos from the user's pers...
research
04/05/2020

Deep Multimodal Feature Encoding for Video Ordering

True understanding of videos comes from a joint analysis of all its moda...
research
02/11/2021

Audiovisual Highlight Detection in Videos

In this paper, we test the hypothesis that interesting events in unstruc...
research
09/07/2021

Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors

Significant advancements made in the generation of deepfakes have caused...

Please sign up or login with your details

Forgot password? Click here to reset