Video Highlight Prediction Using Audience Chat Reactions

07/26/2017
by   Cheng-Yang Fu, et al.
0

Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis. We present methods addressing the problem of automatic video highlight prediction based on joint visual features and textual analysis of the real-world audience discourse with complex slang, in both English and traditional Chinese. We present a novel dataset based on League of Legends championships recorded from North American and Taiwanese Twitch.tv channels (will be released for further research), and demonstrate strong results on these using multimodal, character-level CNN-RNN model architectures.

READ FULL TEXT
research
04/06/2019

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

We present a new large-scale multilingual video description dataset, VAT...
research
03/03/2021

A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis

Most recent works on sentiment analysis have exploited the text modality...
research
11/30/2018

Deep Multimodal Learning: An Effective Method for Video Classification

Videos have become ubiquitous on the Internet. And video analysis can pr...
research
11/12/2018

CUNI System for the WMT18 Multimodal Translation Task

We present our submission to the WMT18 Multimodal Translation Task. The ...
research
10/10/2022

Hierarchical3D Adapters for Long Video-to-text Summarization

In this paper, we focus on video-to-text summarization and investigate h...
research
03/25/2020

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

We introduce a new task, Video-and-Language Inference, for joint multimo...
research
05/15/2020

Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching

We propose in this paper an architecture for near-duplicate video detect...

Please sign up or login with your details

Forgot password? Click here to reset