Themes Informed Audio-visual Correspondence Learning

09/14/2020
by   Runze Su, et al.
12

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks. Among them, learning the correspondence between audio and visual information from videos is a challenging one. Most previous work of the audio-visual correspondence(AVC) learning only investigated constrained videos or simple settings, which may not fit the application of UGV. In this paper, we proposed new principles for AVC and introduced a new framework to set sight of videos' themes to facilitate AVC learning. We also released the KWAI-AD-AudVis corpus which contained 85432 short advertisement videos (around 913 hours) made by users. We evaluated our proposed approach on this corpus, and it was able to outperform the baseline by 23.15

READ FULL TEXT
05/23/2017

Look, Listen and Learn

We consider the question: what can be learnt by looking at and listening...
10/23/2020

Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

With the rising of short video apps, such as TikTok, Snapchat and Kwai, ...
04/29/2020

VGGSound: A Large-scale Audio-Visual Dataset

Our goal is to collect a large-scale audio-visual dataset with low label...
04/23/2021

The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System

Memories are the tethering threads that tie us to the world, and memorab...
01/26/2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning

Large-scale datasets are the cornerstone of self-supervised representati...
08/23/2022

CrossA11y: Identifying Video Accessibility Issues via Cross-modal Grounding

Authors make their videos visually accessible by adding audio descriptio...
07/03/2018

MediaEval 2018: Predicting Media Memorability Task

In this paper, we present the Predicting Media Memorability task, which ...