Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions

05/28/2021
by   Shanshan Wang, et al.
0

This paper presents the details of the Audio-Visual Scene Classification task in the DCASE 2021 Challenge (Task 1 Subtask B). The task is concerned with classification using audio and video modalities, using a dataset of synchronized recordings. Here we describe the datasets and baseline systems. After the challenge submission deadline, challenge results and analysis of the submissions will be added.

READ FULL TEXT

page 1

page 2

page 3

research
06/12/2021

Deep Learning Frameworks Applied For Audio-Visual Scene Classification

In this paper, we present deep learning frameworks for audio-visual scen...
research
07/28/2021

Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification

The use of multiple and semantically correlated sources can provide comp...
research
08/02/2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

This paper presents the method that underlies our submission to the untr...
research
03/07/2022

A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification

In this paper, we propose two techniques, namely joint modeling and data...
research
08/20/2021

Video Ads Content Structuring by Combining Scene Confidence Prediction and Tagging

Video ads segmentation and tagging is a challenging task due to two main...
research
09/02/2021

Binaural Audio Generation via Multi-task Learning

We present a learning-based approach for generating binaural audio from ...
research
12/16/2021

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

This paper presents a task of audio-visual scene classification (SC) whe...

Please sign up or login with your details

Forgot password? Click here to reset