AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

07/14/2023
by   Kin Wai Lau, et al.
0

This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43 test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...
research
03/05/2021

Slow-Fast Auditory Streams For Audio Recognition

We propose a two-stream convolutional network for audio recognition, tha...
research
06/21/2019

The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering

This article explains how to apply time-frequency scattering, a convolut...
research
06/15/2023

Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023

In this report, we describe the technical details of our submission to t...
research
06/09/2023

Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network

Domestic activities classification (DAC) from audio recordings aims at c...
research
08/04/2022

Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

Automatic estimation of domestic activities from audio can be used to so...

Please sign up or login with your details

Forgot password? Click here to reset