SemanticAC: Semantics-Assisted Framework for Audio Classification

02/12/2023
by   Yicheng Xiao, et al.
0

In this paper, we propose SemanticAC, a semantics-assisted framework for Audio Classification to better leverage the semantic information. Unlike conventional audio classification methods that treat class labels as discrete vectors, we employ a language model to extract abundant semantics from labels and optimize the semantic consistency between audio signals and their labels. We verify that simple textual information from labels and advanced pretraining models enable more abundant semantic supervision for better performance. Specifically, we design a text encoder to capture the semantic information from the text extension of labels. Then we map the audio signals to align with the semantics of corresponding class labels via an audio encoder and a similarity calculation module so as to enforce the semantic consistency. Extensive experiments on two audio datasets, ESC-50 and US8K demonstrate that our proposed method consistently outperforms the compared audio classification methods.

READ FULL TEXT
research
05/06/2019

Zero-Shot Audio Classification Based on Class Label Embeddings

This paper proposes a zero-shot learning approach for audio classificati...
research
08/22/2023

Furnishing Sound Event Detection with Language Model Abilities

Recently, the ability of language models (LMs) has attracted increasing ...
research
08/21/2022

GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

Recently, neural topic models (NTMs) have been incorporated into pre-tra...
research
06/16/2019

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

We propose 4 insights that help to significantly improve the performance...
research
05/22/2020

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Scene text recognition is a hot research topic in computer vision. Recen...
research
05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...
research
12/16/2020

R^2-Net: Relation of Relation Learning Network for Sentence Semantic Matching

Sentence semantic matching is one of the fundamental tasks in natural la...

Please sign up or login with your details

Forgot password? Click here to reset