Audio Language Modeling using Perceptually-Guided Discrete Representations

11/02/2022
by   Felix Kreuk, et al.
0

In this work, we study the task of Audio Language Modeling, in which we aim at learning probabilistic models for audio that can be used for generation and completion. We use a state-of-the-art perceptually-guided audio compression model, to encode audio to discrete representations. Next, we train a transformer-based causal language model using these representations. At inference time, we perform audio auto-completion by encoding an audio prompt as a discrete sequence, feeding it to the audio language model, sampling from the model, and synthesizing the corresponding time-domain signal. We evaluate the quality of samples generated by our method on Audioset, the largest dataset for general audio to date, and show that it is superior to the evaluated baseline audio encoders. We additionally provide an extensive analysis to better understand the trade-off between audio-quality and language-modeling capabilities. Samples:link.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

I Hear Your True Colors: Image Guided Audio Generation

We propose Im2Wav, an image guided open-domain audio generation system. ...
research
09/07/2022

AudioLM: a Language Modeling Approach to Audio Generation

We introduce AudioLM, a framework for high-quality audio generation with...
research
06/16/2022

GoodBye WaveNet – A Language Model for Raw Audio with Context of 1/2 Million Samples

Modeling long-term dependencies for audio signals is a particularly chal...
research
08/14/2023

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

We propose a method named AudioFormer,which learns audio feature represe...
research
10/12/2022

JukeDrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VA

This paper proposes a model that generates a drum track in the audio dom...
research
03/28/2018

Meta-Learning a Dynamical Language Model

We consider the task of word-level language modeling and study the possi...
research
01/02/2021

What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

In recent times, BERT based transformer models have become an inseparabl...

Please sign up or login with your details

Forgot password? Click here to reset