Diet deep generative audio models with structured lottery

07/31/2020
by   Philippe Esling, et al.
0

Deep learning models have provided extremely successful solutions in most audio application fields. However, the high accuracy of these models comes at the expense of a tremendous computation cost. This aspect is almost always overlooked in evaluating the quality of proposed models. However, models should not be evaluated without taking into account their complexity. This aspect is especially critical in audio applications, which heavily relies on specialized embedded hardware with real-time constraints. In this paper, we build on recent observations that deep models are highly overparameterized, by studying the lottery ticket hypothesis on deep generative audio models. This hypothesis states that extremely efficient small sub-networks exist in deep models and would provide higher accuracy than larger models if trained in isolation. However, lottery tickets are found by relying on unstructured masking, which means that resulting models do not provide any gain in either disk size or inference time. Instead, we develop here a method aimed at performing structured trimming. We show that this requires to rely on global selection and introduce a specific criterion based on mutual information. First, we confirm the surprising result that smaller models provide higher accuracy than their large counterparts. We further show that we can remove up to 95 weights without significant degradation in accuracy. Hence, we can obtain very light models for generative audio across popular methods such as Wavenet, SING or DDSP, that are up to 100 times smaller with commensurate accuracy. We study the theoretical bounds for embedding these models on Raspberry Pi and Arduino, and show that we can obtain generative models on CPU with equivalent quality as large GPU models. Finally, we discuss the possibility of implementing deep generative audio models on embedded platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

Problems using deep generative models for probabilistic audio source separation

Recent advancements in deep generative modeling make it possible to lear...
research
07/31/2020

Ultra-light deep MIR by trimming lottery tickets

Current state-of-the-art results in Music Information Retrieval are larg...
research
12/12/2012

Expectation-Propogation for the Generative Aspect Model

The generative aspect model is an extension of the multinomial model for...
research
02/27/2023

Continuous descriptor-based control for deep audio synthesis

Despite significant advances in deep models for music generation, the us...
research
04/30/2019

Performing Structured Improvisations with pre-trained Deep Learning Models

The quality of outputs produced by deep generative models for music have...
research
10/20/2021

EBJR: Energy-Based Joint Reasoning for Adaptive Inference

State-of-the-art deep learning models have achieved significant performa...
research
07/06/2021

A Multi-Objective Approach for Sustainable Generative Audio Models

In recent years, the deep learning community has largely focused on the ...

Please sign up or login with your details

Forgot password? Click here to reset