Speech-VGG: A deep feature extractor for speech processing

10/22/2019
by   Pierre Beckmann, et al.
0

A growing number of studies in the field of speech processing employ feature losses to train deep learning systems. While the application of this framework typically yields beneficial results, the question of what's the optimal setup for extracting transferable speech features to compute losses remains underexplored. In this study, we extend our previous work on speechVGG, a deep feature extractor for training speech processing frameworks. The extractor is based on the classic VGG-16 convolutional neural network re-trained to identify words from the log magnitude STFT features. To estimate the influence of different hyperparameters on the extractor's performance, we applied several configurations of speechVGG to train a system for informed speech inpainting, the context-based recovery of missing parts from time-frequency masked speech segments. We show that changing the size of the dictionary and the size of the dataset used to pre-train the speechVGG notably modulates task performance of the main framework.

READ FULL TEXT

page 3

page 4

research
10/20/2019

Deep speech inpainting of time-frequency masks

In particularly noisy environments, transient loud intrusions can comple...
research
03/13/2018

Deep CNN based feature extractor for text-prompted speaker recognition

Deep learning is still not a very common tool in speaker verification fi...
research
07/19/2022

GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcraf...
research
12/01/2022

CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

Self-supervised learning (SSL) is a powerful technique for learning repr...
research
06/27/2018

Speech Denoising with Deep Feature Losses

We present an end-to-end deep learning approach to denoising speech sign...
research
11/22/2022

SkipConvGAN: Monaural Speech Dereverberation using Generative Adversarial Networks via Complex Time-Frequency Masking

With the advancements in deep learning approaches, the performance of sp...
research
08/25/2021

Temporal envelope and fine structure cues for dysarthric speech detection using CNNs

Deep learning-based techniques for automatic dysarthric speech detection...

Please sign up or login with your details

Forgot password? Click here to reset