DeepAI AI Chat
Log In Sign Up

Speech-VGG: A deep feature extractor for speech processing

by   Pierre Beckmann, et al.

A growing number of studies in the field of speech processing employ feature losses to train deep learning systems. While the application of this framework typically yields beneficial results, the question of what's the optimal setup for extracting transferable speech features to compute losses remains underexplored. In this study, we extend our previous work on speechVGG, a deep feature extractor for training speech processing frameworks. The extractor is based on the classic VGG-16 convolutional neural network re-trained to identify words from the log magnitude STFT features. To estimate the influence of different hyperparameters on the extractor's performance, we applied several configurations of speechVGG to train a system for informed speech inpainting, the context-based recovery of missing parts from time-frequency masked speech segments. We show that changing the size of the dictionary and the size of the dataset used to pre-train the speechVGG notably modulates task performance of the main framework.


page 3

page 4


Deep speech inpainting of time-frequency masks

In particularly noisy environments, transient loud intrusions can comple...

Deep CNN based feature extractor for text-prompted speaker recognition

Deep learning is still not a very common tool in speaker verification fi...

GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcraf...

CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models

Self-supervised learning (SSL) is a powerful technique for learning repr...

Speech Denoising with Deep Feature Losses

We present an end-to-end deep learning approach to denoising speech sign...

Temporal envelope and fine structure cues for dysarthric speech detection using CNNs

Deep learning-based techniques for automatic dysarthric speech detection...