Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

12/31/2020
by   Shen Chen, et al.
0

Deep learning (DL) has brought about remarkable breakthrough in processing images, video and speech due to its efficacy in extracting highly abstract representation and learning very complex functions. However, there is seldom operating procedure reported on how to make it for real use cases. In this paper, we intend to address this problem by presenting a generalized operating procedure for DL from the perspective of unconstrained optimal design, which is motivated by a simple intension to remove the barrier of using DL, especially for those scientists or engineers who are new but eager to use it. Our proposed procedure contains seven steps, which are project/problem statement, data collection, architecture design, initialization of parameters, defining loss function, computing optimal parameters, and inference, respectively. Following this procedure, we build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams within different frequency range, so that the acoustic modeling can be more robust resulting from the diversity of features. Trained with VoxCeleb dataset, our experimental results verify the effectiveness of our proposed operating procedure, and also show that our multi-stream framework outperforms single-stream baseline with 20 function (minDCF).

READ FULL TEXT
research
12/21/2020

Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification

Speaker verification aims to verify whether an input speech corresponds ...
research
04/10/2018

DeepQoE: A unified Framework for Learning to Predict Video QoE

Motivated by the prowess of deep learning (DL) based techniques in predi...
research
11/10/2020

Generalized LSTM-based End-to-End Text-Independent Speaker Verification

The increasing amount of available data and more affordable hardware sol...
research
01/02/2020

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

Target speech separation refers to extracting the target speaker's speec...
research
07/05/2021

A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations

We design and develop a new Particle-in-Cell (PIC) method for plasma sim...
research
02/08/2022

A Speech Intelligibility Enhancement Model based on Canonical Correlation and Deep Learning for Hearing-Assistive Technologies

Current deep learning (DL) based approaches to speech intelligibility en...

Please sign up or login with your details

Forgot password? Click here to reset