TorchDIVA: An Extensible Computational Model of Speech Production built on an Open-Source Machine Learning Library

10/17/2022
by   Sean Kinahan, et al.
0

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the original DIVA model, with a negligible difference between the two. We additionally present an example of the extensibility of TorchDIVA as a research platform. Speech quality enhancement in TorchDIVA is achieved through an integration with an existing PyTorch generative vocoder called DiffWave. A modified DiffWave mel-spectrum upsampler was trained on human speech waveforms and conditioned on the TorchDIVA speech production. The results indicate improved speech quality metrics in the DiffWave-enhanced output as compared to the baseline. This enhancement would have been difficult or impossible to accomplish in the original Matlab implementation. This proof-of-concept demonstrates the value TorchDIVA will bring to the research community. Researchers can download the new implementation at: https://github.com/skinahan/DIVA_PyTorch

READ FULL TEXT
research
04/07/2021

DoubleML – An Object-Oriented Implementation of Double Machine Learning in Python

DoubleML is an open-source Python library implementing the double machin...
research
02/17/2022

GEMA: An open-source Python library for self-organizing-maps

Organizations have realized the importance of data analysis and its bene...
research
11/03/2020

Brain Predictability toolbox: a Python library for neuroimaging based machine learning

Summary Brain Predictability toolbox (BPt) represents a unified framewor...
research
11/16/2019

N-HANS: Introducing the Augsburg Neuro-Holistic Audio-eNhancement System

N-HANS is a Python toolkit for in-the-wild audio enhancement, including ...
research
04/22/2021

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, ...
research
11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...
research
04/20/2020

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Estimation of perceptual quality in audio and speech is possible using a...

Please sign up or login with your details

Forgot password? Click here to reset