Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

10/29/2022
by   Yashish M. Siriwardena, et al.
0

Most organisms including humans function by coordinating and integrating sensory signals with motor actions to survive and accomplish desired tasks. Learning these complex sensorimotor mappings proceeds simultaneously and often in an unsupervised or semi-supervised fashion. An autoencoder architecture (MirrorNet) inspired by this sensorimotor learning paradigm is explored in this work to learn how to control an articulatory synthesizer. The synthesizer takes as input control signals consisting of six vocal Tract Variables (TVs) and source features (voicing indicators and pitch), and generates the corresponding auditory spectrograms. Due to the non-linear structure of the synthesizer, the control parameters that produce a target speech signal are not readily computable nor are they always unique. Here we demonstrate how to initialize the MirrorNet learning so as to produce a meaningful range of articulatory values. Once trained, the MirrorNet successfully estimates the TVs and source features needed to synthesize any arbitrary speech utterance. This approach outperforms the best previously designed `speech inversion' systems on the Wisconsin X-ray microbeam (XRMB) dataset.

READ FULL TEXT
research
10/12/2021

The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

Experiments to understand the sensorimotor neural interactions in the hu...
research
03/11/2022

Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

Multi-resolution spectro-temporal features of a speech signal represent ...
research
10/29/2022

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

In this work, we incorporated acoustically derived source features, aper...
research
11/12/2017

Semi-Supervised Learning via New Deep Network Inversion

We exploit a recently derived inversion scheme for arbitrary deep neural...
research
03/26/2018

Deep learning as a tool for neural data analysis: speech classification and cross-frequency coupling in human sensorimotor cortex

A fundamental challenge in neuroscience is to understand what structure ...
research
09/17/2023

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

The performance of deep learning models depends significantly on their c...
research
02/15/2022

Nonverbal Sound Detection for Disordered Speech

Voice assistants have become an essential tool for people with various d...

Please sign up or login with your details

Forgot password? Click here to reset