Reconstructing the Dynamic Directivity of Unconstrained Speech

09/09/2022
by   Camille Noufi, et al.
0

An accurate model of natural speech directivity is an important step toward achieving realistic vocal presence within a virtual communication setting. In this article, we propose a method to estimate and reconstruct the spatial energy distribution pattern of natural, unconstrained speech. We detail our method in two stages. Using recordings of speech captured by a real, static microphone array, we create a virtual array that tracks with the movement of the speaker over time. We use this egocentric virtual array to measure and encode the high-resolution directivity pattern of the speech signal as it dynamically evolves with natural speech and movement. Utilizing this encoded directivity representation, we train a machine learning model that leverages to estimate the full, dynamic directivity pattern when given a limited set of speech signals, as would be the case when speech is recorded using the microphones on a head-mounted display (HMD). We examine a variety of model architectures and training paradigms, and discuss the utility and practicality of each implementation. Our results demonstrate that neural networks can be used to regress from limited speech information to an accurate, dynamic estimation of the full directivity pattern.

READ FULL TEXT

page 1

page 10

page 12

page 14

research
06/28/2022

Show Me Your Face, And I'll Tell You How You Speak

When we speak, the prosody and content of the speech can be inferred fro...
research
10/03/2017

Understanding the visual speech signal

For machines to lipread, or understand speech from lip movement, they de...
research
07/31/2018

Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

We consider the problem of separating speech sources captured by multipl...
research
05/02/2022

A Novel Speech-Driven Lip-Sync Model with CNN and LSTM

Generating synchronized and natural lip movement with speech is one of t...
research
01/12/2021

Neural Network-based Virtual Microphone Estimator

Developing microphone array technologies for a small number of microphon...
research
05/17/2020

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Humans involuntarily tend to infer parts of the conversation from lip mo...
research
09/15/2023

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

We introduce a distinctive real-time, causal, neural network-based activ...

Please sign up or login with your details

Forgot password? Click here to reset