Directional Source Separation for Robust Speech Recognition on Smart Glasses

09/20/2023
by   Tiantian Feng, et al.
0

Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality, this work investigates directional source separation using the multi-microphone array. We first explore multiple beamformers to assist source separation modeling by strengthening the directional properties of speech signals. In addition to relying on predetermined beamformers, we investigate neural beamforming in multi-channel source separation, demonstrating that automatic learning directional characteristics effectively improves separation quality. We further compare the ASR performance leveraging separated outputs to noisy inputs. Our results show that directional source separation benefits ASR for the wearer but not for the conversation partner. Lastly, we perform the joint training of the directional source separation and ASR model, achieving the best overall ASR performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

End-to-End Multi-speaker ASR with Independent Vector Analysis

We develop an end-to-end system for multi-channel, multi-speaker automat...
research
10/30/2019

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

We present a multi-channel database of overlapping speech for training, ...
research
10/30/2020

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

This paper proposes a new paradigm for handling far-field multi-speaker ...
research
03/31/2022

Perceptive, non-linear Speech Processing and Spiking Neural Networks

Source separation and speech recognition are very difficult in the conte...
research
06/21/2018

Towards Automated Single Channel Source Separation using Neural Networks

Many applications of single channel source separation (SCSS) including a...
research
07/20/2022

Spatial Aware Multi-Task Learning Based Speech Separation

During the Covid, online meetings have become an indispensable part of o...
research
06/25/2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

Deep neural networks have recently shown great success in the task of bl...

Please sign up or login with your details

Forgot password? Click here to reset