Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

11/15/2019
by   Tina Raissi, et al.
0

In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88 comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes.

READ FULL TEXT
research
05/18/2020

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Despite the significant progress in automatic speech recognition (ASR), ...
research
06/14/2023

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure

To address the issue of poor generalization ability in end-to-end speech...
research
03/24/2017

Batch-normalized joint training for DNN-based distant speech recognition

Improving distant speech recognition is a crucial step towards flexible ...
research
02/04/2019

Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows

Previous VoIP steganalysis methods face great challenges in detecting sp...
research
11/26/2017

Realistic multi-microphone data simulation for distant speech recognition

The availability of realistic simulated corpora is of key importance for...
research
09/17/2021

Dual-Encoder Architecture with Encoder Selection for Joint Close-Talk and Far-Talk Speech Recognition

In this paper, we propose a dual-encoder ASR architecture for joint mode...
research
08/21/2023

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

We present LibriWASN, a data set whose design follows closely the LibriC...

Please sign up or login with your details

Forgot password? Click here to reset