End-Point Detection with State Transition Model based on Chunk-Wise Classification

12/22/2019
by   Juntae Kim, et al.
0

A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD decisions. Finally, the model was evaluated in both qualitative and quantitative measures including phone error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2021

Unsupervised Object-Based Transition Models for 3D Partially Observable Environments

We present a slot-wise, object-based transition model that decomposes a ...
research
03/04/2013

Denoising Deep Neural Networks Based Voice Activity Detection

Recently, the deep-belief-networks (DBN) based voice activity detection ...
research
10/19/2017

SLING: A framework for frame semantic parsing

We describe SLING, a framework for parsing natural language into semanti...
research
01/02/2017

Vid2speech: Speech Reconstruction from Silent Video

Speechreading is a notoriously difficult task for humans to perform. In ...
research
07/03/2019

End-to-End Speech Recognition with High-Frame-Rate Features Extraction

State-of-the-art end-to-end automatic speech recognition (ASR) extracts ...
research
06/25/2021

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

We address voice activity detection in acoustic environments of transien...
research
05/31/2020

Residual Excitation Skewness for Automatic Speech Polarity Detection

Detecting the correct speech polarity is a necessary step prior to sever...

Please sign up or login with your details

Forgot password? Click here to reset