Dynamic Speech Endpoint Detection with Regression Targets

10/25/2022
by   Dawei Liang, et al.
0

Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assistants to interact with users. Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on context of user queries. Specifically, we present a pause modeling method and show its effectiveness for dynamic end-pointing. Based on our experiments with vendor-collected smartphone and wearables speech queries, our strategy shows a better trade-off between endpointing latency and accuracy, compared to the traditional classification-based method. We further discuss the benefits of this model and generalization of the framework in the paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

For speech interaction, voice activity detection (VAD) is often used as ...
research
08/29/2022

Turn-Taking Prediction for Natural Conversational Speech

While a streaming voice assistant system has been used in many applicati...
research
03/24/2021

Voice Privacy with Smart Digital Assistants in Educational Settings

The emergence of voice-assistant devices ushers in delightful user exper...
research
07/01/2021

An Objective Evaluation Framework for Pathological Speech Synthesis

The development of pathological speech systems is currently hindered by ...
research
08/29/2022

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

In voice-enabled applications, a predetermined hotword isusually used to...
research
08/07/2023

Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface

Voice disorders affect millions of people worldwide. Surface electromyog...
research
07/10/2018

DialPlate: Enhancing the Detection of Smooth Pursuits Eye Movements Using Linear Regression

We introduce and evaluate a novel approach for detecting smooth pursuit ...

Please sign up or login with your details

Forgot password? Click here to reset