E2E Spoken Entity Extraction for Virtual Agents

02/16/2023
by   Karan Singla, et al.
0

This paper reimagines some aspects of speech processing using speech encoders, specifically about extracting entities directly from speech, with no intermediate textual representation. In human-computer conversations, extracting entities such as names, postal addresses and email addresses from speech is a challenging task. In this paper, we study the impact of fine-tuning pre-trained speech encoders on extracting spoken entities in human-readable form directly from speech without the need for text transcription. We illustrate that such a direct approach optimizes the encoder to transcribe only the entity relevant portions of speech, ignoring the superfluous portions such as carrier phrases and spellings of entities. In the context of dialogs from an enterprise virtual agent, we demonstrate that the 1-step approach outperforms the typical 2-step cascade of first generating lexical transcriptions followed by text-based entity extraction for identifying spoken entities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2020

Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants

We focus on improving the effectiveness of a Virtual Assistant (VA) in r...
research
10/03/2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Data-driven speech processing models usually perform well with a large a...
research
05/24/2023

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

We present SPECTRON, a novel approach to adapting pre-trained language m...
research
11/15/2020

To Schedule or not to Schedule: Extracting Task Specific Temporal Entities and Associated Negation Constraints

State of the art research for date-time entity extraction from text is t...
research
05/20/2023

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

The pre-trained speech encoder wav2vec 2.0 performs very well on various...
research
03/29/2022

Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture

Person name capture from human speech is a difficult task in human-machi...
research
03/08/2022

Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

Building robust and general dialogue models for spoken conversations is ...

Please sign up or login with your details

Forgot password? Click here to reset