Investigating the Effects of Word Substitution Errors on Sentence Embeddings

11/16/2018
by   Rohit Voleti, et al.
0

A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is a reasonable assumption for analysis of written text, it is limiting for analysis of transcribed text. In this paper we investigate the effects of word substitution errors, such as those coming from automatic speech recognition errors (ASR), on several state-of-the-art sentence embedding methods. To do this, we propose a new simulator that allows the experimenter to induce ASR-plausible word substitution errors in a corpus at a desired word error rate. We use this simulator to evaluate the robustness of several sentence embedding methods. Our results show that pre-trained encoders such as InferSent [1] are both robust to ASR errors and perform well on textual similarity tasks after errors are introduced. Meanwhile, unweighted averages perform well with perfect transcriptions, but their performance degrades rapidly on textual similarity tasks for text with word substitution errors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

ASR Error Detection (AED) models aim to post-process the output of Autom...
research
12/01/2020

Meta-Embeddings for Natural Language Inference and Semantic Similarity tasks

Word Representations form the core component for almost all advanced Nat...
research
11/09/2019

Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

We address the task of unsupervised Semantic Textual Similarity (STS) by...
research
04/13/2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Automatic Speech Recognition (ASR) systems introduce word errors, which ...
research
04/02/2022

Efficient comparison of sentence embeddings

The domain of natural language processing (NLP), which has greatly evolv...
research
10/28/2019

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...
research
11/29/2022

Better Transcription of UK Supreme Court Hearings

Transcription of legal proceedings is very important to enable access to...

Please sign up or login with your details

Forgot password? Click here to reset