Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning

by   Sunakshi Mehra, et al.

We introduce an unsupervised approach for correcting highly imperfect speech transcriptions based on a decision-level fusion of stemming and two-way phoneme pruning. Transcripts are acquired from videos by extracting audio using Ffmpeg framework and further converting audio to text transcript using Google API. In the benchmark LRW dataset, there are 500 word categories, and 50 videos per class in mp4 format. All videos consist of 29 frames (each 1.16 s long) and the word appears in the middle of the video. In our approach we tried to improve the baseline accuracy from 9.34 filtering and pruning. After applying the stemming algorithm to the text transcript and evaluating the results, we achieved 23.34 recognition. To convert words to phonemes we used the Carnegie Mellon University (CMU) pronouncing dictionary that provides a phonetic mapping of English words to their pronunciations. A two-way phoneme pruning is proposed that comprises of the two non-sequential steps: 1) filtering and pruning the phonemes containing vowels and plosives 2) filtering and pruning the phonemes containing vowels and fricatives. After obtaining results of stemming and two-way phoneme pruning, we applied decision-level fusion and that led to an improvement of word recognition rate upto 32.96


page 1

page 2

page 3

page 4


Word-Level Coreference Resolution

Recent coreference resolution models rely heavily on span representation...

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

Automatic recognition of overlapped speech remains a highly challenging ...

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...

An Approach to Speed-up the Word Sense Disambiguation Procedure through Sense Filtering

In this paper, we are going to focus on speed up of the Word Sense Disam...

Lipi Gnani - A Versatile OCR for Documents in any Language Printed in Kannada Script

A Kannada OCR, named Lipi Gnani, has been designed and developed from sc...

Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy

Speech recognition, especially name recognition, is widely used in phone...

Visual Features for Context-Aware Speech Recognition

Automatic transcriptions of consumer-generated multi-media content such ...