Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps CHiME-4 Corpora

09/23/2021
by   Szu-Jui Chen, et al.
0

In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature. With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5.42 for the development and evaluation sets of the Fearless Steps Corpus. To explore generalization, we further test the same technique on the 1 channel track of CHiME-4 and observe a +11.90 data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Filter-based Discriminative Autoencoders for Children Speech Recognition

Children speech recognition is indispensable but challenging due to the ...
research
06/30/2022

FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition

Self-supervised learning representations (SSLR) have resulted in robust ...
research
08/04/2020

"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)

We describe the speech activity detection (SAD), speaker diarization (SD...
research
11/07/2018

Learning acoustic word embeddings with phonetically associated triplet network

Previous researches on acoustic word embeddings used in query-by-example...
research
10/24/2022

Investigating the effect of domain selection on automatic speech recognition performance: a case study on Bangladeshi Bangla

The performance of data-driven natural language processing systems is co...
research
08/15/2020

FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data

The Fearless Steps Initiative by UTDallas-CRSS led to the digitization, ...
research
09/04/2019

VoipLoc: Establishing VoIP call provenance using acoustic side-channels

We develop a novel technique to determine call provenance in anonymous V...

Please sign up or login with your details

Forgot password? Click here to reset