Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

07/20/2023
by   Xiaoxiao Li, et al.
0

This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17 for Bengali language in the four tracks, and WER of 19.61 15.48 effectiveness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

research
02/22/2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

The performance of automatic speech recognition (ASR) models can be grea...
research
10/31/2022

Blank Collapse: Compressing CTC emission for the faster decoding

Connectionist Temporal Classification (CTC) model is a very efficient me...
research
07/24/2017

Exploring Neural Transducers for End-to-End Speech Recognition

In this work, we perform an empirical comparison among the CTC, RNN-Tran...
research
04/05/2021

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

We present SpeechStew, a speech recognition model that is trained on a c...
research
11/03/2022

The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...
research
03/21/2022

Enhancing Speech Recognition Decoding via Layer Aggregation

Recently proposed speech recognition systems are designed to predict usi...
research
05/08/2019

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding

The Connectionist Temporal Classification (CTC) has achieved great succe...

Please sign up or login with your details

Forgot password? Click here to reset