Adapting an ASR Foundation Model for Spoken Language Assessment

07/13/2023
by   Rao Ma, et al.
0

A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a tendency to skip disfluencies and hesitations in the output. Though useful for readability, these attributes are not helpful for assessing the ability of a candidate and providing feedback. Here a precise transcription of what a candidate said is needed. In this paper, we give a detailed analysis of Whisper outputs and propose two solutions: fine-tuning and soft prompt tuning. Experiments are conducted on both public speech corpora and an English learner dataset. Results show that we can effectively alter the decoding behaviour of Whisper to generate the exact words spoken in the response.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding

Employing pre-trained language models (LM) to extract contextualized wor...
research
05/20/2023

Self-supervised representations in speech-based depression detection

This paper proposes handling training data sparsity in speech-based auto...
research
10/24/2022

Proficiency assessment of L2 spoken English using wav2vec 2.0

The increasing demand for learning English as a second language has led ...
research
04/20/2023

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Though Dialogue State Tracking (DST) is a core component of spoken dialo...
research
03/10/2021

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

Adversarial training of end-to-end (E2E) ASR systems using generative ad...
research
02/10/2022

Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

ASR systems designed for native English (L1) usually underperform on non...
research
07/14/2023

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

ASR systems are generally built for the spoken 'standard', and their per...

Please sign up or login with your details

Forgot password? Click here to reset