Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

10/20/2021
by   Mun-Hak Lee, et al.
0

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Large-scale pre-trained language models (PLMs) with powerful language mo...
research
05/17/2020

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Speech is one of the most effective means of communication and is full o...
research
06/25/2021

Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Cued Speech (CS) is a visual communication system for the deaf or hearin...
research
08/02/2019

Self-Knowledge Distillation in Natural Language Processing

Since deep learning became a key player in natural language processing (...
research
03/22/2023

Self-distillation for surgical action recognition

Surgical scene understanding is a key prerequisite for contextaware deci...
research
10/16/2021

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

On many natural language processing tasks, large pre-trained language mo...
research
08/07/2023

Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation

Cued Speech (CS) is a visual coding tool to encode spoken languages at t...

Please sign up or login with your details

Forgot password? Click here to reset