Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework
Robust automatic speech recognition (ASR) system exploits state-of-the-art deep neural networks (DNN) based acoustic model (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. These systems are quite large and require significant parameter reduction to operate on embedded devices. Impact of the parameter quantization on the overall word recognition performance is studied in this paper. Following three approaches are presented: (i) AM trained in Kaldi framework with conventional factorized TDNN (TDNN-F) architecture. (ii) the TDNN built in Kaldi is loaded into the Pytorch toolkit using a C++ wrapper. The weights and activation parameters are then quantized and the inference is performed in Pytorch. (iii) post quantization training for fine-tuning. Results obtained on standard Librispeech setup provide an interesting overview of recognition accuracy w.r.t. applied quantization scheme.
READ FULL TEXT