A multilingual approach to joint Speech and Accent Recognition with DNN-HMM framework
Human can perform multi-task recognition from speech. For instance, human can recognize speech, as well as a peculiar accent of the speech simultaneously. However, present state-of-the-art speech recognition system can rarely do that. In this paper, we propose a multilingual approach to recognizing English speech, as well as the related accent that the speakers convey using DNN-HMM framework. Specifically, we assume different accents of English as different languages. We then merge them together and train a multilingual speech recognition system. During decoding, we conduct two sets of experiments. One is a monolingual Automatic Speech Recognition (ASR) system, with the accent information only embedded at the phone level, realizing word-based accent recognition, and the other is a multilingual ASR system, with the accent information embedded at both word and phone level, realizing an approximated utterance-based accent recognition.
READ FULL TEXT