DeepAI AI Chat
Log In Sign Up

Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

by   Aparna Khare, et al.

Recent literature has shown that a learned front end with multi-channel audio input can outperform traditional beam-forming algorithms for automatic speech recognition (ASR). In this paper, we present our study on multi-channel acoustic modeling using OPUS compression with different bitrates for the different channels. We analyze the degradation in word error rate (WER) as a function of the audio encoding bitrate and show that the WER degrades by 12.6 relative with 16kpbs as compared to uncompressed audio. We show that its always preferable to have a multi-channel audio input over a single channel audio input given limited bandwidth. Our results show that for the best WER, when one of the two channels can be encoded with a bitrate higher than 32kbps, its optimal to encode the other channel with the highest bitrate possible. For bitrates lower than that, its preferable to distribute the bitrate equally between the two channels. We further show that by training the acoustic model on mixed bitrate input, up to 50 single model.


Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

In this work, we investigated the teacher-student training paradigm to t...

Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

In this paper, we present a novel deep fusion architecture for audio cla...

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

End-to-end (E2E) multi-channel ASR systems show state-of-the-art perform...

Speech bandwidth extension with WaveNet

Large-scale mobile communication systems tend to contain legacy transmis...

Multi-Channel FFT Architectures Designed via Folding and Interleaving

Computing the FFT of a single channel is well understood in the literatu...

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition

In this paper, we tackle the problem of handling narrowband and wideband...

Multi-View Networks For Multi-Channel Audio Classification

In this paper we introduce the idea of multi-view networks for sound cla...