Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

02/01/2020
by   Aparna Khare, et al.
0

Recent literature has shown that a learned front end with multi-channel audio input can outperform traditional beam-forming algorithms for automatic speech recognition (ASR). In this paper, we present our study on multi-channel acoustic modeling using OPUS compression with different bitrates for the different channels. We analyze the degradation in word error rate (WER) as a function of the audio encoding bitrate and show that the WER degrades by 12.6 relative with 16kpbs as compared to uncompressed audio. We show that its always preferable to have a multi-channel audio input over a single channel audio input given limited bandwidth. Our results show that for the best WER, when one of the two channels can be encoded with a bitrate higher than 32kbps, its optimal to encode the other channel with the highest bitrate possible. For bitrates lower than that, its preferable to distribute the bitrate equally between the two channels. We further show that by training the acoustic model on mixed bitrate input, up to 50 single model.

READ FULL TEXT
research
02/01/2020

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

In this work, we investigated the teacher-student training paradigm to t...
research
11/02/2018

Acoustic Features Fusion using Attentive Multi-channel Deep Architecture

In this paper, we present a novel deep fusion architecture for audio cla...
research
09/23/2021

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

End-to-end (E2E) multi-channel ASR systems show state-of-the-art perform...
research
02/19/2022

Multi-Channel FFT Architectures Designed via Folding and Interleaving

Computing the FFT of a single channel is well understood in the literatu...
research
09/05/2019

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition

In this paper, we tackle the problem of handling narrowband and wideband...
research
11/03/2018

Multi-View Networks For Multi-Channel Audio Classification

In this paper we introduce the idea of multi-view networks for sound cla...
research
06/15/2021

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

Automatic speech recognition (ASR) in the cloud allows the use of larger...

Please sign up or login with your details

Forgot password? Click here to reset