DeepAI AI Chat
Log In Sign Up

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR

11/03/2022
by   Vrunda N. Sukhadia, et al.
0

This paper proposes a novel technique to obtain better downstream ASR performance from a joint encoder-decoder self-supervised model when trained with speech pooled from two different channels (narrow and wide band). The joint encoder-decoder self-supervised model extends the HuBERT model with a Transformer decoder. HuBERT performs clustering of features and predicts the class of every input frame. In simple pooling, which is our baseline, there is no way to identify the channel information. To incorporate channel information, we have proposed non-overlapping cluster IDs for speech from different channels. Our method gives a relative improvement of   5 encoder-decoder self-supervised model built with simple pooling of data, which serves as our baseline.

READ FULL TEXT
06/09/2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR

Self-supervised learning (SSL) has shown tremendous success in various s...
11/18/2022

AVATAR submission to the Ego4D AV Transcription Challenge

In this report, we describe our submission to the Ego4D AudioVisual (AV)...
06/05/2020

Self-Supervised Encoder for Fault Prediction in Electrochemical Cells

Predicting faults before they occur helps to avoid potential safety haza...
06/01/2022

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

In this paper, we present a model pretraining technique, named MaskOCR, ...
02/02/2023

Energy-Inspired Self-Supervised Pretraining for Vision Models

Motivated by the fact that forward and backward passes of a deep network...
12/17/2021

Watermarking Images in Self-Supervised Latent Spaces

We revisit watermarking techniques based on pre-trained deep networks, i...
03/30/2022

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer

In this paper, we propose a simple yet powerful improvement over the rec...