
-
Streaming end-to-end multi-talker speech recognition
End-to-end multi-talker speech recognition is an emerging research trend...
read it
-
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
The external language models (LM) integration remains a challenging task...
read it
-
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
read it
-
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020
This paper describes the Microsoft speaker diarization system for monaur...
read it
-
Speaker Separation Using Speaker Inventories and Estimated Speech
We propose speaker separation using speaker inventories and estimated sp...
read it
-
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Because of its streaming nature, recurrent neural network transducer (RN...
read it
-
Exploring Transformers for Large-Scale Speech Recognition
While recurrent neural networks still largely define state-of-the-art sp...
read it
-
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Recently, the recurrent neural network transducer (RNN-T) architecture h...
read it
-
L-Vector: Neural Label Embedding for Domain Adaptation
We propose a novel neural label embedding (NLE) scheme for the domain ad...
read it
-
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR
Recently, a few novel streaming attention-based sequence-to-sequence (S2...
read it
-
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
While the community keeps promoting end-to-end models over conventional ...
read it
-
A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework
To facilitate the deployment of deep neural networks (DNNs) on resource-...
read it
-
RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
Recurrent neural networks (RNNs) based automatic speech recognition has ...
read it
-
SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency
Structured weight pruning is a representative model compression techniqu...
read it
-
BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method
Accelerating DNN execution on various resource-limited computing platfor...
read it
-
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
Teacher-student (T/S) has shown to be effective for domain adaptation of...
read it
-
Character-Aware Attention-Based End-to-End Speech Recognition
Predicting words and subword units (WSUs) as the output has shown to be ...
read it
-
Advances in Online Audio-Visual Meeting Transcription
This paper describes a system that generates speaker-annotated transcrip...
read it
-
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
We propose three regularization-based speaker adaptation approaches to a...
read it
-
Improving RNN Transducer Modeling for End-to-End Speech Recognition
In the last few years, an emerging trend in automatic speech recognition...
read it
-
Self-Teaching Networks
We propose self-teaching networks to improve the generalization capacity...
read it
-
Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch
We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...
read it
-
Encrypted Speech Recognition using Deep Polynomial Networks
The cloud-based speech recognition/API provides developers or enterprise...
read it
-
Adversarial Speaker Adaptation
We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
read it
-
Adversarial Speaker Verification
The use of deep networks to extract embeddings for speaker recognition h...
read it
-
Attentive Adversarial Learning for Domain-Invariant Training
Adversarial domain-invariant training (ADIT) proves to be effective in s...
read it
-
Conditional Teacher-Student Learning
The teacher-student (T/S) learning has been shown to be effective for a ...
read it
-
Speaker Adaptation for End-to-End CTC Models
We propose two approaches for speaker adaptation in end-to-end (E2E) aut...
read it
-
Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
The acoustic-to-word model based on the Connectionist Temporal Classific...
read it
-
Cycle-Consistent Speech Enhancement
Feature mapping using deep neural networks is an effective approach for ...
read it
-
Adversarial Feature-Mapping for Speech Enhancement
Feature-mapping with deep neural networks is commonly used for single-ch...
read it
-
Layer Trajectory LSTM
It is popular to stack LSTM layers to get better modeling power, especia...
read it
-
Developing Far-Field Speaker System Via Teacher-Student Learning
In this study, we develop the keyword spotting (KWS) and acoustic model ...
read it
-
Speaker-Invariant Training via Adversarial Learning
We propose a novel adversarial multi-task learning scheme, aiming at act...
read it
-
Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation
The teacher-student (T/S) learning has been shown effective in unsupervi...
read it
-
Cracking the cocktail party problem by multi-beam deep attractor network
While recent progresses in neural network approaches to single-channel s...
read it
-
Advancing Acoustic-to-Word CTC Model
The acoustic-to-word model based on the connectionist temporal classific...
read it
-
Advancing Connectionist Temporal Classification With Attention Modeling
In this study, we propose advancing all-neural speech recognition by dir...
read it
-
Acoustic-To-Word Model Without OOV
Recently, the acoustic-to-word model based on the Connectionist Temporal...
read it
-
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition
Unsupervised domain adaptation of speech signal aims at adapting a well-...
read it
-
Large-Scale Domain Adaptation via Teacher-Student Learning
High accuracy speech recognition requires a large amount of transcribed ...
read it
-
End-to-End Attention based Text-Dependent Speaker Verification
A new type of End-to-End system for text-dependent speaker verification ...
read it