
-
Streaming end-to-end multi-talker speech recognition
End-to-end multi-talker speech recognition is an emerging research trend...
read it
-
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
In this study, we try to address the problem of leveraging visual signal...
read it
-
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Multi-speaker speech recognition of unsegmented recordings has diverse a...
read it
-
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
The external language models (LM) integration remains a challenging task...
read it
-
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
read it
-
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer
With its strong modeling capacity that comes from a multi-head and multi...
read it
-
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020
This paper describes the Microsoft speaker diarization system for monaur...
read it
-
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Recently, Transformer based end-to-end models have achieved great succes...
read it
-
Speaker Separation Using Speaker Inventories and Estimated Speech
We propose speaker separation using speaker inventories and estimated sp...
read it
-
An End-to-end Architecture of Online Multi-channel Speech Separation
Multi-speaker speech recognition has been one of the keychallenges in co...
read it
-
Adaptation Algorithms for Speech Recognition: An Overview
We present a structured overview of adaptation algorithms for neural net...
read it
-
Continuous Speech Separation with Conformer
Continuous speech separation plays a vital role in complicated speech re...
read it
-
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
Transfer learning (TL) is widely used in conventional hybrid automatic s...
read it
-
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Because of its streaming nature, recurrent neural network transducer (RN...
read it
-
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Recently, there has been a strong push to transition from hybrid models ...
read it
-
Exploring Transformers for Large-Scale Speech Recognition
While recurrent neural networks still largely define state-of-the-art sp...
read it
-
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Recently, the recurrent neural network transducer (RNN-T) architecture h...
read it
-
L-Vector: Neural Label Embedding for Domain Adaptation
We propose a novel neural label embedding (NLE) scheme for the domain ad...
read it
-
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR
Recently, a few novel streaming attention-based sequence-to-sequence (S2...
read it
-
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
While the community keeps promoting end-to-end models over conventional ...
read it
-
Continuous speech separation: dataset and analysis
This paper describes a dataset and protocols for evaluating continuous s...
read it
-
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
Teacher-student (T/S) has shown to be effective for domain adaptation of...
read it
-
Character-Aware Attention-Based End-to-End Speech Recognition
Predicting words and subword units (WSUs) as the output has shown to be ...
read it
-
Semantic Mask for Transformer based End-to-End Speech Recognition
Attention-based encoder-decoder model has achieved impressive results fo...
read it
-
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
We propose three regularization-based speaker adaptation approaches to a...
read it
-
Improving RNN Transducer Modeling for End-to-End Speech Recognition
In the last few years, an emerging trend in automatic speech recognition...
read it
-
Adversarial Speaker Adaptation
We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
read it
-
Adversarial Speaker Verification
The use of deep networks to extract embeddings for speaker recognition h...
read it
-
Attentive Adversarial Learning for Domain-Invariant Training
Adversarial domain-invariant training (ADIT) proves to be effective in s...
read it
-
Conditional Teacher-Student Learning
The teacher-student (T/S) learning has been shown to be effective for a ...
read it
-
Speaker Adaptation for End-to-End CTC Models
We propose two approaches for speaker adaptation in end-to-end (E2E) aut...
read it
-
Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units
The acoustic-to-word model based on the Connectionist Temporal Classific...
read it
-
Cycle-Consistent Speech Enhancement
Feature mapping using deep neural networks is an effective approach for ...
read it
-
Adversarial Feature-Mapping for Speech Enhancement
Feature-mapping with deep neural networks is commonly used for single-ch...
read it
-
Layer Trajectory LSTM
It is popular to stack LSTM layers to get better modeling power, especia...
read it
-
Recent Progresses in Deep Learning based Acoustic Models (Updated)
In this paper, we summarize recent progresses made in deep learning base...
read it
-
Developing Far-Field Speaker System Via Teacher-Student Learning
In this study, we develop the keyword spotting (KWS) and acoustic model ...
read it
-
Speaker-Invariant Training via Adversarial Learning
We propose a novel adversarial multi-task learning scheme, aiming at act...
read it
-
Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation
The teacher-student (T/S) learning has been shown effective in unsupervi...
read it
-
Cracking the cocktail party problem by multi-beam deep attractor network
While recent progresses in neural network approaches to single-channel s...
read it
-
Advancing Acoustic-to-Word CTC Model
The acoustic-to-word model based on the connectionist temporal classific...
read it
-
Advancing Connectionist Temporal Classification With Attention Modeling
In this study, we propose advancing all-neural speech recognition by dir...
read it
-
Acoustic-To-Word Model Without OOV
Recently, the acoustic-to-word model based on the Connectionist Temporal...
read it
-
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition
Unsupervised domain adaptation of speech signal aims at adapting a well-...
read it
-
Improved training for online end-to-end speech recognition systems
Achieving high accuracy with end-to-end speech recognizers requires care...
read it
-
Large-Scale Domain Adaptation via Teacher-Student Learning
High accuracy speech recognition requires a large amount of transcribed ...
read it
-
Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition
Unsupervised single-channel overlapped speech recognition is one of the ...
read it
-
End-to-End Attention based Text-Dependent Speaker Verification
A new type of End-to-End system for text-dependent speaker verification ...
read it
-
Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
This work presents a broad study on the adaptation of neural network aco...
read it
-
Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks
Recent studies have shown that deep neural networks (DNNs) perform signi...
read it