Yifan Gong

research

∙ 09/14/2023

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Attention-based encoder-decoder (AED) speech recognition model has been ...

0 Shaoshi Ling, et al. ∙

research

∙ 04/30/2023

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning

Rehearsal-based approaches are a mainstay of continual learning (CL). Th...

0 Zifeng Wang, et al. ∙

research

∙ 03/13/2023

Can Adversarial Examples Be Parsed to Reveal Victim Model Information?

Numerous adversarial attack methods have been developed to generate impe...

8 Yuguang Yao, et al. ∙

research

∙ 03/01/2023

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

We propose gated language experts to improve multilingual transformer tr...

0 Eric Sun, et al. ∙

research

∙ 12/09/2022

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

During the deployment of deep neural networks (DNNs) on edge devices, ma...

0 Yifan Gong, et al. ∙

research

∙ 11/22/2022

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

As data become increasingly vital for deep learning, a company would be ...

0 Sizhe Chen, et al. ∙

research

∙ 11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...

0 Yashesh Gaur, et al. ∙

research

∙ 03/26/2022

Reverse Engineering of Imperceptible Adversarial Image Perturbations

It has been well recognized that neural network based image classifiers ...

4 Yifan Gong, et al. ∙

research

∙ 01/24/2022

Endpoint Detection for Streaming End-to-End Multi-talker ASR

Streaming end-to-end multi-talker speech recognition aims at transcribin...

0 Liang Lu, et al. ∙

research

∙ 10/10/2021

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Hybrid and end-to-end (E2E) systems have their individual advantages, wi...

0 Guoli Ye, et al. ∙

research

∙ 10/06/2021

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Text-only adaptation of an end-to-end (E2E) model remains a challenging ...

0 Zhong Meng, et al. ∙

research

∙ 09/23/2021

Joint speaker diarisation and tracking in switching state-space model

Speakers may move around while diarisation is being performed. When a mi...

0 Jeremy H. M. Wong, et al. ∙

research

∙ 09/22/2021

Diarisation using location tracking with agglomerative clustering

Previous works have shown that spatial location information can be compl...

0 Jeremy H. M. Wong, et al. ∙

research

∙ 06/04/2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Integrating external language models (LMs) into end-to-end (E2E) models ...

0 Zhong Meng, et al. ∙

research

∙ 04/27/2021

On Addressing Practical Challenges for RNN-Transducer

In this paper, several works are proposed to address practical challenge...

0 Rui Zhao, et al. ∙

research

∙ 04/05/2021

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

In multi-talker scenarios such as meetings and conversations, speech pro...

0 Liang Lu, et al. ∙

research

∙ 02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...

0 Zhong Meng, et al. ∙

research

∙ 11/26/2020

Streaming end-to-end multi-talker speech recognition

End-to-end multi-talker speech recognition is an emerging research trend...

0 Liang Lu, et al. ∙

research

∙ 11/03/2020

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

The external language models (LM) integration remains a challenging task...

0 Zhong Meng, et al. ∙

research

∙ 10/23/2020

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...

0 Liang Lu, et al. ∙

research

∙ 10/22/2020

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

This paper describes the Microsoft speaker diarization system for monaur...

0 Xiong Xiao, et al. ∙

research

∙ 10/20/2020

Speaker Separation Using Speaker Inventories and Estimated Speech

We propose speaker separation using speaker inventories and estimated sp...

0 Peidong Wang, et al. ∙

research

∙ 07/30/2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Because of its streaming nature, recurrent neural network transducer (RN...

0 Jinyu Li, et al. ∙

research

∙ 05/19/2020

Exploring Transformers for Large-Scale Speech Recognition

While recurrent neural networks still largely define state-of-the-art sp...

0 Liang Lu, et al. ∙

research

∙ 05/01/2020

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

Recently, the recurrent neural network transducer (RNN-T) architecture h...

0 Hu Hu, et al. ∙

research

∙ 04/25/2020

L-Vector: Neural Label Embedding for Domain Adaptation

We propose a novel neural label embedding (NLE) scheme for the domain ad...

0 Zhong Meng, et al. ∙

research

∙ 04/10/2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

Recently, a few novel streaming attention-based sequence-to-sequence (S2...

0 Hirofumi Inaguma, et al. ∙

research

∙ 03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...

0 Jinyu Li, et al. ∙

research

∙ 03/13/2020

A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework

To facilitate the deployment of deep neural networks (DNNs) on resource-...

0 Zheng Zhan, et al. ∙

research

∙ 02/19/2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

Recurrent neural networks (RNNs) based automatic speech recognition has ...

0 Peiyan Dong, et al. ∙

research

∙ 01/23/2020

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

Structured weight pruning is a representative model compression techniqu...

0 Zhengang Li, et al. ∙

research

∙ 01/23/2020

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Accelerating DNN execution on various resource-limited computing platfor...

7 Xiaolong Ma, et al. ∙

research

∙ 01/06/2020

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Teacher-student (T/S) has shown to be effective for domain adaptation of...

2 Zhong Meng, et al. ∙

research

∙ 01/06/2020

Character-Aware Attention-Based End-to-End Speech Recognition

Predicting words and subword units (WSUs) as the output has shown to be ...

5 Zhong Meng, et al. ∙

research

∙ 12/10/2019

Advances in Online Audio-Visual Meeting Transcription

This paper describes a system that generates speaker-annotated transcrip...

15 Takuya Yoshioka, et al. ∙

research

∙ 11/09/2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

We propose three regularization-based speaker adaptation approaches to a...

0 Zhong Meng, et al. ∙

research

∙ 09/26/2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition

In the last few years, an emerging trend in automatic speech recognition...

0 Jinyu Li, et al. ∙

research

∙ 09/09/2019

Self-Teaching Networks

We propose self-teaching networks to improve the generalization capacity...

0 Liang Lu, et al. ∙

research

∙ 07/12/2019

Pykaldi2: Yet another speech toolkit based on Kaldi and Pytorch

We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...

0 Liang Lu, et al. ∙

research

∙ 05/11/2019

Encrypted Speech Recognition using Deep Polynomial Networks

The cloud-based speech recognition/API provides developers or enterprise...

0 Shi-Xiong Zhang, et al. ∙

research

∙ 04/29/2019

Adversarial Speaker Adaptation

We propose a novel adversarial speaker adaptation (ASA) scheme, in which...

0 Zhong Meng, et al. ∙

research

∙ 04/29/2019

Adversarial Speaker Verification

The use of deep networks to extract embeddings for speaker recognition h...

0 Zhong Meng, et al. ∙

research

∙ 04/28/2019

Attentive Adversarial Learning for Domain-Invariant Training

Adversarial domain-invariant training (ADIT) proves to be effective in s...

0 Zhong Meng, et al. ∙

research

∙ 04/28/2019

Conditional Teacher-Student Learning

The teacher-student (T/S) learning has been shown to be effective for a ...

0 Zhong Meng, et al. ∙

research

∙ 01/04/2019

Speaker Adaptation for End-to-End CTC Models

We propose two approaches for speaker adaptation in end-to-end (E2E) aut...

0 Ke Li, et al. ∙

research

∙ 12/31/2018

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

The acoustic-to-word model based on the Connectionist Temporal Classific...

0 Amit Das, et al. ∙

research

∙ 09/06/2018

Cycle-Consistent Speech Enhancement

Feature mapping using deep neural networks is an effective approach for ...

0 Zhong Meng, et al. ∙

research

∙ 09/06/2018

Adversarial Feature-Mapping for Speech Enhancement

Feature-mapping with deep neural networks is commonly used for single-ch...

0 Zhong Meng, et al. ∙

research

∙ 08/28/2018

Layer Trajectory LSTM

It is popular to stack LSTM layers to get better modeling power, especia...

0 Jinyu Li, et al. ∙

research

∙ 04/14/2018

Developing Far-Field Speaker System Via Teacher-Student Learning

In this study, we develop the keyword spotting (KWS) and acoustic model ...

0 Jinyu Li, et al. ∙

Yifan Gong

Featured Co-authors

Sign in with Google

Consider DeepAI Pro