Jinyu Li

research

∙ 09/14/2023

DiariST: Streaming Speech Translation with Speaker Diarization

End-to-end speech translation (ST) for conversation recordings involves ...

0 Mu Yang, et al. ∙

research

∙ 08/11/2023

Deepsea: A Meta-ocean Prototype for Undersea Exploration

Metaverse has attracted great attention from industry and academia in re...

0 Jinyu Li, et al. ∙

research

∙ 07/07/2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

In real-world applications, users often require both translations and tr...

0 Sara Papi, et al. ∙

research

∙ 06/28/2023

Accelerating Transducers through Adjacent Token Merging

Recent end-to-end automatic speech recognition (ASR) systems often utili...

0 Yuang Li, et al. ∙

research

∙ 06/28/2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

The integration of Language Models (LMs) has proven to be an effective w...

0 Yuang Li, et al. ∙

research

∙ 05/31/2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with...

0 Huiqiang Jiang, et al. ∙

research

∙ 05/08/2023

PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

In order to deal with the sparse and unstructured raw point clouds, LiDA...

0 Jinyu Li, et al. ∙

research

∙ 03/01/2023

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

We propose gated language experts to improve multilingual transformer tr...

0 Eric Sun, et al. ∙

research

∙ 02/22/2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

We previously proposed contextual spelling correction (CSC) to correct t...

1 Xiaoqiang Wang, et al. ∙

research

∙ 01/05/2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

We introduce a language modeling approach for text to speech synthesis (...

4 Chengyi Wang, et al. ∙

research

∙ 12/05/2022

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Neural transducer is now the most popular end-to-end model for speech re...

0 Rui Zhao, et al. ∙

research

∙ 11/21/2022

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Although speech is a simple and effective way for humans to communicate ...

0 Qiushi Zhu, et al. ∙

research

∙ 11/17/2022

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

Traditional automatic speech recognition (ASR) systems usually focus on ...

0 Xun Gong, et al. ∙

research

∙ 11/09/2022

Speech separation with large-scale self-supervised learning

Self-supervised learning (SSL) methods such as WavLM have shown promisin...

0 Zhuo Chen, et al. ∙

research

∙ 11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...

0 Yashesh Gaur, et al. ∙

research

∙ 11/05/2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

End-to-end formulation of automatic speech recognition (ASR) and speech ...

0 Peidong Wang, et al. ∙

research

∙ 11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...

0 Jian Xue, et al. ∙

research

∙ 10/31/2022

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

Direct speech-to-speech translation (S2ST) is an attractive research top...

0 Kun Wei, et al. ∙

research

∙ 10/27/2022

Simulating realistic speech overlaps improves multi-talker ASR

Multi-talker automatic speech recognition (ASR) has been studied to gene...

0 Muqiao Yang, et al. ∙

research

∙ 10/16/2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Masked language model (MLM) has been widely used for understanding tasks...

0 Ruchao Fan, et al. ∙

research

∙ 10/07/2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

The rapid development of single-modal pre-training has prompted research...

0 Ziqiang Zhang, et al. ∙

research

∙ 09/30/2022

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

How to boost speech pre-training with textual data is an unsolved proble...

0 Ziqiang Zhang, et al. ∙

research

∙ 09/12/2022

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

This paper presents a novel streaming automatic speech recognition (ASR)...

6 Naoyuki Kanda, et al. ∙

research

∙ 06/12/2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

This paper describes the submission of our end-to-end YiTrans speech tra...

0 Ziqiang Zhang, et al. ∙

research

∙ 04/27/2022

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently ...

0 Sanyuan Chen, et al. ∙

research

∙ 04/27/2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Recently, self-supervised learning (SSL) has demonstrated strong perform...

0 Sanyuan Chen, et al. ∙

research

∙ 04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...

0 Jian Xue, et al. ∙

research

∙ 03/31/2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

This paper studies a novel pre-training technique with unpaired speech d...

0 Junyi Ao, et al. ∙

research

∙ 03/30/2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

This paper presents a streaming speaker-attributed automatic speech reco...

0 Naoyuki Kanda, et al. ∙

research

∙ 03/02/2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Contextual biasing is an important and challenging task for end-to-end a...

3 Xiaoqiang Wang, et al. ∙

research

∙ 02/02/2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

This paper proposes a token-level serialized output training (t-SOT), a ...

0 Naoyuki Kanda, et al. ∙

research

∙ 01/24/2022

Endpoint Detection for Streaming End-to-End Multi-talker ASR

Streaming end-to-end multi-talker speech recognition aims at transcribin...

0 Liang Lu, et al. ∙

research

∙ 12/16/2021

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Recently, pioneer work finds that speech pre-trained models can solve fu...

0 Chengyi Wang, et al. ∙

research

∙ 12/10/2021

Sequence-level self-learning with multiple hypotheses

In this work, we develop new self-learning techniques with an attention-...

0 Kenichi Kumatani, et al. ∙

research

∙ 11/02/2021

Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving f...

0 Jinyu Li, et al. ∙

research

∙ 10/28/2021

Continuous Speech Separation with Recurrent Selective Attention Network

While permutation invariant training (PIT) based continuous speech separ...

0 Yixuan Zhang, et al. ∙

research

∙ 10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...

0 Wangyou Zhang, et al. ∙

research

∙ 10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...

0 Sanyuan Chen, et al. ∙

research

∙ 10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...

0 Sanyuan Chen, et al. ∙

research

∙ 10/10/2021

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Hybrid and end-to-end (E2E) systems have their individual advantages, wi...

0 Guoli Ye, et al. ∙

research

∙ 10/06/2021

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Text-only adaptation of an end-to-end (E2E) model remains a challenging ...

0 Zhong Meng, et al. ∙

research

∙ 09/27/2021

Factorized Neural Transducer for Efficient Language Model Adaptation

In recent years, end-to-end (E2E) based automatic speech recognition (AS...

0 Xie Chen, et al. ∙

research

∙ 09/17/2021

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Streaming recognition of multi-talker conversations has so far been eval...

0 Desh Raj, et al. ∙

research

∙ 08/17/2021

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

It's challenging to customize transducer-based automatic speech recognit...

10 Xiaoqiang Wang, et al. ∙

research

∙ 07/13/2021

A Configurable Multilingual Model is All You Need to Recognize All Languages

Multilingual automatic speech recognition (ASR) models have shown great ...

0 Long Zhou, et al. ∙

research

∙ 07/05/2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing...

0 Jian Wu, et al. ∙

research

∙ 06/04/2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Integrating external language models (LMs) into end-to-end (E2E) models ...

0 Zhong Meng, et al. ∙

research

∙ 04/27/2021

On Addressing Practical Challenges for RNN-Transducer

In this paper, several works are proposed to address practical challenge...

0 Rui Zhao, et al. ∙

research

∙ 04/05/2021

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

In multi-talker scenarios such as meetings and conversations, speech pro...

0 Liang Lu, et al. ∙

research

∙ 02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...

0 Zhong Meng, et al. ∙

Jinyu Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro