Zhijie Yan

research

∙ 05/18/2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

Estimating confidence scores for recognition results is a classic task i...

0 Xian Shi, et al. ∙

research

∙ 03/24/2023

MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and onl...

5 Qinglin Zhang, et al. ∙

research

∙ 03/24/2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) ...

0 Qinglin Zhang, et al. ∙

research

∙ 01/29/2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Conventional ASR systems use frame-level phoneme posterior to conduct fo...

0 Xian Shi, et al. ∙

research

∙ 11/29/2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

In this paper, we propose a novel multi-modal multi-task encoder-decoder...

0 Xiaohuan Zhou, et al. ∙

research

∙ 11/18/2022

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

Recently, hybrid systems of clustering and neural diarization models hav...

0 Zhihao Du, et al. ∙

research

∙ 06/16/2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Transformers have recently dominated the ASR field. Although able to yie...

0 Zhifu Gao, et al. ∙

research

∙ 03/18/2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi...

0 Zhihao Du, et al. ∙

research

∙ 02/16/2022

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

Expressive text-to-speech (TTS) has become a hot research topic recently...

0 Yi Ren, et al. ∙

research

∙ 02/08/2022

Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge

The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand Ch...

0 Fan Yu, et al. ∙

research

∙ 10/14/2021

M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Recent development of speech signal processing, such as speech recogniti...

0 Fan Yu, et al. ∙

research

∙ 09/09/2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection

We propose BeamTransformer, an efficient architecture to leverage beamfo...

0 Siqi Zheng, et al. ∙

research

∙ 07/20/2021

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables loca...

0 Siqi Zheng, et al. ∙

research

∙ 05/21/2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...

0 Shiliang Zhang, et al. ∙

research

∙ 03/27/2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

Connectionist Temporal Classification (CTC) based end-to-end speech reco...

0 Shiliang Zhang, et al. ∙

research

∙ 03/05/2018

Linear networks based speaker adaptation for speech synthesis

Speaker adaptation methods aim to create fair quality synthesis speech v...

0 Zhiying Huang, et al. ∙

research

∙ 03/04/2018

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

In this paper, we present an improved feedforward sequential memory netw...

0 Shiliang Zhang, et al. ∙

research

∙ 02/26/2018

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is amon...

0 Mengxiao Bi, et al. ∙

Zhijie Yan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro