In this paper, we explore a continuous modeling approach for
deep-learni...
We propose a novel neural speaker diarization system using memory-aware
...
Previous Multimodal Information based Speech Processing (MISP) challenge...
This technical report details our submission system to the CHiME-7 DASR
...
In recent research, slight performance improvement is observed from auto...
Recently, handwritten Chinese character error correction has been greatl...
The goal of this study is to implement diffusion models for speech
enhan...
The problem of document structure reconstruction refers to converting di...
The Multi-modal Information based Speech Processing (MISP) challenge aim...
Table structure recognition is an indispensable element for enabling mac...
Table of contents (ToC) extraction aims to extract headings of different...
In this paper, we propose a deep learning based multi-speaker direction ...
Document intelligence as a relatively new research topic supports many
b...
In this paper, we propose two techniques, namely joint modeling and data...
Audio-only-based wake word spotting (WWS) is challenging under noisy
con...
We propose two improvements to target-speaker voice activity detection
(...
Multimodal emotion recognition is a challenging task in emotion computin...
The task of table structure recognition is to recognize the internal
str...
We propose a separation guided speaker diarization (SGSD) approach by fu...
We propose a novel neural model compression strategy combining data
augm...
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin sp...
This system description describes our submission system to the Third DIH...
In this paper, we propose a novel four-stage data augmentation approach ...
In this paper, we propose a novel deep learning architecture to improvin...
The audio-video based emotion recognition aims to classify a given video...
This paper introduces the third DIHARD challenge, the third in a series ...
One of the strengths of traditional convolutional neural networks (CNNs)...
Multi-speaker speech recognition of unsegmented recordings has diverse
a...
To improve device robustness, a highly desirable key feature of a compet...
In this paper, we propose a visual embedding approach to improving embed...
In this paper, we exploit the properties of mean absolute error (MAE) as...
In this paper, we show that, in vector-to-vector regression utilizing de...
In this paper, we propose a sub-utterance unit selection framework to re...
In this technical report, we present a joint effort of four groups, name...
This paper introduces the third DIHARD challenge, the third in a series ...
In this paper, we propose a novel stroke constrained attention network (...
Batch normalization (BN) is an effective method to accelerate model trai...
The technique of distillation helps transform cumbersome neural network ...
This paper presents the problems and solutions addressed at the JSALT
wo...
We propose a novel method for representing oriented objects in aerial im...
This paper introduces the second DIHARD challenge, the second in a serie...
In this paper, gating mechanisms are applied in deep neural network (DNN...
The x-vector based deep neural network (DNN) embedding systems have
demo...
The x-vector based deep neural network (DNN) embedding systems have
demo...
Automatic emotion recognition (AER) is a challenging task due to the abs...
Recently, the hybrid convolutional neural network hidden Markov model
(C...
In this paper, we propose a novel scene text detection method named
Text...
One challenging problem of robust automatic speech recognition (ASR) is ...
Recently, hidden Markov models (HMMs) have achieved promising results fo...
Recently, great success has been achieved in offline handwritten Chinese...