This paper focuses on term-status pair extraction from medical dialogues...
Employing additional multimodal information to improve automatic speech
...
Text-to-audio (TTA) generation is a recent popular problem that aims to
...
Adaptive human-agent and agent-agent cooperation are becoming more and m...
Large language models (LLMs) have demonstrated remarkable language abili...
Recent advances in personalized image generation allow a pre-trained
tex...
Medical Slot Filling (MSF) task aims to convert medical queries into
str...
Large-scale pre-trained language models (PLMs) with powerful language
mo...
In the past few years, the emergence of pre-training models has brought
...
Deep learning based models have significantly improved the performance o...
Video anomaly detection aims to identify abnormal events that occurred i...
Recently, large pretrained models (e.g., BERT, StyleGAN, CLIP) have show...
The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
Self-supervised pretraining on speech data has achieved a lot of progres...
Learning from image-text data has demonstrated recent success for many
r...
Recently, language-guided global image editing draws increasing attentio...
This paper describes the recent development of ESPnet
(https://github.co...
Speech separation aims to separate individual voice from an audio mixtur...
We present ESPnet-SE, which is designed for the quick development of spe...
In this study, we present recent developments on ESPnet: End-to-End Spee...
As the performance of single-channel speech separation systems has impro...
Language-driven image editing can significantly save the laborious image...
The marriage of recurrent neural networks and neural ordinary differenti...
Neural sequence-to-sequence models are well established for applications...
Speech separation has been extensively explored to tackle the cocktail p...
Learning continuous-time stochastic dynamics from sparse or irregular
ob...
Speaker diarization is an essential step for processing multi-speaker au...
Expectation maximization (EM) algorithm is to find maximum likelihood
so...
Deep neural networks have shown superior performance in many regimes to
...
In this paper, we introduce a novel problem of audio-visual event
locali...
We discuss a common suspicion about reported financial data, in 10 indus...
Recently, end-to-end memory networks have shown promising results on Que...