Kai Yu

research

∙ 09/14/2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Self-supervised learning (SSL) proficiency in speech-related tasks has d...

0 Yifan Yang, et al. ∙

research

∙ 09/10/2023

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Although diffusion models in text-to-speech have become a popular choice...

0 Yiwei Guo, et al. ∙

research

∙ 08/25/2023

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Recently, there has been growing interest in using Large Language Models...

0 Liangtai Sun, et al. ∙

research

∙ 08/11/2023

Towards Instance-adaptive Inference for Federated Learning

Federated learning (FL) is a distributed learning paradigm that enables ...

0 Chun-Mei Feng, et al. ∙

research

∙ 08/11/2023

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

Benefiting from prompt tuning, recent years have witnessed the promising...

0 Chun-Mei Feng, et al. ∙

research

∙ 06/25/2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Although high-fidelity speech can be obtained for intralingual speech sy...

0 Sen Liu, et al. ∙

research

∙ 06/16/2023

Improving Audio Caption Fluency with Automatic Error Correction

Automated audio captioning (AAC) is an important cross-modality translat...

0 Hanxue Zhang, et al. ∙

research

∙ 06/14/2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Recently, end-to-end (E2E) automatic speech recognition (ASR) models hav...

0 Zheng Liang, et al. ∙

research

∙ 06/13/2023

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

The utilization of discrete speech tokens, divided into semantic tokens ...

0 Chenpeng Du, et al. ∙

research

∙ 06/09/2023

Large Language Model Is Semi-Parametric Reinforcement Learning Agent

Inspired by the insights in cognitive science with respect to human memo...

0 Danyang Zhang, et al. ∙

research

∙ 06/02/2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection

Automated audio captioning aims at generating natural language descripti...

0 Zeyu Xie, et al. ∙

research

∙ 05/25/2023

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

The cross-domain text-to-SQL task aims to build a system that can parse ...

0 Hanchong Zhang, et al. ∙

research

∙ 05/19/2023

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

Large language models (LLMs) based on the generative pre-training transf...

0 Guangyan Chen, et al. ∙

research

∙ 05/14/2023

Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction

The interaction platform plays a crucial role in the recent advancement ...

0 Danyang Zhang, et al. ∙

research

∙ 05/03/2023

Diverse and Vivid Sound Generation from Text Descriptions

Previous audio generation mainly focuses on specified sound classes such...

0 Guangwei Li, et al. ∙

research

∙ 04/25/2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

In this paper, we describe the systems developed by the SJTU X-LANCE tea...

0 Chenpeng Du, et al. ∙

research

∙ 04/23/2023

DiffVoice: Text-to-Speech with Latent Diffusion

In this work, we present DiffVoice, a novel text-to-speech model based o...

0 Zhijun Liu, et al. ∙

research

∙ 03/09/2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Audio-driven talking face has attracted broad interest from academia and...

0 Qi Chen, et al. ∙

research

∙ 01/30/2023

TrFedDis: Trusted Federated Disentangling Network for Non-IID Domain Feature

Federated learning (FL), as an effective decentralized distributed learn...

8 Meng Wang, et al. ∙

research

∙ 01/12/2023

On the Structural Generalization in Text-to-SQL

Exploring the generalization of a text-to-SQL parser is essential for a ...

0 Jieyu Li, et al. ∙

research

∙ 12/01/2022

Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images

Focusing on the complicated pathological features, such as blurred bound...

26 Meng Wang, et al. ∙

research

∙ 11/17/2022

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Although current neural text-to-speech (TTS) models are able to generate...

0 Yiwei Guo, et al. ∙

research

∙ 11/08/2022

BER: Balanced Error Rate For Speaker Diarization

DER is the primary metric to evaluate diarization performance while faci...

5 Tao Liu, et al. ∙

research

∙ 09/10/2022

OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue

This paper presents an ontology-aware pretrained language model (OPAL) f...

0 Zhi Chen, et al. ∙

research

∙ 05/25/2022

DialogZoo: Large-Scale Dialog-Oriented Task Learning

Building unified conversational agents has been a long-standing goal of ...

0 Zhi Chen, et al. ∙

research

∙ 05/24/2022

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat

In a depression-diagnosis-directed clinical session, doctors initiate a ...

0 Binwei Yao, et al. ∙

research

∙ 05/23/2022

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Task-oriented dialogue (TOD) systems have been widely used by mobile pho...

0 Liangtai Sun, et al. ∙

research

∙ 05/13/2022

TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages

Recently, the structural reading comprehension (SRC) task on web pages h...

4 Zihan Zhao, et al. ∙

research

∙ 05/11/2022

A Comprehensive Survey of Automated Audio Captioning

Automated audio captioning, a task that mimics human perception as well ...

0 Xuenan Xu, et al. ∙

research

∙ 04/29/2022

Climate and Weather: Inspecting Depression Detection via Emotion Recognition

Automatic depression detection has attracted increasing amount of attent...

0 Wen Wu, et al. ∙

research

∙ 04/10/2022

UniDU: Towards A Unified Generative Dialogue Understanding Framework

With the development of pre-trained language models, remarkable success ...

0 Zhi Chen, et al. ∙

research

∙ 04/02/2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, ...

0 Chenpeng Du, et al. ∙

research

∙ 03/25/2022

Audio-text Retrieval in Context

Audio-text retrieval based on natural language descriptions is a challen...

0 Siyu Lou, et al. ∙

research

∙ 02/15/2022

Unsupervised word-level prosody tagging for controllable speech synthesis

Although word-level prosody modeling in neural text-to-speech (TTS) has ...

0 Yiwei Guo, et al. ∙

research

∙ 12/09/2021

Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF

Data sparsity problem is a key challenge of Natural Language Understandi...

0 Su Zhu, et al. ∙

research

∙ 09/15/2021

Scalable Cell-Free Massive MIMO Systems with Finite Resolution ADCs/DACs over Spatially Correlated Rician Fading Channels

In this paper, an analytical framework for evaluating the performance of...

0 Xiangjun Ma, et al. ∙

research

∙ 06/04/2021

Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL

Recently, Text-to-SQL for multi-turn dialogue has attracted great intere...

0 Zhi Chen, et al. ∙

research

∙ 06/02/2021

LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations

This work aims to tackle the challenging heterogeneous graph encoding pr...

0 Ruisheng Cao, et al. ∙

research

∙ 05/27/2021

Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling

Generating natural speech with diverse and smooth prosody pattern is a c...

0 Chenpeng Du, et al. ∙

research

∙ 05/10/2021

Voice activity detection in the wild: A data-driven approach using teacher-student training

Voice activity detection is an essential pre-processing component for sp...

0 Heinrich Dinkel, et al. ∙

research

∙ 04/10/2021

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser

Given a database schema, Text-to-SQL aims to translate a natural languag...

0 Zhi Chen, et al. ∙

research

∙ 02/25/2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Chinese short text matching is a fundamental task in natural language pr...

0 Boer Lyu, et al. ∙

research

∙ 02/23/2021

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Automated Audio Captioning is a cross-modal task, generating natural lan...

0 Xuenan Xu, et al. ∙

research

∙ 02/23/2021

Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Automated audio captioning (AAC) aims at generating summarizing descript...

0 Xuenan Xu, et al. ∙

research

∙ 02/01/2021

Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis

Recent researches on both utterance-level and phone-level prosody modell...

0 Chenpeng Du, et al. ∙

research

∙ 01/23/2021

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Web search is an essential way for human to obtain information, but it's...

0 Lu Chen, et al. ∙

research

∙ 01/19/2021

Towards duration robust weakly supervised sound event detection

Sound event detection (SED) is the task of tagging the absence or presen...

0 Heinrich Dinkel, et al. ∙

research

∙ 01/17/2021

A relic sketch extraction framework based on detail-aware hierarchical deep network

As the first step of the restoration process of painted relics, sketch e...

9 Jinye Peng, et al. ∙

research

∙ 10/14/2020

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

Recently, pre-trained language models like BERT have shown promising per...

3 Zihan Zhao, et al. ∙

research

∙ 09/22/2020

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking

In dialogue systems, a dialogue state tracker aims to accurately find a ...

0 Zhi Chen, et al. ∙

Kai Yu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro