
-
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin sp...
read it
-
Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR
Neural architecture search (NAS) has been successfully applied to tasks ...
read it
-
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing
The ConferencingSpeech 2021 challenge is proposed to stimulate research ...
read it
-
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Auto-KWS 2021 challenge calls for automated machine learning (AutoML) so...
read it
-
An Asynchronous WFST-Based Decoder For Automatic Speech Recognition
We introduce asynchronous dynamic decoder, which adopts an efficient A* ...
read it
-
The NPU System for the 2020 Personalized Voice Trigger Challenge
This paper describes the system developed by the NPU team for the 2020 p...
read it
-
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
The variety of accents has posed a big challenge to speech recognition. ...
read it
-
WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit
In this paper, we present a new open source, production first and produc...
read it
-
CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics
Accurate and robust prediction of patient's response to drug treatments ...
read it
-
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
In this paper, we present a novel two-pass approach to unify streaming a...
read it
-
Context-aware RNNLM Rescoring for Conversational Speech Recognition
Conversational speech recognition is regarded as a challenging task due ...
read it
-
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
The front-end module in multi-channel automatic speech recognition (ASR)...
read it
-
Controllable Emotion Transfer For End-to-End Speech Synthesis
Emotion embedding space learned from references is a straightforward app...
read it
-
Adversarial Training for Multi-domain Speaker Recognition
In real-life applications, the performance of speaker recognition system...
read it
-
Optimizing voice conversion network with cycle consistency loss of speaker identity
We propose a novel training scheme to optimize voice conversion network ...
read it
-
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
This paper proposes a unified model to conduct emotion transfer, control...
read it
-
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
End-to-end models are favored in automatic speech recognition (ASR) beca...
read it
-
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher
Singing voice synthesis has been paid rising attention with the rapid de...
read it
-
Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures
The prediction of physicochemical properties from molecular structures i...
read it
-
The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines
Automatic speech recognition (ASR) has been significantly advanced with ...
read it
-
IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines
The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speec...
read it
-
DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation
In this paper, we propose a multi-channel network for simultaneous speec...
read it
-
AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification
The AutoSpeech challenge calls for automated machine learning (AutoML) s...
read it
-
A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine
An unsolved fundamental problem in biology and ecology is to predict obs...
read it
-
An End-to-end Architecture of Online Multi-channel Speech Separation
Multi-speaker speech recognition has been one of the keychallenges in co...
read it
-
AIPerf: Automated machine learning as an AI-HPC benchmark
The plethora of complex artificial intelligence (AI) algorithms and avai...
read it
-
Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
This paper presents a new input format, channel-wise subband input (CWS)...
read it
-
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
Data efficient voice cloning aims at synthesizing target speaker's voice...
read it
-
NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge
This paper describes the NPU system submitted to Interspeech 2020 Far-Fi...
read it
-
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Attention-based seq2seq text-to-speech systems, especially those use sel...
read it
-
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Speech enhancement has benefited from the success of deep learning in te...
read it
-
The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results
Code-switching (CS) is a common phenomenon and recognizing CS speech is ...
read it
-
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
Neural sequence-to-sequence models are well established for applications...
read it
-
Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition
Speaker recognition is a popular topic in biometric authentication and m...
read it
-
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Transformer models have been introduced into end-to-end speech recogniti...
read it
-
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...
read it
-
Conversational End-to-End TTS for Voice Agent
End-to-end neural TTS has achieved superior performance on reading style...
read it
-
Wake Word Detection with Alignment-Free Lattice-Free MMI
Always-on spoken language interfaces, e.g. personal digital assistants, ...
read it
-
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
In this paper, we propose multi-band MelGAN, a much faster waveform gene...
read it
-
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise
Attention-based sequence-to-sequence (seq2seq) speech synthesis has achi...
read it
-
a novel cross-lingual voice cloning approach with a few text-free samples
In this paper, we present a cross-lingual voice cloning approach. BN fea...
read it
-
LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans
The specific characteristics of graph workloads make it hard to design a...
read it
-
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
We present Espresso, an open-source, modular, extensible end-to-end neur...
read it
-
Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation
Graph Neural Networks (GNNs) are powerful to learn the representation of...
read it
-
Building a mixed-lingual neural TTS system with only monolingual data
When deploying a Chinese neural text-to-speech (TTS) synthesis system, o...
read it
-
A New GAN-based End-to-End TTS Training Algorithm
End-to-end, autoregressive model-based TTS has shown significant perform...
read it
-
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
The end-to-end TTS, which can predict speech directly from a given seque...
read it
-
Improved Speaker-Dependent Separation for CHiME-5 Challenge
This paper summarizes several follow-up contributions for improving our ...
read it
-
Time Domain Audio Visual Speech Separation
Audio-visual multi-modal modeling has been demonstrated to be effective ...
read it
-
Exploring RNN-Transducer for Chinese Speech Recognition
End-to-end approaches have drawn much attention recently for significant...
read it