
-
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching
Chinese short text matching is a fundamental task in natural language pr...
read it
-
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events
Automated Audio Captioning is a cross-modal task, generating natural lan...
read it
-
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Automated audio captioning (AAC) aims at generating summarizing descript...
read it
-
Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis
Recent researches on both utterance-level and phone-level prosody modell...
read it
-
WebSRC: A Dataset for Web-Based Structural Reading Comprehension
Web search is an essential way for human to obtain information, but it's...
read it
-
Towards duration robust weakly supervised sound event detection
Sound event detection (SED) is the task of tagging the absence or presen...
read it
-
A relic sketch extraction framework based on detail-aware hierarchical deep network
As the first step of the restoration process of painted relics, sketch e...
read it
-
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models
Recently, pre-trained language models like BERT have shown promising per...
read it
-
CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking
In dialogue systems, a dialogue state tracker aims to accurately find a ...
read it
-
Dual Learning for Dialogue State Tracking
In task-oriented multi-turn dialogue systems, dialogue state refers to a...
read it
-
Structured Hierarchical Dialogue Policy with Graph Neural Networks
Dialogue policy training for composite tasks, such as restaurant reserva...
read it
-
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
The task-oriented spoken dialogue system (SDS) aims to assist a human us...
read it
-
Deep Reinforcement Learning for On-line Dialogue State Tracking
Dialogue state tracking (DST) is a crucial module in dialogue management...
read it
-
End-to-End Speaker-Dependent Voice Activity Detection
Voice activity detection (VAD) is an essential pre-processing step for t...
read it
-
Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding
Few-shot slot tagging becomes appealing for rapid domain transfer and ad...
read it
-
Robust Spoken Language Understanding with RL-based Value Error Recovery
Spoken Language Understanding (SLU) aims to extract structured semantic ...
read it
-
Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model
End-to-end (E2E) systems have played a more and more important role in a...
read it
-
Future Vector Enhanced LSTM Language Model for LVCSR
Language models (LM) play an important role in large vocabulary continuo...
read it
-
An Investigation on Deep Learning with Beta Stabilizer
Artificial neural networks (ANN) have been used in many applications suc...
read it
-
End-to-end spoofing detection with raw waveform CLDNNs
Albeit recent progress in speaker verification generates powerful models...
read it
-
Quantum Criticism: A Tagged News Corpus Analysed for Sentiment and Named Entities
In this research, we continuously collect data from the RSS feeds of tra...
read it
-
Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing
One daunting problem for semantic parsing is the scarcity of annotation....
read it
-
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
Spoken Language Understanding (SLU) converts hypotheses from automatic s...
read it
-
Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders
Text simplification (TS) rephrases long sentences into simplified varian...
read it
-
Dual Learning for Semi-Supervised Natural Language Understanding
Natural language understanding (NLU) converts sentences into structured ...
read it
-
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking
Dialogue state tracking (DST) aims at estimating the current dialogue st...
read it
-
Voice activity detection in the wild via weakly supervised sound event detection
Traditional supervised voice activity detection (VAD) methods work well ...
read it
-
GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection
Traditional voice activity detection (VAD) methods work well in clean an...
read it
-
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding
Traditional slot filling in natural language understanding (NLU) predict...
read it
-
Depa: Self-supervised audio embedding for depression detection
Depression detection research has increased over the last few decades as...
read it
-
Data Augmentation with Atomic Templates for Spoken Language Understanding
Spoken Language Understanding (SLU) converts user utterances into struct...
read it
-
Semantic Parsing with Dual Learning
Semantic parsing converts natural language queries into structured logic...
read it
-
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Recently, speaker embeddings extracted from a speaker discriminative dee...
read it
-
What does a Car-ssette tape tell?
Captioning has attracted much attention in image and video understanding...
read it
-
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning
Dialogue policy plays an important role in task-oriented spoken dialogue...
read it
-
A Hierarchical Decoding Model For Spoken Language Understanding From Unaligned Data
Spoken language understanding (SLU) systems can be trained on two types ...
read it
-
Duration robust sound event detection
Task 4 of the Dcase2018 challenge demonstrated that substantially more r...
read it
-
Text-based Depression Detection: What Triggers An Alert
Recent advances in automatic depression detection mostly derive from mod...
read it
-
Audio Caption: Listen and Tell
Increasing amount of research has shed light on machine perception of au...
read it
-
End-to-End Monaural Multi-speaker ASR System without Pretraining
Recently, end-to-end models have become a popular approach as an alterna...
read it
-
Towards Universal Dialogue State Tracking
Dialogue state tracking is the core part of a spoken dialogue system. It...
read it
-
Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting
Speech recognition is a sequence prediction problem. Besides employing v...
read it
-
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition
Linear Discriminant Analysis (LDA) has been used as a standard post-proc...
read it
-
On Modular Training of Neural Acoustics-to-Word Model for LVCSR
End-to-end (E2E) automatic speech recognition (ASR) systems directly map...
read it
-
Concept Transfer Learning for Adaptive Language Understanding
Semantic transfer is an important problem of the language understanding ...
read it
-
A Large-scale Distributed Video Parsing and Evaluation Platform
Visual surveillance systems have become one of the largest data sources ...
read it
-
Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization
State-of-the-art methods treat pedestrian attribute recognition as a mul...
read it
-
Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken Language Understanding
This paper investigates the framework of encoder-decoder with attention ...
read it
-
Text Flow: A Unified Text Detection System in Natural Scene Images
The prevalent scene text detection approach follows four sequential step...
read it
-
On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation
We propose to train bi-directional neural network language model(NNLM) w...
read it