DeepAI AI Chat
Log In Sign Up

Multi-Task Self-Supervised Learning for Disfluency Detection

08/15/2019
by   Shaolei Wang, et al.
Beijing University of Posts and Telecommunications
University of Oxford
Harbin Institute of Technology
The Regents of the University of California
0

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1 method trained on the full dataset significantly outperforms previous methods, reducing the error by 21

READ FULL TEXT

page 1

page 2

page 3

page 4

10/29/2020

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Most existing approaches to disfluency detection heavily rely on human-a...
08/25/2017

Multi-task Self-Supervised Visual Learning

We investigate methods for combining multiple self-supervised tasks--i.e...
02/03/2021

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Deep learning (DL) techniques are gaining more and more attention in the...
11/23/2022

SS-CXR: Multitask Representation Learning using Self Supervised Pre-training from Chest X-Rays

Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis...
07/30/2020

Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization

Supervised approaches for Neural Abstractive Summarization require large...
06/06/2016

Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution

Most existing approaches for zero pronoun resolution are heavily relying...
06/21/2022

HealNet – Self-Supervised Acute Wound Heal-Stage Classification

Identifying, tracking, and predicting wound heal-stage progression is a ...