Improved POS tagging for spontaneous, clinical speech using data augmentation

07/11/2023
by   Seth Kulick, et al.
0

This paper addresses the problem of improving POS tagging of transcripts of speech from clinical populations. In contrast to prior work on parsing and POS tagging of transcribed speech, we do not make use of an in domain treebank for training. Instead, we train on an out of domain treebank of newswire using data augmentation techniques to make these structures resemble natural, spontaneous speech. We trained a parser with and without the augmented data and tested its performance using manually validated POS tags in clinical speech produced by patients with various types of neurodegenerative conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2021

Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

The automatic recognition of pathological speech, particularly from chil...
research
01/02/2021

Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitutio...
research
01/14/2022

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Disordered speech recognition is a highly challenging task. The underlyi...
research
02/24/2021

Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Word segmentation and part-of-speech tagging are two critical preliminar...
research
08/12/2018

Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

Audio tagging has attracted increasing attention since last decade and h...
research
07/20/2022

Improving Data Driven Inverse Text Normalization using Data Augmentation

Inverse text normalization (ITN) is used to convert the spoken form outp...
research
08/01/2017

Improving Part-of-Speech Tagging for NLP Pipelines

This paper outlines the results of sentence level linguistics based rule...

Please sign up or login with your details

Forgot password? Click here to reset