Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

03/15/2023
by   Yuan Tseng, et al.
0

Past work on unsupervised parsing is constrained to written form. In this paper, we present the first study on unsupervised spoken constituency parsing given unlabeled spoken sentences and unpaired textual data. The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent. We compare two approaches: (1) cascading an unsupervised automatic speech recognition (ASR) model and an unsupervised parser to obtain parse trees on ASR transcripts, and (2) direct training an unsupervised parser on continuous word-level speech representations. This is done by first splitting utterances into sequences of word-level segments, and aggregating self-supervised speech representations within segments to obtain segment embeddings. We find that separately training a parser on the unpaired text and directly applying it on ASR transcripts for inference produces better results for unsupervised parsing. Additionally, our results suggest that accurate segmentation alone may be sufficient to parse spoken sentences accurately. Finally, we show the direct approach may learn head-directionality correctly for both head-initial and head-final languages without any explicit inductive bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with super...
research
06/14/2021

Assessing the Use of Prosody in Constituency Parsing of Imperfect Transcripts

This work explores constituency parsing on automatically recognized tran...
research
10/29/2021

Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Unsupervised constituency parsing has been explored much but is still fa...
research
03/23/2018

On the difficulty of a distributional semantics of spoken language

The bulk of research in the area of speech processing concerns itself wi...
research
05/18/2018

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

Recent research has shown that word embedding spaces learned from text c...
research
11/15/2022

Introducing Semantics into Speech Encoders

Recent studies find existing self-supervised speech encoders contain pri...
research
10/14/2021

Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts

As the volume of long-form spoken-word content such as podcasts explodes...

Please sign up or login with your details

Forgot password? Click here to reset