Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation

07/20/2021
by   Qinglin Zhang, et al.
0

Transcripts generated by automatic speech recognition (ASR) systems for spoken documents lack structural annotations such as paragraphs, significantly reducing their readability. Automatically predicting paragraph segmentation for spoken documents may both improve readability and downstream NLP performance such as summarization and machine reading comprehension. We propose a sequence model with self-adaptive sliding window for accurate and efficient paragraph segmentation. We also propose an approach to exploit phonetic information, which significantly improves robustness of spoken document segmentation to ASR errors. Evaluations are conducted on the English Wiki-727K document segmentation benchmark, a Chinese Wikipedia-based document segmentation dataset we created, and an in-house Chinese spoken document dataset. Our proposed model outperforms the state-of-the-art (SOTA) model based on the same BERT-Base, increasing segmentation F1 on the English benchmark by 4.2 points and on Chinese datasets by 4.3-10.1 points, while reducing inference time to less than 1/6 of inference time of the current SOTA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Discriminative Self-training for Punctuation Prediction

Punctuation prediction for automatic speech recognition (ASR) output tra...
research
04/01/2018

Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension

Reading comprehension has been widely studied. One of the most represent...
research
10/21/2020

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

Spoken question answering (SQA) is a challenging task that requires the ...
research
04/20/2023

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Though Dialogue State Tracking (DST) is a core component of spoken dialo...
research
10/28/2022

Toward Unifying Text Segmentation and Long Document Summarization

Text segmentation is important for signaling a document's structure. Wit...
research
10/14/2021

Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts

As the volume of long-form spoken-word content such as podcasts explodes...
research
03/24/2023

MUG: A General Meeting Understanding and Generation Benchmark

Listening to long video/audio recordings from video conferencing and onl...

Please sign up or login with your details

Forgot password? Click here to reset