Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing

05/23/2023
by   Feng Jiang, et al.
0

Discourse parsing, the task of analyzing the internal rhetorical structure of texts, is a challenging problem in natural language processing. Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle. Recent studies have attempted to overcome this limitation by using distant supervision, which utilizes results from other NLP tasks (e.g., sentiment polarity, attention matrix, and segmentation probability) to parse discourse trees. However, these methods do not take into account the differences between in-domain and out-of-domain tasks, resulting in lower performance and inability to leverage the high-quality in-domain data for further improvement. To address these issues, we propose a distant supervision framework that leverages the relations between topic structure and rhetorical structure. Specifically, we propose two distantly supervised methods, based on transfer learning and the teacher-student model, that narrow the gap between in-domain and out-of-domain tasks through label mapping and oracle annotation. Experimental results on the MCDTB and RST-DT datasets show that our methods achieve the best performance in both distant-supervised and supervised scenarios.

READ FULL TEXT

page 3

page 4

page 7

research
12/12/2021

Predicting Above-Sentence Discourse Structure using Distant Supervision from Topic Segmentation

RST-style discourse parsing plays a vital role in many NLP tasks, reveal...
research
11/05/2020

MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision

The lack of large and diverse discourse treebanks hinders the applicatio...
research
10/30/2019

Predicting Discourse Structure using Distant Supervision from Sentiment

Discourse parsing could not yet take full advantage of the neural NLP re...
research
09/10/2023

What's Hard in English RST Parsing? Predictive Models for Error Analysis

Despite recent advances in Natural Language Processing (NLP), hierarchic...
research
12/17/2020

Unsupervised Learning of Discourse Structures using a Tree Autoencoder

Discourse information, as postulated by popular discourse theories, such...
research
05/24/2023

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark

Topic segmentation and outline generation strive to divide a document in...
research
10/18/2022

Unsupervised Inference of Data-Driven Discourse Structures using a Tree Auto-Encoder

With a growing need for robust and general discourse structures in many ...

Please sign up or login with your details

Forgot password? Click here to reset