Modeling Topical Coherence in Discourse without Supervision

09/02/2018
by   Disha Shrivastava, et al.
0

Coherence of text is an important attribute to be measured for both manually and automatically generated discourse; but well-defined quantitative metrics for it are still elusive. In this paper, we present a metric for scoring topical coherence of an input paragraph on a real-valued scale by analyzing its underlying topical structure. We first extract all possible topics that the sentences of a paragraph of text are related to. Coherence of this text is then measured by computing: (a) the degree of uncertainty of the topics with respect to the paragraph, and (b) the relatedness between these topics. All components of our modular framework rely only on unlabeled data and WordNet, thus making it completely unsupervised, which is an important feature for general-purpose usage of any metric. Experiments are conducted on two datasets - a publicly available dataset for essay grading (representing human discourse), and a synthetic dataset constructed by mixing content from multiple paragraphs covering diverse topics. Our evaluation shows that the measured coherence scores are positively correlated with the ground truth for both the datasets. Further validation to our coherence scores is provided by conducting human evaluation on the synthetic data, showing a significant agreement of 79.3

READ FULL TEXT
research
11/12/2020

Analyzing Neural Discourse Coherence Models

In this work, we systematically investigate how well current models of c...
research
12/31/2020

Towards Modelling Coherence in Spoken Discourse

While there has been significant progress towards modelling coherence in...
research
05/25/2022

RSTGen: Imbuing Fine-Grained Interpretable Control into Long-FormText Generators

In this paper, we study the task of improving the cohesion and coherence...
research
06/04/2021

W-RST: Towards a Weighted RST-style Discourse Framework

Aiming for a better integration of data-driven and linguistically-inspir...
research
02/19/2021

Subjective Assessments of Legibility in Ancient Manuscript Images – The SALAMI Dataset

The research field concerned with the digital restoration of degraded wr...
research
08/19/2017

On the Contribution of Discourse Structure on Text Complexity Assessment

This paper investigates the influence of discourse features on text comp...
research
07/24/2017

Thread Reconstruction in Conversational Data using Neural Coherence Models

Discussion forums are an important source of information. They are often...

Please sign up or login with your details

Forgot password? Click here to reset