Discourse in Multimedia: A Case Study in Information Extraction

11/13/2018
by   Mrinmaya Sachan, et al.
0

To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features which can be leveraged for various NLP tasks. In this paper, we study some of these discourse features in multimedia text and what communicative function they fulfil in the context. We examine how these multimedia discourse features can be used to improve an information extraction system. We show that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

Evaluating Discourse in Structured Text Representations

Discourse structure is integral to understanding a text and is helpful i...
research
02/26/2023

Understanding Social Media Cross-Modality Discourse in Linguistic Space

The multimedia communications with texts and images are popular on socia...
research
09/07/2017

Leveraging Discourse Information Effectively for Authorship Attribution

We explore techniques to maximize the effectiveness of discourse informa...
research
04/01/2021

High-dimensional distributed semantic spaces for utterances

High-dimensional distributed semantic spaces have proven useful and effe...
research
02/07/2017

Neural Discourse Structure for Text Categorization

We show that discourse structure, as defined by Rhetorical Structure The...
research
07/12/2021

Inscriptis – A Python-based HTML to text conversion library optimized for knowledge extraction from the Web

Inscriptis provides a library, command line client and Web service for c...
research
04/30/2020

Text Segmentation by Cross Segment Attention

Document and discourse segmentation are two fundamental NLP tasks pertai...

Please sign up or login with your details

Forgot password? Click here to reset