Generating Texts with Integer Linear Programming

10/31/2018
by   Gerasimos Lampouras, et al.
0

Concept-to-text generation typically employs a pipeline architecture, which often leads to suboptimal texts. Content selection, for example, may greedily select the most important facts, which may require, however, too many words to express, and this may be undesirable when space is limited or expensive. Selecting other facts, possibly only slightly less important, may allow the lexicalization stage to use much fewer words, or to report more facts in the same space. Decisions made during content selection and lexicalization may also lead to more or fewer sentence aggregation opportunities, affecting the length and readability of the resulting texts. Building upon on a publicly available state of the art natural language generator for Semantic Web ontologies, this article presents an Integer Linear Programming model that, unlike pipeline architectures, jointly considers choices available in content selection, lexicalization, and sentence aggregation to avoid greedy local decisions and produce more compact texts, i.e., texts that report more facts per word. Compact texts are desirable, for example, when generating advertisements to be included in Web search results, or when summarizing structured information in limited space. An extended version of the proposed model also considers a limited form of referring expression generation and avoids redundant sentences. An approximation of the two models can be used when longer texts need to be generated. Experiments with three ontologies confirm that the proposed models lead to more compact texts, compared to pipeline systems, with no deterioration or with improvements in the perceived quality of the generated texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2018

Extracting Linguistic Resources from the Web for Concept-to-Text Generation

Many concept-to-text generation systems require domain-specific linguist...
research
10/08/2022

Comparing Computational Architectures for Automated Journalism

The majority of NLG systems have been designed following either a templa...
research
04/24/2014

Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System

We present NaturalOWL, a natural language generation system that produce...
research
04/16/2020

Neural Data-to-Text Generation with Dynamic Content Planning

Neural data-to-text generation models have achieved significant advancem...
research
04/17/2018

Bootstrapping Generators from Noisy Data

A core step in statistical data-to-text generation concerns learning cor...
research
11/20/2015

Polysemy in Controlled Natural Language Texts

Computational semantics and logic-based controlled natural languages (CN...
research
10/08/2022

BLAB Reporter: Automated journalism covering the Blue Amazon

This demo paper introduces the BLAB Reporter, a robot-journalist coverin...

Please sign up or login with your details

Forgot password? Click here to reset