CoSMo: A constructor specification language for Abstract Wikipedia's content selection process

08/01/2023
by   Kutz Arrieta, et al.
0

Representing snippets of information abstractly is a task that needs to be performed for various purposes, such as database view specification and the first stage in the natural language generation pipeline for generative AI from structured input, i.e., the content selection stage to determine what needs to be verbalised. For the Abstract Wikipedia project, requirements analysis revealed that such an abstract representation requires multilingual modelling, content selection covering declarative content and functions, and both classes and instances. There is no modelling language that meets either of the three features, let alone a combination. Following a rigorous language design process inclusive of broad stakeholder consultation, we created CoSMo, a novel Content Selection Modeling language that meets these and other requirements so that it may be useful both in Abstract Wikipedia as well as other contexts. We describe the design process, rationale and choices, the specification, and preliminary evaluation of the language.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2020

Architecture for a multilingual Wikipedia

Wikipedia's vision is a world in which everyone can share in the sum of ...
research
04/05/2022

Considerations for Multilingual Wikipedia Research

English Wikipedia has long been an important data source for much resear...
research
06/29/2021

TWAG: A Topic-Guided Wikipedia Abstract Generator

Wikipedia abstract generation aims to distill a Wikipedia abstract from ...
research
10/23/2022

Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Acknowledged as one of the most successful online cooperative projects i...
research
09/08/2011

Conjure Revisited: Towards Automated Constraint Modelling

Automating the constraint modelling process is one of the key challenges...
research
12/13/2021

Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Fast-developing fields such as Artificial Intelligence (AI) often outpac...
research
04/02/2019

The Tower of Babel Meets Web 2.0: User-Generated Content and its Applications in a Multilingual Context

This study explores language's fragmenting effect on user-generated cont...

Please sign up or login with your details

Forgot password? Click here to reset