SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

10/04/2021
by   Ruben Kruiper, et al.
7

Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84 building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3

READ FULL TEXT

page 3

page 7

page 8

page 9

page 10

page 11

page 14

page 15

research
08/12/2021

Kicktionary-LOME: A Domain-Specific Multilingual Frame Semantic Parsing Model for Football Language

This technical report introduces an adapted version of the LOME frame se...
research
08/04/2021

Multi-Round Parsing-based Multiword Rules for Scientific OpenIE

Information extraction (IE) in scientific literature has facilitated man...
research
08/19/2017

The CLaC Discourse Parser at CoNLL-2016

This paper describes our submission "CLaC" to the CoNLL-2016 shared task...
research
06/29/2000

Semantic Parsing based on Verbal Subcategorization

The aim of this work is to explore new methodologies on Semantic Parsing...
research
10/27/2020

Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

The task of text and sentence classification is associated with the need...
research
11/29/2016

Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving

We study methods for automated parsing of informal mathematical expressi...
research
04/22/2019

Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference

We present a constituency parsing algorithm that maps from word-aligned ...

Please sign up or login with your details

Forgot password? Click here to reset