Extraction and Evaluation of Formulaic Expressions Used in Scholarly Papers

06/18/2020
by   Kenichi Iwatsuki, et al.
0

Formulaic expressions, such as 'in this paper we propose', are helpful for authors of scholarly papers because they convey communicative functions; in the above, it is showing the aim of this paper'. Thus, resources of formulaic expressions, such as a dictionary, that could be looked up easily would be useful. However, forms of formulaic expressions can often vary to a great extent. For example, 'in this paper we propose', 'in this study we propose' and 'in this paper we propose a new method to' are all regarded as formulaic expressions. Such a diversity of spans and forms causes problems in both extraction and evaluation of formulaic expressions. In this paper, we propose a new approach that is robust to variation of spans and forms of formulaic expressions. Our approach regards a sentence as consisting of a formulaic part and non-formulaic part. Then, instead of trying to extract formulaic expressions from a whole corpus, by extracting them from each sentence, different forms can be dealt with at once. Based on this formulation, to avoid the diversity problem, we propose evaluating extraction methods by how much they convey specific communicative functions rather than by comparing extracted expressions to an existing lexicon. We also propose a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence. Experimental results show that the proposed extraction method achieved the best performance compared to other existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions

Idiomatic expressions like `out of the woods' and `up the ante' present ...
research
08/25/2022

Hiding canonicalisation in tensor computer algebra

Simplification of expressions in computer algebra systems often involves...
research
04/29/2020

AxCell: Automatic Extraction of Results from Machine Learning Papers

Tracking progress in machine learning has become increasingly difficult ...
research
03/13/2022

Informative Causality Extraction from Medical Literature via Dependency-tree based Patterns

Extracting cause-effect entities from medical literature is an important...
research
02/04/2023

FGSI: Distant Supervision for Relation Extraction method based on Fine-Grained Semantic Information

The main purpose of relation extraction is to extract the semantic relat...
research
04/30/2020

Unlocking the Power of Deep PICO Extraction: Step-wise Medical NER Identification

The PICO framework (Population, Intervention, Comparison, and Outcome) i...
research
12/01/2016

Multilingual Multiword Expressions

The project aims to provide a semi-supervised approach to identify Multi...

Please sign up or login with your details

Forgot password? Click here to reset