Discovery and Recognition of Formula Concepts using Machine Learning

03/03/2023
by   Philipp Scharpf, et al.
0

Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a 'Formula Concept' that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68 representations of frequent formulas and a recall of 72 formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering as well as document similarity assessments for plagiarism detection or recommender systems.

READ FULL TEXT

page 21

page 23

page 24

page 25

page 26

page 27

page 28

research
01/25/2018

Analyzing Similarity in Mathematical Content To Enhance the Detection of Academic Plagiarism

Despite the effort put into the detection of academic plagiarism, it con...
research
06/27/2019

Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

Identifying academic plagiarism is a pressing task for educational and r...
research
04/11/2021

Fast Linking of Mathematical Wikidata Entities in Wikipedia Articles Using Annotation Recommendation

Mathematical information retrieval (MathIR) applications such as semanti...
research
12/04/2020

ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

The zbMATH database contains more than 4 million bibliographic entries. ...
research
05/20/2019

Why Machines Cannot Learn Mathematics, Yet

Nowadays, Machine Learning (ML) is seen as the universal solution to imp...
research
04/13/2018

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

Mathematical formulae represent complex semantic information in a concis...
research
04/17/2023

What Makes a Good Dataset for Symbol Description Reading?

The usage of mathematical formulas as concise representations of a docum...

Please sign up or login with your details

Forgot password? Click here to reset