Towards Explaining STEM Document Classification using Mathematical Entity Linking

09/02/2021
by   Philipp Scharpf, et al.
0

Document subject classification is essential for structuring (digital) libraries and allowing readers to search within a specific field. Currently, the classification is typically made by human domain experts. Semi-supervised Machine Learning algorithms can support them by exploiting the labeled data to predict subject classes for unclassified new documents. However, while humans partly do, machines mostly do not explain the reasons for their decisions. Recently, explainable AI research to address the problem of Machine Learning decisions being a black box has increasingly gained interest. Explainer models have already been applied to the classification of natural language texts, such as legal or medical documents. Documents from Science, Technology, Engineering, and Mathematics (STEM) disciplines are more difficult to analyze, since they contain both textual and mathematical formula content. In this paper, we present first advances towards STEM document classification explainability using classical and mathematical Entity Linking. We examine relationships between textual and mathematical subject classes and entities, mining a collection of documents from the arXiv preprint repository (NTCIR and zbMATH dataset). The results indicate that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.

READ FULL TEXT

page 1

page 4

page 6

research
05/22/2020

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

In this paper, we show how selecting and combining encodings of natural ...
research
06/04/2019

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Modern entity linking systems rely on large collections of documents spe...
research
04/03/2019

Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding

In today's legal environment, lawsuits and regulatory investigations req...
research
05/25/2020

AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels

Authors of research papers in the fields of mathematics, and other math-...
research
05/20/2023

CDJUR-BR – A Golden Collection of Legal Document from Brazilian Justice with Fine-Grained Named Entities

A basic task for most Legal Artificial Intelligence (Legal AI) applicati...
research
03/03/2022

LegalVis: Exploring and Inferring Precedent Citations in Legal Documents

To reduce the number of pending cases and conflicting rulings in the Bra...

Please sign up or login with your details

Forgot password? Click here to reset