MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain

07/05/2023
by   Timo Pierre Schrader, et al.
0

Scientific publications follow conventionalized rhetorical structures. Classifying the Argumentative Zone (AZ), e.g., identifying whether a sentence states a Motivation, a Result or Background information, has been proposed to improve processing of scholarly documents. In this work, we adapt and extend this idea to the domain of materials science research. We present and release a new dataset of 50 manually annotated research articles. The dataset spans seven sub-topics and is annotated with a materials-science focused multi-label annotation scheme for AZ. We detail corpus statistics and demonstrate high inter-annotator agreement. Our computational experiments show that using domain-specific pre-trained transformer-based text encoders is key to high classification performance. We also find that AZ categories from existing datasets in other domains are transferable to varying degrees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text

Scientific action graphs extraction from materials synthesis procedures ...
research
06/04/2020

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

This paper presents a new challenging information extraction task in the...
research
09/27/2021

Text to Insight: Accelerating Organic Materials Knowledge Extraction via Deep Learning

Scientific literature is one of the most significant resources for shari...
research
08/17/2023

MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

Information extraction and textual comprehension from materials literatu...
research
04/05/2023

Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT

The amount of data has growing significance in exploring cutting-edge ma...
research
06/15/2023

Domain-specific ChatBots for Science using Embeddings

Large language models (LLMs) have emerged as powerful machine-learning s...
research
10/28/2020

What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

Acronyms are the short forms of phrases that facilitate conveying length...

Please sign up or login with your details

Forgot password? Click here to reset