The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

06/04/2020
by   Annemarie Friedrich, et al.
0

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text

Scientific action graphs extraction from materials synthesis procedures ...
research
07/05/2023

MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science Domain

Scientific publications follow conventionalized rhetorical structures. C...
research
11/04/2022

Material Named Entity Recognition (MNER) for Knowledge-driven Materials Using Deep Learning Approach

The scientific literature contains a wealth of cutting-edge knowledge in...
research
10/06/2019

Named Entity Recognition – Is there a glass ceiling?

Recent developments in Named Entity Recognition (NER) have resulted in b...
research
09/27/2021

Text to Insight: Accelerating Organic Materials Knowledge Extraction via Deep Learning

Scientific literature is one of the most significant resources for shari...
research
01/09/2020

Domain-independent Extraction of Scientific Concepts from Research Articles

We examine the novel task of domain-independent scientific concept extra...
research
04/07/2020

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products

Recognizing non-standard entity types and relations, such as B2B product...

Please sign up or login with your details

Forgot password? Click here to reset