Multimodal Approach for Metadata Extraction from German Scientific Publications

11/10/2021
by   Azeddine Bouabdallah, et al.
0

Nowadays, metadata information is often given by the authors themselves upon submission. However, a significant part of already existing research papers have missing or incomplete metadata information. German scientific papers come in a large variety of layouts which makes the extraction of metadata a non-trivial task that requires a precise way to classify the metadata extracted from the documents. In this paper, we propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. This model aims to increase the overall accuracy of metadata extraction compared to other state-of-the-art approaches. It enables the utilization of both spatial and contextual features in order to achieve a more reliable extraction. Our model for this approach was trained on a dataset consisting of around 8800 documents and is able to obtain an overall F1-score of 0.923.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

Extracting metadata from scientific papers can be considered a solved pr...
research
11/04/2022

SMAuC – The Scientific Multi-Authorship Corpus

The rapidly growing volume of scientific publications offers an interest...
research
10/27/2017

New Methods for Metadata Extraction from Scientific Literature

Within the past few decades we have witnessed digital revolution, which ...
research
07/13/2020

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

The lack of publicly available text corpora is a major obstacle for prog...
research
07/24/2023

Making Metadata More FAIR Using Large Language Models

With the global increase in experimental data artifacts, harnessing them...
research
07/13/2017

Classifying document types to enhance search and recommendations in digital libraries

In this paper, we address the problem of classifying documents available...
research
04/25/2011

Bayesian approach for near-duplicate image detection

In this paper we propose a bayesian approach for near-duplicate image de...

Please sign up or login with your details

Forgot password? Click here to reset