Structured references from PDF articles: assessing the tools for bibliographic reference extraction and parsing

05/29/2022
by   Alessia Cioffi, et al.
0

Many solutions have been provided to extract bibliographic references from PDF papers. Machine learning, rule-based and regular expressions approaches were among the most used methods adopted in tools for addressing this task. This work aims to identify and evaluate all and only the tools which, given a full-text paper in PDF format, can recognise, extract and parse bibliographic references. We identified seven tools: Anystyle, Cermine, ExCite, Grobid, Pdfssa4met, Scholarcy and Science Parse. We compared and evaluated them against a corpus of 56 PDF articles published in 27 subject areas. Indeed, Anystyle obtained the best overall score, followed by Cermine. However, in some subject areas, other tools had better results for specific tasks.

READ FULL TEXT
research
05/26/2022

The way we cite: common metadata used across disciplines for defining bibliographic references

Current citation practices observed in articles are very noisy, confusin...
research
08/05/2019

Backronym

The field of Machine Learning research is divided into subject areas, wh...
research
09/11/2020

Citing and referencing habits in Medicine and Social Sciences journals in 2019

This article explores citing and referencing systems in Social Sciences ...
research
08/11/2020

Nature, Science, and PNAS – Disciplinary profiles and impact

Nature, Science, and PNAS are the three most prestigious general-science...
research
08/26/2020

MetaMetaZipf. What do analyses of city size distributions have in common?

In this article, I conduct a textual and contextual analysis of the empi...
research
02/04/2018

Evaluation and Comparison of Open Source Bibliographic Reference Parsers: A Business Use Case

Bibliographic reference parsing refers to extracting machine-readable me...

Please sign up or login with your details

Forgot password? Click here to reset