ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers

11/26/2018
by   Dominika Tkaczyk, et al.
0

Bibliographic reference parsers extract machine-readable metadata such as author names, title, journal, and year from bibliographic reference strings. To extract the metadata, the parsers apply heuristics or machine learning. However, no reference parser, and no algorithm, consistently gives the best results in every scenario. For instance, one tool may be best in extracting titles in ACM citation style, but only third best when APA is used. Another tool may be best in extracting English author names, while another one is best for noisy data (i.e. inconsistent citation styles). In this paper, which is an extended version of our recent RecSys poster, we address the problem of reference parsing from a recommender-systems and meta-learning perspective. We propose ParsRec, a meta-learning based recommender-system that recommends the potentially most effective parser for a given reference string. ParsRec recommends one out of 10 open-source parsers: Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger, and Science Parse. We evaluate ParsRec on 105k references from chemistry. We propose two approaches to meta-learning recommendations. The first approach learns the best parser for an entire reference string. The second approach learns the best parser for each metadata type in a reference string. The second approach achieved a 2.6 parser (GROBID), reducing the false positive rate by 20.2 and the false negative rate by 18.9

READ FULL TEXT
research
08/27/2018

ParsRec: Meta-Learning Recommendations for Bibliographic Reference Parsing

Bibliographic reference parsers extract metadata (e.g. author names, tit...
research
02/04/2018

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers

Bibliographic reference parsing refers to extracting machine-readable me...
research
02/04/2018

Evaluation and Comparison of Open Source Bibliographic Reference Parsers: A Business Use Case

Bibliographic reference parsing refers to extracting machine-readable me...
research
05/12/2018

Citation Data-set for Machine Learning Citation Styles and Entity Extraction from Citation Strings

Citation parsing is fundamental for search engines within academia and t...
research
12/18/2019

Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems

The effectiveness of recommender system algorithms varies in different r...
research
06/20/2019

Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets

Automatically extracted metadata from scholarly documents in PDF formats...
research
06/11/2019

EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching

Citation matching is a challenging task due to different problems such a...

Please sign up or login with your details

Forgot password? Click here to reset