Classifying document types to enhance search and recommendations in digital libraries

07/13/2017
by   Aristotelis Charalampous, et al.
0

In this paper, we address the problem of classifying documents available from the global network of (open access) repositories according to their type. We show that the metadata provided by repositories enabling us to distinguish research papers, thesis and slides are missing in over 60 these metadata describing document types are useful in a variety of scenarios ranging from research analytics to improving search and recommender (SR) systems, this problem has not yet been sufficiently addressed in the context of the repositories infrastructure. We have developed a new approach for classifying document types using supervised machine learning based exclusively on text specific features. We achieve 0.96 F1-score using the random forest and Adaboost classifiers, which are the best performing models on our data. By analysing the SR system logs of the CORE [1] digital library aggregator, we show that users are an order of magnitude more likely to click on research papers and thesis than on slides. This suggests that using document types as a feature for ranking/filtering SR results in digital libraries has the potential to improve user experience.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2020

Contextual Document Similarity for Content-based Literature Recommender Systems

To cope with the ever-growing information overload, an increasing number...
research
05/27/2019

Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems

Many recommendation algorithms are available to digital library recommen...
research
03/22/2020

Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles

Many digital libraries recommend literature to their users considering t...
research
11/10/2021

Multimodal Approach for Metadata Extraction from German Scientific Publications

Nowadays, metadata information is often given by the authors themselves ...
research
10/31/2022

User Manual of Automatic Data Curation Tool(ADCT): A bulk data curator software in Library and Information Science

In library and information science, document storage and user-specific d...
research
01/14/2022

Towards Reducing Manual Workload in Technology-Assisted Reviews: Estimating Ranking Performance

Conducting a systematic review (SR) is comprised of multiple tasks: (i) ...
research
04/17/2018

Prioritizing and Scheduling Conferences for Metadata Harvesting in dblp

Maintaining literature databases and online bibliographies is a core res...

Please sign up or login with your details

Forgot password? Click here to reset