Content-based subject classification at article level in biomedical context

by   Eric Jeangirard, et al.

Subject classification is an important task to analyze scholarly publications. In general, mainly two kinds of approaches are used: classification at a journal level and classification at the article level. We propose a mixed approach, leveraging on embeddings technique in NLP to train classifiers with article metadata (title, abstract, keywords in particular) labelled with the journal-level classification FoR (Fields of Research) and then apply these classifiers at the article level. We use this approach in the context of biomedical publications using metadata from Pubmed. Fasttext classifiers are trained with FoR codes and used to classify publications based on their available metadata. Results show that using a stratification sampling strategy for training help reduce the bias due to unbalanced field distribution. An implementation of the method is proposed on the repository



page 1

page 2

page 3

page 4


Monitoring Open Access at a national level: French case study

After the launch of multiple plans for Open Science, there is now a need...

Aligning Biomedical Metadata with Ontologies Using Clustering and Embeddings

The metadata about scientific experiments published in online repositori...

MexPub: Deep Transfer Learning for Metadata Extraction from German Publications

Extracting metadata from scientific papers can be considered a solved pr...

The Unified Astronomy Thesaurus: Semantic Metadata for Astronomy and Astrophysics

Several different controlled vocabularies have been developed and used b...

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies

The metadata about scientific experiments are crucial for finding, repro...

Using Elasticsearch for entity recognition in affiliation disambiguation

Automatic recognition of affiliations in the metadata of scholarly publi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.