Log In Sign Up

Building a PubMed knowledge graph

by   Jian Xu, et al.

PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51 disambiguation (AND) achieving a F1 score of 98.09 innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. The PKG is freely available on Figshare (, simplified version that exclude PubMed raw data) and TACC website (, full version).


page 2

page 16


EDUKG: a Heterogeneous Sustainable K-12 Educational Knowledge Graph

Web and artificial intelligence technologies, especially semantic web an...

Enriching BERT with Knowledge Graph Embeddings for Document Classification

In this paper, we focus on the classification of books using short descr...

Relationship extraction for knowledge graph creation from biomedical literature

Biomedical research is growing in such an exponential pace that scientis...

BigCilin: An Automatic Chinese Open-domain Knowledge Graph with Fine-grained Hypernym-Hyponym Relations

This paper presents BigCilin, the first Chinese open-domain knowledge gr...

End-to-End NLP Knowledge Graph Construction

This paper studies the end-to-end construction of an NLP Knowledge Graph...

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

Entity-aware image captioning aims to describe named entities and events...

A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

Scholarly data is growing continuously containing information about the ...