Unsupervised Construction of Knowledge Graphs From Text and Code

08/25/2019
by   Kun Cao, et al.
0

The scientific literature is a rich source of information for data mining with conceptual knowledge graphs; the open science movement has enriched this literature with complementary source code that implements scientific models. To exploit this new resource, we construct a knowledge graph using unsupervised learning methods to identify conceptual entities. We associate source code entities to these natural language concepts using word embedding and clustering techniques. Practical naming conventions for methods and functions tend to reflect the concept(s) they implement. We take advantage of this specificity by presenting a novel process for joint clustering text concepts that combines word-embeddings, nonlinear dimensionality reduction, and clustering techniques to assist in understanding, organizing, and comparing software in the open science ecosystem. With our pipeline, we aim to assist scientists in building on existing models in their discipline when making novel models for new phenomena. By combining source code and conceptual information, our knowledge graph enhances corpus-wide understanding of scientific literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2019

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success ...
research
10/02/2020

JAKET: Joint Pre-training of Knowledge Graph and Language Understanding

Knowledge graphs (KGs) contain rich information about world knowledge, e...
research
11/23/2021

Triple Classification for Scholarly Knowledge Graph Completion

Scholarly Knowledge Graphs (KGs) provide a rich source of structured inf...
research
09/19/2019

Extracting Conceptual Knowledge from Natural Language Text Using Maximum Likelihood Principle

Domain-specific knowledge graphs constructed from natural language text ...
research
01/28/2022

Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text

Scientific models hold the key to better understanding and predicting th...
research
06/24/2020

Extracting the main trend in a dataset: the Sequencer algorithm

Scientists aim to extract simplicity from observations of the complex wo...
research
01/24/2023

The Semantic Scholar Open Data Platform

The volume of scientific output is creating an urgent need for automated...

Please sign up or login with your details

Forgot password? Click here to reset