AI- and HPC-enabled Lead Generation for SARS-CoV-2: Models and Processes to Extract Druglike Molecules Contained in Natural Language Text

by   Zhi Hong, et al.

Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of coronavirus research. We report here on a project that leverages both human and artificial intelligence to detect references to drug-like molecules in free text. We engage non-expert humans to create a corpus of labeled text, use this labeled corpus to train a named entity recognition model, and employ the trained model to extract 10912 drug-like molecules from the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198875 papers. Performance analyses show that our automated extraction model can achieve performance on par with that of non-expert humans.


page 1

page 2

page 3

page 4


Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Researchers across the globe are seeking to rapidly repurpose existing d...

Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

The recent COVID-19 pandemic has highlighted the need for rapid therapeu...

An automated domain-independent text reading, interpreting and extracting approach for reviewing the scientific literature

It is presented here a machine learning-based (ML) natural language proc...

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

De novo molecule generation often results in chemically unfeasible molec...

Evolutionary Algorithm for Drug Discovery Interim Design Report

A software program which aims to provide an exploration capability over ...

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Like many scientific fields, new chemistry literature has grown at a sta...

Genetic Constrained Graph Variational Autoencoder for COVID-19 Drug Discovery

In the past several months, COVID-19 has spread over the globe and cause...