AI- and HPC-enabled Lead Generation for SARS-CoV-2: Models and Processes to Extract Druglike Molecules Contained in Natural Language Text

01/12/2021
by   Zhi Hong, et al.
14

Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of coronavirus research. We report here on a project that leverages both human and artificial intelligence to detect references to drug-like molecules in free text. We engage non-expert humans to create a corpus of labeled text, use this labeled corpus to train a named entity recognition model, and employ the trained model to extract 10912 drug-like molecules from the COVID-19 Open Research Dataset Challenge (CORD-19) corpus of 198875 papers. Performance analyses show that our automated extraction model can achieve performance on par with that of non-expert humans.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/28/2020

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Researchers across the globe are seeking to rapidly repurpose existing d...
04/02/2020

Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

The recent COVID-19 pandemic has highlighted the need for rapid therapeu...
07/30/2021

An automated domain-independent text reading, interpreting and extracting approach for reviewing the scientific literature

It is presented here a machine learning-based (ML) natural language proc...
11/25/2020

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

De novo molecule generation often results in chemically unfeasible molec...
03/19/2014

Evolutionary Algorithm for Drug Discovery Interim Design Report

A software program which aims to provide an exploration capability over ...
09/03/2021

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

Like many scientific fields, new chemistry literature has grown at a sta...
04/23/2021

Genetic Constrained Graph Variational Autoencoder for COVID-19 Drug Discovery

In the past several months, COVID-19 has spread over the globe and cause...