Molecular Structure Extraction From Documents Using Deep Learning

02/14/2018
by   Joshua Staker, et al.
0

Chemical structure extraction from documents remains a hard problem due to both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally, but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We here present end-to-end deep learning solutions for both segmenting molecular structures from documents and for predicting chemical structures from these segmented images. This deep learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep-learning approach described herein we show that it is possible to perform well on both segmentation and prediction of low resolution images containing moderately sized molecules found in journal articles and patents.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
02/19/2022

Image-to-Graph Transformers for Chemical Structure Recognition

For several decades, chemical knowledge has been published in written te...
research
05/23/2022

MolMiner: You only look once for chemical structure recognition

Molecular structures are always depicted as 2D printed form in scientifi...
research
05/28/2022

Robust Molecular Image Recognition: A Graph Generation Approach

Molecular image recognition is a fundamental task in information extract...
research
10/11/2019

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Segmenting anatomical structures in medical images has been successfully...
research
03/10/2021

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

We combine deep learning and Conditional Probabilistic Context Free Gram...
research
08/23/2023

MolGrapher: Graph-based Visual Recognition of Chemical Structures

The automatic analysis of chemical literature has immense potential to a...
research
03/04/2015

Toxicity Prediction using Deep Learning

Everyday we are exposed to various chemicals via food additives, cleanin...

Please sign up or login with your details

Forgot password? Click here to reset