An Investigation in Optimal Encoding of Protein Primary Sequence for Structure Prediction by Artificial Neural Networks

08/02/2020
by   Aaron Hein, et al.
0

Machine learning and the use of neural networks has increased precipitously over the past few years primarily due to the ever-increasing accessibility to data and the growth of computation power. It has become increasingly easy to harness the power of machine learning for predictive tasks. Protein structure prediction is one area where neural networks are becoming increasingly popular and successful. Although very powerful, the use of ANN require selection of most appropriate input/output encoding, architecture, and class to produce the optimal results. In this investigation we have explored and evaluated the effect of several conventional and newly proposed input encodings and selected an optimal architecture. We considered 11 variations of input encoding, 11 alternative window sizes, and 7 different architectures. In total, we evaluated 2,541 permutations in application to the training and testing of more than 10,000 protein structures over the course of 3 months. Our investigations concluded that one-hot encoding, the use of LSTMs, and window sizes of 9, 11, and 15 produce the optimal outcome. Through this optimization, we were able to improve the quality of protein structure prediction by predicting the ϕ dihedrals to within 14 - 16 and ψ dihedrals to within 23- 25. This is a notable improvement compared to previously similar investigations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2021

Mimetic Neural Networks: A unified framework for Protein Design and Folding

Recent advancements in machine learning techniques for protein folding m...
research
06/17/2022

Transformer Neural Networks Attending to Both Sequence and Structure for Protein Prediction Tasks

The increasing number of protein sequences decoded from genomes is openi...
research
08/24/2022

Secondary Protein Structure Prediction Using Neural Networks

In this paper we experiment with using neural network structures to pred...
research
07/19/2017

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

During the past decade, with the significant progress of computational p...
research
01/18/2019

Protein Classification using Machine Learning and Statistical Techniques: A Comparative Analysis

In recent era prediction of enzyme class from an unknown protein is one ...
research
10/23/2019

PharML.Bind: Pharmacologic Machine Learning for Protein-Ligand Interactions

Is it feasible to create an analysis paradigm that can analyze and then ...
research
10/07/2020

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

Predicting the effect of mutations in proteins is one of the most critic...

Please sign up or login with your details

Forgot password? Click here to reset