Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins

05/07/2023
by   Markus J. Buehler, et al.
0

We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional architectures in a causal multi-headed graph mechanism, to realize a generative pretrained model. The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks. Further trained on inverse tasks, the model is rendered capable of designing proteins with these properties as target features. The model is formulated as a general framework, completely prompt-based, and can be adapted for a variety of downstream tasks. We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance, beyond what would be possible by training a model on each dataset alone. Case studies are presented to validate the method, yielding protein designs specifically focused on structural proteins, but also exploring the applicability in the design of soluble, antimicrobial biomaterials. While our model is trained to ultimately perform 8 distinct tasks, with available datasets it can be extended to solve additional problems. In a broader sense, this work illustrates a form of multiscale modeling that relates a set of ultimate building blocks (here, byte-level utf8 characters) to complex output. This materiomic scheme captures complex emergent relationships between universal building block and resulting properties via a synergizing learning capacity to express a set of potentialities embedded in the knowledge used in training, via the interplay of universality and diversity.

READ FULL TEXT

page 1

page 12

research
04/06/2022

Structure-aware Protein Self-supervised Learning

Protein representation learning methods have shown great potential to yi...
research
01/05/2023

Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Machine Learning-guided solutions for protein learning tasks have made s...
research
10/05/2022

AlphaFold Distillation for Improved Inverse Protein Folding

Inverse protein folding, i.e., designing sequences that fold into a give...
research
03/09/2021

Pretrained Transformers as Universal Computation Engines

We investigate the capability of a transformer pretrained on natural lan...
research
08/08/2023

PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer

Phosphorylation is central to numerous fundamental cellular processes, i...
research
11/17/2018

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures

We tackle the problem of protein secondary structure prediction using a ...
research
06/16/2020

Understanding and mitigating exploding inverses in invertible neural networks

Invertible neural networks (INNs) have been used to design generative mo...

Please sign up or login with your details

Forgot password? Click here to reset