Multi-level Protein Representation Learning for Blind Mutational Effect Prediction

06/08/2023
by   Yang Tan, et al.
0

Directed evolution plays an indispensable role in protein engineering that revises existing protein sequences to attain new or enhanced functions. Accurately predicting the effects of protein variants necessitates an in-depth understanding of protein structure and function. Although large self-supervised language models have demonstrated remarkable performance in zero-shot inference using only protein sequences, these models inherently do not interpret the spatial characteristics of protein structures, which are crucial for comprehending protein folding stability and internal molecular interactions. This paper introduces a novel pre-training framework that cascades sequential and geometric analyzers for protein primary and tertiary structures. It guides mutational directions toward desired traits by simulating natural selection on wild-type proteins and evaluates the effects of variants based on their fitness to perform the function. We assess the proposed approach using a public database and two new databases for a variety of variant effect prediction tasks, which encompass a diverse set of proteins and assays from different taxa. The prediction results achieve state-of-the-art performance over other zero-shot learning methods for both single-site mutations and deep mutations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2023

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts

Current protein language models (PLMs) learn protein representations mai...
research
02/24/2023

Retrieved Sequence Augmentation for Protein Representation Learning

Protein language models have excelled in a variety of tasks, ranging fro...
research
12/07/2022

Unsupervised language models for disease variant prediction

There is considerable interest in predicting the pathogenicity of protei...
research
05/27/2022

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

The ability to accurately model the fitness landscape of protein sequenc...
research
08/20/2020

Assigning function to protein-protein interactions: a weakly supervised BioBERT based approach using PubMed abstracts

Motivation: Protein-protein interactions (PPI) are critical to the funct...
research
11/18/2022

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Despite being self-supervised, protein language models have shown remark...
research
01/30/2023

Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling

Protein representation learning has primarily benefited from the remarka...

Please sign up or login with your details

Forgot password? Click here to reset