Variational auto-encoding of protein sequences

12/09/2017
by   Sam Sinai, et al.
0

Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design.

READ FULL TEXT
research
01/06/2023

Conditional Generation of Paired Antibody Chain Sequences through Encoder-Decoder Language Model

Protein language models (LMs) have been successful in sequence, structur...
research
10/16/2020

Interpretable Structured Learning with Sparse Gated Sequence Encoder for Protein-Protein Interaction Prediction

Predicting protein-protein interactions (PPIs) by learning informative r...
research
01/02/2018

Transferable neural networks for enhanced sampling of protein dynamics

Variational auto-encoder frameworks have demonstrated success in reducin...
research
04/30/2023

Importance Weighted Expectation-Maximization for Protein Sequence Design

Designing protein sequences with desired biological function is crucial ...
research
02/18/2019

Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins

A Restricted Boltzmann Machine (RBM) is an unsupervised machine-learning...
research
11/12/2017

A Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions

The worldwide surge of multiresistant microbial strains has propelled th...
research
06/08/2022

Multi-channel neural networks for predicting influenza A virus hosts and antigenic types

Influenza occurs every season and occasionally causes pandemics. Despite...

Please sign up or login with your details

Forgot password? Click here to reset