Generative Capacity of Probabilistic Protein Sequence Models

12/03/2020
by   Francisco McGee, et al.
4

Variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict the effect of mutations. Despite encouraging results, a quantitative characterization of the VAE-generated probability distribution is still lacking. In particular, it is currently unclear whether or not VAEs can faithfully reproduce the complex multi-residue mutation patterns observed in natural sequences arising due to epistasis. In other words, are frequently observed subsequences assigned a correspondingly large probability by the VAE? Using a set of sequence statistics we comparatively assess the accuracy, or "generative capacity", of three GPSMs: a pairwise Potts Hamiltonian, a vanilla VAE, and a site-independent model, using natural and synthetic datasets. We show that the vanilla VAE's generative capacity lies between the pairwise Potts and site-independent models. Importantly, our work measures GPSM generative capacity in terms of higher-order sequence covariation and provides a new framework for evaluating and interpreting GPSM accuracy that emphasizes the role of epistasis.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 6

page 7

page 9

page 23

research
04/14/2022

Generative power of a protein language model trained on multiple sequence alignments

Computational models starting from large ensembles of evolutionarily rel...
research
12/29/2022

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

Deep learning has been widely used for protein engineering. However, it ...
research
12/18/2017

Deep generative models of genetic variation capture mutation effects

The functions of proteins and RNAs are determined by a myriad of interac...
research
02/17/2018

Interpretable VAEs for nonlinear group factor analysis

Deep generative models have recently yielded encouraging results in prod...
research
06/01/2022

Top-down inference in an early visual cortex inspired hierarchical Variational Autoencoder

Interpreting computations in the visual cortex as learning and inference...
research
02/27/2017

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Recent work on generative modeling of text has found that variational au...
research
02/18/2019

Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins

A Restricted Boltzmann Machine (RBM) is an unsupervised machine-learning...

Please sign up or login with your details

Forgot password? Click here to reset