MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations

08/26/2020
by   Daniel S. Berman, et al.
0

The ability to predict the evolution of a pathogen would significantly improve the ability to control, prevent, and treat disease. Despite significant progress in other problem spaces, deep learning has yet to contribute to the issue of predicting mutations of evolving populations. To address this gap, we developed a novel machine learning framework using generative adversarial networks (GANs) with recurrent neural networks (RNNs) to accurately predict genetic mutations and evolution of future biological populations. Using a generalized time-reversible phylogenetic model of protein evolution with bootstrapped maximum likelihood tree estimation, we trained a sequence-to-sequence generator within an adversarial framework, named MutaGAN, to generate complete protein sequences augmented with possible mutations of future virus populations. Influenza virus sequences were identified as an ideal test case for this deep learning framework because it is a significant human pathogen with new strains emerging annually and global surveillance efforts have generated a large amount of publicly available data from the National Center for Biotechnology Information's (NCBI) Influenza Virus Resource (IVR). MutaGAN generated "child" sequences from a given "parent" protein sequence with a median Levenshtein distance of 2.00 amino acids. Additionally, the generator was able to augment the majority of parent proteins with at least one mutation identified within the global influenza virus population. These results demonstrate the power of the MutaGAN framework to aid in pathogen forecasting with implications for broad utility in evolutionary prediction for any protein population.

READ FULL TEXT
research
04/09/2021

Protein sequence design with deep generative models

Protein engineering seeks to identify protein sequences with optimized p...
research
05/15/2023

AF2-Mutation: Adversarial Sequence Mutations against AlphaFold2 on Protein Tertiary Structure Prediction

Deep learning-based approaches, such as AlphaFold2 (AF2), have significa...
research
01/10/2023

On the Robustness of AlphaFold: A COVID-19 Case Study

Protein folding neural networks (PFNNs) such as AlphaFold predict remark...
research
12/29/2022

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

Deep learning has been widely used for protein engineering. However, it ...
research
10/08/2017

Protein identification with deep learning: from abc to xyz

Proteins are the main workhorses of biological functions in a cell, a ti...
research
03/18/2023

Protein Sequence Design with Batch Bayesian Optimisation

Protein sequence design is a challenging problem in protein engineering,...
research
06/05/2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Directed Evolution (DE), a landmark wet-lab method originated in 1960s, ...

Please sign up or login with your details

Forgot password? Click here to reset