Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation

06/02/2023
by   Le Zhang, et al.
0

The field of protein folding research has been greatly advanced by deep learning methods, with AlphaFold2 (AF2) demonstrating exceptional performance and atomic-level precision. As co-evolution is integral to protein structure prediction, AF2's accuracy is significantly influenced by the depth of multiple sequence alignment (MSA), which requires extensive exploration of a large protein database for similar sequences. However, not all protein sequences possess abundant homologous families, and consequently, AF2's performance can degrade on such queries, at times failing to produce meaningful results. To address this, we introduce a novel generative language model, MSA-Augmenter, which leverages protein-specific attention mechanisms and large-scale MSAs to generate useful, novel protein sequences not currently found in databases. These sequences supplement shallow MSAs, enhancing the accuracy of structural property predictions. Our experiments on CASP14 demonstrate that MSA-Augmenter can generate de novo sequences that retain co-evolutionary information from inferior MSAs, thereby improving protein structure prediction quality on top of strong AF2.

READ FULL TEXT

page 3

page 9

page 13

research
06/09/2023

PoET: A generative model of protein families as sequences-of-sequences

Generative protein language models are a natural way to design new prote...
research
08/17/2021

Modeling Protein Using Large-scale Pretrain Language Model

Protein is linked to almost every life process. Therefore, analyzing the...
research
01/06/2023

Conditional Generation of Paired Antibody Chain Sequences through Encoder-Decoder Language Model

Protein language models (LMs) have been successful in sequence, structur...
research
03/17/2015

ProtVec: A Continuous Distributed Representation of Biological Sequences

We introduce a new representation and feature extraction method for biol...
research
05/15/2023

AF2-Mutation: Adversarial Sequence Mutations against AlphaFold2 on Protein Tertiary Structure Prediction

Deep learning-based approaches, such as AlphaFold2 (AF2), have significa...
research
02/22/2019

Learning protein sequence embeddings using information from structure

Inferring the structural properties of a protein from its amino acid seq...
research
08/20/2022

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

Data-driven predictive methods which can efficiently and accurately tran...

Please sign up or login with your details

Forgot password? Click here to reset