Conditional Generation of Paired Antibody Chain Sequences through Encoder-Decoder Language Model

01/06/2023
by   Simon K. S. Chu, et al.
0

Protein language models (LMs) have been successful in sequence, structural and functional predictions. However, currently, protein LMs are limited to encoder- or decoder-only architectures for single sequences while many biological contexts involve protein-protein interactions. Here, we introduce pAbT5, which models antibody chain pairing as forward- and back-translations using a T5-based architecture. We show that pAbT5 accurately reflects chain pairing through sequence generation. Our protein LM generates variable-length sequences and its next-word prediction probability agrees with position-specific scoring matrix from sequence alignment. Like other works in protein LM, pAbT5 performs state-of-the-art unsupervised prediction on experimental measurements. To the best of our knowledge, pAbT5 is the first generative encoder-decoder protein LM for protein-protein interactions.

READ FULL TEXT

page 5

page 15

page 22

page 25

page 26

page 28

page 31

page 33

research
06/02/2023

Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation

The field of protein folding research has been greatly advanced by deep ...
research
12/09/2017

Variational auto-encoding of protein sequences

Proteins are responsible for the most diverse set of functions in biolog...
research
09/30/2020

Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices

Identifying similar protein sequences is a core step in many computation...
research
05/09/2022

Multi-segment preserving sampling for deep manifold sampler

Deep generative modeling for biological sequences presents a unique chal...
research
01/16/2023

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling

As opposed to scaling-up protein language models (PLMs), we seek improvi...
research
10/02/2020

Bridging the Gaps in Statistical Models of Protein Alignment

This work demonstrates how a complete statistical model quantifying the ...
research
11/12/2017

A Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions

The worldwide surge of multiresistant microbial strains has propelled th...

Please sign up or login with your details

Forgot password? Click here to reset