Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design

by   Wengong Jin, et al.

Antibodies are versatile proteins that bind to pathogens like viruses and stimulate the adaptive immune system. The specificity of antibody binding is determined by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a generative model to automatically design the CDRs of antibodies with enhanced binding specificity or neutralization capabilities. Previous generative approaches formulate protein design as a structure-conditioned sequence generation task, assuming the desired 3D structure is given a priori. In contrast, we propose to co-design the sequence and 3D structure of CDRs as graphs. Our model unravels a sequence autoregressively while iteratively refining its predicted global structure. The inferred structure in turn guides subsequent residue choices. For efficiency, we model the conditional dependence between residues inside and outside of a CDR in a coarse-grained manner. Our method achieves superior log-likelihood on the test set and outperforms previous baselines in designing antibodies capable of neutralizing the SARS-CoV-2 virus.



There are no comments yet.


page 1

page 2

page 3

page 4


Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...

Improving Molecular Design by Stochastic Iterative Target Augmentation

Generative models in molecular design tend to be richly parameterized, d...

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a...

E(n) Equivariant Normalizing Flows for Molecule Generation in 3D

This paper introduces a generative model equivariant to Euclidean symmet...

GraphRNN: A Deep Generative Model for Graphs

Modeling and generating graphs is fundamental for studying networks in b...

Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network

Personalized review generation (PRG) aims to automatically produce revie...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Monoclonal antibodies are increasingly adopted as therapeutics targeting a wide range of pathogens such as SARS-CoV-2 (Pinto et al., 2020). Since the binding specificity of these Y-shaped proteins is largely determined by their complementarity-determining regions (CDRs), the main goal of computational antibody design is to automate the creation of CDR subsequences with desired properties. This problem is particularly challenging due to the combinatorial search space of over possible CDR sequences and the small solution space which satisfies the desired constraints of binding affinity, stability, and synthesizability (Raybould et al., 2019).

There are three key modeling questions in CDR generation. The first is how to model the relation between a sequence and its underlying 3D structure. Generating sequences without the corresponding structure (Alley et al., 2019; Shin et al., 2021) can lead to sub-optimal performance (Ingraham et al., 2019), while generating from a predefined 3D structure (Ingraham et al., 2019) is not suitable for antibodies since the desired structure is rarely known a priori (Fischman and Ofran, 2018). Therefore, it is crucial to develop models that co-design the sequence and structure. The second question is how to model the conditional distribution of CDRs given the remainder of a sequence (context). Attention-based methods only model the conditional dependence at the sequence level, but the structural interaction between the CDR and its context is crucial for generation. The last question relates to the model’s ability to optimize for various properties. Traditional physics-based methods (Lapidoth et al., 2015; Adolf-Bryfogle et al., 2018) focus on binding energy minimization, but in practice, our objective can be much more involved than binding energies (Liu et al., 2020).

In this paper, we represent a sequence-structure pair as a graph and formulate the co-design task as a graph generation problem. The graph representation allows us to model the conditional dependence between a CDR and its context at both the sequence and structure levels. Antibody graph generation poses unique challenges because the global structure is expected to change when new nodes are inserted. Previous autoregressive models

(You et al., 2018; Gebauer et al., 2019) cannot modify a generated structure because they are trained under teacher forcing. Thus errors made in the previous steps can lead to a cascade of errors in subsequent generation steps. To address these problems, we propose a novel architecture which interleaves the generation of amino acid nodes with the prediction of 3D structures. The structure generation is based on an iterative refinement of a global graph rather than a sequential expansion of a partial graph with teacher forcing. Since the context sequence is long, we further introduce a coarsened graph representation by grouping nodes into blocks. We apply graph convolution at a coarser level to efficiently propagate the contextual information to the CDR residues. After pretraining our model on antibodies with known structures, we finetune it using a predefined property predictor to generate antibodies with specific properties.

We evaluate our method on three generation tasks, ranging from language modeling to SARS-CoV-2 neutralization optimization and antigen-binding antibody design. Our method is compared with a standard sequence model (Saka et al., 2021; Akbar et al., 2021) and a state-of-the-art graph generation method (You et al., 2018) tailored to antibodies. Our method not only achieves lower perplexity on test sequences but also outperforms previous baselines in property-guided antibody design tasks.

2 Related work

Antibody/protein design. Current methods for computational antibody design roughly fall into two categories. The first class is based on energy function optimization (Pantazes and Maranas, 2010; Li et al., 2014; Lapidoth et al., 2015; Adolf-Bryfogle et al., 2018), which use Monte Carlo simulation to iteratively modify a sequence and its structure until reaching a local energy minimum. Similar approaches are used in protein design (Leaver-Fay et al., 2011; Tischer et al., 2020). Nevertheless, these physics-based methods are computationally expensive (Ingraham et al., 2019) and our desired objective can be much more complicated than low binding energy (Liu et al., 2020).

The second class is based on generative models. For antibodies, they are mostly sequence-based (Alley et al., 2019; Shin et al., 2021; Saka et al., 2021; Akbar et al., 2021). For proteins, O’Connell et al. (2018); Ingraham et al. (2019); Strokach et al. (2020); Karimi et al. (2020); Cao et al. (2021) further developed models conditioned on a backbone structure or protein fold. Our model also seeks to incorporate 3D structure information for antibody generation. Since the best CDR structures are often unknown for new pathogens, we co-design sequences and structures for specific properties.

Generative models for graphs. Our work is related to autoregressive models for graph generation (You et al., 2018; Li et al., 2018; Liu et al., 2018; Liao et al., 2019; Jin et al., 2020a). In particular, Gebauer et al. (2019) developed G-SchNet for molecular graph and conformation co-design. Unlike our method, they generate edges sequentially and cannot modify a previously generated subgraph when new nodes arrive. While Graphite (Grover et al., 2019) also uses iterative refinement to predict the adjacency matrix of a graph, it assumes all the node labels are given and predicts edges only. In contrast, our work combines autoregressive models with iterative refinement to generate a full graph with node and edge labels, including node labels and coordinates.

3D structure prediction. Our approach is closely related to protein folding (Ingraham et al., 2018; Yang et al., 2020a; Baek et al., 2021; Jumper et al., 2021). Inputs to the state-of-the-art models like AlphaFold require a complete protein sequence, its multi-sequence alignment (MSA), and its template features. These models are not directly applicable because we need to predict the structure of an incomplete sequence and the MSA is not specified in advance.

Our iterative refinement model is also related to score matching methods for molecular conformation prediction (Shi et al., 2021) and diffusion-based methods for point clouds (Luo and Hu, 2021). These algorithms also iteratively refine a predicted 3D structure, but only for a complete molecule or point cloud. In contrast, our approach learns to predict the 3D structure for incomplete graphs and interleaves 3D structure refinement with graph generation.

3 Antibody Sequence and Structure Co-Design

Overview. The role of an antibody is to bind to an antigen (e.g. a virus), present it to the immune system, and stimulate an immune response. A subset of antibodies known as neutralizing antibodies not only bind to an antigen but can also suppress its activity. An antibody consists of a heavy chain and a light chain, each composed of one variable domain (VH/VL) and some constant domains. The variable domain is further divided into a framework region and three complementarity determining regions (CDRs). The three CDRs on the heavy chain are labeled as CDR-H1, CDR-H2, CDR-H3, each occupying a contiguous subsequence (Figure 1). As the most variable part of an antibody, CDRs are the main determinants of binding and neutralization (Abbas et al., 2014).

Following Shin et al. (2021); Akbar et al. (2021), we formulate antibody design as a CDR generation task, conditioned on the framework region. Specifically, we represent an antibody as a graph, which encodes both its sequence and 3D structure. We propose a new graph generation approach called RefineGNN and extend it to handle conditional generation given a fixed framework region. Lastly, we describe how to apply RefineGNN to property-guided optimization to design new antibodies with better neutralization properties. For simplicity, we focus on the generation of heavy chain CDRs, though our method can be easily extended to model light chains CDRs.

Notations. An antibody VH domain is represented as a sequence of amino acids . Each token in the sequence is called a residue, whose value can be either one of the 20 amino acids or a special token , meaning that its amino acid type is unknown and needs to be predicted. The VH sequence folds into a 3D structure and each residue is labeled with three backbone coordinates: for its alpha carbon atom, for its carbon atom, and for its nitrogen atom.

Figure 1: Schematic structure of an antibody (figure modified from Wikipedia).

3.1 Graph representation

We represent an antibody (VH) as a graph with node features and edge features . Each node feature encodes three dihedral angles related to three backbone coordinates of residue . For each residue , we compute an orientation matrix representing its local coordinate frame (Ingraham et al., 2019) (defined in the appendix). This allows us to compute edge features describing the spatial relationship between two residues and (i.e. distance and orientation) as follows


The edge feature contains four parts. The positional encoding encodes the relative distance between two residues in an antibody sequence. The second term is a distance encoding lifted into radial basis. The third term in is a direction encoding that corresponds to the relative direction of in the local frame of residue . The last term is the orientation encoding of the quaternion representation of the spatial rotation matrix . We only include edges in the -nearest neighbors graph of with . For notation convenience, we use as a shorthand for when there is no ambiguity.

3.2 Iterative Refinement Graph Neural Network (RefineGNN)

We propose to generate an antibody graph via an iterative refinement process. Let be the initial guess of the true antibody graph. Each residue is initialized as a special token and each edge is initialized to be of distance since the average distance between consecutive residues is around three. The direction and orientation features are set to zero. In each generation step , the model learns to revise a current antibody graph and predict the label of the next residue . Specifically, it first encodes with a message passing network (MPN) with parameter


where is a learned representation of residue under the current graph . Our MPN consists of message passing layers with the following architecture


where and .

is a two-layer feed-forward network (FFN) with ReLU activation function.

is a learned embedding of amino acid type . Based on the learned residue representations, we predict the amino acid type of the next residue (Figure 2A).


This prediction gives us a new graph with the same edges as but the node label of is changed (Figure 2B). Next, we need to update the structure to accommodate the new residue . To this end, we encode graph by another MPN with a different parameter and predict the coordinate of all residues.


The new coordinates define a new antibody graph for the next iteration (Figure 2C). We explicitly realize the coordinates of each residue because we need to calculate the spatial edge features for . The structure prediction (coordinates ) and sequence prediction (amino acid types ) are carried out by two different MPNs, namely the structure network and sequence network . This disentanglement allows the two networks to focus on two distinct tasks.

Training. During training, we only apply teacher forcing to the discrete amino acid type prediction. Specifically, in each generation step , residues to are set to their ground truth amino acid types , while all future residues

are set to a padding token. In contrast, the continuous structure prediction is carried out

without teacher forcing. In each iteration, the model refines the entire structure predicted in the previous step and constructs a new -nearest neighbors graph of all residues based on the predicted coordinates .

Loss function.

Our model remains rotation and translation invariant because the loss function is computed over pairwise distance and angles rather than coordinates. The loss function for antibody structure prediction consists of three parts.

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Distance loss: For each residue pair , we compute its pairwise distance between the predicted alpha carbons . We define the distance loss as the Huber loss between the predicted and true pairwise distances


    where distance is squared to avoid the square root operation which causes numerical instability.

  • Dihedral angle loss: For each residue, we calculate its dihedral angle based on the predicted atom coordinates and . We define the dihedral angle loss as the mean square error between the predicted and true dihedral angles

  • angle loss: We calculate angles

    between two vectors

    and as well as dihedral angles between two planes defined by .


In summary, the overall graph generation loss is defined as , where


The sequence prediction loss is the cross entropy between predicted and true residue types.

Figure 2: (A-C) One generation step of RefineGNN. Each circle represents a CDR residue and each square represents a residue block in a coarsened context sequence. (D) Sequence coarsening.

3.3 Conditional generation given the framework region

The model architecture described so far is designed for unconditional generation — it generates an entire antibody graph without any constraints. In practice, we usually fix the framework region of an antibody and design the CDR sequence only. Therefore, we need to extend the model architecture to learn the conditional distribution , where and are residues outside of the CDR .

Conditioning via attention.

A simple extension of RefineGNN is to encode the non-CDR sequence using a recurrent neural network and propagate information to the CDR through an attention layer. To be specific, we first concatenate

and into a context sequence , where means string concatenation and is repeated

times. We then encode this context sequence by a Gated Recurrent Unit (GRU)

(Cho et al., 2014) and modify the structure and sequence prediction step (Equation 4 and 6) as


Multi-resolution modeling. The attention-based approach alone is not sufficient because it does not model the structure of the context sequence, thus ignoring how its residues structurally interact with the CDR’s. While this information is not available for new antibodies at test time, we can learn to predict this interaction using antibodies in the training set with known structures.

A naive solution is to iteratively refine the entire antibody structure (more than 100 residues) while generating CDR residues. This approach is computationally expensive because we need to recompute the MPN encoding for all residues in each generation step. Importantly, we cannot predict the context residue coordinates at the outset and fix them because they need to be adjusted accordingly when the coordinates of CDR residues are updated in each generation step.

For computational efficiency, we propose a coarse-grained model that reduces the context sequence length by clustering it into residue blocks. Specifically, we construct a coarsened context sequence by clustering every context residues into a block (Figure 2D). The new sequence defines a coarsened graph over the residue blocks, whose edges are defined based on block coordinates. The coordinate of each block is defined as the mean coordinate of residues within the block. The embedding of each block is the mean of its residue embeddings.


Now we can apply RefineGNN to generate the CDR residues while iteratively refining the global graph by predicting the coordinates of all blocks. The only change is that the structure prediction loss is defined over block coordinates . Lastly, we combine both the attention mechanism and coarse-grained modeling to keep both fine-grained and coarse-grained information. The decoding process of this conditional RefineGNN is illustrated in Algorithm 1.

3.4 Property-guided sequence optimization

Our ultimate goal is to generate new antibodies with desired properties such as neutralizing a particular virus. This task can be formulated as an optimization problem. Let be a binary indicator variable for neutralization. Our goal is to learn a conditional generative model

that maximizes the probability of neutralization for a training set of antibodies

, i.e.


where is a predictor for . Assuming is given, this problem can be solved by iterative target augmentation (ITA) (Yang et al., 2020b). Before ITA optimization starts, we first pretrain our model on a set of real antibody structures to learn a prior distribution over CDR sequences and structures. In each ITA finetuning step, we first randomly sample a sequence from , a set of antibodies whose CDRs need to be redesigned. Next, we generate new sequences given its context . A generated sequence is added to our training set if it is predicted as neutralizing. Initially, the training set contains antibodies that are known to be neutralizing (). Lastly, we sample a batch of neutralizing antibodies from and update the model parameter by minimizing their sequence prediction loss (Eq.(10)). The structure prediction loss is excluded in ITA finetuning phase because the structure of a generated sequence is unknown.

0:  Context sequence
1:  Predict the CDR length
2:  Coarsen the context sequence into
3:  Construct the initial graph
4:  for  do
5:     Encode using the sequence MPN
6:     Predict distribution of the next residue
7:     Sample
8:     Encode with the structure MPN
9:     Predict all residue coordinates
10:     Update using the new coordinates
Algorithm 1 RefineGNN decoding
0:  A set of antibodies to be optimized
0:  A neutralization predictor .
0:  A set of neutralizing antibodies
1:  for each iteration do
2:     Sample an antibody from , remove its CDR and get a context sequence
3:     for  do
4:        Sample
5:        if  then
7:     Sample a batch of new antibodies from
8:     Update model parameter by minimizing the sequence prediction loss .
Algorithm 2 ITA-based sequence optimization

4 Experiments

Setup. We construct three evaluation setups to quantify the performance of our approach. Following standard practice in generative model evaluation, we first measure the perplexity of different models on new antibodies in a test set created based on sequence similarity split. We also measure structure prediction error by comparing generated and ground truth CDR structures recorded in the Structural Antibody Database (Dunbar et al., 2014). Results for this task are shown in section 4.1.

Second, we evaluate our method on an existing antibody design benchmark of 60 antibody-antigen complexes from Adolf-Bryfogle et al. (2018). The goal is to design the CDR-H3 of an antibody so that it binds to a given antigen. Results for this task are shown in section 4.2.

Lastly, we propose an antibody optimization task to test whether our method can design new antibodies with desired properties. Specifically, we seek to redesign CDR-H3 of existing antibodies in the Coronavirus Antibody Database (CoVAbDab) (Raybould et al., 2021) to improve their neutralization against SARS-CoV-2. Following works in molecular design (Jin et al., 2020b), we use a predictor to evaluate the neutralization of generated antibodies since we cannot experimentally test them in wet labs. Results for this task are reported in section 4.3.

Baselines. We consider three baselines for comparison (details in the appendix). The first baseline is a sequence-based LSTM model used in Saka et al. (2021); Akbar et al. (2021). This model does not utilize any 3D structure information. It consists of an encoder that learns to encode a context sequence , a decoder that decodes a CDR sequence, and an attention layer connecting the two.

The second baseline is an autoregressive graph generation model (AR-GNN) whose architecture is similar to You et al. (2018); Jin et al. (2020b) but tailored for antibodies. AR-GNN generates an antibody graph residue by residue. In each step , it first predicts the amino acid type of residue and then generates edges between and previous residues. Importantly, AR-GNN cannot modify a partially generated 3D structure of residues because it is trained by teacher forcing.

On the antigen-binding task, we include an additional physics-based baseline called RosettaAntibodyDesign (RAbD) (Adolf-Bryfogle et al., 2018). We apply their de novo design protocol composed of graft design followed by 250 iterations of sequence design and energy minimization. We cannot afford to run more iterations because it takes more than 10 hours per antibody. We also could not apply RAbD to the SARS-CoV-2 task because it requires 3D structures to be given. This information is unavailable for antibodies in CoVAbDab.


We performed hyperparameter tuning to find the best setting for each method. For RefineGNN, both its structure and sequence MPN have four message passing layers, with a hidden dimension of 256 and block size

. All models are trained by the Adam optimizer with a learning rate of 0.001. More details are provided in the appendix.

LSTM 6.79 - 7.21 - 9.70 -
AR-GNN 6.44 2.97 6.86 2.27 9.44 3.63
RefineGNN 6.09 1.18 6.58 0.87 8.38 2.50
Model AAR
RAbD 28.53%
LSTM 22.53%
AR-GNN 23.86%
RefineGNN 35.37%
Table 1: Left: Language modeling results. We report perplexity (PPL) and root mean square deviation (RMSD) for each CDR in the heavy chain. Right: Results on the antigen-binding antibody design task. We report the amino acid recovery (AAR) for all methods.

4.1 Language modeling and 3D structure prediction

Data. The Structural Antibody Database (SAbDab) consists of 4994 antibody structures renumbered according to the IMGT numbering scheme (Lefranc et al., 2003). To measure a model’s ability to generalize to novel CDR sequences, we divide the heavy chains into training, validation, and test sets based on CDR cluster split. We illustrate our cluster split process using CDR-H3 as an example. First, we use MMseqs2 (Steinegger and Söding, 2017) to cluster all the CDR-H3 sequences. The sequence identity is calculated under the BLOSUM62 substitution matrix (Henikoff and Henikoff, 1992). Two antibodies are put into the same cluster if their CDR-H3 sequence identity is above 40%. We then randomly split the clusters into training, validation, and test set with 8:1:1 ratio. We repeat the same procedure for creating CDR-H1 and CDR-H2 splits. In total, there are 1266, 1564, and 2325 clusters for CDR-H1, H2, and H3. The size of training, validation, and test sets for each CDR is shown in the appendix.

Metrics. For each method, we report the perplexity (PPL) of test sequences and the root mean square deviation (RMSD) between a predicted structure and its ground truth structure reported in SAbDab. RMSD is calculated by the Kabsch algorithm (Kabsch, 1976) based on coordinate of CDR residues. Since the mapping between sequences and structures is deterministic in RefineGNN, we can calculate perplexity in the same way as standard sequence models.

Results. Since the LSTM baseline does not involve structure prediction, we report RMSD for graph-based methods only. As shown in Table 1, RefineGNN significantly outperforms all baselines in both metrics. For CDR-H3, our model gives 13% PPL reduction (8.38 v.s. 9.70) over sequence only model and 10% PPL reduction over AR-GNN (8.38 v.s. 9.44). RefineGNN also predicts the structure more accurately, with 30% relative RMSD reduction over AR-GNN. In Figure 3, we provide examples of predicted 3D structures of CDR-H3 loops.

Ablation studies. We further conduct ablation experiments on the CDR-H3 generation task to study the importance of different modeling choices. First, when we remove the attention mechanism and context coarsening step in section 3.3, the PPL increases from 8.38 to 8.86 (Figure 3C, row 2) and 9.01 (Figure 3C, row 3) respectively. We also tried to remove both the attention and coarsening modules and trained the model without conditioning on the context sequence. The PPL of this unconditional variant is much worse than our conditional model (Figure 3C, row 4). In short, these results validate the advantage of our multi-resolution conditioning strategy.

Moreover, the model performance slightly degrades when we halve the number of refinement steps or increase block size to (Figure 3C, row 5-6) . Lastly, we train a structure-conditioned model by feeding the ground truth structure to RefineGNN at every generation step (Figure 3C, row 7). While this structure-conditioned model gives a lower PPL as expected (7.39 v.s. 8.38), it is not too far away from the sequence only model (PPL = 9.70). This suggests that RefineGNN is able to extract a decent amount of information from the partial structure co-evolving with the sequence.

4.2 Antigen-binding antibody design

Data. Adolf-Bryfogle et al. (2018) have selected a diverse set of 60 antibody-antigen complexes as a benchmark for computational antibody design. The goal is to design the CDR-H3 sequence of an antibody so that it binds to a corresponding antigen.

Metric. Following Adolf-Bryfogle et al. (2018)

, we use amino acid recovery (AAR) as the evaluation metric. For any generated sequence, we define its AAR as the percentage of residues having the same amino acid as the corresponding residue in the original antibody.

Results. For LSTM, AR-GNN, and RefineGNN, the training set in this setup is the entire SAbDab except antibodies in the same cluster as any of the test antibodies. At test time, we generate 10000 CDR-H3 sequences for each antibody and select the top 100 candidates with the lowest perplexity. For simplicity, all methods are configured to generate CDRs of the same length as the original CDR. As shown in Table 1, our model achieves the highest AAR score, with around 7% absolute improvement over the best baseline. In Figure 4A, we show an example of a generated CDR-H3 sequence and highlight residues that are different from the original antibody. We also found that sequences with lower perplexity tend to have a lower AA recovery error (Pearson R = 0.427, Figure 4B). This suggests that we can use perplexity as the ranking criterion for antibody design.

Figure 3: (A) CDR-H3 structure predicted by RefineGNN (PDB: 4bkl, RMSD = 0.57). The predicted structure (cyan) is aligned to the true structure (green) using the Kabsch algorithm. (B) CDR-H3 structure predicted by AR-GNN (PDB: 4bkl, RMSD = 2.16). (C) Ablation studies of different modeling choices in RefineGNN in the CDR-H3 perplexity evaluation task.
Figure 4: (A) Visualization of a generated CDR-H3 sequence and its structure in complex with an antigen (PDB: 4cmh). The predicted structure is aligned and grafted onto the original antibody using the Kabsch algorithm. Residues different from the original antibody are highlighted in red. (B) The correlation between the perplexity of a generated sequence and AA recovery error.

4.3 SARS-CoV-2 neutralization optimization

Data. Identifying neutralizing antibodies for COVID-19 is crucial to the development of antiviral drugs. The Coronavirus Antibody Database (CoVAbDab) contains 2411 antibodies, each associated with multiple binary labels indicating whether it neutralizes a coronavirus (SARS-CoV-1 or SARS-CoV-2) at a certain epitope. Similar to the previous experiment, we divide the antibodies into training, validation, and test sets based on CDR-H3 cluster split with 8:1:1 ratio.

Neutralization predictor. The predictor takes as input the VH sequence of an antibody and outputs a neutralization probability for the SARS-CoV-1 and SARS-CoV-2 viruses. Each residue is embedded into a 64 dimensional vector, which is fed to a SRU encoder (Lei, 2021) followed by average-pooling and a two-layer feed forward network. The final outputs are the probabilities and of neutralizing SARS-CoV-1 and SARS-CoV-2 and our scoring function is . The predictor achieved 0.81 test AUROC for SARS-CoV-2 neutralization prediction.

CDR sequence constraints. Therapeutic antibodies must be free from developability issues such as glycosylation and high charges (Raybould et al., 2019). Thus, we include four constraints on a CDR-H3 sequence : 1) Its net charge must be between -2.0 and 2.0 (Raybould et al., 2019). The definition of net charge is given in the appendix. 2) It must not contain the N-X-S/T motif which is prone to glycosylation. 3) Any amino acid should not repeat more than five times (e.g. SSSSS). 4) Perplexity of a generated sequence given by LSTM, AR-GNN, and RefineGNN should be all less than 10. The last two constraints force generated sequences to be realistic. We use all three models in the perplexity constraint to ensure a fair comparison for all methods.

Metric. For each antibody in the test set, we generate 100 new CDR-H3 sequences, concatenate them with its context sequence to form 100 full VH sequences, and feed them into the neutralization predictor . We report the average neutralization score of antibodies in the test set. Neutralization score of a generated sequence equals if it satisfies all the CDR sequence constraints. Otherwise the score is the same as the original sequence. In addition, we pretrain each model on the SAbDab CDR-H3 sequences and evaluate its PPL on the CoVAbDab CDR-H3 sequences.

Results. All methods are pretrained on SAbDab antibodies and finetuned on CoVAbDab using the ITA algorithm to generate neutralizing antibodies. Our model outperforms the best baseline by a 3% increase in terms of average neutralization score (Table 2). Our pretrained RefineGNN also achieves a much lower perplexity on CoVAbDab antibodies (7.86 v.s. 8.67). Examples of generated CDR-H3 sequences and their predicted neutralization scores are shown in the appendix.

Original LSTM AR-GNN RefineGNN
CoVAbDab PPL () - 9.40 8.67 7.86
Neutralization () 69.3% 72.0% 70.4% 75.2%
Table 2: SARS-CoV-2 neutralization optimization results. For each method, we report the PPL on CoVAbDab after pretraining on SAbDab and then report the average neutralization score after ITA finetuning. The average neutralization probability of original CoVAbDab antibodies is 69.3%.

5 Conclusion

In this paper, we developed a RefineGNN model for antibody sequence and structure co-design. The advantage of our model over previous graph generation methods is its ability to revise a generated subgraph to accommodate addition of new residues. Our approach significantly outperforms sequence-based and graph-based approaches on three antibody generation tasks.


We would like to thank Rachel Wu, Xiang Fu, Jason Yim, and Peter Mikhael for their valuable feedback on the manuscript. We also want to thank Nitan Shalon, Nicholas Webb, Jae Hyeon Lee, Qiu Yu, and Galit Alter for their suggestions on method development. We are grateful for the generous support of Mark and Lisa Schwartz, funding in a form of research grant from Sanofi, Eric and Wendy Schmidt Center at the Broad Institute, and Abdul Latif Jameel Clinic for Machine Learning in Health.


  • A. K. Abbas, A. H. Lichtman, and S. Pillai (2014) Cellular and molecular immunology e-book. Elsevier Health Sciences. Cited by: §3.
  • J. Adolf-Bryfogle, O. Kalyuzhniy, M. Kubitz, B. D. Weitzner, X. Hu, Y. Adachi, W. R. Schief, and R. L. Dunbrack Jr (2018) RosettaAntibodyDesign (rabd): a general framework for computational antibody design. PLoS computational biology 14 (4), pp. e1006112. Cited by: §1, §2, §4.2, §4.2, §4, §4.
  • R. Akbar, P. A. Robert, C. R. Weber, M. Widrich, R. Frank, M. Pavlović, L. Scheffer, M. Chernigovskaya, I. Snapkov, A. Slabodkin, et al. (2021) In silico proof of principle of machine learning-based antibody design at unconstrained scale. BioRxiv. Cited by: §1, §2, §3, §4.
  • E. C. Alley, G. Khimulya, S. Biswas, M. AlQuraishi, and G. M. Church (2019) Unified rational protein engineering with sequence-based deep representation learning. Nature methods 16 (12), pp. 1315–1322. Cited by: §1, §2.
  • M. Baek, F. DiMaio, I. Anishchenko, J. Dauparas, S. Ovchinnikov, G. R. Lee, J. Wang, Q. Cong, L. N. Kinch, R. D. Schaeffer, et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373 (6557), pp. 871–876. Cited by: §2.
  • Y. Cao, P. Das, V. Chenthamarakshan, P. Chen, I. Melnyk, and Y. Shen (2021) Fold2Seq: a joint sequence (1d)-fold (3d) embedding-based generative model for protein design. In International Conference on Machine Learning, pp. 1261–1271. Cited by: §2.
  • K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §3.3.
  • J. Dunbar, K. Krawczyk, J. Leem, T. Baker, A. Fuchs, G. Georges, J. Shi, and C. M. Deane (2014) SAbDab: the structural antibody database. Nucleic acids research 42 (D1), pp. D1140–D1146. Cited by: §4.
  • S. Fischman and Y. Ofran (2018) Computational design of antibodies. Current opinion in structural biology 51, pp. 156–162. Cited by: §1.
  • N. W. Gebauer, M. Gastegger, and K. T. Schütt (2019) Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7566–7578. Cited by: §1, §2.
  • A. Grover, A. Zweig, and S. Ermon (2019) Graphite: iterative generative modeling of graphs. In International conference on machine learning, pp. 2434–2444. Cited by: §2.
  • S. Henikoff and J. G. Henikoff (1992) Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89 (22), pp. 10915–10919. Cited by: §4.1.
  • J. Ingraham, V. K. Garg, R. Barzilay, and T. Jaakkola (2019) Generative models for graph-based protein design. Neural Information Processing Systems. Cited by: §A.1, §1, §2, §2, §3.1.
  • J. Ingraham, A. Riesselman, C. Sander, and D. Marks (2018) Learning protein structure with a differentiable simulator. In International Conference on Learning Representations, Cited by: §2.
  • W. Jin, R. Barzilay, and T. Jaakkola (2020a) Hierarchical generation of molecular graphs using structural motifs. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119, pp. 4839–4848. Cited by: §2.
  • W. Jin, R. Barzilay, and T. Jaakkola (2020b) Multi-objective molecule generation using interpretable substructures. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119, pp. 4849–4859. Cited by: §4, §4.
  • J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al. (2021) Highly accurate protein structure prediction with alphafold. Nature 596 (7873), pp. 583–589. Cited by: §2.
  • W. Kabsch (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 32 (5), pp. 922–923. Cited by: §4.1.
  • M. Karimi, S. Zhu, Y. Cao, and Y. Shen (2020)

    De novo protein design for novel folds using guided conditional wasserstein generative adversarial networks

    Journal of Chemical Information and Modeling 60 (12), pp. 5667–5681. Cited by: §2.
  • G. D. Lapidoth, D. Baran, G. M. Pszolla, C. Norn, A. Alon, M. D. Tyka, and S. J. Fleishman (2015) AbDesign: a n algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins: Structure, Function, and Bioinformatics 83 (8), pp. 1385–1406. Cited by: §1, §2.
  • A. Leaver-Fay, M. Tyka, S. M. Lewis, O. F. Lange, J. Thompson, R. Jacak, K. W. Kaufman, P. D. Renfrew, C. A. Smith, W. Sheffler, et al. (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods in enzymology 487, pp. 545–574. Cited by: §2.
  • M. Lefranc, C. Pommié, M. Ruiz, V. Giudicelli, E. Foulquier, L. Truong, V. Thouvenin-Contet, and G. Lefranc (2003) IMGT unique numbering for immunoglobulin and t cell receptor variable domains and ig superfamily v-like domains. Developmental & Comparative Immunology 27 (1), pp. 55–77. Cited by: §4.1.
  • T. Lei (2021)

    When attention meets fast recurrence: training language models with reduced compute

    arXiv preprint arXiv:2102.12459. Cited by: §4.3.
  • T. Li, R. J. Pantazes, and C. D. Maranas (2014) OptMAVEn–a new framework for the de novo design of antibody variable region models targeting specific antigen epitopes. PloS one 9 (8), pp. e105954. Cited by: §2.
  • Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia (2018) Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324. Cited by: §2.
  • R. Liao, Y. Li, Y. Song, S. Wang, W. Hamilton, D. K. Duvenaud, R. Urtasun, and R. Zemel (2019) Efficient graph generation with graph recurrent attention networks. Advances in Neural Information Processing Systems 32, pp. 4255–4265. Cited by: §2.
  • G. Liu, H. Zeng, J. Mueller, B. Carter, Z. Wang, J. Schilz, G. Horny, M. E. Birnbaum, S. Ewert, and D. K. Gifford (2020) Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics 36 (7), pp. 2126–2133. Cited by: §1, §2.
  • Q. Liu, M. Allamanis, M. Brockschmidt, and A. L. Gaunt (2018)

    Constrained graph variational autoencoders for molecule design

    Neural Information Processing Systems. Cited by: §2.
  • S. Luo and W. Hu (2021) Diffusion probabilistic models for 3d point cloud generation. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 2837–2845. Cited by: §2.
  • J. O’Connell, Z. Li, J. Hanson, R. Heffernan, J. Lyons, K. Paliwal, A. Dehzangi, Y. Yang, and Y. Zhou (2018) SPIN2: predicting sequence profiles from protein structures using deep neural networks. Proteins: Structure, Function, and Bioinformatics 86 (6), pp. 629–633. Cited by: §2.
  • R. Pantazes and C. D. Maranas (2010) OptCDR: a general computational method for the design of antibody complementarity determining regions for targeted epitope binding. Protein Engineering, Design & Selection 23 (11), pp. 849–858. Cited by: §2.
  • D. Pinto, Y. Park, M. Beltramello, A. C. Walls, M. A. Tortorici, S. Bianchi, S. Jaconi, K. Culap, F. Zatta, A. De Marco, et al. (2020) Cross-neutralization of sars-cov-2 by a human monoclonal sars-cov antibody. Nature 583 (7815), pp. 290–295. Cited by: §1.
  • M. I. Raybould, A. Kovaltsuk, C. Marks, and C. M. Deane (2021) CoV-abdab: the coronavirus antibody database. Bioinformatics 37 (5), pp. 734–735. Cited by: §4.
  • M. I. Raybould, C. Marks, K. Krawczyk, B. Taddese, J. Nowak, A. P. Lewis, A. Bujotzek, J. Shi, and C. M. Deane (2019) Five computational developability guidelines for therapeutic antibody profiling. Proceedings of the National Academy of Sciences 116 (10), pp. 4025–4030. Cited by: Appendix B, §1, §4.3.
  • K. Saka, T. Kakuzaki, S. Metsugi, D. Kashiwagi, K. Yoshida, M. Wada, H. Tsunoda, and R. Teramoto (2021) Antibody design using lstm based deep generative model from phage display library for affinity maturation. Scientific reports 11 (1), pp. 1–13. Cited by: §1, §2, §4.
  • C. Shi, S. Luo, M. Xu, and J. Tang (2021) Learning gradient fields for molecular conformation generation. International Conference on Machine Learning. Cited by: §2.
  • J. Shin, A. J. Riesselman, A. W. Kollasch, C. McMahon, E. Simon, C. Sander, A. Manglik, A. C. Kruse, and D. S. Marks (2021) Protein design and variant prediction using autoregressive generative models. Nature communications 12 (1), pp. 1–11. Cited by: §1, §2, §3.
  • M. Steinegger and J. Söding (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology 35 (11), pp. 1026–1028. Cited by: §4.1.
  • A. Strokach, D. Becerra, C. Corbi-Verge, A. Perez-Riba, and P. M. Kim (2020) Fast and flexible design of novel proteins using graph neural networks. BioRxiv, pp. 868935. Cited by: §2.
  • D. Tischer, S. Lisanza, J. Wang, R. Dong, I. Anishchenko, L. F. Milles, S. Ovchinnikov, and D. Baker (2020)

    Design of proteins presenting discontinuous functional sites using deep learning

    bioRxiv. Cited by: §2.
  • J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov, and D. Baker (2020a) Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117 (3), pp. 1496–1503. Cited by: §2.
  • K. Yang, W. Jin, K. Swanson, R. Barzilay, and T. Jaakkola (2020b) Improving molecular design by stochastic iterative target augmentation. In International Conference on Machine Learning, pp. 10716–10726. Cited by: §3.4.
  • J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec (2018) GraphRNN: a deep generative model for graphs. International Conference on Machine Learning. Cited by: §1, §1, §2, §4.

Appendix A Model architecture details

a.1 RefineGNN

Node features. Each node feature encodes three dihedral angles as follows.


Edge features. The orientation matrix defines a local coordinate system for each residue (Ingraham et al., 2019), which is calculated as


Attention mechanism. The attention layer used in Eq.(13) is a standard bilinear attention:


a.2 Ar-Gnn

AR-GNN generates an antibody graph autoregressively. In each generation step , AR-GNN learns to encode the current subgraph induced from residues into a list of vectors


For fair comparison, we use the same MPN architecture for both RefineGNN and AR-GNN. In terms of structure prediction, AR-GNN first predicts the node feature of the next residue , namely the dihedral angle between its three atoms .


In addition, AR-GNN predicts the pairwise distance between and previous residues .


where is a feed-forward network with one hidden layer and is the positional encoding of , the gap between residue and in the sequence. Lastly, AR-GNN predicts the amino acid type of residue by


Note that AR-GNN also uses two separate MPNs for structure and sequence prediction. However, unlike RefineGNN, AR-GNN is trained under teacher forcing — we need to feed it the ground truth structure and sequence in each generation step. In particular, we find data augmentation to be crucial for AR-GNN performance. Data augmentation is essential because of the discrepancy between training and testing. The model is trained under teacher forcing, but it needs to decode a graph without teacher forcing at test time. We find mistakes made in previous steps have a great impact on subsequent predictions during decoding.

Specifically, for every antibody , we create a corrupted graph by adding independent random Gaussian noise to every coordinate: . In each generation step, we apply MPN over the corrupted graph instead.


The node and edge labels are still defined by the ground truth structure. Specifically, let and be the ground truth dihedral angle and pairwise distance calculated from the original, uncorrupted graph . AR-GNN loss function is defined as the following.


Similar to RefineGNN, AR-GNN also uses attention mechanism for conditional generation. Specifically, we concatenate the residue representations from MPN with context vectors learned from an attention layer.


Appendix B Experimental details

Hyperparameters. For AR-GNN and RefineGNN, we tried hidden dimension and number of message passing layers . We found worked the best for RefineGNN and worked the best for AR-GNN. For LSTM, we tried . We found worked the best. All models are trained by an Adam optimizer with a dropout of 0.1 and a learning rate of 0.001.

SAbDab data. The dataset statistics of SAbDab is the following (after deduplication). For CDR-H1, the train/validation/test size is 4050, 359, and 326. For CDR-H2, the train/validation/test size is 3876, 483, and 376. For CDR-H3, the train/validation/test size is 3896, 403, and 437.

RAbD configuration. We provided details of the de novo design setup of RosettaAntibodyDesign (RAbD) here. For each antibody in the test set, RAbD starts by randomly selecting a CDR from RAbD’s internal database of known CDR structures. The chosen CDR-H3 sequence is required to have same length as the original sequence, but it cannot be exactly the same as the original CDR-H3 sequence. After the initial CDR structure is chosen, RAbD grafts it onto the antibody and performs energy minimization to stabilize its structure. Next, RAbD runs 100 iterations of sequence design to modify the grafted CDR-H3 structure by randomly substituting amino acids. In each sequence design iteration, it performs energy minimization to adjust the structure according to the changed amino acid. Lastly, the model returns the generated CDR-H3 sequence with the lowest energy.

SARS-CoV-2 neutralization. Each generative model is pretrained on the SAbDab data to learn a prior distribution over CDR-H3 structures. Given a fixed predictor , we use the ITA algorithm to finetune our pretrained models to generate neutralizing antibodies. Each model is trained for 3000 ITA steps with . Generated CDR-H3 sequences from our model are visualized in Table 3.

Our neutralization predictor is trained on the CoVAbDab database. For simplicity, we only consider two viruses, SARS-CoV-1 and SARS-CoV-2 since other coronavirus have very little training data. For the same reason, we only consider the spike protein receptor binding domain as our target epitope. The predictor is trained in a multi-task fashion to predict both SARS-CoV-1 and SARS-CoV-2 neutralization labels. The SRU encoder has a hidden dimension of 256. The model was trained with a dropout of 0.2, a learning rate of 0.0005, and batch size of 16.

The charge of a residue is defined as (Raybould et al., 2019). The net charge of a sequence is defined as .

Antibody: S1D7 C694
Improvement 63% 73% 82% 91%
Table 3: SARS-CoV-2 neutralization optimization results. Here we show examples of new CDR-H3 sequences generated by our model and their predicted neutralization improvement over original antibodies S1D7 and C694 in the CoVAbDab database.