BERTology Meets Biology: Interpreting Attention in Protein Language Models

06/26/2020
by   Jesse Vig, et al.
0

Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at https://github.com/salesforce/provis.

READ FULL TEXT

page 2

page 4

research
09/07/2023

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

Motivation: We explored how explainable AI (XAI) can help to shed light ...
research
06/14/2022

Exploring evolution-based -free protein language models as protein function predictors

Large-scale Protein Language Models (PLMs) have improved performance in ...
research
03/11/2023

Enhancing Protein Language Models with Structure-based Encoder and Pre-training

Protein language models (PLMs) pre-trained on large-scale protein sequen...
research
10/17/2022

ProtoFold Neighborhood Inspector

Post-translational modifications (PTMs) affecting a protein's residues (...
research
08/08/2023

PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer

Phosphorylation is central to numerous fundamental cellular processes, i...
research
06/11/2019

iProStruct2D: Identifying protein structural classes by deep learning via 2D representations

In this paper we address the problem of protein classification starting ...
research
07/30/2014

Characterization of graphs for protein structure modeling and recognition of solubility

This paper deals with the relations among structural, topological, and c...

Please sign up or login with your details

Forgot password? Click here to reset