PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer

08/08/2023
by   Ziyang Xu, et al.
1

Phosphorylation is central to numerous fundamental cellular processes, influencing the onset and progression of a variety of diseases. Identification of phosphorylation sites is thus an important step for understanding the molecular mechanisms of cells and virus infection, which potentially leads to new therapeutic targets. In this study, we present PTransIPs, a novel deep learning model for the identification of phosphorylation sites. PTransIPs treats amino acids in protein sequences as words in natural language, extracting unique encodings based on the types along with position of amino acids in the sequence. It also incorporates embeddings from large pre-trained protein models as additional data inputs. PTransIPS is further trained on a combination model of convolutional neural network with residual connections and Transformer model equipped with multi-head attention mechanisms. At last, the model outputs classification results through a fully connected layer. The results of independent testing reveal that PTransIPs outperforms existing state-of-the-art methodologies, achieving AUROCs of 0.9232 and 0.9660 for identifying phosphorylated S/T and Y sites respectively. In addition, ablation studies prove that pretrained model embeddings contribute to the performance of PTransIPs. Furthermore, PTransIPs has interpretable amino acid preference, visible training process and shows generalizability on other bioactivity classification tasks. To facilitate usage, our code and data are publicly accessible at <https://github.com/StatXzy7/PTransIPs>.

READ FULL TEXT

page 1

page 8

research
12/06/2017

Attention based convolutional neural network for predicting RNA-protein binding sites

RNA-binding proteins (RBPs) play crucial roles in many biological proces...
research
10/27/2021

MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

A missense mutation is a point mutation that results in a substitution o...
research
12/05/2020

Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream Tasks

Less than 1 annotated. Natural Language Processing (NLP) community has r...
research
06/26/2020

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Transformer architectures have proven to learn useful representations fo...
research
05/18/2023

Vaxformer: Antigenicity-controlled Transformer for Vaccine Design Against SARS-CoV-2

The SARS-CoV-2 pandemic has emphasised the importance of developing a un...
research
05/07/2023

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins

We report a flexible language-model based deep learning strategy, applie...
research
08/10/2021

A Brief Review of Machine Learning Techniques for Protein Phosphorylation Sites Prediction

Reversible Post-Translational Modifications (PTMs) have vital roles in e...

Please sign up or login with your details

Forgot password? Click here to reset