ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning

02/08/2022
by   Zhenkun Shi, et al.
0

Enzyme Commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab-initio computational approaches were proposed to predict EC numbers for given input sequences directly. However, the prediction performance (accuracy, recall, precision), usability, and efficiency of existing methods still have much room to be improved. Here, we report ECRECer, a cloud platform for accurately predicting EC numbers based on novel deep learning techniques. To build ECRECer, we evaluate different protein representation methods and adopt a protein language model for protein sequence embedding. After embedding, we propose a multi-agent hierarchy deep learning-based framework to learn the proposed tasks in a multi-task manner. Specifically, we used an extreme multi-label classifier to perform the EC prediction and employed a greedy strategy to integrate and fine-tune the final model. Comparative analyses against four representative methods demonstrate that ECRECer delivers the highest performance, which improves accuracy and F1 score by 70 the state-of-the-the-art, respectively. With ECRECer, we can annotate numerous enzymes in the Swiss-Prot database with incomplete EC numbers to their full fourth level. Take UniPort protein "A0A0U5GJ41" as an example (1.14.-.-), ECRECer annotated it with "1.14.11.38", which supported by further protein structure analysis based on AlphaFold2. Finally, we established a webserver (https://ecrecer.biodesign.ac.cn) and provided an offline bundle to improve usability.

READ FULL TEXT

page 9

page 10

page 13

page 24

page 25

page 26

page 27

page 28

research
12/01/2021

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

The capability of accurate prediction of protein functions and propertie...
research
11/04/2018

Deep Robust Framework for Protein Function Prediction using Variable-Length Protein Sequences

Amino acid sequence portrays most intrinsic form of a protein and expres...
research
07/24/2023

DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction

Automatic protein function prediction (AFP) is classified as a large-sca...
research
11/30/2022

xTrimoABFold: Improving Antibody Structure Prediction without Multiple Sequence Alignments

In the field of antibody engineering, an essential task is to design a n...
research
05/30/2023

Predicting protein stability changes under multiple amino acid substitutions using equivariant graph neural networks

The accurate prediction of changes in protein stability under multiple a...
research
11/11/2021

HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

Identifying the targets of an antimicrobial peptide is a fundamental ste...

Please sign up or login with your details

Forgot password? Click here to reset