HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

11/11/2021
by   Qinze Yu, et al.
10

Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immune response and combating antibiotic resistance, and more broadly, precision medicine and public health. There have been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is an antimicrobial peptide (AMP) or a non-AMP and (ii) which targets are these sequences effective to (Gram-positive, Gram-negative, etc.). Despite the existing deep learning methods on this problem, most of them are unable to handle the small AMP classes (anti-insect, anti-parasite, etc.). And more importantly, some AMPs can have multiple targets, which the previous methods fail to consider. In this study, we build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases. To generate efficient representations and features for the small classes dataset, we take advantage of a protein language model trained on 250 million protein sequences. Based on that, we develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively. After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes. Extensive experiments suggest that our framework outperforms state-of-the-art models in both the binary classification task and the multi-label classification task, especially on the minor classes.The model is robust against reduced features and small perturbations and produces promising results. We believe HMD-AMP contributes to both the future wet-lab investigations of the innate structural properties of different antimicrobial peptides and build promising empirical underpinnings for precise medicine with antibiotics.

READ FULL TEXT

page 1

page 5

page 6

page 7

page 11

page 12

page 13

page 14

research
10/20/2020

Coherent Hierarchical Multi-Label Classification Networks

Hierarchical multi-label classification (HMC) is a challenging classific...
research
07/24/2023

DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction

Automatic protein function prediction (AFP) is classified as a large-sca...
research
03/30/2017

Near Perfect Protein Multi-Label Classification with Deep Neural Networks

Artificial neural networks (ANNs) have gained a well-deserved popularity...
research
04/21/2021

Interpretation of multi-label classification models using shapley values

Multi-label classification is a type of classification task, it is used ...
research
09/18/2017

Leveraging Distributional Semantics for Multi-Label Learning

We present a novel and scalable label embedding framework for large-scal...
research
02/08/2022

ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning

Enzyme Commission (EC) numbers, which associate a protein sequence with ...
research
11/12/2021

Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence

SARS-CoV-2 is an upper respiratory system RNA virus that has caused over...

Please sign up or login with your details

Forgot password? Click here to reset