Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

08/17/2014
by   Lorenzo Livi, et al.
0

We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.

READ FULL TEXT
research
11/20/2012

A Brief Review of Data Mining Application Involving Protein Sequence Classification

Data mining techniques have been used by researchers for analyzing prote...
research
07/28/2014

Entropic one-class classifiers

The one-class classification problem is a well-known research endeavor i...
research
07/30/2014

Characterization of graphs for protein structure modeling and recognition of solubility

This paper deals with the relations among structural, topological, and c...
research
11/03/2021

Binary classification of proteins by a Machine Learning approach

In this work we present a system based on a Deep Learning approach, by u...
research
06/17/2018

MCP: a multi-component learning machine for prediction of protein secondary structure

Proteins biological function is tightly connected to its specific 3D str...
research
08/22/2014

Designing labeled graph classifiers by exploiting the Rényi entropy of the dissimilarity representation

Representing patterns as labeled graphs is becoming increasingly common ...

Please sign up or login with your details

Forgot password? Click here to reset