DI2: prior-free and multi-item discretization ofbiomedical data and its applications

03/07/2021
by   Leonardo Alexandre, et al.
0

Motivation: A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations. Results: In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. DI2 provides robust guarantees of generalization by placing data corrections using the Kolmogorov-Smirnov test before statistically fitting distribution candidates. DI2 further supports multi-item assignments. Results gathered from biomedical data show its relevance to improve classic discretization choices. Software: available at https://github.com/JupitersMight/DI2

READ FULL TEXT

page 1

page 11

research
06/19/2017

Aztec: A Platform to Render Biomedical Software Findable, Accessible, Interoperable, and Reusable

Precision medicine and health requires the characterization and phenotyp...
research
01/25/2019

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Biomedical text mining is becoming increasingly important as the number ...
research
04/27/2020

An Empirical Study on Feature Discretization

When dealing with continuous numeric features, we usually adopt feature ...
research
08/19/2016

Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

In this paper, we report a knowledge-based method for Word Sense Disambi...
research
01/23/2020

Locking free and gradient robust H(div)-conforming HDG methods for linear elasticity

Robust discretization methods for (nearly-incompressible) linear elastic...
research
07/07/2023

Unsupervised 3D out-of-distribution detection with latent diffusion models

Methods for out-of-distribution (OOD) detection that scale to 3D data ar...
research
02/09/2018

Using Discretization for Extending the Set of Predictive Features

To date, attribute discretization is typically performed by replacing th...

Please sign up or login with your details

Forgot password? Click here to reset