Toward a generic representation of random variables for machine learning

06/02/2015
by   Gautier Marti, et al.
0

This paper presents a pre-processing and a distance which improve the performance of machine learning algorithms working on independent and identically distributed stochastic processes. We introduce a novel non-parametric approach to represent random variables which splits apart dependency and distribution without losing any information. We also propound an associated metric leveraging this representation and its statistical estimate. Besides experiments on synthetic datasets, the benefits of our contribution is illustrated through the example of clustering financial time series, for instance prices from the credit default swaps market. Results are available on the website www.datagrapple.com and an IPython Notebook tutorial is available at www.datagrapple.com/Tech for reproducible research.

READ FULL TEXT

page 8

page 9

page 11

page 12

research
04/28/2016

Optimal Transport vs. Fisher-Rao distance between Copulas for Clustering Multivariate Time Series

We present a methodology for clustering N objects which are described by...
research
03/13/2016

Clustering Financial Time Series: How Long is Enough?

Researchers have used from 30 days to several years of daily returns as ...
research
09/17/2022

Some stochastic comparison results for frailty and resilience models

Frailty and resilience models provide a way to introduce random effects ...
research
03/20/2017

Independence clustering (without a matrix)

The independence clustering problem is considered in the following formu...
research
05/17/2023

Time Series Clustering With Random Convolutional Kernels

Time series can describe a wide range of natural and social phenomena. A...
research
06/22/2023

Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement

Spaces with locally varying scale of measurement, like multidimensional ...

Please sign up or login with your details

Forgot password? Click here to reset