Redistributor: Transforming Empirical Data Distributions

10/25/2022
by   Pavol Harar, et al.
0

We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable S and the continuous cumulative distribution function of some desired target T, it provably produces a consistent estimator of the transformation R which satisfies R(S)=T in distribution. As the distribution of S or T may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. The package is implemented in Python and is optimized to efficiently handle large data sets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://gitlab.com/paloha/redistributor.

READ FULL TEXT

page 1

page 13

page 14

page 16

page 17

research
04/17/2023

pgmpy: A Python Toolkit for Bayesian Networks

Bayesian Networks (BNs) are used in various fields for modeling, predict...
research
09/19/2017

varbvs: Fast Variable Selection for Large-scale Regression

We introduce varbvs, a suite of functions written in R and MATLAB for re...
research
12/20/2021

PyChEst: a Python package for the consistent retrospective estimation of distributional changes in piece-wise stationary time series

We introduce PyChEst, a Python package which provides tools for the simu...
research
12/02/2019

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

This paper introduces scikit-hubness, a Python package for efficient nea...
research
08/14/2023

LCE: An Augmented Combination of Bagging and Boosting in Python

lcensemble is a high-performing, scalable and user-friendly Python packa...
research
03/04/2021

GenoML: Automated Machine Learning for Genomics

GenoML is a Python package automating machine learning workflows for gen...
research
06/22/2022

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning ...

Please sign up or login with your details

Forgot password? Click here to reset