Scalable Multivariate Histograms
We give a distributed variant of an adaptive histogram estimation procedure previously developed by the first author. The procedure is based on regular pavings and is known to have numerous appealing statistical and arithmetical properties. The distributed version makes it possible to process data sets significantly bigger than previously. We provide prototype implementation under a permissive license.
READ FULL TEXT