1 The construction on the cube
1.1 Setup
We are given data points in written as the binary matrix . Our goal is to decompose as a tree of subcubes and “subcube corrections”. A dimensional subcube of is determined by a point , along with a set of restricted indices . The cube consists of the points such that for all , that is
The unrestricted indices can take on either value.
1.2 The construction
Here I will describe a simple version of the construction where each node in the tree corresponds to a subcube of the same dimension , and a hard binary clustering is used at each stage. Suppose our tree has depth . Then the construction consists of

A tree structured clustering of into sets at depth (scale) such that

and cluster representatives (that is dimensional subcubes)
such that the restricted sets have the property that if is an ancestor of ,
and
for all
Here each
is a vector in
; the complete set of roughly corresponds to from before. However, note that each has precisely entries that actually matter; and moreover because of the nested equalities, the leaf nodes carry all the information on the branch. This is not to say that the tree structure is not important or not used it is, as the leaf nodes have to share coordinates. However once the full construction is specified, the leaf representatives are all that is necessary to code a data point.1.3 Algorithms
We can build the partitions and representatives starting from the root and descending down the tree as follows: first, find the best fit dimensional subcube for the whole data set. This is given by a coordinatewise mode; the free coordinates are the ones with the largest average discrepancy from their modes. Remove the fixed coordinates from consideration. Cluster the reduced ( dimensional) data using means with ; on each cluster find the best fit cube. Continue to the leaves.
1.3.1 Refinement
The terms and can be updated with a Lloyd type alternation. With all of the fixed, loop through each from the root of the tree finding the best subcubes at each scale for the current partition. Now update the partition so that each is sent to its best fit leaf cube.
1.3.2 Adaptive , , etc.
In [1], one of the important points is that many of the model parameters, including the , , and the number of clusters could be determined in a principled way. While it is possible that some of their analysis may carry over to this setting, it is not yet done. However, instead of fixing , we can fix a percentage of the energy to be kept at each level, and choose the number of free coordinates accordingly.
2 Experiments
We use a binarized the MNIST training data by thresholding to obtain
. Here and . Replace of the entries in with noise sampled uniformly from , and train a tree structured cube dictionary with and depth . The subdivision scheme used to generate the multiscale clustering is means initialized via randomized farthest insertion [2]; this means we can cycle spin over the dictionaries [5], to get many different reconstructions to average over. In this experiment the reconstruction was preformed 50 times for the noise realization. The results are visualized below.References
 [1] W. Allard, G. Chen, and M. Maggioni. Multiscale geometric methods for data sets II: Geometric multiresolution analysis. to appear in Applied and Computational Harmonic Analysis.
 [2] David Arthur and Sergei Vassilvitskii. kmeans++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms, SODA ’07, pages 1027–1035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.
 [3] Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. ModelBased Compressive Sensing. Dec 2009.

[4]
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright.
Robust principal component analysis?
J. ACM, 58(3):11, 2011.  [5] R. R. Coifman and D. L. Donoho. Translationinvariant denoising. Technical report, Department of Statistics, 1995.
 [6] G. David and S. Semmes. Singular integrals and rectifiable sets in : audelà des graphes Lipschitziens. Astérisque, 193:1–145, 1991.

[7]
Laurent Jacob, Guillaume Obozinski, and JeanPhilippe Vert.
Group lasso with overlap and graph lasso.
In
Proceedings of the 26th Annual International Conference on Machine Learning
, ICML ’09, pages 433–440, New York, NY, USA, 2009. ACM.  [8] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In International Conference on Machine Learning (ICML), 2010.
 [9] P. W. Jones. Rectifiable sets and the traveling salesman problem. Invent Math, 102(1):1–15, 1990.
 [10] Seyoung Kim and Eric P. Xing. Treeguided group lasso for multitask regression with structured sparsity. In ICML, pages 543–550, 2010.
 [11] Gilad Lerman. Quantifying curvelike structures of measures by using Jones quantities. Comm. Pure Appl. Math., 56(9):1294–1365, 2003.
Comments
There are no comments yet.