What is the dimension of your binary data?

02/04/2019
by   Nikolaj Tatti, et al.
24

Many 0/1 datasets have a very large number of variables; on the other hand, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension is the number of columns in a dataset D' with independent columns and having the same (unnormalized) fractal dimension as D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against baseline measures such as PCA. We also study the relationship of the dimension of the whole dataset and the dimensions of subgroups formed by clustering. The results indicate interesting differences between and within datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2012

Dimension Correction for Hierarchical Latent Class Models

Model complexity is an important factor to consider when selecting among...
research
04/28/2021

Distributional Results for Model-Based Intrinsic Dimension Estimators

Modern datasets are characterized by a large number of features that may...
research
01/29/2018

Structured Spreadsheet Modelling and Implementation with Multiple Dimensions - Part 1: Modelling

Dimensions are an integral part of many models we use every day. Without...
research
05/28/2018

Clustering by latent dimensions

This paper introduces a new clustering technique, called dimensional cl...
research
06/11/2012

Dimension Independent Similarity Computation

We present a suite of algorithms for Dimension Independent Similarity Co...
research
04/03/2023

Taylor Polynomials of Rational Functions

A Taylor variety consists of all fixed order Taylor polynomials of ratio...
research
05/18/2012

Theory of Dependent Hierarchical Normalized Random Measures

This paper presents theory for Normalized Random Measures (NRMs), Normal...

Please sign up or login with your details

Forgot password? Click here to reset