A Geometric Statistic for Quantifying Correlation Between Tree-Shaped Datasets
The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometric quantification of tree correlation does not exist in the literature although tree-shaped datasets are frequently encountered in various fields, such as academic genealogy tree and embryonic development tree. In this paper, we introduce a geometric statistic to both represent tree correlation intuitively and quantify its magnitude precisely. The theoretical properties of the geometric statistic are provided. Large-scale simulations based on various data distributions demonstrate that the geometric statistic is precise in measuring the tree correlation. Its real application on mathematical genealogy trees also demonstrated its usefulness.
READ FULL TEXT