Computing Approximate Statistical Discrepancy
Consider a geometric range space (X,A̧) where each data point x ∈ X has two or more values (say r(x) and b(x)). Also consider a function Φ(A) defined on any subset A ∈ (X,A̧) on the sum of values in that range e.g., r_A = ∑_x ∈ A r(x) and b_A = ∑_x ∈ A b(x). The Φ-maximum range is A^* = _A ∈ (X,A̧)Φ(A). Our goal is to find some  such that |Φ(Â) - Φ(A^*)| ≤ε. We develop algorithms for this problem for range spaces with bounded VC-dimension, as well as significant improvements for those defined by balls, halfspaces, and axis-aligned rectangles. This problem has many applications in many areas including discrepancy evaluation, classification, and spatial scan statistics.
READ FULL TEXT