Star Discrepancy Subset Selection: Problem Formulation and Efficient Approaches for Low Dimensions

01/19/2021
by   Carola Doerr, et al.
0

Motivated by applications in instance selection, we introduce the star discrepancy subset selection problem, which consists of finding a subset of m out of n points that minimizes the star discrepancy. We introduce two mixed integer linear formulations (MILP) and a combinatorial branch-and-bound (BB) algorithm for this problem and we evaluate our approaches against random subset selection and a greedy construction on different use-cases in dimension two and three. Our results show that one of the MILPs and BB are efficient in dimension two for large and small m/n ratio, respectively, and for not too large n. However, the performance of both approaches decays strongly for larger dimensions and set sizes. As a side effect of our empirical comparisons we obtain point sets of discrepancy values that are much smaller than those of common low-discrepancy sequences, random point sets, and of Latin Hypercube Sampling. This suggests that subset selection could be an interesting approach for generating point sets of small discrepancy value.

READ FULL TEXT

page 7

page 21

research
06/27/2023

Heuristic Approaches to Obtain Low-Discrepancy Point Sets via Subset Selection

Building upon the exact methods presented in our earlier work [J. Comple...
research
04/07/2013

Constructing Low Star Discrepancy Point Sets with Genetic Algorithms

Geometric discrepancies are standard measures to quantify the irregulari...
research
06/29/2023

Computing Star Discrepancies with Numerical Black-Box Optimization Algorithms

The L_∞ star discrepancy is a measure for the regularity of a finite set...
research
01/16/2019

Set-Codes with Small Intersections and Small Discrepancies

We are concerned with the problem of designing large families of subsets...
research
04/30/2018

Computing Approximate Statistical Discrepancy

Consider a geometric range space (X,A̧) where each data point x ∈ X has ...
research
02/27/2023

Generator Matrices by Solving Integer Linear Programs

In quasi-Monte Carlo methods, generating high-dimensional low discrepanc...
research
12/06/2018

Comparative Document Summarisation via Classification

This paper considers extractive summarisation in a comparative setting: ...

Please sign up or login with your details

Forgot password? Click here to reset