Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems

11/14/2016
by   Maria-Florina Balcan, et al.
0

Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a demand for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2018

Shortest path queries, graph partitioning and covering problems in worst and beyond worst case settings

In this thesis, we design algorithms for several NP-hard problems in bot...
research
09/19/2018

Data-Driven Clustering via Parameterized Lloyd's Families

Algorithms for clustering points in metric spaces is a long-studied area...
research
04/07/2022

Faster algorithms for learning to link, align sequences, and price two-part tariffs

Data-driven algorithm configuration is a promising, learning-based appro...
research
07/01/2019

Learning to Link

Clustering is an important part of many modern data analysis pipelines, ...
research
06/21/2020

Refined bounds for algorithm configuration: The knife-edge of dual class approximability

Automating algorithm configuration is growing increasingly necessary as ...
research
08/08/2019

How much data is sufficient to learn high-performing algorithms?

Algorithms for scientific analysis typically have tunable parameters tha...
research
04/25/2019

The Mutex Watershed and its Objective: Efficient, Parameter-Free Image Partitioning

Image partitioning, or segmentation without semantics, is the task of de...

Please sign up or login with your details

Forgot password? Click here to reset