Properly Learning Decision Trees with Queries Is NP-Hard

by   Caleb Koch, et al.

We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.


page 1

page 2

page 3

page 4


Properly learning decision trees in almost polynomial time

We give an n^O(loglog n)-time membership query algorithm for properly an...

Superpolynomial Lower Bounds for Decision Tree Learning and Testing

We establish new hardness results for decision tree optimization problem...

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

Decision trees are important both as interpretable models amenable to hi...

Query Minimization under Stochastic Uncertainty

We study problems with stochastic uncertainty information on intervals f...

Query strategies for priced information, revisited

We consider the problem of designing query strategies for priced informa...

Superpolynomial Lower Bounds for Learning Monotone Classes

Koch, Strassle, and Tan [SODA 2023], show that, under the randomized exp...

Fourier Growth of Parity Decision Trees

We prove that for every parity decision tree of depth d on n variables, ...

Please sign up or login with your details

Forgot password? Click here to reset