Exact high-dimensional asymptotics for support vector machine
Support vector machine (SVM) is one of the most widely used classification methods. In this paper, we consider soft margin support vector machine used on data points with independent features, where the sample size n and the feature dimension p grows to ∞ in a fixed ratio p/n→δ. We propose a set of equations that exactly characterizes the asymptotic behavior of support vector machine. In particular, we give exact formula for (1) the variability of the optimal coefficients, (2) proportion of data points lying on the margin boundary (i.e. number of support vectors), (3) the final objective function value, and (4) expected misclassification error on new data points, which in particular implies exact formula for the optimal tuning parameter given a data generating mechanism. The global null case is considered first, where the label y∈{+1,-1} is independent of the feature x. Then the signaled case is considered, where the label y∈{+1,-1} is allowed to have a general dependence on the feature x through a linear combination a_0^Tx. These results for the non-smooth hinge loss serve as an analogue to the recent results in sur2018modern for smooth logistic loss. Our approach is based on heuristic leave-one-out calculations.
READ FULL TEXT