Learning Rates for Nonconvex Pairwise Learning
Pairwise learning is receiving increasing attention since it covers many important machine learning tasks, e.g., metric learning, AUC maximization, and ranking. Investigating the generalization behavior of pairwise learning is thus of significance. However, existing generalization analysis mainly focuses on the convex objective functions, leaving the nonconvex learning far less explored. Moreover, the current learning rates derived for generalization performance of pairwise learning are mostly of slower order. Motivated by these problems, we study the generalization performance of nonconvex pairwise learning and provide improved learning rates. Specifically, we develop different uniform convergence of gradients for pairwise learning under different assumptions, based on which we analyze empirical risk minimizer, gradient descent, and stochastic gradient descent pairwise learning. We first successfully establish learning rates for these algorithms in a general nonconvex setting, where the analysis sheds insights on the trade-off between optimization and generalization and the role of early-stopping. We then investigate the generalization performance of nonconvex learning with a gradient dominance curvature condition. In this setting, we derive faster learning rates of order 𝒪(1/n), where n is the sample size. Provided that the optimal population risk is small, we further improve the learning rates to 𝒪(1/n^2), which, to the best of our knowledge, are the first 𝒪(1/n^2)-type of rates for pairwise learning, no matter of convex or nonconvex learning. Overall, we systematically analyzed the generalization performance of nonconvex pairwise learning.
READ FULL TEXT