Analysis of nonsmooth stochastic approximation: the differential inclusion approach
In this paper we address the convergence of stochastic approximation when the functions to be minimized are not convex and nonsmooth. We show that the "mean-limit" approach to the convergence which leads, for smooth problems, to the ODE approach can be adapted to the non-smooth case. The limiting dynamical system may be shown to be, under appropriate assumption, a differential inclusion. Our results expand earlier works in this direction by Benaim et al. (2005) and provide a general framework for proving convergence for unconstrained and constrained stochastic approximation problems, with either explicit or implicit updates. In particular, our results allow us to establish the convergence of stochastic subgradient and proximal stochastic gradient descent algorithms arising in a large class of deep learning and high-dimensional statistical inference with sparsity inducing penalties.
READ FULL TEXT