Optimization and Learning With Nonlocal Calculus
Nonlocal models have recently had a major impact in nonlinear continuum mechanics and are used to describe physical systems/processes which cannot be accurately described by classical, calculus based "local" approaches. In part, this is due to their multiscale nature that enables aggregation of micro-level behavior to obtain a macro-level description of singular/irregular phenomena such as peridynamics, crack propagation, anomalous diffusion and transport phenomena. At the core of these models are nonlocal differential operators, including nonlocal analogs of the gradient/Hessian. This paper initiates the use of such nonlocal operators in the context of optimization and learning. We define and analyze the convergence properties of nonlocal analogs of (stochastic) gradient descent and Newton's method on Euclidean spaces. Our results indicate that as the nonlocal interactions become less noticeable, the optima corresponding to nonlocal optimization converge to the "usual" optima. At the same time, we argue that nonlocal learning is possible in situations where standard calculus fails. As a stylized numerical example of this, we consider the problem of non-differentiable parameter estimation on a non-smooth translation manifold and show that our nonlocal gradient descent recovers the unknown translation parameter from a non-differentiable objective function.
READ FULL TEXT