A Discrete Variational Derivation of Accelerated Methods in Optimization
Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing variational and symplectic integration methods using geometric integrators. In particular, in this paper, we introduce variational integrators which allow us to derive different methods for optimization. Using both, Hamilton's principle and Lagrange-d'Alembert's, we derive two families of optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method, mimicking the behavior of the latter which reduces the oscillations of typical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers. Several experiments exemplify the result.
READ FULL TEXT