The process of learning has been recently formulated under the framework of laws of nature derived from variational principle . While the paper addresses some fundamental issues on the links with mechanics, a major open problem is the one connected with the satisfaction of the boundary conditions of the Euler-Lagrange equations of learning.
This paper springs out from recent studies especially on the problem of learning visual features [3, 4, 2] and it is also stimulated by a nice analysis on the interpretation of Newtonian mechanics equations in the variational framework . It is pointed out that the formulation of learning as Euler-Lagrange (EL) differential equation is remarkably different with respect to classic gradient flow. The difference is mostly originated from the continuous nature of time; while gradient flow has a truly algorithmic flavor, the EL-equations of learning, which are the outcome of imposing a null variation of the action, can be interpreted as laws of nature.
The paper shows that learning is driven by fourth-order differential equations that collapses to second-order under an intriguing interpretation connected with the mentioned result given in  concerning the arising of Newtonian laws.
2 Euler-Lagrange equations
Consider an integral functional of the following form
where maps a point into the real number and is a map of . Consider a partition of the interval into subintervals of length . Given a function one can identify the point , and in general one can define the subset of
Now consider the and consider the following “approximation” of the functional integral :
where . The stationarity condition on is , thus we have
Using the fact that and we get
This means that the condition implies
where, consistently with our previous definition we are assuming that .
This last equation is indeed the discrete counterpart of the Euler-Lagrange equations in the continuum:
The discovery of stationary points of the cognitive action defined by Eq. 1 is somewhat related with the gradient flow that one might activate to optimize , namely by the classic updating rule
This flow is clearly different with respect to Eq. 4 (see also its continuous counterpart 5). Basically, while the Euler-Lagrange equations yield an updating computation model of , the gradient flow moves
3 A surprising link with mechanics
Let us consider the action
The Euler-Lagrange equations are
Since we have
In case we make no assumption on the variation then these equations must be joined with the boundary condition . Now suppose , with . Then Eq. 9 becomes
The Lagrangian , with and , and , is the one used in mechanics, which returns the Newtonian equations
of the damping oscillator. We notice in passing that this equation arises when choosing the classic action from mechanics, which does not seem to be adequate for machine learning since the potential (analogous to the loss function) and the kinetic energy (analogous to the regularization term) come with different sign. It is also worth mentioning that the trivial choiceyields a pure oscillation with no dissipation, which is on the opposite the fundamental ingredient of learning.
This Lagrangian, however, does not convey a reasonable interpretation for a learning theory, since one very much would like , so as could be nicely interpreted as a temporal regularization parameter. Before exploring a different interpretation, we notice in passing that large values of , which corresponds with strong dissipation on small masses yields the gradient flow
4 Laws of learning and gradient flow
While the discussion in the previous section provides a somewhat surprising links with mechanics, the interpretation of the learning as a problem of least actions is not very satisfactory since, just like in mechanics, we only end up into stationary points of the actions that are typically saddle points.
We will see that an appropriate choice of the Lagrangian function yields truly laws of nature where Euler-Lagrange equations turns out to minimize corresponding actions that are appropriate to capture learning tasks. We consider kinetic energies that also involve the acceleration and two different cases which depend on the choice of . The new action is
where . In the continuum setting, the corresponding Euler-Lagrange equations can be determined by considering the variation associated with , where is a variation and . We have
If we integrate by parts, we get
and, therefore, the variation becomes
Now, suppose we give the initial conditions on and . In that case we can promptly see that this is equivalent with posing and . Hence, we get the Euler-Lagrange equation when posing
Now if we choose as a constant we immediately get
while if we choose as an affine function, when considering the above condition we get
Finally, the stationary point of the action corresponds with the Euler-Lagrange equations
Now, let us consider the case in which . The Euler-Lagrange equations become
If we consider again the case we get
Now we consider the kinetic energy associated with the differential operator
Let us consider the following two different cases of . In both cases, they convey the unidirectional structure of time.
(19) (20) (21)
A possible satisfaction is . Notice that as the Euler-Lagrange Eq. 19 reduces to
and the corresponding boundary conditions are always verified.
Let us assume that in the kinetic energy 18 and . In particular we consider the action
In this case the Lagrange equations turn out to be
along with the boundary conditions
Interesting, as the Euler-Lagrange equations become:
where the boundary conditions are always satisfied.
Notice that while we can choose the parameters in such a way that Eq. 19 is stable, the same does not hold for Eq. 24. Interestingly, stability can be gained for , which is corresponds with a singular solution. Basically if we denote by the solution associated with , we have that does not approximate corresponding at in case in which we can choose arbitrarily large domains .
While machine learning is typically framed in the statistical setting, in this case time is exploited in such a way that one relies on a sort of underlying ergodic principle according to which statistical regularities can be captured in time. This paper shows that the continuous nature of time gives rise to computational models of learning that can be interpreted as laws of nature. Unlike traditional stochastic gradient, the theory suggests that, just like in mechanics, learning is driven by the Euler-Lagrange equations that minimize a sort of functional risk. The collapsing from forth- to second-order differential equations opens the doors to an in-depth theoretical and experimental investigation.
We thank Giovanni Bellettini for insightful discussions.
-  Alessandro Betti and Marco Gori. The principle of least cognitive action. Theor. Comput. Sci., 633:83–99, 2016.
-  Alessandro Betti and Marco Gori. Convolutional networks in visual environments. CoRR, abs/1801.07110, 2018.
-  Alessandro Betti, Marco Gori, and Stefano Melacci. Cognitive action laws: The case of visual features. CoRR, abs/1808.09162, 2018.
-  Alessandro Betti, Marco Gori, and Stefano Melacci. Motion invariance in visual environments. CoRR, abs/1807.06450, 2018.
-  Matthias Liero and Ulisse Stefanelli. A new minimum principle for lagrangian mechanics. Journal of Nonlinear Science, 23:179–204, 2013.