# On the Role of Time in Learning

By and large the process of learning concepts that are embedded in time is regarded as quite a mature research topic. Hidden Markov models, recurrent neural networks are, amongst others, successful approaches to learning from temporal data. In this paper, we claim that the dominant approach minimizing appropriate risk functions defined over time by classic stochastic gradient might miss the deep interpretation of time given in other fields like physics. We show that a recent reformulation of learning according to the principle of Least Cognitive Action is better suited whenever time is involved in learning. The principle gives rise to a learning process that is driven by differential equations, that can somehow descrive the process within the same framework as other laws of nature.

## Authors

• 23 publications
• 45 publications
• ### Least Action Principles and Well-Posed Learning Problems

Machine Learning algorithms are typically regarded as appropriate optimi...
07/04/2019 ∙ by Alessandro Betti, et al. ∙ 0

• ### Cognitive Action Laws: The Case of Visual Features

This paper proposes a theory for understanding perceptual learning proce...
08/28/2018 ∙ by Alessandro Betti, et al. ∙ 14

• ### Stochastic Physics-Informed Neural Networks (SPINN): A Moment-Matching Framework for Learning Hidden Physics within Stochastic Differential Equations

Stochastic differential equations (SDEs) are used to describe a wide var...
09/03/2021 ∙ by Jared O'Leary, et al. ∙ 0

• ### Hidden Markov models are recurrent neural networks: A disease progression modeling application

Hidden Markov models (HMMs) are commonly used for sequential data modeli...
06/04/2020 ∙ by Matt Baucum, et al. ∙ 0

• ### Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov Models

Cyber threat intelligence is one of the emerging areas of focus in infor...
05/28/2019 ∙ by Soham Deshmukh, et al. ∙ 0

• ### Developing Constrained Neural Units Over Time

In this paper we present a foundational study on a constrained method th...
09/01/2020 ∙ by Alessandro Betti, et al. ∙ 0

• ### Motion Invariance in Visual Environments

The puzzle of computer vision might find new challenging solutions when ...
07/14/2018 ∙ by Alessandro Betti, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The process of learning has been recently formulated under the framework of laws of nature derived from variational principle [1]. While the paper addresses some fundamental issues on the links with mechanics, a major open problem is the one connected with the satisfaction of the boundary conditions of the Euler-Lagrange equations of learning.

This paper springs out from recent studies especially on the problem of learning visual features [3, 4, 2] and it is also stimulated by a nice analysis on the interpretation of Newtonian mechanics equations in the variational framework [5]. It is pointed out that the formulation of learning as Euler-Lagrange (EL) differential equation is remarkably different with respect to classic gradient flow. The difference is mostly originated from the continuous nature of time; while gradient flow has a truly algorithmic flavor, the EL-equations of learning, which are the outcome of imposing a null variation of the action, can be interpreted as laws of nature.

The paper shows that learning is driven by fourth-order differential equations that collapses to second-order under an intriguing interpretation connected with the mentioned result given in [5] concerning the arising of Newtonian laws.

## 2 Euler-Lagrange equations

Consider an integral functional of the following form

 A(q):=∫tNt1L(t,q(t),˙q(t))dt (1)

where maps a point into the real number and is a map of . Consider a partition of the interval into subintervals of length . Given a function one can identify the point , and in general one can define the subset of

 Xε:={(q(t1),q(t2),…q(tN))∈RN:q∈X}.

Now consider the and consider the following “approximation” of the functional integral :

 Aε(x1,…,xN):=εN−1∑k=1L(k,xk,Δεxk),

where . The stationarity condition on is , thus we have

 ∇iAε(x)=ε∇i[L(i−1,xi−1,Δεxi−1)+L(i,xi,Δεxi)].

Using the fact that and we get

 ∇iAε(x) =ε[Lp(i−1,xi−1,Δεxi−1)ε−1+Lq(i,xi,Δεxi)−Lp(i,xi,Δεxi)ε−1] (2) =εLq(i,xi,Δεxi)−εLp(i,xi,Δεxi)−Lp(i−1,xi−1,Δεxi−1)ε. (3)

This means that the condition implies

 Lq(i,xi,Δεxi)−ΔεLp(i−1,xi−1,Δεxi−1)=0,i=2,…,N−1, (4)

where, consistently with our previous definition we are assuming that .

This last equation is indeed the discrete counterpart of the Euler-Lagrange equations in the continuum:

 Lq(t,u(t),˙u(t))−ddtLp(t,u(t),˙u(t))=0,t∈[t1,tN]. (5)

The discovery of stationary points of the cognitive action defined by Eq. 1 is somewhat related with the gradient flow that one might activate to optimize , namely by the classic updating rule

 Xϵ←Xϵ−η∇Aϵ. (6)

This flow is clearly different with respect to Eq. 4 (see also its continuous counterpart 5). Basically, while the Euler-Lagrange equations yield an updating computation model of , the gradient flow moves

## 3 A surprising link with mechanics

Let us consider the action

 A=∫T0dt h(t)¯L(x(t),q(t),˙q(t)). (7)

The Euler-Lagrange equations are

 h¯Lq−˙h¯L˙q−hddt¯L˙q=0. (8)

Since we have

 ddt¯L˙q+˙hh¯L˙q−¯Lq=0. (9)

In case we make no assumption on the variation then these equations must be joined with the boundary condition . Now suppose , with . Then Eq. 9 becomes

 ddtT˙q+˙hhT˙q−γVq=0. (10)

The Lagrangian , with and , and , is the one used in mechanics, which returns the Newtonian equations

 m¨q+θ˙q+Vq=0

of the damping oscillator. We notice in passing that this equation arises when choosing the classic action from mechanics, which does not seem to be adequate for machine learning since the potential (analogous to the loss function) and the kinetic energy (analogous to the regularization term) come with different sign. It is also worth mentioning that the trivial choice

yields a pure oscillation with no dissipation, which is on the opposite the fundamental ingredient of learning.

This Lagrangian, however, does not convey a reasonable interpretation for a learning theory, since one very much would like , so as could be nicely interpreted as a temporal regularization parameter. Before exploring a different interpretation, we notice in passing that large values of , which corresponds with strong dissipation on small masses yields the gradient flow

 ˙q=−1θVq

## 4 Laws of learning and gradient flow

While the discussion in the previous section provides a somewhat surprising links with mechanics, the interpretation of the learning as a problem of least actions is not very satisfactory since, just like in mechanics, we only end up into stationary points of the actions that are typically saddle points.

We will see that an appropriate choice of the Lagrangian function yields truly laws of nature where Euler-Lagrange equations turns out to minimize corresponding actions that are appropriate to capture learning tasks. We consider kinetic energies that also involve the acceleration and two different cases which depend on the choice of . The new action is

 A2=∫T0dt L(t,q(t),˙q(t),¨q(t)), (11)

where . In the continuum setting, the corresponding Euler-Lagrange equations can be determined by considering the variation associated with , where is a variation and . We have

 δA2=s∫T0dt (Lqv+Lp˙v+La¨v). (12)

If we integrate by parts, we get

 ∫T0dt Lp˙v=−∫T0dt vddtLp+[vLp]T0 ∫T0dt La¨v=−∫T0dt ˙vddtLa+[˙vLa]T0=∫T0dt vd2dt2La−[vddtLa]T0+[˙vLa]T0,

and, therefore, the variation becomes

 δA2=s∫T0dt v(d2dt2La−ddtLp+Lq)+[v(Lp−ddtLa)]T0+[˙vLa]T0=0.

Now, suppose we give the initial conditions on and . In that case we can promptly see that this is equivalent with posing and . Hence, we get the Euler-Lagrange equation when posing

 v(T)(Lp∣∣t=T−ddtLa∣∣∣t=T)+˙v(T)La∣∣t=T=0.

Now if we choose as a constant we immediately get

 Lp∣∣t=T−ddtLa∣∣∣t=T=0, (13)

while if we choose as an affine function, when considering the above condition we get

 La∣∣∣t=T=0. (14)

Finally, the stationary point of the action corresponds with the Euler-Lagrange equations

 d2dt2La−ddtLp+Lq=0, (15)

that holds along with Cauchy initial conditions on and boundary conditions 13 and 14.

Now, let us consider the case in which . The Euler-Lagrange equations become

 d2dt2¯La+2˙hhddt¯La+¨hh¯La−˙hh¯Lp−ddt¯Lp+Lq=0. (16)

If we consider again the case we get

 d2dt2Ta+2˙hhddtTa+¨hhTa−˙hhTp−ddtTp+γVq=0. (17)

Now we consider the kinetic energy associated with the differential operator

 T=12θ2(Pq)2=12θ2(α1˙q+α2¨q)2=12α21θ2˙q2+α1α2θ2˙q¨q+12α22θ2¨q2 (18)

Let us consider the following two different cases of . In both cases, they convey the unidirectional structure of time.

1. In this case, when plugging the kinetic energy in Eq. 18 into Eq. 17 we get

 1θ2q(4)+2θq(3)+α1α2θ+α22θ2−α21α22θ2¨q+α1α2θ2−α21θα22θ2˙q+γα22Vq=0. (19)

These equations hold along with Cauchy conditions and boundary conditions given by Eq. 13 and 14, that turn out to be

 α21θ2˙q(T)−α22θ2q(3)(T)=0 (20) α1α2θ2˙q(T)+α22θ2¨q(T)=0. (21)

A possible satisfaction is . Notice that as the Euler-Lagrange Eq. 19 reduces to

 ¨q+α1α2˙q+γα22Vq=0. (22)

and the corresponding boundary conditions are always verified.

2. Let us assume that in the kinetic energy 18 and . In particular we consider the action

 A=∫T0dt e−t/ϵ(12ϵ2ρ¨q2+12ϵν˙q2+V(q,t)) (23)

In this case the Lagrange equations turn out to be

 ϵ2ρq(4)−2ϵρq(3)+(ρ−ϵν)¨q+ν˙q+γVq=0, (24)

along with the boundary conditions

 ϵ2ρ¨q(T)=0 (25) ϵν¨q(T)−ρϵ2q3(T)=0. (26)

Interesting, as the Euler-Lagrange equations become:

 ρ¨q+ν˙q+γVq=0, (27)

where the boundary conditions are always satisfied.

Notice that while we can choose the parameters in such a way that Eq. 19 is stable, the same does not hold for Eq. 24. Interestingly, stability can be gained for , which is corresponds with a singular solution. Basically if we denote by the solution associated with , we have that does not approximate corresponding at in case in which we can choose arbitrarily large domains .

## 5 Conclusions

While machine learning is typically framed in the statistical setting, in this case time is exploited in such a way that one relies on a sort of underlying ergodic principle according to which statistical regularities can be captured in time. This paper shows that the continuous nature of time gives rise to computational models of learning that can be interpreted as laws of nature. Unlike traditional stochastic gradient, the theory suggests that, just like in mechanics, learning is driven by the Euler-Lagrange equations that minimize a sort of functional risk. The collapsing from forth- to second-order differential equations opens the doors to an in-depth theoretical and experimental investigation.

## Acknowledgments

We thank Giovanni Bellettini for insightful discussions.