 Based on a class of associative algebras with zero-divisors which are called real-like algebras by us, we introduce the concept of the graded automatic differentiation induced a real-like algebra and present a new way of doing automatic differentiation to compute the first, the second and the third derivatives of a function exactly and simultaneously.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Real-like Algebras

Let

be non-negative integer. We say that a vector space

over the real number field is the direct sum of its non-zero subspaces , , …, and we write if each in A can be represented uniquely in the form for for . The subspaces , , …, are called the homogeneous subspaces of . The elements of are said to be homogeneous of degree for . After expressing an element in as a sum of non-zero homogeneous elements of distinct degrees, these non-zero homogeneous elements are called the homogeneous components of and the homogeneous components of of least degree is called the initial component of .

We now define real-like algebras which are the kind of real associative algebras we need in the study of automatic differentiation.

###### Definition 1.1

A commutative associative algebra over the real number field is called a real-like algebra if is the direct sum of its non-zero subspaces , , …, satisfying and for , where for .

The real-like algebras we will used in this paper is the real-like -algebra , where

 R(n):=R[X]/<{Xk|k≥n}>

is the quotient associative algebra of the polynomial ring with respect to the ideal generated by the subset of . Clearly, is a real-like algebra. In fact, if then we have

 R(n)=n−1⨁i=0Rεi,Rε0=R,(Rεi)(Rεj)={Rεi+jif i+j

where . The real-like -algebra has appeared in automatic differentiation for a long time (see Section 13.2 in ). To the best of our knowledge, although is a -normed algebra by , a -normed algebraic structure has not been introduced on the real-like -algebra . Since

-norm is generally preferred in neural networks and more computational efficient than

-norm, it is advantageous to have a -normed algebraic structure on the algebras appearing in the study of automatic differentiation. At the end of this section, we give many ways of introducing a -norm on the real-like -algebra . For convenience, we use -numbers to name the elements of the real-like -algebra . Clearly, -number are the dual numbers introduced by C. L. Clifford in . Based on our research about the applications of the dual numbers, we strongly feel that if there exists a class of new numbers which can be used to extend the known mathematics based on real numbers in a satisfyingly way, then -numbers should be the best candidate for this class of new numbers.

The following proposition gives the basic properties of real-like algebras.

###### Proposition 1.1

Let be a real-like algebra and let be an element of with for .

(i)

is a zero-divisor if and only if .

(ii)

is invertible if and only if , in which case, the inverse of is given by , where is the -matrix obtained by replaying the -th column of the -matrix

 M:=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣a000⋯00a1a00⋯00a2a1a0⋯00⋮⋮⋮⋯⋮⋮an−1an−2an−3⋯a00anan−1an−2⋯a1a0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

with the -matrix and is the determinant of the -matrix .

Proof (i) If is a zero-divisor, then for some . Let be the degree of the initial component of . Then we have , where for and . Assume that . By the fact that is in , the inverse of exists. It follows that

 0=a−10ab=a−10(n∑i=0ai)(n∑i=pbi)=bp+n∑i=p+1bi+(n∑i=1ai)(n∑i=pbi). (1)

Since the degree of the initial component of is at least , we have to have by (1), which is impossible. This proves that has to be .

Conversely, if , then . After choosing , we get . This proves that is a zero-divisor.

(ii) is invertible if and only if there exists with for such that , which is equivalent to

 1=a0x0+(a0x1+a1x0)+(a2x0+a1x1+a0x2)+⋯+(anx0+an−1x1+⋯+a0xn)

or

 M⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x0x1x2⋮xn−1xn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣a000⋯00a1a00⋯00a2a1a0⋯00⋮⋮⋮⋯⋮⋮an−1an−2an−3⋯a00anan−1an−2⋯a1a0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x0x1x2⋮xn−1xn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣100⋮00⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

It follows from that for . Thus (ii) holds.

To study the norm algebraic structure of a real-like algebra, we introduce the concept of a homogeneous norm algebra in the following

###### Definition 1.2

We say that a real-like algebra has a homogeneous norm if is a function on the set of homogeneous elements of such that for , , and :

(i)

, and if and only if ,

(ii)

,

(iii)

,

(iv)

.

Let be a real-like algebra which has a homogeneous norm . Mimicking the definitions of the ordinary -norm and -norm on , we have the following natural extensions and of the function :

 ||a||1:=n∑i=0|ai|∗and||a||2,∗:=(n∑i=0|ai|2∗)12, (2)

where with for . The following proposition gives the basic properties of the two real-valued functions and .

###### Proposition 1.2

Let be a real-like algebra which has a homogeneous norm , and let and be the real-valued functions defined by (2).

(i)

is a normed algebra with respect to the norm .

(ii)

If for the identity of the algebra , then can not be made into a normed algebra via the real-valued function .

(iii)

is a norm on .

Proof Recall that a function is called a norm on if for and , , we have

 ||rx||=|r|||x||,||x||≥0,and ||x||=0 if and only if x=0 (3)

and

 ||x+y||≤||x||+||y||. (4)

Also, is called a normed algebra if there is a norm on such that

 ||xy||≤||x||||y||for all x, y∈A. (5)

The proof of Proposition 1.2 follows from direct computations. We now proof (ii) to explain the way of doing the computation. Let . Then

 |a1|∗>0and|a21|∗=|a1⋅a1|∗≤|a1|∗⋅|a1|∗=|a1|2∗. (6)

It follows from (2) and (6) that

 ||(e+a1)⋅(e+a1)||22,∗=||e+2a1+a21||22,∗=|e|2∗+|2a1|2∗+|a21|2∗ = |e|2∗+(2|a1|∗)2+|a21|2∗=|e|2∗+4|a1|2∗+|a21|2∗>|e|2∗+2|a1|2∗+|a21|2∗ ≥ |e|2∗+2|a21|∗+|a21|2∗=(|e|∗+|a21|∗)2 = (|e|2∗+|a21|∗)2=(|e+a1|22,∗)2=(|e+a1|2,∗⋅|e+a1|2,∗)2

or

 ||(e+a1)⋅(e+a1)||2,∗>|e+a1|2,∗⋅|e+a1|2,∗,

which proves that (5) fails for .

Obviously, the map defined by

 |xεi|∗:=|x|for x∈R and 0≤i≤n−1

is a homogeneous norm on the real-like -algebra which satisfies the assumption in Proposition 1.2 (ii), where is the absolute value of the real number . Hence, the natural idea of extending the ordinary way of defining a -norm on can not give a -normed algebraic structure on the real-like -algebra by Proposition 1.2 (ii). This is possibly why we have not seen the way of making the real-like -algebra into a -normed algebra in automatic differentiation community even it has a -normed algebraic structure. We now give many ways of introducing a -normed algebraic structure on the real-like -algebra .

###### Proposition 1.3

Let be a positive constant real numbers. If is the non-negative real valued function defined by

 ∣∣ ∣∣∣∣ ∣∣n∑k=0xiεi∣∣ ∣∣∣∣ ∣∣β:= ⎷n∑k=0(n+1−i)βix2i (7)

for with for , then makes the real-like -algebra into a normed algebra.

Proof For convenience, we set for . Let be a map defined by

 |xiεi|~∗:=√αi|xi|for xi∈R and 0≤i≤n. (8)

For , , and , we clearly have

 |xiεi|~∗≥0, and |xiεi|~∗=0 if and only if xiεi=0, (9)
 |rxiεi|~∗=|r||xiεi|~∗ (10)

and

 |xiεi+yiεi|~∗=√αi|xi+yi|≤√αi(|xi|+yi|)≤|xiεi|~∗+|yiεi|~∗. (11)

We now prove that

 |xiεi⋅yjεj|~∗≤|xiεi|~∗⋅|yjεj|~∗% for 0≤i,j≤n. (12)

Since

 |xiεi⋅yjεj|~∗=|(xiyj)εi+j|~∗={√αi+j|xiyj|if i+j≤n,0if i+j>n, (13)

(12) holds clearly if . In the case where , we have

 αiαj−αi+j=(n+1−i)βi(n+1−j)βj−(n+1−(i+j))βi+j = βi+j[(n+1)2−(n+1)(i+j)+ij−(n+1−(i+j))] = βi+j[(n+1)(n+1−(i+j))+ij−(n+1−(i+j))] = βi+j[n((n+1−(i+j))+ij]≥0for i+j≤n

or

 αi+j≤αiαjfor 0≤i,,j≤n and i+j≤n. (14)

It follows from (13) and (14) that

 |xiεi⋅yjεj|~∗≤√αi+j|xiyj|≤√αi√αj|xi||yj|=√αi|xi|⋅√αj|yj|,

which proves that (12) is also true if .

By (9), (10), (11) and (12), is a homogeneous norm on the real-like -algebra . By (7) and (8), we have

 ∣∣ ∣∣∣∣ ∣∣n∑i=0xiεi∣∣ ∣∣∣∣ ∣∣β=(n∑i=0(|αixi|~∗)2)12=∣∣ ∣∣∣∣ ∣∣n∑i=0xiεi∣∣ ∣∣∣∣ ∣∣2,~∗. (15)

It follows from (15) and Proposition 1.2 (iii) that is a norm on the real-like -algebra .

In order to prove that the real-like -algebra is a normed algebra via the norm , we need only to prove

 ||xy||β≤||x||β||y||βfor x, y∈R(n+1). (16)

For , we define

 ϕ(x)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣x0β000⋯00x1βx0β00⋯00x2β2x1βx0β0⋯00⋮⋮⋮⋯⋮⋮xn−1βn−1xn−2βn−2xn−3βn−3⋯x0β00xnβnxn−1βn−1xn−2βn−2⋯x1βx0β0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (17)

Then the map defined by (17) is an injective algebra homomorphism. Using this algebra homomorphism and the matrix norm which makes into a normed algebra, we get (16).

## 2 Automatic Differentiation induced by R(4)

The following definition is our way of conceptualizing automatic differentiation mathematically.

###### Definition 2.1

Let be a real-like algebra. A -tuple consisting of an algebra homomorphism , a map and a family maps is called the graded automatic differentiation induced by on or the graded -automatic differentiation if the following three conditions are satisfied:

(i)

extends each function in , i.e., for all ;

(ii)

preserves the invertible real numbers, i.e., is an invertible element of for each non-zero real number ;

(iii)

preserves the composition of two differentiable functions, i.e.,

 Λ(f∘g)=Λ(f)∘Λ(g) (18)

and the map for , which is called the -th derivative map, has the following property:

 (Γi∘(Λ(f))∘Ω)(c)=difdxi(c), (19)

where , and

Like the first-order automatic differentiation which depends on one parameter, which is denoted by in the section 3.1.1 of , the higher-order automatic differentiation depends on many parameters. Different choices of these parameters give different ways of doing higher-order automatic differentiation.

We now explain how to get the graded automatic differentiation induced by on .

Let , and be three real constants. For , we define the maps and by

 Λ(f)(x+a1ε+a2ε2+a3ε3):=f(x)+a1f′(x)ε+
 +(a2f′(x)+12a21f′′(x))ε2+(a3f′(x)+a1a2f′′(x)+16a31f′′′(x))ε3 (20)

and

 Ωα,β,γ(x):=x+αε+βε2+γε3 (21)

where , , , .

The following theorem, which is the main theorem of this paper, presents the new technique of automatic differentiation to compute the first, the second and the third derivatives exactly and simultaneously.

###### Proposition 2.1

(The Main Theorem) Let , and be three real constants. If the maps and are defined by (21) and (30), then the -tuple is the graded -automatic differentiation, where the -th derivative map for each is defined by

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩Γ1(y):=1αy1,Γ2(y):=2α2y2−2βα3y1,Γ3(y):=6α3y3−12βα4y2+(12β2α5−6γα4)y1 (22)

for with , , , .

Proof First, let and be the identity of the algebra and the algebra , respectively. By (30), we have

 Λ(1D3(R,R))=1F(R(4),R(4)). (23)

Let , . Clearly, we have

 Λ(f+g)=Λ(f)+Λ(g). (24)

Note that

 (fg)′=f′g+fg′,(fg)′′=f′′g+2f′g′+fg′′ (25)

and

 (fg)′′′=f′′′g+3f′′g′+3f′g′′+fg′′′. (26)

Let , where , , and . By (30), (25) and (26), we have

 Λ(fg)(x+a1ε+a2ε2+a3ε3)=fg+a1(fg)′ε+ (27) +[a2(fg)′+12a21(fg)′′]ε2+[a3(fg)′+a1a2(fg)′′+16a31(fg)′′′]ε3 = fg+a1(f′g+fg′)ε+[a2(f′g+fg′)+12a21(f′′g+2f′g′+fg′′)]ε2+ +[a3(f′g+fg′)+a1a2(f′′g+2f′g′+fg′′)+ +16a31(f′′′g+3f′′g′+3f′g′′+fg′′′)]ε3

and

 (Λ(f)⋅Λ(g))(x+a1ε+a2ε2+a3ε3) = Λ(f)(x+a1ε+a2ε2+a3ε3)⋅Λ(g)(x+a1ε+a2ε2+a3ε3) = ⋅[g+a1g′ε+(a2g′+12a21g′′)ε2+(a3g′+a1a2g′′+16a31g′′′)ε3] = fg+(a1f′⋅g+f⋅a1g′)ε+ +[f⋅(a2g′1+12a21g′′2)+a1f′⋅a1g′2+(a2f′1+12a21f′′2)⋅g]ε2+ +[f⋅(a3g