# On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis' quadratic entropy related to the conformal flattening of the Fisher-Rao curved geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual forward/reverse flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman-Tsallis Voronoi diagram corresponds to the hyperbolic Voronoi diagram and the dual Bregman-Tsallis Voronoi diagram coincides with the ordinary Euclidean Voronoi diagram. Besides, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.

## Authors

• 53 publications
• ### Geodesics in persistence diagram space

It is known that for a variety of choices of metrics, including the stan...
05/26/2019 ∙ by Samir Chowdhury, et al. ∙ 0

• ### Algebraic 3D Graphic Statics: reciprocal constructions

The recently developed 3D graphic statics (3DGS) lacks a rigorous mathem...
07/30/2020 ∙ by Marton Hablicsek, et al. ∙ 0

• ### Constructing Object Groups Corresponding to Concepts for Recovery of a Summarized Sequence Diagram

Comprehending the behavior of an object-oriented system solely from its ...
03/06/2020 ∙ by Kunihiro Noda, et al. ∙ 0

• ### Classification of Reverse-Engineered Class Diagram and Forward-Engineered Class Diagram using Machine Learning

UML Class diagram is very important to visualize the whole software we a...
11/14/2020 ∙ by Kaushil Mangaroliya, et al. ∙ 0

• ### Cramér-Rao Lower Bounds Arising from Generalized Csiszár Divergences

We study the geometry of probability distributions with respect to a gen...
01/14/2020 ∙ by M. Ashok Kumar, et al. ∙ 2

• ### An Extended Cencov-Campbell Characterization of Conditional Information Geometry

We formulate and prove an axiomatic characterization of conditional info...
07/11/2012 ∙ by Guy Lebanon, et al. ∙ 0

• ### Linear Expected Complexity for Directional and Multiplicative Voronoi Diagrams

While the standard unweighted Voronoi diagram in the plane has linear wo...
04/20/2020 ∙ by Chenglin Fan, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be a finite set of points in a space equipped with a measure of dissimilarity . The Voronoi diagram [49] of partitions into elementary Voronoi cells (also called Dirichlet cells [7]) such that

 VorD(Pi):={X∈X,D(Pi,X)≤D(Pj,X),∀j∈{1,…,n}} (1)

denotes the proximity cell of point generator , i.e., the locii of points closer to than to any other generator .

When the dissimilarity is chosen as the Euclidean distance, we recover the ordinary Voronoi diagram [49]. Figure LABEL:fig:VoronoiDeEucl (left) displays the Voronoi cells of an ordinary Voronoi diagram for a given set of generators.

The Voronoi diagram and its dual Delaunay complex [16] are fundamental data structures of computational geometry [11]. These geometric data-structures find many applications in robotics, 3D reconstruction, geographic information systems (GISs), etc. See the textbook [49] for some applications. The Delaunay simplicial complex is obtained by drawing a straight edge between two generators iff their Voronoi cells share an edge. In Euclidean geometry, the Delaunay simplicial complex triangulates the convex hull of the generators, and is called the Delaunay triangulation. Figure 1 (middle, right) depicts the dual Delaunay triangulations to ordinary Voronoi diagrams. In general, when considering arbitrary dissimilarity , the Delaunay simplicial complex may not triangulate the convex hull of the generators.

When the dissimilarity is oriented or asymmetric, i.e., , one can define the reverse or dual dissimilarity . This duality is termed reference duality in [64], and is an involution:

 (D∗)∗(P,Q)=D(P,Q). (2)

The dissimilarity is called the forward dissimilarity.

In the remainder, we shall use the ‘:’ notational convention [4] between the arguments of the dissimilarity to emphasize that a dissimilarity is asymmetric: . For an oriented dissimilarity , we can define two types of dual Voronoi cells as follows:

 VorD(Pi) := {X∈X,D(Pi:X)≤D(Pj:X),∀j∈{1,…,n}}, (3) Vor∗D(Pi) := {X∈XD(X:Pi)≤D(X:Pj),∀j∈{1,…,n}}, (4) = {X∈XD∗(Pi:X)≤D∗(Pj:X),∀j∈{1,…,n}} (5)

with the property that

 Vor∗D(Pi)=VorD∗(Pi). (7)

That is, the dual Voronoi cell with respect to a dissimilarity is the primal Voronoi cell for the dual (reverse) dissimilarity .

We can build a Voronoi diagram as a minimization diagram [10] by defining the functions . Then iff for all . Thus by building the lower envelope [10] of the functions , we get the the Voronoi diagram.

An important class of smooth asymmetric dissimilarities are the Bregman divergences [12]. A Bregman divergence is defined for a -strictly convex functional generator by

 BF(θ1:θ2):=F(θ1)−F(θ2)−(θ1−θ2)⊤∇F(θ2), (8)

where denotes the gradient of . In information geometry [13, 4, 45], Bregman divergences are the canonical divergences of dually flat spaces [4]. Dually flat spaces generalize the (self-dual) Euclidean geometry obtained for the generator . In information sciences, dually flat spaces can be obtained as the induced information geometry of the Kullback-Leibler divergence [19] of an exponential family manifold [24, 4] or a mixture manifold [39]. The dual Bregman Voronoi diagrams and their dual regular complexes have been studied in [9].

In this paper, we study the Voronoi diagrams induced by the Fisher-Rao distance [52, 6, 51], the Kullback-Leibler (KL) divergence [20] and the chi square distance [42] for the family of Cauchy distributions. Cauchy distributions also called Lorentzian distributions in the literature [34, 30].

The paper is organized with our main contributions as follows:

In Section 2, we concisely review the information geometry of the Cauchy family: We first describe the hyperbolic Fisher-Rao geometry in §2.1 and make a connection between the Fisher-Rao distance and the chi square divergence, then we point out the remarkable fact that any -geometry coincides with the Fisher-Rao geometry (§2.2), and we finally present the dually flat geometric structures on the Cauchy manifold related to Tsallis’ quadratic entropy [59] which amount to a conformal flattening of the Fisher-Rao geometry (§2.4). Section 3.3 proves that the square root of the KL divergence between any two Cauchy distributions yields a metric distance (Theorem 3), and that this metric distance can be isometrically embedded in a Hilbert space for the case of the Cauchy scale families (Theorem 4). Section 4 shows that the Cauchy Voronoi diagram induced either by the Fisher-Rao distance, the chi-square divergence, or the Kullback-Leibler divergence (and its square root metrization) all coincide with a hyperbolic Voronoi diagram calculated on the Cauchy location-scale parameters. This result yields a practical and efficient construction algorithm of hyperbolic Cauchy Voronoi diagrams [41] (Theorem 5) and their dual hyperbolic Cauchy Delaunay complexes. We prove that the hyperbolic Cauchy Voronoi diagrams are Fisher orthogonal to the dual Delaunay complexes (Theorem 6). Finally, we conclude this work in §5.

## 2 Information geometry of the Cauchy family

We start by reporting the Fisher-Rao geometry of the Cauchy manifold (§2.1), then show that all -geometries coincide with the Fisher-Rao geometry (§2.2). Then we recall that we can associate an information-geometric structure from any divergence (§2.3) and finally dually flatten this Fisher-Rao geometry using Tsallis’s quadratic entropy [59]2.4) and a conformal Fisher metric.

### 2.1 Fisher-Rao geometry of the Cauchy manifold

Information geometry [13, 4, 45]

investigates the geometry of families of probability measures. The 2D family

of Cauchy distributions

 C:={pλ(x):=sπ(s2+(x−l)2),λ:=(l,s)∈H:=R×R+}, (9)

is a location-scale family [33] (and also an elliptical distribution family [32]) where and denote the location parameter and the scale parameter, respectively:

 pl,s(x):=1sp(x−ls), (10)

where

 p(x):=1π(1+x2)=:p0,1(x) (11)

is the Cauchy standard distribution.

Let denote the log density. The parameter space of the Cauchy family is called the upper plane. The Fisher-Rao geometry [27, 52, 51] of consists in modeling as a Riemannian manifold by choosing the Fisher information metric [4]

 gFR(λ):=Epλ[∂ilλ(x)∂jlλ(x)] (12)

as the Riemannian metric tensor, where

for (i.e., and ).

The Fisher-Rao distance is then defined as the Riemannian geodesic length distance on the Cauchy manifold :

 ρFR(pλ1(x),pλ2(x))=minλ(s)such thatλ(0)=λ1,λ(1)=λ2∫10 ⎷(dλ(t)dt)TgFR(λ(s))dλ(t)dtdt. (13)

The Fisher information metric tensor for the Cauchy family [32] is

 (14)

where .

A generic formula for the Fisher-Rao distance between two univariate elliptical distributions is reported in [32]. This formula when instantiated for the Cauchy distributions yields the following closed-form for the Fisher-Rao distance:

 ρFR[pl1,s1,pl2,s2]=1√2∣∣ ∣ ∣∣logtan(ψ12)tan(ψ22)∣∣ ∣ ∣∣, (15)

where

 ψi = arcsin(siA),i∈{1,2}, (16) A2 = s21+((l2−l1)2−(s21−s22))24(l2−l1)2. (17)

However, by noticing that the metric tensor for the Cauchy family of Eq. 14 is equal to the scaled metric tensor of the Poincaré (P) hyperbolic upper plane [5]:

 gP(x,y)=1y2[1001], (18)

we get a relationship between the square infinitesimal lengths (line elements) and as follows:

 dsFR=1√2dsP. (19)

It follows that the Fisher-Rao distance between two Cauchy distributions is simply obtained by rescaling the 2D hyperbolic distance expressed in the Poincaré upper plane [5]:

 ρFR[pl1,s1,pl2,s2]=1√2ρP(l1,s1;l2,s2) (20)

where

 ρP(l1,s1;l2,s2):=arccosh(1+δ(l1,s1,l2,s2)), (21)

with

 arccosh(x):=log(x+√x2−1),x>1, (22)

and

 δ(l1,s1;l2,s2):=(l2−l1)2+(s2−s1)22s1s2. (23)

This latter term shall naturally appear in §2.4 when studying the dually flat space obtained by conformal flattening the Fisher-Rao geometry. The expression of Eq.23 can be interpreted as a conformal divergence for the squared Euclidean distance [47].

We may also write the delta term using the 2D Cartesian coordinates as:

 δ(λ1,λ2):=(λ(1)2−λ(1)1)2+(λ(2)2−λ(1)1)22λ(2)1λ(2)2, (24)

where .

In particular, when , we get the simplified Fisher-Rao distance for the Cauchy scale family:

 ρFR[pl,s1,pl,s2]=1√2∣∣∣log(s1s2)∣∣∣. (25)
###### Proposition 1.

The Fisher-Rao distance between two Cauchy distributions is

 ρFR[pl1,s1,pl2,s2]=⎧⎪ ⎪⎨⎪ ⎪⎩1√2∣∣logs1s2∣∣when l1=l2,1√2arccosh(1+(l2−l1)2+(s2−s1)22s1s2)when l1≠l2.

The Fisher-Rao manifold of Cauchy distributions has constant negative scalar curvature , see [32] for detailed calculations.

###### Remark 1.

It is well-known that the Fisher-Rao geometry of location-scale families amount to a hyperbolic geometry [33]. For -variate scale-isotropic Cauchy distributions with , the Fisher information metric is , where denotes the identity matrix. It follows that

 ρFR[pl1,s1,pl2,s2]=1√2arccosh(1+Δ(l1,s1,l2,s2)), (26)

where

 Δ(l1,s1,l2,s2):=∥l2−l1∥2+(s2−s1)22s1s2, (27)

where is the -dimensional Euclidean norm: . That is, is the scaled -dimensional real hyperbolic distance [5] expressed in the Poincaré upper space model.

Recently, the Riemannian geometry of location-scale models has also been studied from the complementary viewpoint of warped metrics [14, 54].

### 2.2 The dualistic α-geometry of the statistical Cauchy manifold

A statistical manifold [29] is a triplet where is a Riemannian metric tensor and a cubic totally symmetric tensor (i.e., for any permutation ). For a parametric family of densities , the cubic tensor is called the skewness tensor [4], and defined by:

 Tijk(θ):=Epλ[∂ilλ(x)∂jlλ(x)∂klλ(x)]. (28)

A statistical manifold structure allows one to construct Amari’s dualistic -geometry [4] for any : Namely a quadruplet where and are dual affine connections (i.e., ). We refer the reader to the textbook [4] and the overview [45] for further details concerning the dual torsion-free affine connections coupled with the metric tensor.

The Fisher-Rao geometry corresponds to the -geometry, i.e., the self-dual geometry where is the Levi-Civita metric connection [4]: .

In information geometry, the invariance principle

states that the geometry should be invariant under the transformation of a random variable

to provided that is a sufficient statistics [4]. The -geometry are invariant geometry [4, 45].

A remarkable fact is that all the -geometries of the Cauchy family coincide with the Fisher-Rao geometry since the cubic skewness tensor vanishes everywhere [32], i.e., . The non-zero coefficients of the Christoffel symbols of the -connections (including the Levi-Civita metric connection derived from the Fisher metric tensor) are:

 αΓ112 = αΓ121=αΓ222=−1s, (29) αΓ211 = 1s. (30)

All -geometries coincide and have constant negative scalar curvature . In other words, we cannot choose a value for to make the Cauchy manifold dually flat [4]. To contrast with this result, Mitchell [32] reported values of for which the -geometry is dually flat for some parametric location-scale families of distributions: For example, it is well known that the manifold

of univariate Gaussian distributions is

-flat [4]. The manifold of -Student’s distributions with degrees of freedom is proven dually flat when  [32]. Dually flat manifolds are Hessian manifolds [56] with dual geodesics being straight lines in one of the two dual global affine coordinate systems. On a global Hessian manifold, the canonical divergences are Bregman divergences. Thus these dually flat Bregman manifolds are computationally friendly [9] as many techniques of computational geometry can be naturally extended to these spaces [40].

### 2.3 Dualistic structures induced by a divergence

A divergence or contrast function [24] is a smooth parametric dissimilarity. Let denote the manifold of its parameter space. Eguchi [24] showed how to associate to any divergence a canonical information-geometric structure . Moreover, the construction allows to prove that (see [4, 45] for details). That is the dual affine connection associated to coincides with the primal connection associated to the dual divergence . Conversely, Matsumoto [31] proved that given an information-geometric structure , one can build a divergence such that from which we can derived the structure . Thus when calculating the Voronoi diagram for an arbitrary divergence , we may use the induced information-geometric structure to investigate some of its properties. For example, is the bisector -autoparallel?, or is the bisector of two generators orthogonal with respect to the metric to their -geodesic? Section 4 will study these questions.

### 2.4 Dually flat geometry of the Cauchy manifold by conformal flattening

The Cauchy distributions are usually handled in information geometry using the wider scope of -Gaussians [35, 30, 4] (deformed exponential families [62]) which also include the Student’s -distributions. Cauchy distributions are -Gaussians for . These -Gaussians are also called

[58], and they can be obtained as maximum entropy distributions with respect to Tsallis’ entropy [59] (see Theorem 4.12 of [4]):

 Tq(p):=1q−1(1−∫∞−∞pq(x)dx). (31)

When , we have the following Tsallis’ quadratic entropy:

 T2(p):=1−∫∞−∞p2(x)dx. (32)

That is, -Gaussians are -exponential families [34], generalizing the maxent exponential families derived from Shannon entropy [3]. The integral corresponds to Onicescu’s informational energy [50, 46].

A dually flat structure construction for -Gaussians is reported in [4] (Sec. 4.3, p. 84–89). We instantiate this construction for the Cauchy distributions (-Gaussians):

Let

 expC(u):=11−u,u≠1, (33)

denote the deformed -exponential and

 logC(u):=1−1u,u≠0, (34)

its compositional inverse, the deformed -logarithm. The probability density of a -Gaussian can be factorized as

 pθ(x)=expC(θ⊤x−F(θ)), (35)

where denotes the 2D natural parameters. We have

 logC(pθ(x)) = 1−1sπ(s2+(x−l)2)=1−π(s+(x−l)2s), (36) =: θ⊤t(x)−F(θ), (37) = (2πls)x+(−πs)x2θ⊤t(x)−(πs+πl2s−1)F(θ). (38)

Therefore the natural parameter is (for ) and the deformed log-normalizer is

 F(θ(λ)) = πs+πl2s−1=:Fλ(λ), (39) F(θ) = −π2θ2−θ214θ2−1. (40)

In general, we obtain a strictly convex and -function , called the -free energy for a -Gaussian family. Here, we let for the Cauchy family.

We convert back the natural parameter to the ordinary parameter as follows:

 λ(θ)=(l,s)=(−θ12θ2,−πθ2). (41)

The gradient of the deformed log-normalizer is

 (42)

The gradient defines the dual global affine coordinate system where is the dual parameter space.

It follows the following divergence  [4] between Cauchy densities which is by construction equivalent to a Bregman divergence between their corresponding natural parameters:

 Dflat[pλ1:pλ2] := 1∫p2λ2(x)dx⎛⎝∫p2λ2(x)pλ1(x)dx−1⎞⎠, (43) = 2πs2(s21+s22+(l1−l2)22s1s2−1), (44) = 2πs2(s1−s2)2+(l1−l2)22s1s2, (45) = 2πs2δ(l1,s1,l2,s2), (46) = BF(θ1:θ2), (47)

where and . We term the Bregman-Tsallis quadratic divergence ( for general -Gaussians).

We used a computer algebra system (CAS, see Appendix A) to calculate the closed forms of the following definite integrals:

 ∫p2λ2(x)dx = 12πs2, (48) ∫p2λ2(x)pλ1dx = s21+s22+(l1−l2)22s1s2. (49)

Here, observe that the equivalent Bregman divergence is not on swapped parameter order as it is the case for ordinary exponential families: where is the cumulant function of the exponential family, see [4, 45].

We term the divergence the flat divergence because its induced affine connection [24] has zero curvature (i.e., the Riemann-Christofel curvature tensor induced by the connection vanishes, see [4] p. 134). We refer to [24] for the -geometry construction from a divergence (also called contrast function). Reciprocally, a statistical manifold has a contrast function [31].

Since , the flat divergence is interpreted as a conformal squared Euclidean distance [47] with conformal factor . The Fisher-Rao geometry of -Gaussians has scalar curvature [58] . Thus we recover the scalar curvature for the Fisher-Rao Cauchy manifold since .

###### Theorem 1.

The flat divergence between two Cauchy distributions is equivalent to a Bregman divergence on the corresponding natural parameters with the following closed-form formula in the ordinary location-scale parameterization:

 Dflat[pλ1:pλ2]=2πs2δ(l1,s1,l2,s2)=πs1((s1−s2)2+(l1−l2)2)=πs1∥λ1−λ2∥2. (50)

In general, we call the Bregman divergence arising from the -Gaussian flattening the -Bregman-Tsallis divergence .

The conversion of -coordinates to -coordinates are

 θ(η)=⎡⎢ ⎢ ⎢⎣2πη1√η2−η21−π−√η2−η21⎤⎥ ⎥ ⎥⎦:=∇F∗(η), (51)

where

 F∗(η):=θ(η)⊤η−F(θ(η)), (52)

is the Legendre-Fenchel convex conjugate [4]:

 F∗(η)=1−2π√η2−η21. (53)

Since

 η(λ)=η(θ(λ))=(λ1,λ21+λ22)=(l,l2+s2), (54)

we have

 F∗λ(λ):=F∗(η(λ))=1−2π√l2+s2−l2=1−2πs (55)

that is independent of the location parameter . Moreover, we have [4]

 F∗λ(λ):=1−1∫p2(x)dx=1−112πs=1−2πs. (56)

We can convert the dual parameter to the ordinary parameter as follows:

 λ(η)=(l,s)=(η1,√η2−η21). (57)

It follows that we have the following equivalent expressions for the flat divergence:

 Dflat[pλ1:pλ2]=BF(θ1:θ2)=BF∗(η2:η1)=AF(θ1:η2)=AF∗(η2:θ1), (58)

where

 AF(θ1:η2):=F(θ1)+F∗(η2)−θ⊤1η2, (59)

is the Legendre-Fenchel divergence measuring the inequality gap of the Fenchel-Young inequality:

 F(θ1)+F∗(η2)≥θ⊤1η2. (60)

That is, , where and .

The Hessian metrics of the dual convex potential functions and are:

 ∇2F(θ) = ⎡⎢ ⎢⎣−12θ2θ12θ22θ12θ22−θ212θ22−2π2θ22⎤⎥ ⎥⎦=:gF(θ), (61) ∇2F∗(η) = (62)

We check the Crouzeix identity [21, 45]:

 ∇2F(θ)∇2F∗(η(θ))=∇2F(θ(η))∇2F∗(η)=I, (64)

where denotes the identity matrix.

The Hessian metric is also called the -Fisher metric [58] (for ). Let and denote the Fisher information metric expressed using the -coordinates and the -coordinates, respectively. Then, we have

 gθFR(θ)=Jac⊤λ(θ)×gλFR(λ(θ))×Jacλ(θ), (65)

where denotes the Jacobian matrix:

 Jacλ(θ):=[∂λi∂θj]. (66)

Similarly, we can express the Hessian metric using the -coordinate system:

 gλF(λ)=Jac⊤θ(λ)×gθF(θ(λ))×Jacθ(λ). (67)

We have the following Jacobian matrices:

 (68)

and

 Jacλ(θ)=⎡⎢ ⎢⎣−12θ2θ12θ220πθ22⎤⎥ ⎥⎦. (69)

We check that we have

 gλF(λ)=eu(λ)gλFR(λ). (70)

That is, the Riemannian metric tensors and are conformally equivalent for a smooth function .

This dually flat space construction

 (C,g(θ)=∇2F(θ),Dflat∇,Dflat∇∗=D∗flat∇)

can be interpreted as a conformal flattening of the curved -geometry [58, 4, 48]. The relationships between the curvature tensors of dual connections is studied in [65].

Notice that this dually flat geometry can be recovered from the divergence-based structure of S2.3 by considering the Bregman-Tsallis divergence. Figure 2 illustrates the relationships between the invariant -geometry and the dually flat geometry of the Cauchy manifold. The -Gaussians can further be generalized by -family with corresponding deformed logarithm and exponential functions [4, 3]. The -family unifies both the dually flat exponential family with the dually flat mixture family [3]. A statistical dissimilarity between two parametric distributions and amounts to an equivalent dissimilarity between their parameters: . When the parametric dissimilarity is smooth, one can construct the divergence-based -geometry [2, 45]. Thus the dually flat space structure of the Cauchy manifold can also be obtained from the divergence-based -geometry obtained from the flat divergence (see Figure 2). It can be shown that the dually flat space -geometry is the unique geometry in the intersection of the conformal Fisher-Rao geometry with the deformed -geometry (Theorem 13 of  [3]) when the manifold is the positive orthant .

## 3 Invariant divergences: f-divergences and α-divergences

### 3.1 Invariant divergences in information geometry

The -divergences [22, 42] between two densities and is defined for a positive convex function , strictly convex at , with as:

 If[p:q]:=∫Xp(x)f(q(x)p(x))dx, (71)

The KL divergence is a -divergence obtained for the generator .

An invariant divergence is a divergence is a divergence which satisfies the information monotonicity [4]: with equality iff is a sufficient statistic. The invariant divergences are the -divergences for the simplex sample space [4]. Moreover, the standard -divergences (with and ) induce the Fisher information metric for its metric tensor : , see [4].

### 3.2 α-Divergences between location-scale densities

Let denote the -divergence [4] between and :

 Iα[p:q]:=1α(1−α)(1−Cα[p:q]),α∉{0,1} (72)

where is Chernoff -coefficient [17, 44]:

 Cα[p:q] := ∫pα(x)q1−α(x)dx, (73) = ∫q(x)(p(x)q(x))α. (74)

We have .

The -divergences include the chi square divergence (), the squared Hellinger divergence () and in the limit cases the KL divergence () and the reverse KL divergence (). The -divergences are -divergences for the generator:

 (75)

For location scale families, let

 Cα(l1,s1;l2,s2):=Cα[pl1,s1:pl2,s2]. (76)

Using change of variables in the integrals, one can show that

 Cα(l1,s1;l2,s2) = Cα(0,1;l2−l1s1,s2s1), (77) = Cα(l1−l2s2,s1s2;0,1), (78) = C1−α(0,1;l1−l2s2,s1s2), (79) = C1−α(l2,s2;l1,s1). (80)

For the location-scale families which include the normal family , the Cauchy family and the -Student families with fixed degree of freedom , the -divergences are not symmetric in general (e.g., -divergences between two normal distributions). However, we have shown that the chi square divergences and the KL divergence are symmetric when densities belong to the Cauchy family. Thus it is of interest to prove that the -divergences between Cauchy densities are symmetric, and report their closed-form formula for all .

Using symbolic integration described in Appendix A, we found that

 C3(pλ1;pλ2)=3s42+(2s21+6l22−12l1l2+6l21)s22+3s41+(6l22−12l1l