DeepAI
Log In Sign Up

Axiomatic characterization of the χ^2 dissimilarity measure

We axiomatically characterize the χ^2 dissimilarity measure. To this end, we solve a new generalization of a functional equation discussed in Aczel (Lectures on functional equations and their applications, Academic Press, 1966).

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/22/2019

Mumford-Shah functionals on graphs and their asymptotics

We consider adaptations of the Mumford-Shah functional to graphs. These ...
12/24/2004

Global minimization of a quadratic functional: neural network approach

The problem of finding out the global minimum of a multiextremal functio...
06/20/2022

The Carleman convexification method for Hamilton-Jacobi equations on the whole space

We propose a new globally convergent numerical method to solve Hamilton-...
12/22/2010

Local Minima of a Quadratic Binary Functional with a Quasi-Hebbian Connection Matrix

The local minima of a quadratic functional depending on binary variables...
03/20/2022

On a characterization of exponential and double exponential distributions

Recently, G. Yanev obtained a characterization of the exponential family...
06/05/2021

Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

Can models with particular structure avoid being biased towards spurious...

1 Introduction

Let be a set of categories (with

). The vector

represents the respective numbers of observations in each category and the total number of observations is denoted by . We want to measure the dissimilarity between the observed distribution and a reference distribution , with and for all , where is the set of positive rational numbers. We exclude reference distributions with null components because the dissimilarity measure is not defined when a component is zero. The set of all observed distributions is , i.e. the set of all mappings from to , where is the set of non-negative integers. The set of all reference distributions is defined by .

A dissimilarity measure is a mapping from to (the set of non-negative real numbers) satisfying iff . It measures how far the observed distribution is from the reference. In this paper, we axiomatically characterize the dissimilarity measure defined by

and frequently used in statistics as a measure of goodness of fit.

The dissimilarity measure defined by has been characterized in [7] and we will also provide a new characterization thereof. It is popular in ecology [6], sociology [8], economics [9], and so on.

While we consider in our paper that the number of categories is given and fixed, [7] considers that can vary. Depending on the context, one or the other assumption can be more relevant. For instance, when we use Pearson’s test, we have a sample distributed over categories and the

-value is computed conditional on a theoretical probability distribution with the same number

of categories. If we repeat the experiment and draw other samples, we obtain other

-values always based on the same theoretical probability distribution with the same number

of categories. It therefore makes sense to consider as given.

A common feature of [7] and our paper is that we use a framework in which can vary and such that comparisons of the dissimilarity measure across different reference distributions are relevant. Yet, unlike [7], we also consider the case in which the reference distribution is fixed (as in our Pearson’s example).

For characterizations of other dissimilarity measures, in the context of political sciences, see [3]. See [2] for a characterization of a wide class of dissimilarity measures. While we consider dissimilarity measures in this paper, it is also interesting to consider dissimilarity rankings as in [4].

Section 2 presents our main conditions and results. Section 3 shows the independence of the conditions used in our results. Section 4 concludes the discussion. All the proofs are gathered in Section 5.

2 Axioms and results

The dissimilarity measures and are homogeneous of degree 0 and 1, respectively, where homogeneity is defined as follows.

A 1.

Homogeneity of degree . For all positive integers and , .

In statistics, it seems unanimously accepted that a dissimilarity measure (used as a goodness-of-fit statistic) should be homogeneous of degree 1, but in ecology, many researchers seem to favour homogeneity of degree 0. Indeed, when they measure the dissimilarity between the species distribution in an ecosystem and a reference distribution, they want the dissimilarity to be independent of the size of the ecosystem. It is easy to see that Homogeneity of degree 0 (resp. 1) is satisfied by (resp. ). Indeed, we have

and

Suppose the dissimilarity between a distribution and is zero. This implies for some positive integer . The next condition states that, when we modify by moving a single individual from category to

, then the dissimilarity measure is inversely proportional to the harmonic mean of

and . Let be a vector such that and for all .

A 2.

Inverse Effects. If , then, for all , with and ,

In our first result, we will use a restricted variant of Inverse Effects in which . This weaker condition is named Restricted Inverse Effects and is trivially satisfied when . We now prove that Inverse Effects is satisfied by :

The proof for is similar.

Let and be two observed distributions of size . The deviation between and is . The corresponding deviation for is . If we add these two vectors of deviations, we obtain and the corresponding observed distribution is (provided all components are non-negative). Hence, represents the dissimilarity corresponding to the additive combination of two deviations: between (resp. ) and . Similarly, corresponds to the subtractive combination of the same two deviations. Finally, corresponds in some sense to four deviations (two - and two -deviations) combined once additively and once subtractively. Our next condition states that this must be equal to , which is another way to combine the same four deviations.

A 3.

Deviations Balancedness. For all with , if and , then

This condition is inspired by [5], in which they characterize the Euclidean distance in . Let us prove that satisfies Deviations Balancedness. We have

and

Hence, is equal to

We are now ready to state our first result in which we consider that is given and does not vary.

Theorem 2.1.

Assume is given. For , a dissimilarity measure satisfies Homogeneity of degree , Deviations Balancedness and Restricted Inverse Effects iff , for some positive . Restricted Inverse Effects is not required when .

Notice that Theorem 2.1 does not hold when is not fixed. Indeed, for any with not constant, the dissimilarity measure

satisfies Homogeneity of degree 1, Deviations Balancedness and Restricted Inverse Effects but is not of the form or . In order to characterize the dissimilarity measure when varies, we need the full power of Inverse Effects.

Theorem 2.2.

For , a dissimilarity measure satisfies Homogeneity of degree , Deviations Balancedness and Inverse Effects iff , for some positive .

3 Independence of the axioms

In order to prove the independence of the conditions characterizing with variable , we provide three examples of dissimilarity measures violating only one of the three conditions in Theorem 2.2.

The dissimilarity measure violates Homogeneity of degree 0 but satisfies Deviations Balancedness and Inverse Effects. The dissimilarity measure

violates Deviations Balancedness but satisfies Homogeneity of degree 0 and Inverse Effects. The dissimilarity measure

violates Inverse Effects but satisfies Homogeneity of degree 0 and Deviations Balancedness.

Our examples are easily adapted to prove the independence of the conditions characterizing with variable . Finally, our examples can also be used for Theorem 2.1 since it involves the same conditions as Theorem 2.2 except for Restricted Inverse Effects which is weaker than Inverse Effects.

4 Discussion

Theorems 2.1 and 2.2 characterize the dissimilarity measures and up to a multiplication by a positive real number . We could easily add a condition characterizing exactly or . For instance, the extra condition is enough to force in both characterizations. Yet, unlike [7], we consider that such a normalization is not really interesting. Indeed and (with ) convey exactly the same information, just like a distance measurement in meters or yards. In particular, if we want to perform a Pearson’s test, we are free to use Pearson’s statistic (i.e. ) and to compute the -value using the density or to use (with an arbitrary ) and to compute the -value using the corresponding density. The resulting -value will of course be identical. The same holds for and .

5 Proofs


We need a few lemmas before proving Theorem 2.1.

Lemma 1.

Let . Then satisfies Homogeneity of degree 1 iff satisfies Homogeneity of degree 0. And satisifies Deviations Balancedness (resp. Inverse Effects) iff satisfies Deviations Balancedness (resp. Inverse Effects).

Proof.

Since satisfies Homogeneity of degree 1, we have for all positive integers . We thus have . Hence and is homogeneous of degree 0. The proof of the reverse implication is similar. The rest of the proof is left to the reader. ∎

Lemma 2.

Suppose is fixed. If a dissimilarity measure satisfies Homogeneity of degree 0, then , for some mapping .

Proof.

Since is fixed, we can define a mapping such that . Define now the mapping as follows. For any , if there is such that . The mapping is defined everywhere because has rational components and, hence, there is always such that . The mapping is well defined. Indeed, suppose now there are such that and . By Homogeneity of degree 0, . Therefore, . ∎

We say that a set in is rational convex if whenever , then for all rational .

Lemma 3.

Let be a rational convex subset of such that is full-dimensional. Let be a mapping such that the graph of is a parabola on any line segment . Then for some real .

Proof.

Since is full-dimensional, the interior of is not empty and we can suppose without loss of generality that . Let us consider the line defined by for some and all . The intersection of this line with defines a line segment passing by the origin. The graph of on is a parabola. We can express this by means of the following polynomial of degree 2 in  :

(5.1)

where and are real numbers.

Let us now consider the line defined by for some and all . The intersection of this line with defines a line segment . We can express that the graph of on is a parabola by means of a polynomial of degree 2 in  :

(5.2)

where and are real numbers. Setting in (5.2) yields, . Since this must be true for all , we must have .

Equating (5.1) and (5.2) yields

(5.3)

Setting , and in (5.3) yields

The solution of this system is

Let us rewrite (5.1) :

Letting and , we find that is equal to

(5.4)

The graph of must be a parabola on the line segment corresponding to . That is,

must be a parabola in . This is possible only if . We have therefore reached the conclusion that (5) can be written as in the statement of the lemma. ∎

Let and .

Lemma 4.

Let be a rational convex subset of such that is full-dimensional. Let be a mapping such that the graph of is a parabola on any line segment . Suppose the restriction of

to the hyperplane defined by

( for all such that the hyperplane intersects ) has the form for some real .

Then for some real .

Proof.

Since is full-dimensional, there is and we can suppose without loss of generality that . Let us consider the line defined by for some and all . The intersection of this line with defines a line segment passing by the origin. The graph of on is a parabola. We can express this by means of the following polynomial of degree 2 in  :

(5.5)

where and are real numbers.

Let us now consider the hyperplane defined by for some and . We assumed in the statement of the lemma,

(5.6)

Setting in (5.6) yields, . Since this must be true for all , we must have , for all .

Equating (5.5) and (5.6) yields

(5.7)

Setting , and in (5.7) yields

The solution of this system is

Let us rewrite (5.5) :

Letting , we have and , and the previous equation becomes,

For any , the graph of must be a parabola on the line segment corresponding to . That is,

must be a parabola in . This is possible only if for all .

Similarly, for any with , the graph of must be a parabola on the line segment corresponding to , . That is,