In [MP18], [MP19a] we have initiated the study of tropical probability spaces and their diagrams: In [MP18] we endowed (commutative) diagrams of probability spaces with the intrinsic entropy distance and in [MP19a] we defined tropical diagrams as points in the asymptotic cone of the metric space. They are represented by certain sequences of diagrams of probability spaces.
We expect that tropical diagrams will be helpful in the study of information optimization problems, and we have indeed applied them to derive a dimension-reduction result for the shape of the entropic cone in [MP19b].
In the present article we introduce the notion of conditioning on a space in a tropical diagram and show that the operation is Lipschitz-continuous with respect to the asymptotic entropy distance.
It is a rather technical result, and we have therefore decided to treat it in this separate article, but it is an important ingredient in the theory, and in particular we need it for the dimension-reduction result mentioned before.
Given a tuple of finite-valued random variablesand a random variable , one may “condition” the collection on . The result of this operation is a family of -tuples of random variables denoted parameterized by those values of that have positive probability. Each tuple of random variable in this family is defined on a separate probability space.
When passing to the tropical setting the situation is different in the sense that when we condition a tropical diagram on a space , the result is again a tropical diagram rather than a family. After recalling some preliminaries in Section 2, we describe the operation of conditioning and prove that the result depends in a Lipschitz way on the original diagram in Section 3.
Our main objects of study are commutative diagrams of probability spaces and their tropical counterparts. In this section we recall briefly the main definitions and results.
2.1. Probability spaces and their diagrams
2.1.1. Probability spaces
By a finite probability space we mean a set with a probability measure, that has finite support. A reduction from one probability space to another is an equivalence class of measure-preserving maps. Two maps are equivalent, if they coincide on a set of full measure. We call a point in a probability space an atom if it has positive weight and we write to mean is an atom in (as opposed to for points in the underlying set). For a probability space we denote by the cardinality of the support of the probability measure.
2.1.2. Indexing categories
To record the combinatorial structure of a commutative diagrams of probability spaces and reductions we use an object that we call an indexing category. By an indexing category we mean a finite category such that for any pair of objects there is at most one morphism between them either way. In addition, we will assume it satisfies one additional property that we will describe after introducing some terminology. For a pair of objects such that there is a morphism , object will be called an ancestor of and object will be called a descendant of . The subcategory of all descendants of an object is called an ideal generated by and will be denoted , while we will call the subcategory consisting of all ancestors of together with all the morphisms in it a co-ideal generated by and denote it by . (The term filter is also used for co-ideal in the literature about lattices)
The additional property that an indexing category has to satisfy is that for any pair of objects there exists a minimal common ancestor , that is is an ancestor for both and and any other ancestor of them both is also an ancestor of .
An equivalent formulation of the property above is the following: the intersection of the co-ideals generated by two objects is also a co-ideal generated by some object .
Any indexing category is necessarily initial, which means that there exists an initial object, that is an object such that .
A fan in a category is a pair of morphisms with the same domain. A fan is called minimal if for any other fan included in a commutative diagram
the vertical arrow must be an isomorphism.
For any pair of objects in an indexing category there exists a unique minimal fan in .
We denote by the category of finite probability spaces and reductions. For an indexing category , a -diagram is a functor . A reduction from one -diagram to another is a natural transformation between the functors. It amounts to a collection of reductions such that the big diagram consisting of all spaces , and all morphisms , and is commutative. The category of -diagrams and reductions will be denoted . The construction of diagrams could be iterated, thus we can consider -diagrams of -diagrams and denote the corresponding category . Every -diagram of -diagrams can also be considered as -diagram of -diagrams, thus there is a natural equivalence of categories .
A -diagram will be called minimal if it maps minimal fans in to minimal fans in the target category. The subspace of all minimal -diagrams will be denoted . In [MP18] we have shown that for any fan in or in its minimization exists and is unique up to isomorphism.
2.1.4. Tensor product
The tensor product of two probability spacesand is their independent product, . For two -diagrams and we define their tensor product to be .
2.1.5. Constant diagrams
Given an indexing category and a probability space we can form a constant diagram that has all spaces equal to and all reductions equal to the identity isomorphism. Sometimes when such constant diagram is included in a diagram with another -diagrams (such as, for example, a reduction ) we will write simply in place of .
Evaluating entropy on every space in a -diagram we obtain a tuple of non-negative numbers indexed by objects in , thus entropy gives a map
where the target space is a space of real-valued functions on the set of objects in endowed with the -norm. Entropy is a homomorphism in that it satisfies
2.1.7. Entropy distance
Let be an indexing category and be a fan of -diagrams. We define the entropy distance
The intrinsic entropy distance between two -diagrams is defined to be the infimal entropy distance of all fans with terminal diagrams and
In [MP18] it is shown that the infimum is attained, that the optimal fan is minimal, that is a pseudo-distance which vanishes if and only if and are isomorphic and that is a 1-Lipschitz linear functional with respect to .
2.2. Diagrams of sets, distributions and empirical reductions
2.2.1. Distributions on sets
For a set we denote by
the collection of all finitely-supported probability distributions on. For a pair of distributions we denote by the total variation distance between them.
For a map between two sets we denote by the induced affine map (the map preserving convex combinations).
For define the empirical map by the assignment below. For and set
For a finite probability space the empirical distribution on is the push-forward . Thus
is a reduction of finite probability spaces. The construction of empirical reduction is functorial, that is for a reduction between two probability spaces the diagram of reductions
2.2.2. Distributions on diagrams of sets
Let denote the category of sets and surjective maps. For an indexing category , we denote by the category of -diagrams in . That is, objects in are commutative diagrams of sets indexed by , the spaces in such a diagram are sets and arrows represent surjective maps, subject to commutativity relations.
For a diagram of sets we define the space of distributions on the diagram by
If is the initial set of , then there is an isomorphism
Given a -diagram of sets and an element we can construct a -diagram of probability spaces . Note that any diagram of probability spaces has this form.
Consider a -diagram of probability spaces , where is a diagram of sets and . Let be the initial space in and be another space in . Since is initial, there is a map . Fix an atom and define the conditioned distribution on as the distribution supported in and for every defined by
Let be the distribution corresponding to under the isomorphism in (2.1). We define the conditioned -diagram .
2.3.1. The Slicing Lemma
we prove the so-called Slicing Lemma that allows to estimate the intrinsic entropy distance between two diagrams in terms of distances between conditioned diagrams. Among the corollaries of the Slicing Lemma is the following inequality. p:slicing Letbe a fan of -diagrams of probability spaces and be another diagram. Then
The fan in the assumption of the the proposition above can often be constructed in the following manner. Suppose is a -diagram and is a space in it for some . We can construct a fan by assigning to be the initial space of the (unique) minimal fan in with terminal spaces and and and to be left and right reductions in that fan, for any .
2.4. Tropical Diagrams
s:tropical-diagrams A detailed discussion of the topics in this section can be found in [MP19a].
The asymptotic entropy distance between two diagrams of the same combinatorial type is defined by
A tropical -diagram is an equivalence class of certain sequences of -diagrams of probability spaces. Below we describe the type of sequences and the equivalence relation.
A function is called an admissible function if is non-decreasing and there is a constant such that for any
An example of an admissible function will be , for .
A sequence of diagrams of probability spaces will be called quasi-linear with defect bounded by an admissible function if it satisfies
For example for a diagram , the sequence is -quasi-linear for (and for any admissible ). Such sequences are called linear.
The asymptotic entropic distance between two quasi-linear sequences and is defined to be
and sequences are called asymptotically equivalent if . An equivalence class of a sequence will be denoted and the totality of all the classes . The sum of two such equivalence classes is defined to be the equivalence class of the sequence obtained by tensor-multiplying representative sequences of the summands term-wise. In addition there is a doubly transitive action of on . In [MP19a] the following theorem is proven p:tropical-summary Let be an indexing category. Then
The space does not depend on the choice of a positive admissible function up to isometry.
The space is metrically complete.
The map is a --isometric embedding. The space of linear sequences, i.e. the image of the map above, is dense in .
There is a distance-preserving homomorphism from into a Banach space , whose image is a closed convex cone in .
The entropy functional
is a well-defined 1-Lipschitz linear map.
2.5. Asymptotic Equipartition Property for Diagrams
Among all -diagrams there is a special class of maximally symmetric ones. We call such diagrams homogeneous, see below for the definition. Homogeneous diagrams come very handy in many considerations, because their structure is easier to describe then that of general diagrams. We show below that among tropical diagrams, those that have homogeneous representatives are dense. It means, in particular, that when considering continuous functionals on the space of diagrams, it suffices to only look at homogeneous diagrams.
2.5.1. Homogeneous diagrams
A -diagram is called homogeneous if the automorphism group acts transitively on every space in , by which we mean that the action is transitive on the support of the probability measure. Homogeneous probability spaces are isomorphic to uniform spaces. For more complex indexing categories this simple description is not sufficient.
2.5.2. Tropical Homogeneous Diagrams
The subcategory of all homogeneous -diagrams will be denoted and we write for the category of minimal homogeneous -diagrams. These spaces are invariant under the tensor product, thus they are metric Abelian monoids and the general “tropicalization” described in [MP19a] can be performed. Passing to the tropical limit we obtain spaces of tropical (minimal) homogeneous diagrams, that we denote by and , respectively.
2.5.3. Asymptotic Equipartition Property
In [MP18] the following theorem is proven p:aep-complete Suppose is a -diagram of probability spaces for some fixed indexing category . Then there exists a sequence of homogeneous -diagrams such that
where is a constant only depending on and .
The approximating sequence of homogeneous diagrams is evidently quasi-linear with the defect bounded by the admissible function
Thus, Theorem LABEL:p:aep-complete above states that . On the other hand we have shown in [MP19a], that the space of linear sequences is dense in . Combining the two statements we get the following theorem.
p:aep-tropical For any indexing category , the space is dense in . Similarly, the space is dense in .
3. Conditioning of Tropical Diagrams
Let be a -diagram of probability spaces containing probability space indexed by an object .
Given an atom we can define a conditioned diagram . If the diagram is homogeneous, then the isomorphism class of is independent of , so that is a constant family. On the other hand we have shown, that the power of any diagram can be approximated by homogeneous diagrams, thus suggesting that in the tropical setting should be a well-defined tropical diagram, rather than a family. Below we give a definition of tropical conditioning operation and prove its consistency.
3.2. Classical-tropical conditioning
Here we define the operation of conditioning of classical diagram, such that the result is a tropical diagram. Let be a -diagram of probability spaces and be a space in . We define the conditioning map
by conditioning by and averaging the corresponding tropical diagrams:
where is the tropical diagram represented by a linear sequence generated by , see section 2.4. Note that the integral on the right-hand side is just a finite convex combination of tropical diagrams. Expanding all the definitions we will get for the representative sequence
3.3.1. Conditioning of Homogeneous Diagrams
If the diagram is homogeneous, then for any atom with positive weight
Recall that earlier we have defined a quantity
Now that is a tropical diagram, the expression can be interpreted in two, a priori different, ways: by the formula above and as the entropy of the object introduced in the previous subsection. Fortunately, the numeric value of it does not depend on the interpretation, since the entropy is a linear functional on .
If and are two -diagrams with , for some , then
s:cond-homo It follows that for any diagram with a space and holds
3.4. Continuity and Lipschitz property
p:cond-lip Let be a complete poset category, be two diagrams, and be two spaces in and , respectively, indexed by some . Then
Using homogeneity property of conditioning, Section 3.3.4, we can obtain the following stronger inequality. p:cond-lip-aikd In the setting of Proposition LABEL:p:cond-lip holds
Before we prove Proposition LABEL:p:cond-lip we will need some preparatory lemmas.
p:dist-cond-types Let be a -diagram of probability spaces and be a space in it. Let be the empirical reduction. Then for any and any
Proof: To prove the lemma we construct a coupling between and in the following manner. Note that there exists a permutation such that
Using that we can estimate
where denotes the isomorphism coupling of two naturally isomorphic diagrams, while denotes the “independence” coupling.
p:int-dist-cond Let be a -diagram of probability spaces and be a space in . Then
Proof: First we apply Proposition LABEL:p:slicing slicing the first argument
We will argue now that the double integral on the right-hand side grows sub-linearly with . We estimate the double integral by applying Lemma LABEL:p:dist-cond-types to the integrand
where the convergence to zero of the last double integral follows from Sanov’s theorem.
p:dist-cond Let be a -diagram and a probability space included in . Then
Proof: Let . Then
where we used Lemma LABEL:p:int-dist-cond and the fact that in the last line. We finish the proof by taking the limit .
Proof(of Proposition LABEL:p:cond-lip): We start with a note on general terminology: a reduction of probability spaces can also be considered as a fan . Then the entropy distance of is
If the reduction is a part of a bigger diagram containing also space , then the following inequality holds
be an optimal coupling between and . It can also we viewed as a -diagram of two-fans, each of which is a minimal coupling between and . Among them is the minimal fan .
We use triangle inequality to bound the distance by four summands as follows.
We will estimate each of the four summands separately. The bound for the first one is as follows.
An analogous calculation shows that
To bound the second summand we will use Corollary LABEL:p:dist-cond
We will now use Corollary LABEL:p:dist-cond with and to estimate the integrand. Then,
Combining the estimates we get
3.5. Tropical conditioning
Let be a tropical -diagram and for some . Choose a representative and denote . We define now a conditioned diagram by the following limit
Proposition LABEL:p:cond-lip-aikd guarantees, that the limit exists and is independent of the choice of representative. For a fixed the conditioning is a linear Lipschitz map
- [KSŠ12] Mladen Kovačević, Ivan Stanojević, and Vojin Šenk. On the hardness of entropy minimization and related problems. In 2012 IEEE Information Theory Workshop, pages 512–516. IEEE, 2012.
- [MP18] Rostislav Matveev and Jacobus W Portegies. Asymptotic dependency structure of multiple signals. Information Geometry, 1(2):237–285, 2018.
- [MP19a] Rostislav Matveev and Jacobus W. Portegies. Tropical diagrams of probability spaces. arXiv e-prints, page arXiv:1905.04375, May 2019.
- [MP19b] Rostislav Matveev and Jacobus W. Portegies. Tropical probability theory and an application to the entropic cone. arXiv e-prints, page arXiv:1905.05351, May 2019.
- [Vid12] Mathukumalli Vidyasagar. A metric between probability distributions on finite sets of different cardinalities and applications to order reduction. IEEE Transactions on Automatic Control, 57(10):2464–2477, 2012.