Distances between States and between Predicates

11/27/2017 ∙ by Bart Jacobs, et al. ∙ Radboud Universiteit 0

This paper gives a systematic account of various metrics on probability distributions (states) and on predicates. These metrics are described in a uniform manner using the validity relation between states and predicates. The standard adjunction between convex sets (of states) and effect modules (of predicates) is restricted to convex complete metric spaces and directed complete effect modules. This adjunction is used in two state-and-effect triangles, for classical (discrete) probability and for quantum probability.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Metric structures have a long history in program semantics, see the overview book [2]. They occur naturally, for instance on sequences, of inputs, outputs, or states. In complete metric spaces solutions of recursive (suitably contractive) equations exist via Banach’s fixed point theorem. The Hausdorff distance on subsets is used to model non-deterministic (possibilistic) computation. In general, metrics can be used to measure to what extent computations can be approximated, or are similar.

This paper looks at metrics on probability distributions, often called states. Various such metrics exist for measuring the (dis)similarity in behaviour between probabilistic computations, see e.g. [6, 11, 4]. This paper does not develop new applications, but contributes to the theory behind distances: how they arise in a uniform way and how they can be related. The paper covers standard distance functions on classical discrete probability distributions and also on quantum distributions. For discrete probability we use the total variation distance, which is a special case of the Kantorovich distance, see e.g. [13, 5, 30, 29]. For quantum probability we use the trace distance for states (quantum distributions) on Hilbert spaces, and the operator norm distance for states on von Neumann algebras. One contribution of this paper is a uniform description of all these distances on states as ‘validity’ distances.

In each of these cases we shall describe a validity relation between states and predicates , so that the validity is a number in the unit interval . This validity relation plays a central role in the definition of various distances. What we call the ‘validity’ distance on states is given by the supremum (join) over predicates in:


In general, states are closed under convex combinations. We shall thus study combinations of convex and complete metric spaces, in a category .

We also study metrics on predicates. The algebraic structure of predicates will be described in terms of effect modules. Here we show that suitably order complete effect modules are Archimedean, and thus carry an induced metric, such that limits and joins of ascending Cauchy sequences coincide. In our main examples, we use fuzzy predicates on sets and effects of von Neumann algebras as predicates; their distance can also be formulated via validity , but now using a join over states in:


The ‘duality’ between the distance formulas (1) for states and (2) for predicates is a new insight.

A basic ‘dual’ adjunction in a probabilistic setting is of the form , between effect modules and convex sets. Effect modules are the probabilistic analogues of Boolean algebras, serving as ‘algebraic probabilistic logics’ (see below for details). Convex sets capture the algebraic structure of states. This adjunction thus expresses the essentials of a probabilistic duality between predicates and states. Since predicates are often called ‘effects’ in a quantum setting, one also speaks of a duality between states and effects.

This paper restricts the this adjunction to an adjunction between directed complete effect modules and convex complete metric spaces. This restriced adjunction is used in two ‘state-and-effect’ triangles, of the form:

@C-1.5pc ^op@/^2ex/[rr] & ⊤& @/^1.5ex/[ll] &  & ^op@/^2ex/[rr] & ⊤& @/^1.5ex/[ll]
& Kℓ_fin(D)[ul]^[ur]_ & & & & ^op[ul]^[ur]_ &

Details will be provided in Section 4. Thus, the paper culminates in suitable order/metrically complete versions of the state-and-effect triangles that emerge in the effectus-theoretic [16, 9] description of state and predicate transformer semantics for probability (see also [18, 20]).

2 Distances between states

This section will describe distance functions (metrics) on various forms of probability distributions, which we collectively call ‘states’. In separate subsections it will introduce discrete probability distributions on sets and on metric spaces, and quantum distributions on Hilbert spaces and on von Neumann algebras. A unifying formulation will be identified, namely what we call a validity formulation of the metrics involved, where the distance between two states is expressed via a join over all predicates using the validities of these predicates in the two states, as in (1).

2.1 Discrete probability distributions on sets

A finite discrete probability distribution on a set is given by ‘probability mass’ function with finite support and . This support is the set . We sometimes simply say ‘distribution’ instead of ‘finite discrete probability distribution’. Often such a distribution is called a ‘state’. The ‘ket’ notation is useful to describe specific distributions. For instance, on a set we may write a distribution as . This corresponds to the probability mass function given by , and .

We write for the set of distributions on a set . The mapping forms (part of) a well-known monad on the category of sets, see e.g. [15, 17, 18] for additional information, using the same notation as used here. We write for the associated Kleisli category, and for the category of Eilenberg-Moore algebras. The latter may be identified with convex sets, that is, with sets in which formal convex sums can be interpreted as actual sums. Thus we often write ; morphisms in are ‘affine’ functions, that preserve convex sums. Convex sets have a rich history, going back to [33], see [27, Remark 2.9] for an extensive description.

Definition 1

Let be two distributions on the same set . Their total variation distance is the positive real number defined as:


The historical origin of this definition is not precisely clear. It is folklore that the total variation distance is a special case of the ‘Kantorovich distance’ (also known as ‘Wasserstein’ or ‘earth mover’s distance’) on distributions on metric spaces, when applied to discrete metric spaces (sets), see Subsection 2.2 below.

We leave it to the reader to verify that is a metric on sets of distributions , and that its values are in the unit interval .

Example 2

Consider the sets and

with ‘joint’ distribution

given by . The first and second marginal of , written as and , are: and . We immediately see that is not the same as the product of its marginals, since . This means is ‘entwined’, see [24, 18]. One way to associate a number with this entwinedness is to take the distance between and the product of its marginals. It can be computed as:

For a function there are two associated ‘transformation’ functions, namely state transformation (aka. Kleisli extension) and predicate transformation . They are defined as:


Maps are called (fuzzy) predicates on . In the special case where the outcomes are in the (discrete) subset , the predicate is called sharp. These sharp predicates correspond to subsets , via the indicator function .

For a state we write for the validity of predicate in state , defined as the expected value in . Thus, ; the latter sum is commonly written as . Further, the fundamental validity transformation equality holds: .

We conclude this subsection with a standard redescription of the total variation distance, see e.g. [13, 34]. It uses validity , as described above. Such ‘validity’ based distances will form an important theme in this paper. The proof of the next result is standard but not trivial and is included in the appendix, for the convenience of the reader.

Proposition 3

Let be an arbitrary set, with states . Then:

We write maximum ‘’ instead of join to express that the supremum is actually reached by a subset (sharp predicate). Completeness of the Kantorovich metric is an extensive topic, but here we only need the following (standard) result. Since there is a short proof, it is included.

Lemma 4

If is a finite set, then , with the total variation distance , is a complete metric space.

  • Let and be a Cauchy sequence. For each we have . Hence, the sequence is Cauchy too, say with limit . Take . This is the limit of the .

2.2 Discrete probability distributions on metric spaces

A metric on a set is called 1-bounded if it takes values in the unit interval , that is, if it has type . We write for the category with such 1-bounded metric spaces as objects, and with non-expansive functions between them, satisfying . From now on we assume that all metric spaces in this paper are 1-bounded. For example, each set carries a discrete metric, where points have distance if they are equal, and otherwise.

For a metric space and two functions from some set to there is the supremum distance given by:


A ‘metric predicate’ on a metric space is a non-expansive function . These predicates carry the above supremum distance . We use them in the following definition of Kantorovich distance, which transfers the validity description of Proposition 3 to the metric setting.

Definition 5

Let be two discrete distributions on (the underlying set of) a metric space . The Kantorovich distance between them is defined as:


This makes a (1-bounded) metric space.

The Kantorovich-Wasserstein duality Theorem gives an equivalent description of this distance in terms of joint states and ‘couplings’, see [28, 34] for details. Here we concentrate on relating the Kantorovich distance to the monad structure of distributions. The next lemma collects some basic, folkore facts.

Lemma 6

Let be metric spaces.

  1. The unit function given by is non-expansive.

  2. For each non-expansive function the corresponding state transformer from (4) is non-expansive.

    As special cases, the multiplication map of the monad is non-expansive, and validity in its first argument as well.

  3. If and are non-expansive, then so is . Moreover, the function is itself non-expansive, wrt. the supremum distance (5).

    As a result, validity is non-expansive in its second argument too.

  4. Taking convex combinations of distributions satisfies: for ,

  • We do points (1) and (4) and leave the others to the reader. The crucial point that we use to show for (1) is that the unit map is non-expansive is: . Hence we are done because the join in (6) is over non-expansive functions in:

    For point (4) we first notice that for and ,

    where is used as (non-expansive) predicate on . Hence for with ,

Corollary 7

The monad on lifts to a monad, also written as , on the category , and commutes with forgetful functors, as in:

@R-0.5pc [d][rr]^-D & & [d]
[rr]^-D & &

We write for the category of Eilenberg-Moore algebras of this lifted monad, with ‘convex metric spaces’ as objects, see below.

The lifting (7) can be seen as a finite version of a similar lifting result for the ‘Kantorovich’ functor in [5]. This captures the tight Borel probability measures on a metric space . The above lifting (7) is a special case of the generic lifting of functors on sets to functors on metric spaces described in [3] (see esp. Example 3.3).

The category of the monad contains convex metric spaces, consisting of:

  1. a convex set , that is, a set with an Eilenberg-Moore algebra of the distribution monad on ;

  2. a metric on ;

  3. a connection between the convex and the metric structure, via the requirement that the algebra map is non-expansive: , for all distributions .

The maps in are both affine and non-expansive. We shall write for the full subcategory of convex complete metric spaces.

Example 8

The unit interval is a convex metric space, via its standard (Euclidean) metric, and its standard convex structure, given by the algebra map defined by the ‘expected value’ operation:

The identity map is a predicate on that satisfies:

This allows us to show that is non-expansive:

In fact, we can see this as a special case of non-expansiveness of multiplication maps from Lemma 6 (2): indeed, , for the two-element set , and the algebra corresponds to the multiplication .

2.3 Density matrices on Hilbert spaces

The analogue of a probability distribution in quantum theory is often simply called a state. We first consider states of Hilbert spaces (over ), and consider the more general (and abstract) situation of states on von Neumann algebras in subsection 2.5.

A state of a Hilbert space is a density operator, that is, it is a positive linear map whose trace is one: . Recall that the trace of a positive operator is given by , where  is any orthonormal basis for ; this value does not depend on the choice of basis , but might equal  [1, Def. 2.51]. The same formula also works for when  is not necessarily positive, but bounded with — where and is the adjoint of  and where the square root is determined as the unique positive operator with . Such , which are aptly called trace-class operators, always have finite trace: , see [1, Def. 2.5{4,6}]. When  is finite dimensional, any operator  is trace-class, and when represented as a matrix, its trace can be computed as the sum of all elements on the diagonal. If  is a density operator, then the associated matrix is called a density matrix. We refer for more information to for instance [1], and to [31, 32, 35] for the finite-dimensional case.

A linear map is called self-adjoint if and positive if it is of the form . This yields a partial order, with iff is positive. A predicate on is a linear map with . It is called sharp (or a projection) if . Predicates are also called effects. We write for the set of effects of . For a state of the validity is defined as the trace . To make sense of this definition we should mention that the product of bounded operators is trace-class when either  or  is trace-class [1, Def. 2.54] — so  is trace-class because  is.

Definition 9

Let be two quantum states of the same Hilbert space. The trace distance between them is defined as:


This definition involves the square root of a positive operator . With the examples below in mind it is worth pointing out that in the finite-dimensional case — when  is essentially a positive matrix — the square root of  can be computed by first diagonalising the matrix , where is a diagonal matrix; then one forms the diagonal matrix by taking the square roots of the elements on the diagonal in ; finally the square root of is .

The trace distance is an extension of the total variation distance : given two discrete distributions on the same set, then the union of their supports is a finite set, say with elements. We can represent via diagonal matrices as density operators . They are states, by construction. Then .

Example 10

We describe the quantum analogue of Example 2

, involving the ‘Bell’ state. As a vector in

the Bell state is usually described as . The corresponding density matrix is the following matrix.

Its two marginals (partial traces) are equal matrices, namely:

The product state is obtained as Kronecker product, see e.g. [31].

We can now ask the same question as in Example 2, namely what is the distance between the Bell state

and the product of its marginals. We recall that the Bell state is ‘maximally entangled’ and that the quantum theory allows, informally stated, higher levels of entanglement than in classical probability theory. Hence we expect an outcome that is higher than the value

obtained in Example 2 for the classical maximally entwined state.

The key steps are:


In the earlier version of this paper [19] these distance computations are generalised to -ary products, both for classical and for quantum states. Both distances then tend to , as goes to infinity, but the classical distance is one step behind, via formulas versus . Here we only consider .

The following result is a quantum analogue of Proposition 3. Our formulation generalises the standard formulation of e.g. [31, §9.2] and its proof to arbitrary, not necessarily finite-dimensional Hilbert spaces. We’ll see an even more general version involving von Neumann algebras later on.

Proposition 11

For states on the same Hilbert space ,

As before, the maximum means the supremum is actually reached by a sharp effect. The proof of this result is in the appendix.

2.4 Preliminaries on von Neumann algebras

Our final example of a distance function requires a short introduction to von Neumann algebras. We do not however pretend to explain the basics of the theory of von Neumann algebras here; for this we refer to [26]. We just recall some elementary definitions and facts which are relevant here.

To define von Neumann algebras we must speak about -algebras first.

Definition 12

A -algebra  is a complex vector space endowed with:

  1. an associative multiplication that is linear in both coordinates;

  2. an element , called unit, such that for all ;

  3. a unary operation , called involution, such that , , , and for all  and ;

  4. a complete norm, , with and for all .

N.B. In the literature the unit is usually not included as part of the definition of -algebra, and what we have defined above is called a unital -algebra instead.

Two types of elements deserve special mention: an element  of a -algebra  is called self-adjoint when , and positive when  for some .111In [26] a different but in the end equivalent definition of “positive” is used, see Theorem 4.2.6 of [26]. Elementary matters relating to self-adjoint elements are usually easily established: the reader should have no trouble verifying, for example, that every element  of a -algebra  can be written as for unique self-adjoint  (namely, and .) On the other hand, the everyday properties of the positive elements are often remarkably difficult to prove from basic principles, such as the facts that the sum of positive elements is positive, that the set  of positive elements of  is norm closed (see parts (iii) and (i) of Theorem 4.2.2 of [26]), that every positive element  has a unique positive square root, (see Theorem 4.2.6(ii) of [26]), and that every self-adjoint element  of  may be written uniquely as where  with  (see Proposition 4.2.3(iii) of [26]).

The elements of a -algebra are ordered by when is positive. We write for subset of effects ; they will be used as quantum predicates. Such an effect is called sharp (or a projection) if .

Definition 13

A -algebra is a von Neumann algebra (aka. -algebra) if firstly the unit interval is a directed complete partial order (dcpo), and secondly the positive linear functionals that preserve these (directed) suprema separate the elements of . In the notation introduced below this means that follows if for all states .

There are several equivalent alternative definitions of the notion of ‘von Neumann algebra’, but this one, essentially due to Kadison (see [25]), is most convenient here.

We consider as morphisms between von Neumann algebras: linear maps which are unital (that is, ), positive ( implies ) and normal. The latter normality requirement means that the restriction preserves directed joins (i.e. is Scott continuous). This yields a category of von Neumann algebras. It occurs naturally in opposite form, as .

Each non-zero map in has operator norm equal to 1, i.e. , where . Below we apply the operator norm to a (pointwise) difference of parallel maps in . Using as distance, each homset of is a complete metric space.222Here’s a proof that  is complete: We must show that a Cauchy sequence in converges. By Theorem 1.5.6 of [26] -converges to a bounded linear map . It’s clear that  will be unital, and positive (since the norm-limit of positive elements of  is again positive, see Theorem 4.2.2 of [26]), so it remains to be shown that  is normal. Given directed  we must show that . For this it suffices to show that for all positive normal linear functionals , which is the case when  is normal. But since  -converges to , this is indeed so (because the predual of  is complete, see the text under Definition 7.4.1 of [26].)

2.5 States of von Neumann algebras

A state of a von Neumann algebra is a morphism in . We write for the set of states; it is easy to see that it is a convex set. For an effect we write for the value . When is the von Neumann algebra of bounded operators on a Hilbert space , then ‘effect’ has a consistent meaning, since . Moreover, density operators on are in one–one correspondence with states of , via ; in fact, this correspondence extends to a linear bipositive isometry between trace-class operators on  and normal — but not necessarily positive — functionals on  (see [1, Thm 2.68]).

For states of von Neumann algebras we use half of the operator norm as distance, since it coincides with the ‘validity’ distance whose formulation is by now familiar. The proof is again delegated to the appendix.

Proposition 14

Let be two states of a von Neumann algebra . Their validity distance , as defined on the left below, satisfies:

Via the last equation it is easy to see that is a complete metric.

Corollary 15

Let be a von Neumann algebra.

  1. For each predicate the ‘evaluate at ’ map is both affine and non-expansive.

  2. The convex map is non-expansive.

  3. The ‘states’ functor restricts to a functor .

    1. It is standard that the map is affine, so we concentrate on its non-expansiveness: for states we have:

    2. Suppose we have two formal convex combinations and in . The map is non-expansive since:

    3. We have to prove that for a positive unital map between von Neumann algebras the associated state transformer is affine and non-expansive. The former is standard, so we concentrate on non-expansiveness. Let be states of . Then:

3 Distances between effects (predicates)

There are several closely connected views on what are predicates in a probabilistic setting. Informally, one can consider fuzzy predicates on a space , or only the sharp ones . Instead of restricting oneself to truth values in , one can use -valued predicates , which are often called ‘observables’. Alternatively, one can restrict to the non-negative ones . There are ways to translate between these views, by restriction, or by completion. The relevant underlying mathematical structures are: effect modules, order unit spaces, and ordered cones. Via suitable restrictions, see [22, Lem. 13, Thm. 14] for details, the categories of these structures are equivalent. Here we choose to use effect modules because they capture -valued predicates, which we consider to be most natural. Moreover, there is a standard adjunction between effect modules and the convex sets that we have been using in the previous section. This adjunction will be explored in the next section.

In this section we recall some basic facts from the theory of effect modules (see [16, 9, 23]), and add a few new ones, especially related to -joins and metric completeness, see Proposition 18. With these results in place, we observe that in our main examples — fuzzy predicates on a set and effects in a von Neumann algebras — the induced ‘Archimedean’ metric can also be expressed using validity , but now in dual form wrt. the previous section: for the distance between two predicates we now take a join over all states and use the validities of the two predicates in these states.

We briefly recall what an effect module is, and refer to [16] and its references for more details. This involves three steps.

  1. A partial commutative monoid (PCM) is given by a set with an element and a partial binary operation which is commutative and associative, in a suitably partial sense,333That is: is defined iff is defined, and they’re equal in that case; and is defined iff is defined, and they’re equal in that case. and has has unit element.

  2. An effect algebra is a PCM in which each element has a unique orthosupplement with , where . Moreover, if is defined, then . Each effect algebra carries a partial order given by: iff for some . It satisfies iff . For more information on effect algebras we refer to [12].

  3. An effect module is an effect algebra with a (total) scalar multiplication operation which acts as a bihomomorphism: it preserves in each coordinate separately scalar multiplications and partial sums , when defined, and maps the pair to .

We write for the category of effect modules. A map in preserves , sums , when they exist, and scalar multiplication; such an then also preserves orthosupplements. There are (non-full) subcategories of directed complete and -complete effect modules, with joins of directed (or countable ascending) subsets, with respect to the existing order of effect algebras. The sum and scalar multiplication operations are required to preserve these joins in each argument separately444In fact, it can be shown that maps preserve joins automatically, see Lemma 17 (1i). Preservation by scalar multiplication can also be proved, but is outside the scope of this paper.. Since taking the orthosupplement is an order anti-isomorphism it sends joins to meets and vice-versa. In particular, /directed meets exist in -/directed complete effect modules. Morphisms in and are homomorphisms of effect modules that additionally preserve the relevant joins.

Below it is shown how this effect module structure arises naturally in our main examples. The predicate functors are special cases of constructions for ‘effectuses’, see [16].

Lemma 16
  1. For the distribution monad on there is a ‘predicate’ functor on its Kleisli category:

    This functor is faithful, and it is full (& faithful) if we restrict it to the subcategory with finite sets as objects.

  2. There is also a ‘predicate’ functor:

    This functor is full and faithful.

Writing on both sides in point (2) looks rather formal, but makes sense since the category of von Neumann algebras is naturally used in opposite form, see also the next section.

    1. It is easy to see that the set of fuzzy predicate on a set is an effect module, in which a sum exists if for all , and in that case . Clearly, and for a scalar . The induced order on is the pointwise order, which is (directed) complete.

      For a Kleisli map the predicate transformation map from (4) preserves the effect module structure. Moreover, it is Scott-continuous by the following argument. Let be a directed collection of predicates, and let . Write the support of as . Then:

      Assume for , and let , . Write for the singleton predicate that is on and zero everywhere else. Then . Hence , showing that is faithful.

      Now let be finite sets and a map in . Define . We claim that is a distribution on , say, and that . This works as follows.