Nawrotzki's Algorithm for the Countable Splitting Lemma, Constructively

We reprove the countable splitting lemma by adapting Nawrotzki's algorithm which produces a sequence that converges to a solution. Our algorithm combines Nawrotzki's approach with taking finite cuts. It is constructive in the sense that each term of the iteratively built approximating sequence as well as the error between the approximants and the solution is computable with finitely many algebraic operations.

Authors

• 6 publications
• 3 publications
• On the incomputability of computable dimension

Using an iterative tree construction we show that for simple computable ...
04/30/2019 ∙ by Ludwig Staiger, et al. ∙ 0

• Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems

We adapt the Douglas-Rachford (DR) splitting method to solve nonconvex f...
09/30/2014 ∙ by Guoyin Li, et al. ∙ 0

• Splitting methods for solution decomposition in nonstationary problems

In approximating solutions of nonstationary problems, various approaches...
08/18/2020 ∙ by Yalchin Efendiev, et al. ∙ 0

• Privacy-preserving Data Splitting: A Combinatorial Approach

Privacy-preserving data splitting is a technique that aims to protect da...
01/18/2018 ∙ by Oriol Farràs, et al. ∙ 0

We present an optimization-based approach to radiation treatment plannin...
05/04/2021 ∙ by Anqi Fu, et al. ∙ 0

• A Linearly-growing Conversion from the Set Splitting Problem to the Directed Hamiltonian Cycle Problem

We consider a direct conversion of the, classical, set splitting problem...
02/27/2019 ∙ by Michael Haythorpe, et al. ∙ 0

• Controllable Text Simplification with Explicit Paraphrasing

Text Simplification improves the readability of sentences through severa...
10/21/2020 ∙ by Mounica Maddela, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Explanation of what is going on …

Given a measure on a product space , the -th marginal of is the push-forward of under the -th canonical projection . Explicitly, this is

 μj(A):=μ(π−1j(A))

for all with being measureable.

In his fundamental paper [18] Strassen investigated the existence of measures on a product which have prescribed marginals and satisfy additional constraints of a certain form. The result stated in Section 1 below is known as Strassen’s theorem on stochastic domination111The result is a corollary of [18, Theorem 11]. Curiously, it is not even explicitly stated in Strassen’s paper, but only mentioned in one sentence. . The stated variant is taken from [17, Corollary 7]222A different proof can be found in [13]. . To formulate it, we need some notation.

• Let be a Hausdorff space, and let be a partial order on which is closed as a subset of . A subset is upward closed w.r.t. , if

 ∀x∈X,y∈A.y≼x⇒x∈A.
• For two positive Borel measures on we write , if for all upward closed Borel sets it holds that .

Let be a Hausdorff space, let be a closed partial order on , and let and

be two probability (Borel-) measures on

. If , then there exists a probability (Borel-) measure on which has the marginals and , and whose support is contained in .

An important particular case of Section 1 is when the base space is finite or countable with the discrete topology.

Over the years this result was established on different levels of generality; some papers are [3], [2], [5], [6], [11], [14], [10]. Some predecessors of Strassen’s work are [12, 16].

Section 1

plays an important role in probability theory and has applications in various areas. For example, it prominently occurs in finance mathematics, e.g.

[1, 4], or in computer science, e.g. [7, 8, 9].

The proof of Section 1 relies in general on a rather heavy analytic machinery, in particular, on theorems exploiting compactness properties. If is finite, a required solution can – naturally – be found by an algorithm which terminates after finitely many steps. This fact can be based on various reasoning. For example on elementary manipulations with inequalities, as e.g. in [12, §3], or combinatorial results like the max-flow min-cut theorem or the subforest lemma, as e.g. in [15] or [8, Theorem 4.10].

In the present exposition we deal with the countable discrete case. Our aim is to give a recursive algorithm which produces a sequence of (discrete) probability measures on such that

1. each term of the sequence is computable from the inital data with a finite number of algebraic operations;

2. the sequence converges to a solution in the -norm on , in particular it converges pointwise;

3. the speed of pointwise convergence can be controlled in a computable way.

To explain our contribution, it is worthwhile to revisit the presently availabe proofs for the countable discrete case. First, specialising the general proof(s) of Section 1 obviously does not lead to an algorithm, since tools like e.g. the Banach-Alaoglu Theorem are used. More interesting are the arguments given in the papers of Kellerer [12, §4] and Nawrotzki [16]. Both are inconstructive, but for different reasons.

• Kellerer’s approach is to reduce to the finite cases. Given on a countable set, he produces appropriately cut-off data , , and solves the problem for those. This gives a measure on , which solves the problem up to the index . Each measure can be computed in finitely many steps. Sending the cut-off point to infinity leads to existence of a solution for the full data . The masses of the measures may oscillate, and therefore the sequence need not be convergent. However, each accumulation point of the sequence will be a solution.

What makes the method inconstructive is that accumulation points exist by compactness (in this case applied in the form of the Heine-Borel Theorem).

• Nawrotzki’s approach is to produce a sequence , which does not necessarily solve the problem on any finite section, but still converges to a solution. His construction ensures that the masses of the measures are nonincreasing on points of the diagonal and nondecreasing off the diagonal. This ensures that passing to subsequences is not necessary.

What makes the method inconstructive is that defining the measures requires to evaluate sums of infinite series and infima of infinite sets of real numbers.

Our idea to produce with 1.3. above, is to combine the approaches: we apply Nawrotzki’s algorithm to appropriately truncated sequences to ensure computability, and control the error which is made by passing to cut-off’s to ensure convergence.

2 Nawrotzki’s algorithm

In [16], which preceeds the work of Strassen, Nawrotzki proved a discrete version of Strassen’s theorem. In our present language his result reads as follows.

Let and be sequences of real numbers, such that

 ∀n∈N.μn≥0∧νn≥0%and∑n∈Nμn=∑n∈Nνn=1, (1)

Moreover, let be a partial order on .

If it holds that

 ∀R⊆N upwards closed w.r.t.\ ≼.∑n∈Rμn≤∑n∈Rνn, (2)

then there exists an infinite matrix of real numbers, such that

 ∀n,m∈N.λn,m≥0and∑n,m∈Nλn,m=1, (3) ∀n,m∈N.λn,m≠0⇒n≼m, (4) ∀n∈N.∑m∈Nλn,m=μn, (5) ∀m∈N.∑n∈Nλn,m=νm. (6)

In this section we present Nawrotzki’s argument in a structured way including all details. This provides an in-depth understanding of his work, and this is necessary to make appropriate adaption to the algorithm later on (in Section 3).

Before we dive into the formulas and proofs, which are a bit technical and lengthy, let us give an intuition for what is going to happen.

Assume we are given data satisfying Equations 2 and 1 and a (probably bad) approximation of a solution that satisfies Equations 4 and 3, as well as Equation 5. Note that achieving correctness of one marginal, i.e. satisfying Equation 5, is very easy; for example already the diagonal matrix with ’s on the diagonal will satisfy this.

If the column sums do not give the correct results as required by Equation 6, it must be that some of them are larger than the target value and some of them are smaller since the total sum is always . Now we want to modify the values to improve the approximation, i.e., make the error in Equation 6 smaller while retaining all other properties. Most importantly, we have to ensure that Equation 2, also known as stochastic dominance, is inherited. In addition, we want to make the modification in such a way that:

1. At each place entries change monotonically when repeating the step in the algorithm. This is achieved by having diagonal entries nonincreasing and off-diagonal entries nondecreasing. This will guarantee existence of a limit.

2. Make sure that the pattern of which column sums are too large and which are too small is inherited with exception that some column sums may become correct. This will guarantee that the algorithm can proceed appropriately.

The algorithm proceeds in steps. In each step exactly two values of the matrix change: one at the diagonal at position and another in the same row at position such that Equation 6 fails for and , as pictured below. The new values are and , where is chosen such that still , .

In the picture, filled circles indicate those points where our approximation has nonzero entries, circled dots mark the changes made by one step of the algorithm, and is the correction term whose exact definition (see Section 2) is taylor made so that the above explained requirements are met.

&& -th column && -th column &

&& && & ∙

&&&& ∙&

&&& ∙&& ∙

&& && &

∙& ∙&&&&

∙&& [-,dotted]uuuuuu & ∙& [-,dotted]uuuuuu &

[-3mm] && && &

The next result, Section 2, is the first crucial ingredient to Nawrotzki’s algorithm (out of two; the second is Section 2 further below). It will ensure that in the limit a solution is obtained. To formulate it, we need additional notation.

Let be a partial order on . For each with , we denote

Note that is always nonempty. For example, we have

 {l∈N∣m≼l}∈Rn,m.

Assume that , , and , satisfy Equation 1 and Equation 2. If for each pair with at least one of

 μn≤νn, (7) μm≥νm, (8) infR∈Rn,m∑l∈R(νl−μl)=0, (9)

holds, then .

Note here that all series in Equation 9 converge absolutely and that by Equation 2 the infimum in Equation 9 is nonnegative. Moreover, in an algorithm acting as explained in Section 2 above (and defined in precise mathematical terms in Section 2 below), using instead of all upwards closed sets is sufficient to retain Equation 2. This is because for upwards closed sets which are not in , Equation 2 is trivially inherited.

In the proof of Section 2, we use the following simple fact.

Assume that , , and , satisfy Equation 1 and Equation 2. Further, let be a (finite or infinite) sequence of upward closed (w.r.t. ) subsets of , and set

 R:=⋃kRk.

Then is upward closed, and

 ∑l∈R(νl−μl)≤∑k∑l∈Rk(νl−μl).
Proof.

By absolute convergence we may rearrange the sum on the left side without changing its value. Now write as the disjoint union

 R=˙⋃kR′k

where

 R′k:=Rk∖⋃j

Then

 ∑l∈R(νl−μl)=∑k∑l∈R′k(νl−μl).

For each we have

 ∑l∈Rk(νl−μl)=∑l∈R′k(νl−μl)+∑Rk∩⋃j

The set is upward closed, and hence the second summand on the right side is nonnegative. This shows that

 ∑l∈R′k(νl−μl)≤∑l∈Rk(νl−μl)

for all . ∎

Proof of Section 2.

It is enough to show that for all . Assume towards a contradiction that there exists with , and fix one with this property. Moreover, choose small enough, say,

 ϵ:=13(μn−νn).

By the assumption of the proposition we know that for each with at least one of

• ,

• ,

must hold.

Consider the set where the second case takes place

 H:={m∈N∣n≺m,infR∈Rn,m∑l∈R(νl−μl)=0}.

If , it is easy to reach a contradiction. Namely, if for all , then

 ∑m≽nμm>∑m≽nνm,

If , we argue as follows. For each choose , such that

 ∑l∈Rm(νl−μl)≤ϵ2m,

and set . Then , , and

 ∑l∈R(νl−μl)≤∑m∈H∑l∈Rm(νl−μl)≤∑m∈Hϵ2m≤2ϵ.

Consider the upward closed set

 R′:=R∪{l∈N∣n≺l}.

If , then and . Thus we must have . From this we see that

 0≤∑l∈R′(νl−μl)=∑l∈R(νl−μl)+∑l∈R′∖R(νl−μl)≤∑l∈R(νl−μl)≤2ϵ.

The set

is also upward closed. Using the above estimate, and recalling that

 0≤∑l∈R′∪{n}(νl−μl)=∑l∈R′(νl−μl)+(νn−μn)≤2ϵ+(νn−μn)=13(νn−μn)<0.

Nawrotzki’s algorithm for the proof of Section 2 proceed in three steps:

2. Iteratively modify this matrix in such a way, that the set of all points where all of Equation 7Equation 9 fail (for certain modified sequences), gets smaller in each step.

3. Pass to the limit, so to reach a situation where Section 2 applies.

The single steps of the recursive process 2. are realised by maps which act on . To define those maps, we first introduce an abbreviation for row- and column sums of a matrix. Given , we denote

 λ∗,m:=∑n∈Nλn,m,λn,∗:=∑m∈Nλn,m.

Note that these series converge absolutely since .

Let and . We define maps

 ανn,m:ℓ1(N×N)→[0,∞),Φνn,m:ℓ1(N×N)→ℓ1(N×N).
• For set

 ανn,m(Λ):=min{λ∗,n−νn,νm−λ∗,m,infR∈Rn,m∑l∈R(νl−λ∗,l)},

if and this minimum is positive, and set otherwise.

• For let be the matrix with the entries

Note that is well-defined, since implies that , and since it is obvious that is again summable.

Let us collect some more obvious properties of the transformations .

For each and , the following statements hold.

1. ,

2. ,

Having just means that at the point one of Equation 7Equation 9 holds for the sequences and . Moreover, in this case, does not change . We are interested to see what happens if .

Let and . Then we set

 S(Λ):={(n,m)∈N×N∣ανn,m(Λ)>0}.

Moreover, we denote by and the projections of onto the first and second, respectively, component.

To avoid bulky notation, we do not explicitly notate the dependency on . Moreover, observe that is contained in and does not intersect the diagonal, in fact,

 π1(S(Λ))∩π2(S(Λ))=∅.

In the next proposition we show that preserves several relevant properties and indeed shrinks the set .

Let , , and assume that

 ∀n,m∈N.λn,m≥0and∑n,m∈Nλn,m=1, (10) ∀n∈π1(S(Λ)).λ∗,n=λn,n. (11) ∀R⊆N upward closed w.r.t.\ ≼.∑l∈Rλ∗,l≤∑l∈Rνl, (12)

Further, let , and assume that . Then

1. satisfies Equation 10, Equation 11, and Equation 12,

2. .

Proof.

To shorten notation, we write

 Λ′=(λ′n,m)n,m∈N:=Φνn′,m′(Λ).

We start with showing that satisfies Equation 10 and Equation 12. Let . Then and hence is nonnegative. For we use (11) to obtain

 λ′n′,n′=λn′,n′−ανn′,m′(Λ)=λ∗,n′−ανn′,m′(Λ)≥νn′≥0.

Obviously, applying does not change the total sums of the entries of a matrix. Thus

 ∑n,m∈Nλ′n,m=∑n,m∈Nλn,m=1.

We see that Equation 10 holds.

Let be upward closed. If , then

 ∑l∈Rλ′∗,l≤∑l∈Rλ∗,l≤∑l∈Rνl.

Next, for

 ∑l∈Rλ′∗,l=∑l∈Rλ∗,l+ανn′,m′(Λ), (13)

and from this we find

 ∑l∈Rλ′∗,l=∑l∈Rλ∗,l+ανn′,m′(Λ)≤∑l∈Rλ∗,l+∑l∈R(νn−λ∗,l)=∑l∈Rνl.

Thus Equation 12 holds.

Now we come to the proof of 2.. This is the major part of the argument.

In the first step we show that . We make a case distinction according to which term is the minimum in the definition of .

• Case :
Then , and hence . In particular, .

• Case :
Then , and hence . In particular, .

• Case :
Recalling Equation 13, we find

Thus also in this case .

In the second step, we show that . Assume towards a contradiction that . Explicitly this means that

 n≺m ∧ λ′∗,n>νn ∧ λ′∗,m<νm ∧ infR∈Rn,m∑l∈R(νl−λ′∗,l)>0 ∧ [ λ∗,n≤νn ∨ λ∗,m≥νm ∨ infR∈Rn,m∑l∈R(νl−λ∗,l)=0 ]

We distinguish cases according to the disjunction in the square bracket.

• Case :
The sum of the -th column increases, and thus we must have . This implies

 λ′∗,n=λ′∗,m′=λ∗,m′+ανn′,m′(Λ)≤νm′=νn,

which contradicts the second term in the conjunction.

• Case :
The sum of the -th column decreases, and thus we must have . This implies

 λ′∗,m=λ′∗,n′=λ∗,n′−ανn′,m′(Λ)≥νn′=νm,

which contradicts the third term in the conjunction.

• Case :
Choose such that

 ∑l∈R′(νl−λ∗,l)

Then, in particular, the value of the sum over all decreases, and we must have and . Since is upward closed and , this is a contradiction.

The proof of 2. is complete.

It remains to deduce Equation 11. Let . Then also , and therefore and . From the first property we obtain that the -th column is modified at most at its diagonal entry, and now the second implies that . ∎

Next, we investigate iterative application of maps . Start with , , and a sequence of points in . From this data, we built the sequence where

 Λ(k):=[Φνnk,mk∘⋯∘Φνn1,m1](Λ(0)). (14)

It turns out that, in the situation of Section 2, sequences of this form converge. In fact, they do so because of a very simple reason, namely, monotonicity.

Let be a sequence in , such that

and that there exists a partition such that is nondecreasing for all and nonincreasing for all .

Then the limit exists in the -norm.

Proof.

Each of the sequences is monotone and bounded, hence convergent. Denote . We have to show that the pointwise limit is actually attained in the -norm. To this end we split the corresponding sum according to the given partition.

For each the sequence is nondecreasing, and hence the monotone convergence theorem yields

 ∑(n,m)∈Aλn,m=limk→∞∑(n,m)∈Aλ(k)n,m≤supk∈N∥Λ(k)∥1<∞.

Since , we may now refer to the bounded convergence theorem to obtain that

For each and we have

 λ(0)n,m≥λ(k)n,m≥λ(k)n,m−λn,m≥0.

Since , the bounded convergence theorem applies, and we find that

Assume that satisfies Equation 10 and Equation 11, let be any sequence, and let be defined by Equation 14. Then the limit

 Λ:=limk→∞Λ(k)

exists w.r.t. the -norm.

Proof.

Since is always nonnegative, a partition of required to apply Section 2 is obtained by taking the diagonal as the set . ∎

Now we show that, when passing to a limit, the set can be controlled.

Let be a sequence in which converges in the -norm, and denote . Then

 S(Λ)⊆⋃N∈N⋂k≥NS(Λ(k)).
Proof.

Let , and set . Choose such that

 ∀k≥N.∥Λ(k)−Λ∥1≤ϵ.

Then for all

 λ(k)∗,n≥λ∗,n−ϵ≥νn,λ(k)∗,m≤λ∗,m+ϵ≤νm,

and for all

 ∑l∈R(νl−λ(k)∗,l)≥∑l∈R(νl−λ∗,l)−ϵ≥ϵ>0

Thus . ∎

We have collected all the neccessary tools needed for the proof of Section 2.

Proof of Section 2.

Let , , and , be given, and assume that Equation 1 and Equation 2 hold.

Let be the diagonal matrix built from , i.e.,

 λ(0)n,m:={μnif n=m,0otherwise. (15)

Choose a sequence of points in which covers . For example, every enumeration of certainly has this property. Now define by Equation 14 using this sequence.

By Section 2, each satisfies Equation 10, Equation 11, and Equation 12. Moreover,

 S(Λ(k))⊆S(Λ(0))∖{(n1,m1),…,(nk,mk)}.

The limit

 Λ=(λn,m)n,m∈N:=limk→∞Λ(k)

exists in the -norm by Section 2, and by Section 2.

Clearly, Equation 3Equation 5 hold for . By virtue of Section 2, we may apply Section 2 with the sequences and , and obtain that also Equation 6 holds. ∎

We refer to the procedure carried out in this proof as Nawrotzki’s algorithm being performed along the sequence .

For later use, we observe the following fact. Let be a sequence produced by an application of Nawrotzki’s algorithm. Then off-diagonal elements change their value at most once when runs through . Namely, only when and it happens that .

3 A constructive variant of the algorithm

Nawrotzki’s proof of Section 2 is inconstructive for the following reason:

• The set is in general infinite, and its elements themselves are in general infinite.

Because of this, computing the numbers requires to evaluate the sum of infinite series and an infimum of an infinite set. Hence, it is not possible to compute any term of the sequence , which converges to a solution matrix , with a finite number of algebraic operations.

Our aim is to give a proof of Section 2 which is more constructive in the following sense.

Let be given such that Equation 1 and Equation 2 hold. Then there exists a sequence of matrices in with the following properties.

1. Each can be computed from the given data and by a finite number of algebraic operations.

2. The limit exists in the -norm and satisfies Equation 3Equation 6.

As usual we use the notation and .

1. For each fixed with , and for each , a number with the property that

 ∀k≥k0.|δ(k)n,m−δn,m|≤ϵ

can be computed from the given data and by a finite number of algebraic operations

While the speed of pointwise convergence is controlled by the assertion in item 3. (even in a constructive way), we have no control of the speed of -convergence.

The idea to prove this theorem is the simplest possible: we consider cut-off data , instead of , , apply Nawrotzki’s algorithm to the truncated data, and then send the cut-off point to infinity. Realising this idea, however, requires some work.

We start with discussing convergence matters. The error when using cut-off’s instead of the full data can be controlled using the following general perturbation lemma.

Let , , and . Then

 ∣∣ανn,m(Λ)−α~νn,m(~Λ)∣∣≤∥Λ−~Λ∥1+∥ν−~ν∥1. (16)
Proof.

We have

 ∣∣(λ∗,n−νn)−(~λ∗,n−~νn)∣∣≤∑l∈N|λl,n−~λl,n|+|νn−~νn|≤∥Λ−~Λ∥1+∥ν−~ν∥1,

and in the same way

 ∣∣(λ∗,m−νm)−(~λ∗,m−~νm)∣∣≤∑l∈N|λl,m−~λl,m|+|νm−~νm|≤∥Λ−~Λ∥1+∥ν−~ν∥1.

Next let . Then

 ∣∣∑l∈R(νl−λ∗,l)−∑l∈R(~νl−~λ∗,l)∣∣≤≤∑l∈R∑k∈N|λk,l−~λk,l|+∑l∈R|νl−~νl|≤∥Λ−~Λ∥1+∥ν−~ν∥1.

It follows that

 ∣∣∣inf({λ∗,n− νn,νm−λ∗,m}∪{∑l∈R(νl−λ∗,l)∣R∈Rn,m}) −inf({~λ∗,n−~νn,~νm−~λ∗,m}∪{∑l∈R(~νl−~λ∗,l)∣R∈Rn,m})∣∣∣ ≤∥Λ−~Λ∥1+∥ν−~ν∥1

This is Equation 16 if . Otherwise , and the required estimate holds trivially. ∎

Let