 # On Bayesian Consistency for Flows Observed Through a Passive Scalar

We consider the statistical inverse problem of estimating a background fluid flow field v from the partial, noisy observations of the concentration θ of a substance passively advected by the fluid, so that θ is governed by the partial differential equation ∂/∂ tθ(t,x) = -v(x) ·∇θ(t,x) + κΔθ(t,x) , θ(0,x) = θ_0(x) for t ∈ [0,T], T>0 and x∈T=[0,1]^2. The initial condition θ_0 and diffusion coefficient κ are assumed to be known and the data consist of point observations of the scalar field θ corrupted by additive, i.i.d. Gaussian noise. We adopt a Bayesian approach to this estimation problem and establish that the inference is consistent, i.e., that the posterior measure identifies the true background flow as the number of scalar observations grows large. Since the inverse map is ill-defined for some classes of problems even for perfect, infinite measurements of θ, multiple experiments (initial conditions) are required to resolve the true fluid flow. Under this assumption, suitable conditions on the observation points, and given support and tail conditions on the prior measure, we show that the posterior measure converges to a Dirac measure centered on the true flow as the number of observations goes to infinity.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this work we consider the inverse problem of estimating a background fluid flow from partial, noisy observations of a dye, pollutant, or other solute advecting and diffusing within the fluid. The physical model considered is the two-dimensional advection-diffusion equation on the periodic domain :

 ∂∂tθ(t,x)=−v(x)⋅∇θ(t,x)+κΔθ(t,x) , θ(0,x)=θ0(x). (1.1)

Here

• is a passive scalar, typically the concentration of some solute of interest, which is spread by diffusion and by the motion of a (time-stationary) fluid flow . This solute is “passive” in that it does not affect the motion of the underlying fluid.

• is an incompressible background flow, i.e., is constant in time and satisfies .

• is the diffusion coefficient, which models the rate at which local concentrations of the solute spread out within the solvent in the absence of advection.

We obtain finite observations subject to additive noise , i.e.

 Y=G(v)+η , η∼γ0 (1.2)

for some measure related to the precision of the observations. Here, the forward map associates the background flow , sitting in a suitable function space , with a finite collection of measurements (observables) of the resulting solution of (1.1). We consider spatial-temporal point observations:

 Gj(v):=θ(tj,xj,v), for any tj∈[0,T] and xj∈[0,1]2. (1.3)

The goal of the inverse problem is then to estimate the flow from data . The initial condition is assumed to be known, so the problem can be interpreted as a controlled experiment, where the solute is added at known locations and then observed as the system evolves to investigate the structure of the underlying flow. This is a common experimental approach to investigating complex fluid flows; see, for example, [12, 13, 35, 29].

As we will illustrate, the inverse problem is ill-posed, i.e., the flow is not uniquely defined by the scalar field ; that the observations of are both finite-dimensional and polluted by noise exacerbates this problem. We therefore adopt a Bayesian approach to regularize the inverse problem, as described for this problem in our companion work  (see also ) and in a more general setting in [11, 30, 4]

. A key component of this approach is the selection of a prior probability measure on the space of divergence-free flows,

. It is then natural to ask to what extent the result of the inference depends on the choice of prior, and in particular whether the Bayesian approach to the inverse problem is consistent: That is, under what conditions does the posterior measure concentrate on the true fluid flow as the number of observations of grows large?

In this work, we establish conditions under which the Bayesian inference of

given data (1.2) is consistent for i.i.d. observational noise . We then prove that the posterior measure converges weakly to a Dirac measure centered on the true background flow as the number of scalar observations grows large; see Section 3 for a full statement of the assumptions and the key result. Here it is a nontrivial task to determine suitable conditions on the structure of the observed data and on the prior measure for which consistency would be expected to hold. As such, as a crucial starting point for the analysis of consistency, one must address difficult experimental design questions.

In our problem, even under the noiseless and complete measurement of , essential symmetries can prevent the recovery of . For example, a poor choice of in (1.1) makes it impossible to distinguish between (an infinite class of) laminar flows, so multiple experiments (initial conditions) are required to guarantee resolution of the true background flow. A second useful structural condition is that, by picking spatial-temporal observation points at random, we can ensure a sufficiently complete recovery of the solution as the number of observation points grows. Thirdly, it is worth emphasizing that we require special conditions on the prior measure. Crucially, we identify a tail condition that ensures that flows are sufficiently smooth – that is, the prior turns out to be critical to the result by restricting consideration to flows of limited roughness (up to a region of low probability).

An important outcome of this experimental design is that it allows us to use compactness to effectively constrain the space of possible divergence-free velocity fields. Indeed, compactness plays an important role in two components of the consistency proof. First, we use it to show the continuity of the inverse map from to (see Section 4

). Second, we use it to develop a suitable uniform version of the law of large numbers in order to show that noisy observations can differentiate between the true and other scalar fields (

Section 5).

Consistency of Bayesian estimators has been of interest since at least Laplace , with rigorous proofs of convergence for some problems appearing in the mid-twentieth century [6, 18]. The works [8, 28, 5] identified infinite-dimensional examples where Bayesian estimators are not consistent – that is, there are cases where the data can never guarantee recovery of the true parameter value. See, e.g., , , or  for a more detailed description on the history of consistency and the main ideas.

In recent years, there has been interest in extending these consistency results to infinite-dimensional inverse problems, and in particular those constrained by PDEs. Our result is one of the first on consistency in this context. Recent work in this area includes , which used an elliptic PDE as the guiding example, and 

, which establishes a Bernstein-von Mises theorem – consistency, but also contraction rates in the form of a Gaussian approximation – for Bayesian estimation of parameters of the time-independent Schrödinger equation.

It is worth noting that the related inverse problem of estimating the drift function from partial observations of the Itô diffusion

 dXt=b(Xt)dt+σ(Xt)dWT,t>0 (1.4)

has been studied extensively; see, e.g.,  or . Consistency has been established in various forms for this problem; see [32, 14, 24, 1]. However, while the equations (1.1) and (1.4) are related by the Kolmogorov equations (see, e.g., [25, Chapter 8]), the observed data are different: Observations of an individual diffusion provide an approximate measurement of the drift, whereas observations of the concentration are less direct – movement of individual particles must be inferred. Our consistency proof therefore, while retaining some similarities with other such arguments, requires an original approach with different assumptions.

The remainder of the paper is organized as follows. Section 2 describes the mathematical framework of the inverse problem and why it is ill-posed in the traditional sense. The main result and key assumptions are stated in Section 3. Continuity of the inverse map is shown in Section 4. Uniform convergence of the log-likelihood is shown in Section 5. Convergence of the posterior to the inverse image of the true scalar field is shown in Section 6. Finally, the proof of the main result is in Section 7. Energy estimates for the advection-diffusion problem used to show continuity of the forward and inverse maps are reserved for Appendix A.

## 2 Preliminaries

In this section, we describe the mathematical framework of the inverse problem (1.2

). We begin by defining the functional analytic setting for the problem, including how we represent divergence-free background flows. We then define the inverse problem, key notation, and Bayes’ Theorem for this application.

### 2.1 Representation of Divergence-Free Background Flows

The target of the inference is a divergence-free background flow , so we start by describing the space of such flows that we will consider. For this purpose we begin by recalling the Sobolev spaces of (scalar valued) periodic functions on the domain

 Hs(T2)=⎧⎪⎨⎪⎩u:u=∑k∈Z2∖{0}cke2πik⋅x,¯¯¯¯¯ck=c−k,∥u∥Hs<∞⎫⎪⎬⎪⎭, where % ∥u∥2Hs:=∑k∈Z2∥k∥2s|ck|2, (2.1)

defined for any ; see e.g. [27, 31]. We will abuse notation and use the same notation for periodic divergence-free background flows by replacing the coefficients in (2.1) as

 ck=vkk⊥∥k∥2,¯¯¯¯¯vk=−v−k, (2.2)

where to ensure . Throughout what follows we fix our parameter space as

###### Notation 2.1 (Parameter space, H).

We consider background flows , where is the Sobolev space (see (2.1)),

 H=Hm(T2),for some m>1 (2.3)

with coefficients given by (2.2).

Here the exponent

is chosen so that vector fields in

, as well as their corresponding solutions , exhibit continuity properties convenient for our analysis below (see creftype 2.4 below). We take with for the usual Lebesgue spaces and denote the space of continuous and -th integrable, -valued functions by and , respectively, for a given Banach space . All of these spaces are endowed with their standard topologies unless otherwise specified.

### 2.2 Mathematical Setting of the Advection-Diffusion Problem

In this section, we provide a precise definition of solutions for the advection-diffusion problem, (1.1). Crucially the setting we choose yields a map from to and then to observations of that is continuous.

###### Proposition 2.2 (Well-Posedness and Continuity of the solution map for (1.1)).

• Fix any and with and suppose that and . Then there exists a unique such that

 θ∈L2loc([0,∞);Hs+1(T2))∩L∞([0,∞);Hs(T2)) with ∂θ∂t∈L2loc([0,∞);Hs−1(T2))

so that in particular

 θ∈C([0,∞);Hs(T2))

solves (1.1) at least weakly, namely

 ⟨∂θ∂t,ϕ⟩H−1(T2)×H1(T2)+⟨v⋅∇θ,ϕ⟩L2(T2)+κ⟨∇θ,∇ϕ⟩L2(T2)=0 (2.4)

for all and almost all time .

• For any the map which associates and to the corresponding is continuous relative the standard topologies on and .

This result can be proven using energy methods; similar results can be found for example in [7, 19]. In the case of smooth solution where one may also establish creftype 2.2 using particle methods as in e.g.  by observing that (1.1) is the Kolmogorov equation corresponding to a stochastic differential equation with the drift given by ; see  for details in our setting. For completeness, we provide the a priori estimates leading to creftype 2.2 in Appendix A.

###### Definition 2.3 (Solution Operator S, Observation Operator O).

Fix and a time and consider the phase space defined as (2.3). The forward map as in (1.2) is interpreted as the composition , where:

1. The solution operator maps a given to the corresponding the solution of (1.1) (in the sense of 2.2).

2. The observation operator measures point observations defined by for and .

We now note assumptions on and under which these observations are well-defined and vary continuously with .

###### Remark 2.4 (Continuity of θ).

Let with associated exponent (see (2.3)) and let , for . Recalling that embeds continuously in in dimension (see e.g. , Theorem A.1) we have that again with the embedding continuous. Thus, with creftype 2.2, we have that

 S:H→C([0,T]×T2)

continuously. In particular this justifies that is well defined and continuous in the case of point observations as in creftype 2.3.

### 2.3 Bayesian Setting of the Inverse Problem

In this subsection, we define the setting of the statistical inverse problem and note cases where the inverse map is ill-posed, which will inform the assumptions required for the consistency argument. We close with a definition of Bayes’ theorem for this problem. We begin by fixing some notation used in the remainder of the paper.

###### Definition 2.5 (v⋆, Y, G, η).

We frequently fix a “true” background flow by . For the given , the observed data is given by

 Y=G(v⋆)+η

where

• The forward map with for observation point .

• The observational noise is distributed as .

We emphasize, however, that is not necessarily the only that could produce such data, as we describe in the next remark.

###### Remark 2.6.

Since the background flow enters (1.1) through the term, the inverse problem of recovering from can be ill-posed. One important class of examples illustrating this difficulty arises when is zero everywhere, in which case the fluid flow does not have any effect on . Two such examples are as follows:

• Ill-posedness: Laminar Flow: Let be independent of and . Then for any .

• Ill-posedness: Radial Symmetry: Set and . Then for any .

In these cases, the even noiseless and complete spatial/temporal observations of have no way to discriminate between a range of background flows, making it impossible to uniquely identify a true background flow in general.

We have following adaptation of Bayes’ Theorem to the advection-diffusion problem; see the derivation in [3, Appendix A] or  for additional information.

###### Theorem 2.7 (Bayes’ Theorem).

Fix a prior distribution and let forward maps , data , and associated observational noise be as defined in creftype 2.5. Then the posterior measure

associated with the random variable

is absolutely continuous with respect to and given by

 μY(dv)=1ZYexp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv) (2.5)

where is the normalization

 ZY=∫Hexp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv). (2.6)

## 3 Statement of the Main Result

With the mathematical preliminaries in Section 2 in hand, we are now ready to provide a precise formulation of the main result of the paper. Referring back to creftype 2.6 we do not expect consistency to hold without delicate assumptions on the initial conditions in (1.1) and on the observation points in our forward function in (1.2). Moreover our result relies on the selection of an appropriate prior . In particular this should distinguish the regularity of the ‘true’ background flow for which we assume there is greater degree of spatial smoothness than for generic elements in the ambient parameter space . We therefore define an additional smaller space used throughout.

###### Definition 3.1 (Higher Regularity Space).

Define the space

 V=Hm⋆(T2),m⋆>m, (3.1)

where is the exponent associated with the parameter space defined according to (2.3). We denote for the associated norm and take

 (3.2)

Our main result is as follows

###### Theorem 3.2 (Convergence of Posterior to a Dirac).

Let be a sequence of observation points that we assume are i.i.d. uniform random variables in . Fix any , with determined from (2.3), such that

 (∇θ(1)0(x))⊥⋅∇θ(2)0(x)≠0,for almost all x∈T2. (3.3)

Define the parameter-to-observable (forward) maps for and the initial conditions by

 G2j−1(v) :=θ(tj,xj,v,θ(1)0) (3.4) G2j(v) :=θ(tj,xj,v,θ(2)0)

for . As in creftype 2.5, we fix any and draw data points , where

 Yj=Gj(v⋆)+ηj (3.5)

for i.i.d. observational noises that are independent of the observation points .

Fix a prior distribution and for observations, let be the Bayesian posterior measure on , given by (cf. creftype 2.7)

 μNY(dv)=1ZNYexp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv) (3.6)

where is the normalization

 ZNY=∫Hexp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv).

Suppose that

 for any r>0, μ0(BrV(v⋆))>0. (3.7)

Additionally assume that there exists an such that is monotone increasing with and

 supN∫Hf(∥v∥V)μNY(dv)<∞a.s. (3.8)

Then almost surely. In other words, on a set of full measure,

 ∫Hϕ(v)μNY(dv)→ϕ(v⋆)as N→∞ for any ϕ∈Cb(H). (3.9)
###### Remark 3.3 (Sufficient conditions on the prior).

Suppose that

 μ0(BrV(0))=1, (3.10)

for some . Under this assumption we have

 ∫H∥v∥VμNY(dv)= ∫BrV(0)∥v∥Vexp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv)∫BrV(0)exp[−12σ2ηN∑j=1(Yj−Gj(v))2]μ0(dv)≤r

so that (3.10) implies (3.8). Thus we can guarantee the existence of a class of non-trivial priors such that creftype 3.2 holds. On the other hand the reverse implication is not to be expected to hold and thus the general significance of (3.8) for the admissible classes of are not immediately clear. In particular having bounded support is a strong restriction and indeed we conjecture that there is a class of Gaussian measures on such that (3.8) still holds. We will investigate this question in future work.

###### Remark 3.4 (Poincaré inequality, support of μ0).

Since we are assuming that elements in are mean-free (see (2.1)) we have the Poincaré type inequality

 ∥v∥H≤C∥v∥V, (3.11)

for a constant independent of . As such, for any , where is the constant appearing in (3.11). In particular under the condition (3.7) in creftype 3.2 we have that .

###### Remark 3.5 (Restrictions on the initial conditions).

It unavoidable that that we impose a condition as in (3.3) on the initial data in creftype 3.2. In creftype 2.6 we provide two examples where the observations have no way to discriminate between a range of background flows. In these two examples as well as many other classes of initial conditions, the posterior fails to concentrate on as the number of observations (except for very particular priors). It is an interesting question for future work to characterize the support of the limiting measure for the analogue of as as a function of a single initial condition .

Before turning to the technical details let us provide an overview of the method of the proof of creftype 3.2. Our starting point are two basic observations. Firstly, according to Portmanteau’s Theorem, in order to establish (3.9) it is equivalent to show that

 liminfN≥1μNY(BϵH(v⋆))≥1 (3.12)

for any . See e.g.  for further details on such generalities concerning the weak convergence of probability measures.

Our second observation concerns using the law of large numbers to identify the approximate character of the potential terms in (3.6) for large . Referring back to (3.5), (3.6) we have

Invoking the law of large numbers, using assumed statistical properties of and we have

 1NN∑j=1(Yj−Gj(v))2≈σ2η+1T2∑l=1∫T0∫T2(θ(t,x,v,θ(l)0)−θ(t,x,v⋆,θ(l)0))2dxdt (3.13)

for all sufficiently large.111Referring back to Section 2.1 we are assuming that is unit length. For , take

 Xδ={v∈H:2∑l=1∫T0∫T2(θ(t,x,v,θ(l)0)−θ(t,x,v⋆,θ(l)0))2dxdt<δ2}. (3.14)

Invoking (3.13), we observe that

 μNY(Xcδ) ≈∫Xcδexp[−N4σ2ηT∑2l=1∫T0∫T2(θ(t,x,θ(l)0,v)−θ(t,x,θ(l)0,v⋆))2dxdt]μ0(dv)∫exp[−N4σ2ηT∑2l=1∫T0∫T2(θ(t,x,θ(l)0,v)−θ(t,x,θ(l)0,v⋆))2dxdt]μ0(dv) ≤exp(−Nδ24σ2ηT)μ0(Xcδ)∫Xδ/2exp[−N4σ2ηT∑2l=1∫T0∫T2(θ(t,x,θ(l)0,v)−θ(t,x,θ(l)0,v⋆))2dxdt]μ0(dv) ≤exp(−3Nδ216σ2ηT)μ0(Xδ/2). (3.15)

Here note that (cf. creftype 3.4) so that we are not dividing by zero in the final upper bound.

One is thus tempted to now combine (3.13) and (3.15) to obtain the desired weak convergence (3.9) and conclude. However this naive argument runs up against two fundamental flaws

• Although, as we establish below in creftype 4.3, the condition (3.13) ensures that the map is injective into it is not clear if this map has a continuous inverse.

• It is not obvious that we have sufficient uniformity over in our invocation of the LLN in (3.13). In particular this means that the approximation in the first line in (3.15) would be unjustified.

We address both of these concerns by assuming a little bit of extra regularity for our ‘true vector field’ taking and by making effective use of the prior to identify this regularity for (see assumptions (3.7), (3.8)). With the Rellich-Kondrachov theorem we are thus able to use ‘compactness’ to address both concerns. Indeed although an injective, continuous map does not have a continuous inverse in general, this property does hold true when the domain of is compact; see Footnote 2 below. Regarding the second concern (ii) we establish a uniform version of the LLN creftype 5.1 below (and see also [21, 22]) but our proof makes essential use of the fact that the ‘parameter’ (which for us is ) lies in a compact set.

The precise proof of creftype 3.2 is presented in a series of sections as follows. Firstly in Section 4 we address the injectivity of the forward map under (3.3) as well as continuity of the inverse map (i). In Section 5 we introduce a uniform version of the Law of Large Numbers, creftype 5.1 and use this Proposition to obtain a quantitative version of (3.13). Section 6 establishes that converges on the ‘true value’ of as . Finally Section 7 uses the machinery now in place to complete the proof of 3.2.

## 4 Continuity of Inverse Map

In this section, we lay out conditions under which the inverse solution map is continuous. This requires some care. Indeed it is not true in general that the forward map is injective as illustrated in creftype 2.6. As such, counterexamples to creftype 3.2 exist (cf. creftype 3.5) if we fail to impose a suitable assumption on the initial condition(s) for (1.1) a la (3.3).

With this in mind we now define the solution map associated with the solution of (1.1) for the multiple initial conditions.

###### Notation 4.1 (Paired solution map ~S).

Fix any for as in (2.1) and let be the associated solutions of (1.1) corresponding to defined according to creftype 2.2. We denote

 ~S(v)=(θ(⋅,v,θ(1)0),θ(⋅,v,θ(2)0)),

regarding as a map .

We now observe that the the paired solution map is continuous (creftype 4.2) and that under condition (3.3), is 1-to-1 (creftype 4.3).

###### Corollary 4.2 (~S continuous).

The paired solution map (see creftype 4.1) is continuous.

###### Proof.

For any (with as in (2.3)) the associated solution map given by is continuous by creftype 2.4 so that the map is also continuous. ∎

###### Lemma 4.3 (~S injective).

Let be the paired solution map given in creftype 4.1 with initial conditions satisfying (3.3). Suppose that such that

 ∥∥~S(v)−~S(~v)∥∥L2([0,T]×T2)2=0. (4.1)

Then , or in other words, is injective.

###### Proof.

Let satisfy (4.1), i.e.,

 ∥∥θ(i)(⋅,v)−θ(i)(⋅,~v)∥∥L2([0,T]×T2)=0,i=1,2.

Then for almost all and . However, since both solutions are continuous (see creftype 2.4), this implies that for all and . Denote . Then solves both

 ∂∂tθ(i)(t,x)=−u(x)⋅∇θ(i)(t,x)+κΔθ(i)(t,x) , θ(i)(0,x)=θ(i)0(x)

for , all and . Subtraction leads to

 0=(~v(x)−v(x))⋅∇θ(i)(t,x)

for and all . In particular,

 0=(~v(x)−v(x))⋅∇θ(i)0(x)

for and all . However, under (3.3) span at almost all . Therefore for almost all and hence , completing the proof. ∎

Even under the conditions of creftype 4.3 it remains unclear if has a continuous inverse. To remedy this we recall the following elementary fact from real analysis suggesting we restrict further restrict the domain of .

###### Lemma 4.4.

Let be metric spaces and suppose that is compact. Let be injective and continuous. Then is also continuous.222Here, we denote .

###### Proof.

Let such that . Define according to and . We would like to show that as .

To this end let be any subsequence. Since compact, there exists a subsubsequence that converges in ; denote this limit . Then, since continuous, . But, by definition and the assumed convergence of , we also have so that . Since injective, , i.e., . However since the original subsequence was arbitrary we have in fact that yielding the desired result. ∎

From Footnote 2 we draw the following two conclusions, which we use below

###### Corollary 4.5 (~S−1 continuous).

Let be the paired solution map given in creftype 4.1 with initial conditions meeting (3.3). Then, for any and , is continuous.

###### Proof.

We have continuous by creftype 4.2 and injective by creftype 4.3. We also have compact in by the Rellich-Kondrachov Theorem; see, e.g., Corollary A.5 of . Therefore, is continuous by Footnote 2. ∎

###### Corollary 4.6.

Let and . For all , there exists a such that

 {v∈H:∥∥~S(v)−~S(v0)∥∥L2([0,T]×T2)2<δ}∩BrV(v0)⊂BϵH(v0).

## 5 Concentration of Normalized Potentials, Uniform Law of Large Numbers

The next step in our analysis is to prove a rigorous and more quantitative version of (3.13), creftype 5.2, which yields the asymptotics of the potential functions (log-likelihoods) appearing in the posterior measures defined as in (3.6). As a preliminary step we introduce a uniform version of the Law of Large Numbers. See also [21, 22] for previous related results.

###### Proposition 5.1 (Uniform Law of Large Numbers).

Let be a metric space with compact and (Borel) measurable. Take be an i.i.d. sequence of random variables and let to be any random variable with this distribution. Assume that

 \Expf(Z,x)2<∞, for all x∈B (5.1)

and that there exists a deterministic function with such that for all and , there exists a such that

 ρ(x,~x)<δ⟹|f(z,x)−f(z,~x)|≤d(z)ϵ, for all z∈Rn. (5.2)

Then

 limN→∞supx∈B∣∣ ∣∣1NN∑j=1f(Zj,x)−\Expf(Z,x)∣∣ ∣∣=0a.s. (5.3)
###### Proof.

Note that, since is non-negative, implies that

 ~Ω=∞⋂j=1{ω∈Ω:d(Zj(ω))=0}

is a set of full measure in which case the random functions , are all constant on and the result (5.3) follow in this special case.

We turn to the nontrivial case where . Define , . Then by our assumptions on , for every . Note also that for any , ,

 |f(Z,x)−f(Z,~x)|≤d(Z)ϵ⟹|g(Z,x)−g(Z,~x)|≤[d(Z)+\Expd(Z)]ϵ. (5.4)

Fix any . Then by (5.2) and (5.4), for each there exists a such that implies . Let and note that . Then since is compact, there exists a finite subcovering , such that

 m⋃i=1Bδi(xi)⊃B.

Let and let be the index such that . Then

 ∣∣ ∣∣1NN∑j=1g(Zj,x)∣∣ ∣∣ ≤∣∣ ∣∣1NN∑j=1(g(Zj,x)−g(Zj,xi))∣∣ ∣∣+∣∣ ∣∣1NN∑j=1g(Zj,xi)∣∣ ∣∣ ≤ϵ2\Expd(Z)∣∣ ∣∣1NN∑j=1d(Zj)+\Expd(Z)∣∣ ∣∣+∣∣ ∣∣1NN∑j=1g(Zj,xi)∣∣ ∣∣.

Taking the supremum over and using the subcovering yields

 supx∈B∣∣ ∣∣1NN∑j=1g(Zj,x)∣∣ ∣∣ ≤maxi=1,…,msupx∈Bδi(xi)[ϵ2\Expd(Z)∣∣ ∣∣1NN∑j=1d(Zj)+\Expd(Z)∣∣ ∣∣+∣∣ ∣∣1NN∑j=1g(Zj,xi)∣∣ ∣∣]