Posterior Convergence of α-Stable Sheets

07/06/2019 ∙ by Neil K. Chada, et al. ∙ National University of Singapore Lappeenranta University of Technology, LUT 0

This paper is concerned with the theoretical understanding of α-stable sheets U on R^d. Our motivation for this is in the context of Bayesian inverse problems, where we consider these processes as prior distributions, aiming to quantify information of the posterior. We derive convergence results referring to finite-dimensional approximations of infinite-dimensional random variables. In doing so we use a number of variants which these sheets can take, such as a stochastic integral representation, but also random series expansions through Poisson processes. Our proofs will rely on the fact of whether U can omit L^p-sample paths. To aid with the convergence of the finite approximations we provide a natural discretization to represent the prior. Aside from convergence of these stable sheets we address whether both well-posedness and well-definiteness of the inverse problem can be attained.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In numerous scientific disciplines, inverse problems arise which pose necessary and useful questions. They are concerned with the recovery of an underlying unknown from data, represented by noisy measurements of a model solution. One way to approximate solutions to an inverse problem is to treat the unknown as a probabilistic distribution, referred to as the posterior, which is known as a statistical or Bayesian approach [17, 27, 30]

. Common algorithms which have been used in this field include Markov chain Monte Carlo (MCMC) methods and variational type schemes. Since the development of inverse problems in the Bayesian setting, an important and fundamental question is how to develop informative priors. A common and popular choice of priors traditionally have been priors of a Gaussian form

[3, 19, 28]

. This is due to both attractive properties associated with the form, but also its applicability for computational purposes, such as in the context of partial differential equation (PDEs) based inverse problems. With many PDEs the physical properties of the input such as a diffusion coefficient omit a heterogeneous form, so modelling them through a Gaussian prior is a natural choice. Such PDEs include impedance tomography, Darcy flow and the Navier-Stokes equation. However there are scenarios in which the unknown of interest does not omit a smooth representation, but instead has certain discontinuities and edges. As a consequence a Gaussian prior can result in a poor approximation. Edge-preserving priors on the other hand offer a more realistic reconstruction as these priors are suited towards non-smooth features.

The purpose of this paper is to examine the theoretical understanding of non-Gaussian priors based on -stable random fields [7, 14, 16, 29]. An -stable random variable has the general form

where denotes stability parameter,

denotes the skewness parameter,

represents the shift parameter, and which describes the location. Priors constructed with the help of -stable random variables are motivated from the formulation and work first discussed in Markkanen et al. [20], which were used in the context of tomography. These processes are of particular interest as they incorporate both smooth and rough features through various distributions, such as Gaussian and Cauchy [6, 28, 31]. Both these processes have been used for Bayesian inversion. Much of the work thus far has been focused on developing computational techniques to test non-Gaussian priors. In contrast to this, we aim to understand the convergence analysis of -stable sheets. As these sheets can be delicate due to heavy-tails of their distributions, we initially consider fields on . In order to achieve this we adopt a probabilistic measure-based setting which is an extension of the work by Lasanen [21, 22], who demonstrated numerous convergence results for non-Gaussian priors. In particular this was achieved through the setting of Suslin spaces, which was first used in the context of Bayesian inversion in [PP05]. Our work will be largely focused on understanding convergence, specifically convergence, through various forms that -stable random fields can take. These include an integral representation but also one of a Poisson processes. In order to obtain convergence, our analysis will differ for different values of . Leading on from this we aim to further establish convergence in a function space setting. Aside from this, an important question is how one could implement these priors in a practical sense. We aim to answer this question by constructing discretizations of stable sheets through sums of i.i.d. random variables, and taking their respective limits. We emphasize with this work that we do not conduct any numerical investigations.

A further and final consideration in this work is to present a well-posedness theorem for stable sheets. Well-posedness for Bayesian inverse problems was characterized in a function space setting in the work of Stuart [30], and since has been the main analytical result derived for non-Gaussian priors. We intend to extend this to stable sheets. Before continuing with the initiation of this work, we present a section reviewing non-Gaussian priors used in Bayesian inversion and provide an outline for this article.

1.1. Related work

One of the first edge-preserving priors that was used in the context of Bayesian inversion was the total variation (TV) prior. The motivation was taken from TV regularization in non-statistical inverse problems. However a study by Lassas and Siltanen [24] showed the TV prior was degenerate to mesh refinement. As a result the prior was viewed not a sensible choice to use through the Bayesian approach. Extending the idea of edge-preserving inversion in a Bayesian setting, another prior that was developed was the Besov prior. The prior itself was also first analyzed by Lassas et al. [23] which is based on wavelet expansions with random coefficients. The Besov prior has seen substantial progress since its formulation, notably to non-linear inverse problems [8] and to the further enhancement of Maximum a-posteriori inversion [1]. The benefit of these priors over TV, is that they remain discretization-invariant, which is an attractive property to have. Other priors which have been developed include priors that contain heavy tails. Notably the work of Hosseini [12, 13] recently suggested the implementation of a Laplace prior, where well-posedness was proved for the inverse problem. These developments for Bayesian edge-preserving lead to other proposed priors that were based on -stable processes [29].

The -stable priors were first discussed in the work of Markkanen et al. [20] which conducted a numerical study of Cauchy difference priors in various dimensions. More recently this was applied to non-linear hierarchical Bayesian inversion where further numerical investigations were provided [6]. As -stable processes include a variety of different processes, the Gaussian and Cauchy cases are of particular interest as they omit different properties and features. An alternative approach to Cauchy priors was discussed by Sullivan [31] but where the process was represented as a series expansion similar to a Karhunen-Loève expansion. These priors offer an alternative to Besov priors as they are adaptable at modelling unknowns of inverse problems with heavy tails. However in terms of theoretical gains, all that has been mainly studied thus far for these priors is well-posedness.

1.1.1. Outline.

The layout of this work is split in the following manner. In Section 2 we provide an overview of preliminary material and notation which is required for the rest of paper. This includes a discussion on Bayesian inverse problems and their well-posedness. This will lead onto Section 3 where we discuss and introduce -stable stable sheets and the forms they take. Then we provide our first results on the convergence analysis. We extend these results to the case of the posterior in Section 4 where we describe a discretization of the sheets. Finally in Section 5 we conclude with some final remarks while mentioning further areas of research.

2. Background material

2.1. Notation and preliminaries

Let represent a complete probability space, with sample space , -algebra and probability measure . The set of all real-valued random variables on is denoted with . The conditional expectation of a function , given a -algebra is a -measurable function, where

Let be a separable Banach space equipped with its Borel -algebra . We denote with the dual space of i.e. the space of all continuous linear forms . We denote the action of on with duality between and and equip the linear space with the strong topology .

We say that is -valued random variable, if for all Borel sets .

Remark 2.1.

For a separable Banach space , weakly measurable mappings are measurable, since the Borel -algebra is generated by cylinder sets of the form

where , , and . For separable duals, we may choose to be any countable dense set of the dual space of (see Theorem 6.8.9 in [4]).

The distribution of on is denoted with .

A conditional distribution of given another Banach-space valued random variable , is defined as for all Borel sets of , where is the Borel -algebra of the separable Banach space . The notation emphasizes the fact that conditional expectation given -algebra generated by another random variable can be expressed as a function of . In separable Banach spaces, conditional distributions have regular version in the sense that is measurable from to for every and is a probability measure on for every

. We remark, that only the joint distribution of

and is needed to determine the distributions up to a -null set (see Theorem 2.4 in [21]).

Two separable Banach space-valued random variables and are called independent, if and are independent for any Borel sets and . For independent Banach-space valued random variables and continuous function , the conditional distribution of given is the distribution of (see Lemma 3.2 in [21]).

A characteristic function of a probability measure

on is a mapping given by

For further details on measures and regular conditional distributions on Banach spaces, we refer to [3, 4].

An -valued random variable has a stable distribution if, for any positive constants and , there exists positive constants and such that

(2.1)

in distribution, where and are independent copies of . Real-valued stable random variables have characteristic functions

where parameters , , and . Parameters and are called stability parameter and skewness parameter, respectively. The distribution of a real-valued stable random variable is denoted with

The distribution of real-valued stable random variable is called symmetric if . When is -valued stable random variable, we call it symmetric, if the composition of with any continuous linear form is symmetric. Especially, we can identify the distribution of symmetric through its characteristic function

We will focus on special type of symmetric stable random variables in Section 3, -stable sheets, which we will use as heavy-tailed priors in Bayesian inverse problems for -valued unknowns.

2.2. Bayesian inverse problems

We will first recall the basics of Bayesian inverse problems in infinite-dimensional spaces. Let , and denote separable Banach spaces equipped with their Borel -algebras and , respectively. An inverse problem is concerned with the recovery of some quantity of interest from data , where we will consider noisy models of the type

(2.2)

such that is -valued random noise and is a continuous mapping. We take the Bayesian approach and model as an -valued random variable with prior distribution on . A common setup is to take , where is a continuous mapping.

When distributions of and

have probability density functions (e.g. when

and are finite-dimensional), we write the familiar Bayes’ formula for the posterior distribution of given as

(2.3)

where is the likelihood and

is the prior probability density of

, which represents our initial beliefs about the unknown . We will denote with the prior distribution of and with the posterior distribution of on . The non-existence of Lebesgue’s measure in infinite-dimensional setting prohibits us from using (2.3

) in the infinite-dimensional case. Instead, one uses different measures in place of the Lebesgue’s measure. Let us recall the basics of infinite-dimensional Bayesian inference by considering some formal candidates for replacements of Lebesgue’s measure in (

2.3), which lead to the well-known representation of the posterior distributions (see [30]).

Avoiding the use of the posterior density in (2.3) is straightforward, just take integrals and consider posterior distribution instead of posterior density. Similarly, the prior density can be avoided by integrating with respect to prior distribution instead of , where is the Lebesgue’s measure. From (2.3), we formally derive the posterior distribution

Expressing the integral with respect to prior measure and considering posterior distributions instead of posterior densities handles two out of three problematic densities.

The critical part of generalizing (2.3) to infinite dimensions is finding a generalization for the likelihood function , which has turned out to be nontrivial and sometimes even impossible (see Remark 4 and 5, together with a simple Gaussian counterexample, Remark 9, in [21]).

If the conditional probability distribution of

given , which we denote with , has Radon-Nikodym densities with respect to a common -finite measure on for (-almost) all , the Bayes’ formula continuous to hold in the sense that the posterior distribution has Radon-Nikodym density (see  [18, 21])

(2.4)

with respect to the prior distribution . In (2.4), the mapping is an extended real-valued mapping, which can be chosen to be jointly - measurable on (see Theorem 2 in [18]).

Definition 2.2.

We say that inverse problem (2.2) is dominated, if there exists a -finite measure so that the Radon-Nikodym densities

define a jointly measurable mapping . In this work, we call negative dominated loglikelihood (NDLL).

In this work, we will focus on the basic case, where is finite-dimensional and the generalized likelihood is bounded. For example, when the and are statistically indpendent, we take for the observation .

2.3. Well-posedness of Bayesian inverse problems

The well-posedness of the posterior distribution, established first by Stuart [30] in the Gaussian nonlinear case, means essentially that the posterior distribution depends continuously on with respect to suitable topology on the space of probability distributions. We recall sufficient conditions for well-posedness of the posterior distribution of stable random sheets in dominated inverse problems with respect to weak topology and total variation metric. We follow the general scheme introduced in [30] and refined in [31].

Let the posterior distribution be of the form

where is NDLL as in Definition 2.2 and the prior is the distribution of the stable random sheet .

In the next definition, Conditions WD1 and WD2 are connected to well-definedness in the fully infinite-dimensional case. Condition WP1 and WP2 are connected to well-posedness. Condition WP2 is connected to well-posedness in weak topology and Condition WP3 strengthens well-posedness so that it holds also in total variation metric. Conditions WD1 and WP1 intentionally leave the finer sufficient properties of the NDLL undetailed, since our intention is to use a pure skeleton of high-impact assumptions. The Condition PC1, which we will use later, is connected to posterior convergence of the Bayesian inverse problem.

Definition 2.3.

We define the following conditions for a NDLL and a distribution on .

  1. WD1. There exists an open set such that the function is -integrable for every .

  2. WD2. The NDLL is bounded on bounded subsets of .

  3. WP1. There exists an open set such that the function is continuous on .

  4. WP2. The function is continuous on for every .

  5. WP3. The functions on are uniformly continuous with respect to for any bounded subset of .

  6. PC1. The functions on are continuous for any .

Definition 2.4.

We say that the posterior distribution is well-defined on the set , if the normalizing constant is positive and finite for every .

Theorem 2.5.

Let a NDLL and a prior distribution satisfy Conditions WD1 and WD2. Then the posterior distribution is well-defined on the set .

Proof.

Since all probability measures on separable Banach spaces are Radon, there exists a compact set such that . By Condition WD2, the NDLL is bounded on and there is a constant such that on . Therefore, the normalizing constant has a lower bound

for every . By Condition WD1, the normalizing constant is also bounded for every . ∎

We will now turn to the well-posedness of posterior distributions.

Definition 2.6.

We say that the posterior distribution is well-posed on the set in weak topology, if is well-defined on and, for every bounded continuous function , the equation

holds whenever , where , .

Theorem 2.7.

Let a NDLL and a prior distribution satisfy Conditions WD1, WD2, WP1, and WP2. Then the posterior distribution is well-posed on the open set in weak topology.

Proof.

By Condition WP1 and WP2

which we integrate over an open set . By Fatou’s lemma

Since the open set can be chosen freely, we arrive at a well-known equivalent criteria for weak convergence of distributions. ∎

Definition 2.8.

We say that the posterior distribution is well-posed on the set in total variation metric, if is well-defined on and

whenever , where , .

Remark 2.9.

For a Banach space , the uniform tightness of the family of distributions on is equivalent to the condition that for every and every , there exists a finite number of open balls of such that

for every (See Remark 2.3.1 in [5]).

Next, we study well-posedness of the posterior distribution in total variation metric.

Theorem 2.10.

Let a NDLL and a prior distribution satisfy Conditions WD1, WD2, WP1, and WP3. Then the posterior distribution is well-posed on in total variation metric.

Proof.

Let , where all . Then all belong to a bounded set .

We use the equivalent definition of uniform tightness in Remark 2.9. Let and . By tightness of , there exists a finite number of balls such that

By Conditions WP3, the NDLL has an upper bound

(2.5)

for all and from a bounded subset of , which we take to be the finite union

. We estimate

We choose so that

that is,

Then

Since the normalizing constants converge by Condition WP1, we may choose so that when .

For , there exists finite number of open balls such that

The finite collection of balls together with the finite number of balls , where , fulfills the condition. Hence, are uniformly tight. The uniformly tight family of measures is also bounded. Indeed, the converging sequence belongs to a bounded set and by uniform tightness, there exists a compact set so that

where we apply Condition WD2.

Finally, we verify the convergence of , which follows directly from tightness and Equation (2.5). Indeed,

where we apply Condition WP3. ∎

3. -stable random measures and sheets

In this section we review -stable fields, which will later serve as priors . We highlight certain properties and assumptions of -stable random fields that are required in order to analyze the convergence. As our discretization scheme for the unknown function is based on finite-difference approximations on certain function spaces, we need to verify that -stable random sheets have enough regularity to carry out the convergence analysis. We aim to understand the convergence both in terms of probability and functional analysis.

We will need the concept of -stable stochastic integrals, which we recall from [29]. Consider measure space where

denotes a subset of that consists of sets of finite -measure. In our inverse problem, the unknown will be a random field defined on .

Definition 3.1.

Let . A random -additive set function

is called an -stable random measure on with control measure and skewness parameter , if it is independently scattered and for every ,

The random measure is called symmetric, if .

By stating independently scattered we mean that if belong to and are disjoint then the random variables are independent. Furthermore, -additivity means that if , that belong to , are disjoint and then

Stochastic integrals of deterministic functions with respect to -stable random measure are defined similarly to the Gaussian case, through limits of simple functions . However, the convergence holds in a weaker sense. Namely

(3.1)

in probability if (and only if) in (Proposition 3.5.1 in[29]). The values of the random variable can be specified almost surely by e.g. choosing a subsequence that converges almost surely to . Recall, that space is only a complete metric space, not a normed space, when .

The distribution of the stochastic integral is , where

We consider modeling our unknown as an -stable sheet on the hypercube .

Definition 3.2.

A random field on is called a symmetric -stable random sheet if it can be expressed (up to a version) as a stochastic integral

(3.2)

where

(3.3)

and is symmetric -stable random measure on with Lebesgue’s measure as the control measure

The -stable random sheet has marginal distributions

where . Moreover, the values and are statistically dependent.

3.1. Sample paths

Let us now concentrate on the nature of the mapping for fixed , where each is defined by (3.2). This is an important point, because we are interested in modelling our unknown function with and wish to specify a Banach space where lives. In other words, we wish to describe as an -valued random variable for some Banach space . At this point, we have defined as a random field, through a family of random variables. We will heavily utilise another way of describing stable random fields, the so-called LePage series representation ([29], Theorem 3.9.1), which is often used in deriving sample path properties of stable random fields.

For the convenience of the reader, we provide the proofs below and begin with two preparatory lemmas. We recall, that arrival times of a Poisson process with arrival rate 1 can be expressed as

where are independent identically distributed random variables with common probability density .

Lemma 3.3.

Let be arrival times of a Poisson process with arrival rate 1. There exists and so that for all and for - almost every . Moreover, the series

converges almost surely for all .

Proof.

By the law of large numbers

almost surely. Therefore, for large and there exists and integers such that for all almost surely. Inserting the lower bound in the series

shows that the series converges almost surely. ∎

Lemma 3.4.

Let be mutually independent random sequencies, and let be a separable Banach space-valued Borel measurable function. The series

converges almost surely in if and only if the series

converges for almost every sample of .

Proof.

If is any event, say

then

if and only if almost surely. Indeed, conditional expectation of is at most 1. A simple proof by contradiction shows that the conditional expectation must equal 1 almost surely if . The other direction is trivial.

The next theorem provides a series representation for the symmetric -stable random measure with Lebesgue’s control measure. In the theorem, we prove series representations of stochastic integrals, when . This suffices for our purposes, because we can always choose for the functions that we study. The approach helps us to use almost surely equivalence of stochastic integrals instead of the more common concept of equivalence in distribution.

Theorem 3.5.

Let and be a measurable set with . Let be arrivals times of a Poisson process with arrival rate 1. Let

form an i.i.d. sequence of random vectors independent of

that consist of uniformly distributed

-dimensional random vectors on , and -valued random variables whose conditional distribution given is . Let

Then

(3.4)

where are Borel sets, defines a symmetric -stable random measure with Lebesgue’s control measure on .

If , where when and otherwise, then the series

(3.5)

converges almost surely and it coincides with the stochastic integral .

Proof.

First, we will verify convergence of (3.4).

For , a direct application of Lemma 3.3 shows the convergence. For , we proceed as in [29] by using Kolmogorov’s three series theorem, but apply it under conditioning with respect to in the spirit of Lemma 3.4.

For fixed ,

(3.6)

since for large by Lemma 3.3. Moreover, the sum of expectations of
, that is,

(3.7)

vanishes due to the distribution of

, and the sum of variances

(3.8)

is finite by Lemma 3.3. Hence, is a well-defined random variable.

Secondly, we show that has the right distribution. We first remark, that

(3.9)

by the law of large numbers. Additionally the random vector with components
, , is distributed as the random vector whose components are independent uniformly distributed random variables on ordered increasingly. In the finite sum (3.9), we may also reorder the random variables without changing the distribution of the sum. This leads to identification of as the limit of sums of independent random variables

which implies that is necessarily a stable random variable. We still need to define the index and parameters of the stable distribution. To do this, we identify the domain of attraction of the common distribution of . Since the common distribution is symmetric, we study the tail behaviour

which implies [10] that distribution of is -stable. Through tail behaviour, we identify the distribution as .

From the characteristic function of it is evident, that is independently scattered. For disjoint sets , the sum converges clearly almost surely and the sum of independent random variables converges through Itō-Nisio theorem [15] almost surely and their limit coincide in probability. By selecting almost surerly converging subsequences, the limits coincide almost surely, which shows that is countably additive.

When , the right hand side of converges absolutely by Lemma 3.3, since its the conditional expectation given the sequence is finite. By choosing almost surely converging subsequences from the simple function approximations that converge in probability to , we identify the stochastic integral

with the almost surely converging series representation by taking the limit inside the sum by Lebesgue’s dominated convergence theorem.

When , we divide into positive and negative parts and , and consider, where necessary, their strictly increasing simple function approximations . The right hand side of (3.5) converges almost surely by Kolmogorov’s three series theorem similarly to (3.6) and (3.7), since

by Markov’ inequality and the assumptions. The convergence of the sum of variances holds, since