1. Introduction
In numerous scientific disciplines, inverse problems arise which pose necessary and useful questions. They are concerned with the recovery of an underlying unknown from data, represented by noisy measurements of a model solution. One way to approximate solutions to an inverse problem is to treat the unknown as a probabilistic distribution, referred to as the posterior, which is known as a statistical or Bayesian approach [17, 27, 30]
. Common algorithms which have been used in this field include Markov chain Monte Carlo (MCMC) methods and variational type schemes. Since the development of inverse problems in the Bayesian setting, an important and fundamental question is how to develop informative priors. A common and popular choice of priors traditionally have been priors of a Gaussian form
[3, 19, 28]. This is due to both attractive properties associated with the form, but also its applicability for computational purposes, such as in the context of partial differential equation (PDEs) based inverse problems. With many PDEs the physical properties of the input such as a diffusion coefficient omit a heterogeneous form, so modelling them through a Gaussian prior is a natural choice. Such PDEs include impedance tomography, Darcy flow and the NavierStokes equation. However there are scenarios in which the unknown of interest does not omit a smooth representation, but instead has certain discontinuities and edges. As a consequence a Gaussian prior can result in a poor approximation. Edgepreserving priors on the other hand offer a more realistic reconstruction as these priors are suited towards nonsmooth features.
The purpose of this paper is to examine the theoretical understanding of nonGaussian priors based on stable random fields [7, 14, 16, 29]. An stable random variable has the general form
where denotes stability parameter,
denotes the skewness parameter,
represents the shift parameter, and which describes the location. Priors constructed with the help of stable random variables are motivated from the formulation and work first discussed in Markkanen et al. [20], which were used in the context of tomography. These processes are of particular interest as they incorporate both smooth and rough features through various distributions, such as Gaussian and Cauchy [6, 28, 31]. Both these processes have been used for Bayesian inversion. Much of the work thus far has been focused on developing computational techniques to test nonGaussian priors. In contrast to this, we aim to understand the convergence analysis of stable sheets. As these sheets can be delicate due to heavytails of their distributions, we initially consider fields on . In order to achieve this we adopt a probabilistic measurebased setting which is an extension of the work by Lasanen [21, 22], who demonstrated numerous convergence results for nonGaussian priors. In particular this was achieved through the setting of Suslin spaces, which was first used in the context of Bayesian inversion in [PP05]. Our work will be largely focused on understanding convergence, specifically convergence, through various forms that stable random fields can take. These include an integral representation but also one of a Poisson processes. In order to obtain convergence, our analysis will differ for different values of . Leading on from this we aim to further establish convergence in a function space setting. Aside from this, an important question is how one could implement these priors in a practical sense. We aim to answer this question by constructing discretizations of stable sheets through sums of i.i.d. random variables, and taking their respective limits. We emphasize with this work that we do not conduct any numerical investigations.A further and final consideration in this work is to present a wellposedness theorem for stable sheets. Wellposedness for Bayesian inverse problems was characterized in a function space setting in the work of Stuart [30], and since has been the main analytical result derived for nonGaussian priors. We intend to extend this to stable sheets. Before continuing with the initiation of this work, we present a section reviewing nonGaussian priors used in Bayesian inversion and provide an outline for this article.
1.1. Related work
One of the first edgepreserving priors that was used in the context of Bayesian inversion was the total variation (TV) prior. The motivation was taken from TV regularization in nonstatistical inverse problems. However a study by Lassas and Siltanen [24] showed the TV prior was degenerate to mesh refinement. As a result the prior was viewed not a sensible choice to use through the Bayesian approach. Extending the idea of edgepreserving inversion in a Bayesian setting, another prior that was developed was the Besov prior. The prior itself was also first analyzed by Lassas et al. [23] which is based on wavelet expansions with random coefficients. The Besov prior has seen substantial progress since its formulation, notably to nonlinear inverse problems [8] and to the further enhancement of Maximum aposteriori inversion [1]. The benefit of these priors over TV, is that they remain discretizationinvariant, which is an attractive property to have. Other priors which have been developed include priors that contain heavy tails. Notably the work of Hosseini [12, 13] recently suggested the implementation of a Laplace prior, where wellposedness was proved for the inverse problem. These developments for Bayesian edgepreserving lead to other proposed priors that were based on stable processes [29].
The stable priors were first discussed in the work of Markkanen et al. [20] which conducted a numerical study of Cauchy difference priors in various dimensions. More recently this was applied to nonlinear hierarchical Bayesian inversion where further numerical investigations were provided [6]. As stable processes include a variety of different processes, the Gaussian and Cauchy cases are of particular interest as they omit different properties and features. An alternative approach to Cauchy priors was discussed by Sullivan [31] but where the process was represented as a series expansion similar to a KarhunenLoève expansion. These priors offer an alternative to Besov priors as they are adaptable at modelling unknowns of inverse problems with heavy tails. However in terms of theoretical gains, all that has been mainly studied thus far for these priors is wellposedness.
1.1.1. Outline.
The layout of this work is split in the following manner. In Section 2 we provide an overview of preliminary material and notation which is required for the rest of paper. This includes a discussion on Bayesian inverse problems and their wellposedness. This will lead onto Section 3 where we discuss and introduce stable stable sheets and the forms they take. Then we provide our first results on the convergence analysis. We extend these results to the case of the posterior in Section 4 where we describe a discretization of the sheets. Finally in Section 5 we conclude with some final remarks while mentioning further areas of research.
2. Background material
2.1. Notation and preliminaries
Let represent a complete probability space, with sample space , algebra and probability measure . The set of all realvalued random variables on is denoted with . The conditional expectation of a function , given a algebra is a measurable function, where
Let be a separable Banach space equipped with its Borel algebra . We denote with the dual space of i.e. the space of all continuous linear forms . We denote the action of on with duality between and and equip the linear space with the strong topology .
We say that is valued random variable, if for all Borel sets .
Remark 2.1.
For a separable Banach space , weakly measurable mappings are measurable, since the Borel algebra is generated by cylinder sets of the form
where , , and . For separable duals, we may choose to be any countable dense set of the dual space of (see Theorem 6.8.9 in [4]).
The distribution of on is denoted with .
A conditional distribution of given another Banachspace valued random variable , is defined as for all Borel sets of , where is the Borel algebra of the separable Banach space . The notation emphasizes the fact that conditional expectation given algebra generated by another random variable can be expressed as a function of . In separable Banach spaces, conditional distributions have regular version in the sense that is measurable from to for every and is a probability measure on for every
. We remark, that only the joint distribution of
and is needed to determine the distributions up to a null set (see Theorem 2.4 in [21]).Two separable Banach spacevalued random variables and are called independent, if and are independent for any Borel sets and . For independent Banachspace valued random variables and continuous function , the conditional distribution of given is the distribution of (see Lemma 3.2 in [21]).
For further details on measures and regular conditional distributions on Banach spaces, we refer to [3, 4].
An valued random variable has a stable distribution if, for any positive constants and , there exists positive constants and such that
(2.1) 
in distribution, where and are independent copies of . Realvalued stable random variables have characteristic functions
where parameters , , and . Parameters and are called stability parameter and skewness parameter, respectively. The distribution of a realvalued stable random variable is denoted with
The distribution of realvalued stable random variable is called symmetric if . When is valued stable random variable, we call it symmetric, if the composition of with any continuous linear form is symmetric. Especially, we can identify the distribution of symmetric through its characteristic function
We will focus on special type of symmetric stable random variables in Section 3, stable sheets, which we will use as heavytailed priors in Bayesian inverse problems for valued unknowns.
2.2. Bayesian inverse problems
We will first recall the basics of Bayesian inverse problems in infinitedimensional spaces. Let , and denote separable Banach spaces equipped with their Borel algebras and , respectively. An inverse problem is concerned with the recovery of some quantity of interest from data , where we will consider noisy models of the type
(2.2) 
such that is valued random noise and is a continuous mapping. We take the Bayesian approach and model as an valued random variable with prior distribution on . A common setup is to take , where is a continuous mapping.
When distributions of and
have probability density functions (e.g. when
and are finitedimensional), we write the familiar Bayes’ formula for the posterior distribution of given as(2.3) 
where is the likelihood and
is the prior probability density of
, which represents our initial beliefs about the unknown . We will denote with the prior distribution of and with the posterior distribution of on . The nonexistence of Lebesgue’s measure in infinitedimensional setting prohibits us from using (2.3) in the infinitedimensional case. Instead, one uses different measures in place of the Lebesgue’s measure. Let us recall the basics of infinitedimensional Bayesian inference by considering some formal candidates for replacements of Lebesgue’s measure in (
2.3), which lead to the wellknown representation of the posterior distributions (see [30]).Avoiding the use of the posterior density in (2.3) is straightforward, just take integrals and consider posterior distribution instead of posterior density. Similarly, the prior density can be avoided by integrating with respect to prior distribution instead of , where is the Lebesgue’s measure. From (2.3), we formally derive the posterior distribution
Expressing the integral with respect to prior measure and considering posterior distributions instead of posterior densities handles two out of three problematic densities.
The critical part of generalizing (2.3) to infinite dimensions is finding a generalization for the likelihood function , which has turned out to be nontrivial and sometimes even impossible (see Remark 4 and 5, together with a simple Gaussian counterexample, Remark 9, in [21]).
If the conditional probability distribution of
given , which we denote with , has RadonNikodym densities with respect to a common finite measure on for (almost) all , the Bayes’ formula continuous to hold in the sense that the posterior distribution has RadonNikodym density (see [18, 21])(2.4) 
with respect to the prior distribution . In (2.4), the mapping is an extended realvalued mapping, which can be chosen to be jointly  measurable on (see Theorem 2 in [18]).
Definition 2.2.
We say that inverse problem (2.2) is dominated, if there exists a finite measure so that the RadonNikodym densities
define a jointly measurable mapping . In this work, we call negative dominated loglikelihood (NDLL).
In this work, we will focus on the basic case, where is finitedimensional and the generalized likelihood is bounded. For example, when the and are statistically indpendent, we take for the observation .
2.3. Wellposedness of Bayesian inverse problems
The wellposedness of the posterior distribution, established first by Stuart [30] in the Gaussian nonlinear case, means essentially that the posterior distribution depends continuously on with respect to suitable topology on the space of probability distributions. We recall sufficient conditions for wellposedness of the posterior distribution of stable random sheets in dominated inverse problems with respect to weak topology and total variation metric. We follow the general scheme introduced in [30] and refined in [31].
Let the posterior distribution be of the form
where is NDLL as in Definition 2.2 and the prior is the distribution of the stable random sheet .
In the next definition, Conditions WD1 and WD2 are connected to welldefinedness in the fully infinitedimensional case. Condition WP1 and WP2 are connected to wellposedness. Condition WP2 is connected to wellposedness in weak topology and Condition WP3 strengthens wellposedness so that it holds also in total variation metric. Conditions WD1 and WP1 intentionally leave the finer sufficient properties of the NDLL undetailed, since our intention is to use a pure skeleton of highimpact assumptions. The Condition PC1, which we will use later, is connected to posterior convergence of the Bayesian inverse problem.
Definition 2.3.
We define the following conditions for a NDLL and a distribution on .

WD1. There exists an open set such that the function is integrable for every .

WD2. The NDLL is bounded on bounded subsets of .

WP1. There exists an open set such that the function is continuous on .

WP2. The function is continuous on for every .

WP3. The functions on are uniformly continuous with respect to for any bounded subset of .

PC1. The functions on are continuous for any .
Definition 2.4.
We say that the posterior distribution is welldefined on the set , if the normalizing constant is positive and finite for every .
Theorem 2.5.
Let a NDLL and a prior distribution satisfy Conditions WD1 and WD2. Then the posterior distribution is welldefined on the set .
Proof.
Since all probability measures on separable Banach spaces are Radon, there exists a compact set such that . By Condition WD2, the NDLL is bounded on and there is a constant such that on . Therefore, the normalizing constant has a lower bound
for every . By Condition WD1, the normalizing constant is also bounded for every . ∎
We will now turn to the wellposedness of posterior distributions.
Definition 2.6.
We say that the posterior distribution is wellposed on the set in weak topology, if is welldefined on and, for every bounded continuous function , the equation
holds whenever , where , .
Theorem 2.7.
Let a NDLL and a prior distribution satisfy Conditions WD1, WD2, WP1, and WP2. Then the posterior distribution is wellposed on the open set in weak topology.
Proof.
By Condition WP1 and WP2
which we integrate over an open set . By Fatou’s lemma
Since the open set can be chosen freely, we arrive at a wellknown equivalent criteria for weak convergence of distributions. ∎
Definition 2.8.
We say that the posterior distribution is wellposed on the set in total variation metric, if is welldefined on and
whenever , where , .
Remark 2.9.
For a Banach space , the uniform tightness of the family of distributions on is equivalent to the condition that for every and every , there exists a finite number of open balls of such that
for every (See Remark 2.3.1 in [5]).
Next, we study wellposedness of the posterior distribution in total variation metric.
Theorem 2.10.
Let a NDLL and a prior distribution satisfy Conditions WD1, WD2, WP1, and WP3. Then the posterior distribution is wellposed on in total variation metric.
Proof.
Let , where all . Then all belong to a bounded set .
We use the equivalent definition of uniform tightness in Remark 2.9. Let and . By tightness of , there exists a finite number of balls such that
By Conditions WP3, the NDLL has an upper bound
(2.5) 
for all and from a bounded subset of , which we take to be the finite union
. We estimate
We choose so that
that is,
Then
Since the normalizing constants converge by Condition WP1, we may choose so that when .
For , there exists finite number of open balls such that
The finite collection of balls together with the finite number of balls , where , fulfills the condition. Hence, are uniformly tight. The uniformly tight family of measures is also bounded. Indeed, the converging sequence belongs to a bounded set and by uniform tightness, there exists a compact set so that
where we apply Condition WD2.
Finally, we verify the convergence of , which follows directly from tightness and Equation (2.5). Indeed,
where we apply Condition WP3. ∎
3. stable random measures and sheets
In this section we review stable fields, which will later serve as priors . We highlight certain properties and assumptions of stable random fields that are required in order to analyze the convergence. As our discretization scheme for the unknown function is based on finitedifference approximations on certain function spaces, we need to verify that stable random sheets have enough regularity to carry out the convergence analysis. We aim to understand the convergence both in terms of probability and functional analysis.
We will need the concept of stable stochastic integrals, which we recall from [29]. Consider measure space where
denotes a subset of that consists of sets of finite measure. In our inverse problem, the unknown will be a random field defined on .
Definition 3.1.
Let . A random additive set function
is called an stable random measure on with control measure and skewness parameter , if it is independently scattered and for every ,
The random measure is called symmetric, if .
By stating independently scattered we mean that if belong to and are disjoint then the random variables are independent. Furthermore, additivity means that if , that belong to , are disjoint and then
Stochastic integrals of deterministic functions with respect to stable random measure are defined similarly to the Gaussian case, through limits of simple functions . However, the convergence holds in a weaker sense. Namely
(3.1) 
in probability if (and only if) in (Proposition 3.5.1 in[29]). The values of the random variable can be specified almost surely by e.g. choosing a subsequence that converges almost surely to . Recall, that space is only a complete metric space, not a normed space, when .
The distribution of the stochastic integral is , where
We consider modeling our unknown as an stable sheet on the hypercube .
Definition 3.2.
A random field on is called a symmetric stable random sheet if it can be expressed (up to a version) as a stochastic integral
(3.2) 
where
(3.3) 
and is symmetric stable random measure on with Lebesgue’s measure as the control measure
The stable random sheet has marginal distributions
where . Moreover, the values and are statistically dependent.
3.1. Sample paths
Let us now concentrate on the nature of the mapping for fixed , where each is defined by (3.2). This is an important point, because we are interested in modelling our unknown function with and wish to specify a Banach space where lives. In other words, we wish to describe as an valued random variable for some Banach space . At this point, we have defined as a random field, through a family of random variables. We will heavily utilise another way of describing stable random fields, the socalled LePage series representation ([29], Theorem 3.9.1), which is often used in deriving sample path properties of stable random fields.
For the convenience of the reader, we provide the proofs below and begin with two preparatory lemmas. We recall, that arrival times of a Poisson process with arrival rate 1 can be expressed as
where are independent identically distributed random variables with common probability density .
Lemma 3.3.
Let be arrival times of a Poisson process with arrival rate 1. There exists and so that for all and for  almost every . Moreover, the series
converges almost surely for all .
Proof.
By the law of large numbers
almost surely. Therefore, for large and there exists and integers such that for all almost surely. Inserting the lower bound in the series
shows that the series converges almost surely. ∎
Lemma 3.4.
Let be mutually independent random sequencies, and let be a separable Banach spacevalued Borel measurable function. The series
converges almost surely in if and only if the series
converges for almost every sample of .
Proof.
If is any event, say
then
if and only if almost surely. Indeed, conditional expectation of is at most 1. A simple proof by contradiction shows that the conditional expectation must equal 1 almost surely if . The other direction is trivial.
∎
The next theorem provides a series representation for the symmetric stable random measure with Lebesgue’s control measure. In the theorem, we prove series representations of stochastic integrals, when . This suffices for our purposes, because we can always choose for the functions that we study. The approach helps us to use almost surely equivalence of stochastic integrals instead of the more common concept of equivalence in distribution.
Theorem 3.5.
Let and be a measurable set with . Let be arrivals times of a Poisson process with arrival rate 1. Let
form an i.i.d. sequence of random vectors independent of
that consist of uniformly distributed
dimensional random vectors on , and valued random variables whose conditional distribution given is . LetThen
(3.4) 
where are Borel sets, defines a symmetric stable random measure with Lebesgue’s control measure on .
If , where when and otherwise, then the series
(3.5) 
converges almost surely and it coincides with the stochastic integral .
Proof.
First, we will verify convergence of (3.4).
For , a direct application of Lemma 3.3 shows the convergence. For , we proceed as in [29] by using Kolmogorov’s three series theorem, but apply it under conditioning with respect to in the spirit of Lemma 3.4.
For fixed ,
(3.6) 
since for large by Lemma 3.3.
Moreover, the sum of expectations of
, that is,
(3.7) 
vanishes due to the distribution of
, and the sum of variances
(3.8) 
is finite by Lemma 3.3. Hence, is a welldefined random variable.
Secondly, we show that has the right distribution. We first remark, that
(3.9) 
by the law of large numbers. Additionally the random vector with components
, ,
is distributed as the random vector whose components are
independent uniformly distributed random variables on ordered increasingly.
In the finite sum (3.9), we may also reorder the random variables
without changing the distribution of the sum. This leads to identification of
as the limit of
sums of independent random variables
which implies that is necessarily a stable random variable. We still need to define the index and parameters of the stable distribution. To do this, we identify the domain of attraction of the common distribution of . Since the common distribution is symmetric, we study the tail behaviour
which implies [10] that distribution of is stable. Through tail behaviour, we identify the distribution as .
From the characteristic function of it is evident, that is independently scattered. For disjoint sets , the sum converges clearly almost surely and the sum of independent random variables converges through ItōNisio theorem [15] almost surely and their limit coincide in probability. By selecting almost surerly converging subsequences, the limits coincide almost surely, which shows that is countably additive.
When , the right hand side of converges absolutely by Lemma 3.3, since its the conditional expectation given the sequence is finite. By choosing almost surely converging subsequences from the simple function approximations that converge in probability to , we identify the stochastic integral
with the almost surely converging series representation by taking the limit inside the sum by Lebesgue’s dominated convergence theorem.
When , we divide into positive and negative parts and , and consider, where necessary, their strictly increasing simple function approximations . The right hand side of (3.5) converges almost surely by Kolmogorov’s three series theorem similarly to (3.6) and (3.7), since
by Markov’ inequality and the assumptions. The convergence of the sum of variances holds, since
Comments
There are no comments yet.