# Piecewise Deterministic Markov Processes and their invariant measure

Piecewise Deterministic Markov Processes (PDMPs) are studied in a general framework. First, different constructions are proven to be equivalent. Second, we introduce a coupling between two PDMPs following the same differential flow which implies quantitative bounds on the total variation between the marginal distributions of the two processes. Finally two results are established regarding the invariant measures of PDMPs. A practical condition to show that a probability measure is invariant for the associated PDMP semi-group is presented. In a second time, a bound on the invariant probability measures in V-norm of two PDMPs following the same differential flow is established. This last result is then applied to study the asymptotic bias of some non-exact PDMP MCMC methods.

## Authors

• 38 publications
• 3 publications
• 6 publications
03/09/2020

### Stationary distributions of persistent ecological systems

We analyze ecological systems that are influenced by random environmenta...
01/29/2019

### Nonparametric estimation of jump rates for a specific class of Piecewise Deterministic Markov Processes

In this paper, we consider a piecewise deterministic Markov process (PDM...
02/18/2021

### Entropy under disintegrations

We consider the differential entropy of probability measures absolutely ...
09/24/2021

### Approximations of Piecewise Deterministic Markov Processes and their convergence properties

Piecewise deterministic Markov processes (PDMPs) are a class of stochast...
06/15/2020

### Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View

In this work, we show existence of invariant ergodic measure for switche...
11/18/2020

### Subgeometric hypocoercivity for piecewise-deterministic Markov process Monte Carlo methods

We extend the hypocoercivity framework for piecewise-deterministic Marko...
06/11/2020

### Fiducial and Posterior Sampling

The fiducial coincides with the posterior in a group model equipped with...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Piecewise Deterministic Markov Processes (PDMP), similarly to diffusion processes, form an important class of Markov processes, which are used to model random dynamical systems in numerous fields (see e.g. [18, 1]). Recently, interest has grown for their use to sample from a target distribution [4, 23, 5]. The resulting class of algorithms is referred to as PDMP Monte Carlo (PDMP-MC) methods. Recently, interest has grown for their use in MCMC algorithms. To this end, natural questions arise as to the stationarity of the target measure, the ergodicity of the corresponding process and possible bias introduced by the method. In mathematical physics [6] and biology [7], the long time behaviour of these processes has been the subject of several works. In this context, these studies are done through the Kolmogorov Fokker Planck operator of the PDMP of interest given for all smooth density on by

 A⋆ρ=−⟨Ξ,∇ρ⟩+K(λρ)−λρ,

where

is a smooth vector field of

, and is a non-local collision operator.

The relevance of the present work emerged while writing the companion paper [12], concerned with the geometric ergodicity of the Bouncy Particle Sampler (BPS) [5], an MCMC algorithm which, given a target distribution on , introduces a PDMP for which is invariant. In order to make rigorous several arguments in [12], technical lemmas had to be established, in particular to cope with the fact that Markov semi-groups associated to PDMP lack the regularity properties of (hypo-)elliptic diffusions, and thus implies additional difficulties and technicalities. These results, of interest in a more general framework, are gathered here with the hope that it will set a framework where for example verification of the invariance of a measure becomes a mere calculus via the generator (as for diffusion). The BPS is used as a recurrent example.

Let us present these different results, together with the organization of the paper. Section 2 contains the basic definitions of our framework, and in particular presents the construction of a PDMP. Alternative constructions are shown in Sections 4 and 3

to give the same process (i.e. to give a random variable with the same law on the Skorokhod space). Conditions which ensure that PDMPs are non explosive are presented in

Section 5. The synchronous coupling of two PDMPs is defined in Section 6

, which aims to construct simultaneously two different PDMPs, starting at the same initial state, in such a way that they have some probability to stay equal for some time. It yields estimates on the difference of the corresponding semi-groups in total variation norm. In

Section 8, conditions are established under which the semi-group associated to a PDMP leaves invariant the space of compactly-supported smooth functions. Using this result, a practical criterion to ensure that a given probability measure is invariant for a PDMP is obtained. Indeed, it is classical that, denoting by the strong generator of the Markov semi-group associated to the PDMP, then is invariant if and only if for all in a core of . Nevertheless, due to the lack of regularization properties of the semi-group, it is generally impossible to determine such a core. We will prove that, under some simple assumptions, it is enough to consider compactly-supported smooth functions . Finally, in Section 10, we are interested in bounding the -norm between two invariant probability measures and of two PDMPs sharing the same differential flow but with different jump rates and Markov kernels, sometimes called perturbation theory in the litterature (see for example the recent [24]). This question is here mainly motivated by the thinning method used to sample trajectories of PDMPs [17, 16]. Indeed, a PDMP can be exactly sampled (in the sense that no time discretization is needed) provided that the associated differential flow can be computed and a simple upper bound on the jump rate is known. When this is not the case, a PDMP with a truncated jump rate can be sampled, and our result gives a control on the ensuing error.

### Notations and conventions

For all , we denote , , .

stands for the identity matrix on

.

For all , the scalar product between and is denoted by and the Euclidean norm of by . For all , , we denote by the ball centered at with radius . The closed ball centered in with radius is denoted by . For any -dimensional matrix , define by the operator norm associated with .

Let be a smooth closed Riemannian sub-manifold of and the associated Borel -field. The distance induced by is denoted by . With a slight abuse of notations, the ball (respectively closed ball) centered at with radius is denoted by (respectively ).

For all function and compact set , denote , . Denote by the set of all measurable and bounded functions from to . The space is endowed with the topology associated with the uniform norm . Let stand for the set of continuous function from to , the subset of consisting of continuous functions vanishing at infinity and, for all , let be the set of -times continuously differentiable functions from to . Denote for all , and the set of functions in with compact support and the set of bounded functions in respectively. For , we denote by , the differential of . For all function , we denote by and the gradient and the Hessian of respectively, if they exist.

We denote by the set of probability measures on . For , is called a transference plan between and if for all , and . The set of transference plan between and is denoted . The random variables and on are a coupling between and if the distribution of belongs to . The total variation norm between and is defined by

 ∥μ−ν∥TV=2infξ∈Γ(μ,ν)∫M21ΔM(x,y)dξ(x,y),

where . For all , define the support of by

 suppμ=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯{x∈M: for all open set% U∋x,μ(U)>0}.

In the sequel, we take the convention that . All the random variables considered in this paper are defined on a fixed probability space .

## 2. A first definition of Piecewise Deterministic Markov Processes

### Definitions and further notations

Let be a smooth closed Riemannian sub-manifold of . A PDMP on is defined using a triple , , referred to as the local characteristics of a PDMP, where

• is a differential flow on : is a measurable function from to , such that for all , . Moreover, for all , is continuous differentiable from to and for all , is a -diffeomorphism of . The flow is (time)-homogeneous if for all , , in which case we denote .

• For all , is a measurable function referred to as a jump rate on which is locally bounded, in the sense that for all compact . The jump rate is (time)-homogeneous if it does not depend on .

• For all , is an inhomogeneous Markov kernel on : for all , is measurable, and for all , . The Markov kernel is (time)-homogeneous if it does not depend on .

If is a homogeneous differential flow and, for all , are homogeneous as well, the local characteristics are said to be homogeneous. A (homogeneous) jump mechanism on is a pair constituted of a (homogeneous) jump rate and a (homogeneous) Markov kernel on .

### A first construction of a PDMP

For all , consider a representation of the Markov kernel , i.e. a measurable function from to such that for all , , where

is a random variable uniformly distributed on

. By [3, Corollary 7.16.1], such a representation always exists.

Then, a PDMP based on the local characteristics and the initial distribution

can be defined recursively through a Markov chain

on . For all , will be the state of the process at times . Between two times and , will be a deterministic function of and . More precisely, consider the following construction.

###### Construction 1.

Let be a random variable with distribution and be an i.i.d. sequence, independent of , such that for all and , is uniformly distributed on and is an exponential random variable with parameter , independent of and from for . Let be a cemetery point.
Set , , and suppose that and have been defined for some , with and . For all , set

 (1) Sj,k+1=inf{t⩾Sk: Ej,k+1<∫tSkλj(s,φSk,s(X′k))ds},Sk+1=minj∈⟦1,ℓ⟧Sj,k+1.
• If , set , , for all , and for all .

• If , set

 Ik+1=min{j∈⟦1,ℓ⟧, Sj,k+1=Sk+1},X′k+1=GIk+1(Sk+1,φSk,Sk+1(X′k),Uk+1).

For , set and .

For , set .

Note that, when , , the probability of for two indexes in is zero, but the definition of ensures that the process is defined not only almost everywhere on , but in fact on all .

Let be the filtration associated with . Then taking a random variable on , is an inhomogeneous Markov chains since for all , , , ,

 P(X′k+1∈A,Sk+1⩽t,Ik+1=j∣∣Fk) =1M(X′k)∫tSkQj(s,φSk,s(X′k),A)λj(s,φSk,s(X′k)) (2) ×exp{−ℓ∑i=1∫sSkλi(u,φSk,u(X′k))du}ds.

Note that the sequence is an inhomogeneous Markov chain as well, whose kernel can be straightforwardly deduced from (2).

Then, is a stochastic process on , i.e. it is a random variable from to the space of càdlàg functions from to , endowed with the Skorokhod topology, see [15, Chapter 6]. Moreover, is a Markov process [14, Theorem 7.3.1], from the class of piecewise deterministic Markov processes (PDMPs). We say that a stochastic process is a PDMP with local characteristics and initial distribution if it has the same distribution on as . We will denote by this distribution. In the sequel, we will see that a given PDMP can admit several local characteristics. Note that, as is a -diffeomorphism, is completely determined by the Markov chain , referred to as the embedded chain associated to the process. The sequence is said to be the jump times of the process .

A PDMP is said to be homogeneous if its local characteristics are (time) homogeneous.

For , we call the explosion time of the process . A process is said to be non-explosive if almost surely. PDMP characteristics are said to be non-explosive if for all initial distribution the associated PDMP is non-explosive.

Construction 1 associated with the characteristics defines a Markov semi-group for all , and by

 Ps,t(x,A) = P(¯Xs,xt−s∈A),

where is a PDMP started from with characteristics . Its left-action on and right-action on are then given by

for all , , , and . The Markov property of is equivalent to the semi-group property for all . If is non explosive, then is a Markov kernel for all and we say that is non explosive. Else, it is only a sub-Markovian kernel. For a homogeneous process, we simply write for all .

For a PDMP with jump times , we say that at time , , a true jump occurred if . Else, we say that at time a fantom jumped occurred. Note that, in the definition of homogeneous PDMPs with characteristics given in [9, standard conditions p. 62], fantom jumps are impossible, since it is assumed that for all , . This is not the case with the definition we gave in Section 2, where the notion of jump times depends on the jump mechanisms used to define the process. We will see that in Section 4 that under our settings, based on characteristics which define a PDMP , we can always define some characteristics which define a PDMP with the same distribution as but no fantom jump.

The condition imposed by [9] implying that a PDMP has no fantom jump can be very useful since it allows a one-to-one correspondence between the path of the continuous-time process and of its embedded chain . With our construction, the continuous process is completely determined by its embedded chain but not the opposite.

On the other hand, adding fantom jumps sometimes turns out to be convenient. Here is an example: let be the characteristics of a PDMP , and suppose that there exists such that for all and . From Equation 9 below, has the same distribution as the PDMP obtained through Construction 1 from the characteristics with for all ,

 ~Q(t,x,A) = λ(t,x)λ∗Q(t,x,A)+{1−λ(t,x)λ∗}\updeltax(A).

The jump times of are given by a Poisson process with intensity . The method of adding fantom jumps so that the distribution of the jump times get simpler (for sampling purpose, for instance) is called thinning (see [16] and references therein for more details).

Another use of fantom jump is presented in [8]. The stability or ergodicity of a PDMP and of its embedded chain may differ, but this is no more the case if fantom jumps are added at constant rate, i.e. if we consider the PDMP with characteristics , where is given for all by

and its embedded chain. See [8] for more details.

There are a other differences between the assumptions we made on the characteristics of a PDMP and those made in [9, standard conditions p. 62]. For simplicity, we consider that the flow cannot exit contrary to [9]. In addition, to prevent the artificial problem of an infinity of fantom jumps in a finite time, we assume that is locally bounded, instead of the following weaker condition that would be sufficient to define : for all , there exists such that . On the other hand, we don’t assume a priori that PDMPs are non-explosive.

### Examples

Several examples of PDMP can be found in [18] and references therein. In the present paper, special attention will be paid to the family of velocity jump PDMP, described as follows. Let be a smooth complete Riemannian submanifold, and set . Then, is a smooth complete Riemannian submanifold of

endowed with the canonical Euclidean distance and tensor metric. We say that a PDMP

on (where and for all ) with characteristics is a velocity jump PDMP if is homogeneous and given for any and by

 (3) φt(x,y)=(x+ty,y)

and if for all , all and all ,

 Qi(t,(x,y),A×V) = \updeltax(A).

Consider the PDMP associated with this choice of characteristics and the corresponding embedded chain. Note that by construction for all , , and . Therefore for all , and only can be discontinuous in time.

The class of velocity jump processes gathers the Zig-Zag process [4], the Bouncy Particle Sampler (BPS) [23] and many of their variants. The choice for the jump rates and Markov kernels of these different (but similar) processes are mainly of one of the following type (here we only consider homogeneous mechanisms):

• refreshment mechanism: the rate only depends on , and the kernel is constant, i.e. there exists such that for all and all

 Q((x,y),A×A′) = \updeltax(A)ν(A′).
• deterministic bounce mechanism: there exists a measurable function , locally bounded, such that for all , and , for a measurable function . A particular example in the case or , is where is given for all by

 (4) R(x,y) = {y−2∥g(x)∥−2⟨g(x),y⟩g(x) if g(x)≠0,y otherwise.

Note that is simply the orthogonal reflection of with respect to if .

• randomized bounce mechanism: there exists a measurable function such that for all and , , and , where is a Markov kernel on .

For instance, [6] studies the velocity jump process associated with the linear Boltzmann equation, which gives an exemple of refreshment mechanism. The Zig-Zag (ZZ) process [4] and the Bouncy Particle Sampler (BPS) [23, 21, 10] are recently proposed PDMP used to sample from a target density , where is a continuously differentiable function. The ZZ process is a velocity jump process with and deterministic bounce mechanisms given for all , , and by

 λi(x,y)=(yi∂U(x)/∂xi)+,Qi((x,y),{x}×{−y})=1.

Note that in this case, for all , where is the vector of the standard basis of . Additional refreshment mechanisms can be added to the process. In the rest of this paper, we will repeatedly use the BPS process as an illustration to our different results.

###### Example-Bouncy Particle Sampler .

Let be a smooth closed sub-manifold of rotation invariant, i.e. for any rotation of , . Let and . The BPS process associated with the potential , refreshment rate and refreshment distribution is the PDMP on with characteristics where is given by (3) and for all , , , , and , where is given by (4) with . Note that is the pure bounce mechanism associated with , and is a refreshment mechanism.

Variants of the BPS with randomized bounces have been recently introduced in [20, 27, 25].

## 3. Alternative constructions

Consider PDMP characteristics , an initial distribution and the associated process defined in Section 2. The goal of this Section is to construct another process on the same probability space with the same distribution on as .

###### Construction 2.

Let be a random variable  with distribution and be an i.i.d. family, independent of , such that for all and , is uniformly distributed on and is an exponential random variable  with parameter , independent of .
Set , , and for . Suppose that and have been defined for some , with and . Set

• If , set , , for all and for .

• If , set

 ~Ik+1=min{j∈⟦1,ℓ⟧, ~Sj,k+1=~Sk+1},~H~Ik+1,k+2=E~Ik+1,~Nk+1,Y′k+1=G~Ik+1(~Sk+1,φ~Sk,~Sk+1(Y′k),U~Ik+1,~Nk+1),~N~Ik+1,k+2=~N~Ik+1,k+1+1,

and for ,

 ~Hj,k+2=~Hj,k+1−∫~Sk+1~Skλj(s,φ~Sk,s(Y′k))ds.

Set for and .

For , set .

We show in the following result that the two constructions we consider define the same distribution on .

###### Proposition .

The two Markov chains and have the same distribution on . Therefore, and have the same distribution on .

We preface the proof by a lemma. Denote by and the filtration associated with the sequence of random variables and .

###### Lemma .

For all , given , are i.i.d. exponential random variables with parameter , independent of . In addition, for all , given , are i.i.d. exponential random variables with parameter , independent of .

###### Proof.

Set for all and , . Note that the second statement is equivalent to for all and , are i.i.d. exponential random variables with parameter , independent of given since for all , , which is the result that we will show.

The proof is by induction on . For

, by definition the first statement holds. The second part follows from the memoryless property of the exponential distribution and because for all

, , is independent of .

Assume now that the result holds for . Then for all , we have using the induction hypothesis, the definition of , the memoryless property of the exponential distribution and since given , is independent of for all ,

 1R+(~Sk)P(ℓ⋂j=1{~Hj,k+1⩾tj}∣∣ ∣∣~F′k) =1R+(~Sk)ℓ∑i=1P⎛⎝ℓ⋂j=1,j≠i{~Hj,k+1⩾tj}∩{~Sk=~Si,k}∩{¯Ei,¯Nk+1⩾ti}∣∣ ∣∣~F′k⎞⎠

which shows the first part of the statement. Finally we show the second statement of the induction. Note that is an independent family of random variables, independent of and , for and therefore given , is independent of . Then, using the first statement and the memoryless property of the exponential distributions, we have for all ,

 1R+(~Sk+1)P⎛⎝ℓ⋂j=1,j≠i{~Hj,k+1−Bj,k+1⩾tj}∩{~Sk+1=~Si,k+1}∣∣ ∣∣σ(~F′k,~Si,k+1)⎞