# Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé

This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B.

• 7 publications
• 3 publications
• 41 publications
10/13/2021

### A Short Review of Ergodicity and Convergence of Markov chain Monte Carlo Estimators

This short note reviews the basic theory for quantifying both the asympt...
04/14/2020

### An introduction to computational complexity in Markov Chain Monte Carlo methods

The aim of this work is to give an introduction to the theoretical backg...
03/09/2021

### Unbiased approximation of posteriors via coupled particle Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is a powerful methodology for the approx...
05/08/2017

### Geometry and Dynamics for Markov Chain Monte Carlo

Markov Chain Monte Carlo methods have revolutionised mathematical comput...
05/05/2007

### MIMO detection employing Markov Chain Monte Carlo

We propose a soft-output detection scheme for Multiple-Input-Multiple-Ou...
10/30/2010

### Discussion of "Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by M. Girolami and B. Calderhead

This technical report is the union of two contributions to the discussio...
11/19/2021

### Analysis of autocorrelation times in Neural Markov Chain Monte Carlo simulations

We provide a deepened study of autocorrelations in Neural Markov Chain M...

## Appendix A Derivation of the Upper Bound

The aim in what follows is to reproduce the proof of Proposition 1 in [5] whilst explicitly tracking the terms that are -dependent. To avoid reproducing large amounts of [5], we assume familiarity with the notation and quantities defined in that work.

The first part of the argument in [5] uses Assumption 1 to deduce that for some and all . Our first task is to explicitly compute the constant in terms of the quantities and in Assumption 1. To this end, we reproduce the argument alluded to in the paper:

 (E[|Δt|2+η])12+η = (E[|h(Xt)−h(Yt−1)|2+η])12+η ≤ (E[|h(Xt)|2+η])12+η+(E[|h(Yt−1)|2+η])12+η(Minkowski's inequality) ≤ D12+η+D12+η(Assumption% 1) ⟹E[Δ2t]=E[Δ2t1(τ>t)] ≤ E[|Δt|2+η]22+ηE[1(τ>t)]η2+η(H\"{o}lder's inequality) ≤ (2D12+η)2(Cδt)η2+η(Assumption 2) = 4Cη2+ηD22+η~δt=~C~δt,~C=4Cη2+ηD22+η.

It is then stated in the proof of Proposition 1 in [5] that where for some and all with ; we reproduce the implied argument to explicitly represent in terms of and next:

 E[(Hn′0(X,Y)−Hn0(X,Y))2] = n′∑s=n+1n′∑t=n+1E[ΔsΔt] ≤ n′∑s=n+1n′∑t=n+1E[Δ2s]1/2E[Δ2t]1/2(Cauchy-% Schwarz inequality) ≤ n′∑s=n+1n′∑t=n+1(~C~δs)1/2(~C~δt)1/2 = ~Cn′∑s=n+1(~δ1/2)s+n+1n′−n−1∑t=0(~δ1/2)t = ~Cn′∑s=n+1(~δ1/2)s+n+1⎛⎝1−(~δ1/2)n′−n1−~δ1/2⎞⎠ ≤ ~C11−~δ1/2n′∑s=n+1(~δ1/2)s+n+1 = ~C11−~δ1/2(~δ1/2)2n+2n′−n−1∑s=0(~δ1/2)s = ~C11−~δ1/2(~δ1/2)2n+2⎛⎝1−(~δ1/2)n′−n1−~δ1/2⎞⎠≤~C~δ(1−~δ1/2)2~δn

so we may take

 ¯C=~δ(1−~δ1/2)2~C=~δ(1−~δ1/2)2×4Cη2+ηD22+η=γ2D22+η,γ2:=4Cη2+ηδη2+η(1−δη4+2η)2 (2)

where is a -independent constant that depends only on the law of the meeting time for the Markov chains. The constant is finite since .

The stylised bound that we present is rooted in the concept of the maximum mean discrepancy associated to the reproducing kernel Hilbert space , defined as

 dH(π,π′):=sup∥f∥H≤1|π(f)−π′(f)|.

If then we have from the definition of the maximum mean discrepancy that

 |π(|h|2+η)−π′(|h|2+η)|≤∥|h|2+η∥HdH(π,π′).

Taking to be the law of thus gives that

 |π(|h|2+η)−E[|h(Xt)|2+η)] ≤ ∥|h|2+η∥HdH(π,πt) ⟹E[|h(Xt)|2+η] ≤ π(|h|2+η)+∥|h|2+η∥HdH(π,πt) ⟹supt≥0E[|h(Xt)|2+η] ≤ π(|h|2+η)+∥|h|2+η∥Hsupt≥0dH(π,πt).

Thus we may take the constant in Assumption 1 to be

 D=π(|h|2+η)+∥|h|2+η∥Hsupt≥0dH(π,πt). (3)

In what follows we let be a -independent constant that depends on the law of the Markov chain used. It is necessary to check that is finite. Let be the inner product in . The assumption that is a reproducing kernel Hilbert space means that , from the reproducing property and Cauchy-Schwarz. Since the kernel was assumed to satisfy , it follows that . Thus

 dH(π,π′)=sup∥h∥H≤1|π(h)−π′(h)|≤sup∥h∥∞≤1|π(h)−π′(h)|=dTV(π,π′).

Thus as required.

To complete the argument we proceed as follows:

 E[(Hn′0(X,Y)−Hn0(X,Y))2] ≤ ¯C~δn ⟹|E[Hn′0(X,Y)2]1/2−E[(Hn0(X,Y))2]1/2| ≤ (¯C~δn)1/2(reverse Minkowski % inequality) ⟹E[H0(X,Y)2]1/2 ≤ ¯C1/2+E[h(X0)2]1/2(taking n=0, n′=∞) ⟹σ(h) ≤ ¯C1/2+E[h(X0)2]1/2(since V[Z]≤E[Z2]) ≤ γD12+η+E[h(X0)2]1/2(from (???)) ≤ γ(π(|h|2+η)+λ∥|h|2+η∥H)12+η+E[h(X0)2]1/2

where the final line follows from (3) and the fact that .