A note on concentration inequality for vector-valued martingales with weak exponential-type tails

09/06/2018 ∙ by Chris Junchi Li, et al. ∙ 0

We present novel martingale concentration inequalities for martingale differences with finite Orlicz-ψ_α norms. Such martingale differences with weak exponential-type tails scatters in many statistical applications and can be heavier than sub-exponential distributions. In the case of one dimension, we prove in general that for a sequence of scalar-valued supermartingale difference, the tail bound depends solely on the sum of squared Orlicz-ψ_α norms instead of the maximal Orlicz-ψ_α norm, generalizing the results of Lesigne & Volný (2001) and Fan et al. (2012). In the multidimensional case, using a dimension reduction lemma proposed by Kallenberg & Sztencel (1991) we show that essentially the same concentration tail bound holds for vector-valued martingale difference sequences.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This note concerns the following problem: let be a vector-martingale difference sequence that take place on the -dimensional Euclidean space , where . Assume that has the following weak exponential-type tail condition: for some and all we have for some scalar


and hence by Markov’s inequality their tails satisfy for each

then what can we conclude about the tail probability of the random variable

? Note for under the condition (1.1

), the moment generating functions

are in general not available, and hence the classical analysis using moment generating functions do not work through and hence new analytical tools are in demand.

Our result makes several contributions upon the previous works. First, we conclude that in the one-dimensional case where one denotes , a one-sided maximal inequality can be concluded that, roughly,


where the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. In above and the following, we allow the numerical constant to change from paragraph to paragraph. This generalizes the bound of Lesigne & Volnỳ (2001) and Fan et al. (2012), where both groups of authors only consider the case in the independent and martingale difference sequence cases, separately. See also the more recent paper Fan et al. (2017) for similar concentration under a slightly weaker condition. In fact, we also know that the inequality (1.2) is optimal in the sense that it cannot be further improved for a class of martingale difference sequences that satisfy the exponential moment condition (1.1).

Secondly for the general dimension case, applying (1.2) as well as a dimension-reduction argument for vector martingales (Kallenberg & Sztencel, 1991; Hayes, 2005; Lee et al., 2016) allows us to conclude a one-sided bound on its Euclidean norm: under (1.1) we have


where analogously, the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. To our best knowledge, this provides a first concentration result for vector-valued martingales with unbounded martingale differences under the weak exponential-type condition (1.1).

Concentration results of (1.2) and (1.3

) potentially see many applications in probability and statsitcs, including the rate of convergence of martingales, the consistency of nonparametric regression estimation with errors of martingale difference sequence (see

Laib (1999)), as well as online stochastic gradient algorithms for parameters estimation in linear models and PCA (Li et al., 2018).

2 Orlicz space and Orlicz norm

In this subsection, we briefly revisit the properties of Orlicz space and its -norm that are mostly relevant. Readers who are interested in an exposure of Orlicz space from a Banach space point of view are referred to Ledoux & Talagrand (2013).

Let be the set of nonnegative real numbers. Consider the Orlicz space of -valued random vector which lives in the probability space such that some . Let be a nondecreasing convex function with and , and equip the Orlicz space with the norm

One calls the Orlicz- norm. In special, random vector has an Orlicz- norm defined as Orlicz- norm of as a scalar-valued random variable.

In this note, we are interested in the exponential-tailed distributions that corresponds to a family of functions: , , in which case the corresponding Orlicz space is the collection of random variables with exponential moments . 111 Rigorously speaking, when is not convex when is in a neighborhood of 0. In this case, one can let the function be

for some large enough, so that the function satisfies the condition. We choose not to adopt this definition of simply for clarity of presentation.

3 One dimensional result

We state our first main result that concludes the right-tailed bound (1.2) under a slightly more general condition that forms a supermartingale difference sequence.

Theorem 1.

Let be given. Assume that is a sequence of supermartingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,



We make several remarks on Theorem 1, as follows.

  1. By replacing by a larger value in (3.1) of Theorem 1, one may rediscover essentially Theorem 2.1 in Fan et al. (2012) which includes bound (1.1) of Lesigne & Volnỳ (2001) as a special case . 222 The work Fan et al. (2012) assumes a slightly more general condition . Nevertheless, our result does not lose any generality in general, since can be absorbed into the Orlicz- norm as a polylogarithmic factor. In summary, Theorem 2.1 of Fan et al. (2012) would provide a bound that depends on the maximum of , while our new bound sharpens such bound of Fan et al. (2012) and depends only on the Orlicz- norm of the martingale differences in terms of their squared sum. It turns out that the sharpened bound is more desirable to obtain useful upper bounds in many statistical applications.

  2. Theorem 2.1 in Fan et al. (2012) is optimal in the sense that a counterexample that has the right hand of (3.1) as the lower bound (up to a constant factor in the exponent), and forbids the existence of a sharper bound for the martingale difference sequence class. Since our result generalizes their Theorem 2.1, one may apply the same counterexample and conclude the optimality of our bound. See more in the next paragraph.

Optimality of our result

To claim optimality we note that (3.1) implies, for the special case and each ,


which is as for some . In the mean time, Fan et al. (2012) generalizes the counterexample in Lesigne & Volnỳ (2001) where, in our terminology of -norm, Theorem 2.1 of Fan et al. (2012) provides for each an ergodic sequence of martingale differences and a sequence of positives such that for all sufficiently large,

Comparing the last equation with (3.2), we conclude the optimality of our result.

Comparison with conditional weak exponential-type conditions

If we pose the additional assumption that ’s satisfy (1.1) in the conditional sense, the martingale concentration inequality can be further improved. Taking the example where and , if one poses a slightly stronger condition


i.e. the martingale differences are scalar-valued and conditionally subgaussian random variables, and one may conclude from the Hoeffding’s concentration inequality (Wainwright, 2015)


Similar bound can be derived for sub-exponential variables. Observe that the power of the term in the exponent of (3.4) is 1, and instead, our bound in (1.2) has an exponent of and is hence worse. Fortunately, to obtain an error probability both inequalities give a cut-off point up to a different polylogarithmic factor of , and these two cut-off points are equivalent if these factors are ignored.

4 Proof of Theorem 1

To prove our main result for the one-dimensional case, Theorem 1, we will use a maxima version of the classical Azuma-Hoeffding’s inequality proposed by Laib (1999) for bounded martingale differences, and then apply an argument of Lesigne & Volnỳ (2001) and Fan et al. (2012) to truncate the tail and analyze the bounded and unbounded pieces separately.

  1. First of all, for the sake of simplicity and with no loss of generality, throughout the following proof of Theorem 1 we shall pose the following extra condition


    In other words, under the additional (4.1) condition proving (3.1) reduces to showing


    This can be made more clear from the following rescaling argument: one can put in the left of (4.2) in the place of , and in the place of , the left hand of (3.1) is just

    which, by (4.2), is upper-bounded by

    proving (3.1).

  2. We apply a truncating argument used in Lesigne & Volnỳ (2001) and later in Fan et al. (2012). Let be arbitrary, and we define


    Since is -measurable, and are two martingale difference sequences with respect to , and let be defined as


    Since are martingale differences we have that is predictable with , and hence for any ,


    In the following, we analyze the tail bounds for and separately (Lesigne & Volnỳ, 2001; Fan et al., 2012).

  3. To obtain the first bound, we recap Laib’s inequality as follows:

    Lemma 1.

    (Laib, 1999) Let be a real-valued martingale difference sequence with respect to some filtration , i.e. , and the essential norm is finite. Then for an arbitrary and ,


    (4.7) generalizes the folklore Azuma-Hoeffding’s inequality, where the latter can be concluded from

    The proof of Lemma 1 is given in Laib (1999). Recall our extra condition (4.1), then from the definition of in (4.3) that , the desired bound follows immediately from Laib’s inequality in Lemma 1 by setting :


    To obtain the tail bound of we only need to show




    from which, Doob’s martingale inequality (Durrett, 2010, §5) implies immediately that


    To prove (4.9), first recall from the definition of in (4.4) that

    Recall from the property of conditional expectation (Durrett, 2010) that for any random variable and a -algebra

    where the last equality is due to the second-moment formula for nonnegative random variable (Durrett, 2010). Plugging in and we have


    where the last inequality is due to Markov’s inequality that for all


    It can be shown from Calculus I that the function is decreasing in and is increasing in , where was earlier defined in (4.10) (Fan et al., 2012). If we have


    If , we have by setting as in above


    Combining (4.12) with the two above displays (4.14) and (4.15) we obtain

    completing the proof of (4.9) and hence (4.11).

  4. Putting the pieces together: combining (4.6), (4.8) and (4.11) we obtain for an arbitrary that


    We choose as, by making the exponents equal in above,

    Plugging this back into (4.16) we obtain


    where we plugged in the expression of in (4.10). We can further simplify the square-bracket prefactor in the last line of (4.17) which can be tightly bounded by

    where we applied a few elementary algebraic inequalities, including and a variant of Jensen’s inequality: for one has for all (where the equality holds for ). Thus, (4.2) is concluded by noticing the relation (4.5) and setting in the place of , which hence proves Theorem 1 via the argument in (i) in our proof.

5 General dimensions result

In many applications we are often more interested in a concentration tail inequality for vector-valued martingales. To proceed, we need a so-called dimension reduction lemma for Hilbert space which is inspired from its continuum version proved in Kallenberg & Sztencel (1991). We argue that it is sufficient to prove it for the case . Writing in terms of martingale differences, we have

Lemma 2 (Dimension reduction lemma for or Hilbert space).

Let be a -valued martingale difference sequence with respect to filtration , i.e. for each , . Then there exists a -valued martingale difference sequence with respect to the same filtration so that for each


For a proof of Lemma 2, see Lemma 2.3 of Lee et al. (2016), which proves the lemma on a generic Hilbert space.

Theorem 2.

Let be given. Assume that is a sequence of -valued martingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,


Theorem 2 explicitly argues that the martingale inequality hold with the dimension-free property: the bound on the right hand of (5.2) is independent of dimension and only depends on the martingale differences via .

Proof of Theorem 2.

From Lemma 2 we have a -valued martingale difference sequence such that for each , (5.1) holds. It is straightforward to justify for each . Therefore to prove (5.2), we only need to show


Note by definition, for . Applying Theorem 1 to both and as supermartingale difference sequences, we have for

Thus (5.3) follows from union bound. ∎

It remains an open question if similar concentration inequalities hold for polynomial-tail martingale differences where satisfies for ? In the case where ’s are independent, Theorem 6.21 of Ledoux & Talagrand (2013) gives a bound on the sum of vectors that can be turned to a tail inequality, but to our best knowledge a general result for martingale differences (even just in one dimension) is not available and left for future research.


The author thank Xiequan Fan for valuable comments on an earlier version of this note.


  • Durrett (2010) Durrett, R. (2010). Probability: Theory and Examples (4th edition). Cambridge University Press.
  • Fan et al. (2012) Fan, X., Grama, I., & Liu, Q. (2012). Large deviation exponential inequalities for supermartingales. Electronic Communications in Probability, 17.
  • Fan et al. (2017) Fan, X., Grama, I., & Liu, Q. (2017). Deviation inequalities for martingales with applications. Journal of Mathematical Analysis and Applications, 448(1), 538–566.
  • Hayes (2005) Hayes, T. P. (2005). A large-deviation inequality for vector-valued martingales.
  • Kallenberg & Sztencel (1991) Kallenberg, O. & Sztencel, R. (1991). Some dimension-free features of vector-valued martingales. Probability Theory and Related Fields, 88(2), 215–247.
  • Laib (1999) Laib, N. (1999). Exponential-type inequalities for martingale difference sequences. application to nonparametric regression estimation. Communications in Statistics-Theory and Methods, 28(7), 1565–1576.
  • Ledoux & Talagrand (2013) Ledoux, M. & Talagrand, M. (2013). Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media.
  • Lee et al. (2016) Lee, J. R., Peres, Y., & Smart, C. K. (2016). A gaussian upper bound for martingale small-ball probabilities. The Annals of Probability, 44(6), 4184–4197.
  • Lesigne & Volnỳ (2001) Lesigne, E. & Volnỳ, D. (2001). Large deviations for martingales. Stochastic processes and their applications, 96(1), 143–159.
  • Li et al. (2018) Li, C. J., Wang, M., Liu, H., & Zhang, T. (2018). Near-optimal stochastic approximation for online principal component estimation. Mathematical Programming, 167(1), 75–97.
  • Wainwright (2015) Wainwright, M. (2015). Basic tail and concentration bounds. URL: https://www.stat.berkeley.edu/ mjwain/stat210b/Chap2_TailBounds_Jan22_2015.pdf.