This note concerns the following problem: let be a vector-martingale difference sequence that take place on the -dimensional Euclidean space , where . Assume that has the following weak exponential-type tail condition: for some and all we have for some scalar
and hence by Markov’s inequality their tails satisfy for each
), the moment generating functionsare in general not available, and hence the classical analysis using moment generating functions do not work through and hence new analytical tools are in demand.
Our result makes several contributions upon the previous works. First, we conclude that in the one-dimensional case where one denotes , a one-sided maximal inequality can be concluded that, roughly,
where the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. In above and the following, we allow the numerical constant to change from paragraph to paragraph. This generalizes the bound of Lesigne & Volnỳ (2001) and Fan et al. (2012), where both groups of authors only consider the case in the independent and martingale difference sequence cases, separately. See also the more recent paper Fan et al. (2017) for similar concentration under a slightly weaker condition. In fact, we also know that the inequality (1.2) is optimal in the sense that it cannot be further improved for a class of martingale difference sequences that satisfy the exponential moment condition (1.1).
Secondly for the general dimension case, applying (1.2) as well as a dimension-reduction argument for vector martingales (Kallenberg & Sztencel, 1991; Hayes, 2005; Lee et al., 2016) allows us to conclude a one-sided bound on its Euclidean norm: under (1.1) we have
where analogously, the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. To our best knowledge, this provides a first concentration result for vector-valued martingales with unbounded martingale differences under the weak exponential-type condition (1.1).
) potentially see many applications in probability and statsitcs, including the rate of convergence of martingales, the consistency of nonparametric regression estimation with errors of martingale difference sequence (seeLaib (1999)), as well as online stochastic gradient algorithms for parameters estimation in linear models and PCA (Li et al., 2018).
2 Orlicz space and Orlicz norm
In this subsection, we briefly revisit the properties of Orlicz space and its -norm that are mostly relevant. Readers who are interested in an exposure of Orlicz space from a Banach space point of view are referred to Ledoux & Talagrand (2013).
Let be the set of nonnegative real numbers. Consider the Orlicz space of -valued random vector which lives in the probability space such that some . Let be a nondecreasing convex function with and , and equip the Orlicz space with the norm
One calls the Orlicz- norm. In special, random vector has an Orlicz- norm defined as Orlicz- norm of as a scalar-valued random variable.
In this note, we are interested in the exponential-tailed distributions that corresponds to a family of functions: , , in which case the corresponding Orlicz space is the collection of random variables with exponential moments .
Rigorously speaking, when is not convex when is in a neighborhood of 0.
In this case, one can let the function be
3 One dimensional result
We state our first main result that concludes the right-tailed bound (1.2) under a slightly more general condition that forms a supermartingale difference sequence.
Let be given. Assume that is a sequence of supermartingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,
We make several remarks on Theorem 1, as follows.
By replacing by a larger value in (3.1) of Theorem 1, one may rediscover essentially Theorem 2.1 in Fan et al. (2012) which includes bound (1.1) of Lesigne & Volnỳ (2001) as a special case . 222 The work Fan et al. (2012) assumes a slightly more general condition . Nevertheless, our result does not lose any generality in general, since can be absorbed into the Orlicz- norm as a polylogarithmic factor. In summary, Theorem 2.1 of Fan et al. (2012) would provide a bound that depends on the maximum of , while our new bound sharpens such bound of Fan et al. (2012) and depends only on the Orlicz- norm of the martingale differences in terms of their squared sum. It turns out that the sharpened bound is more desirable to obtain useful upper bounds in many statistical applications.
Theorem 2.1 in Fan et al. (2012) is optimal in the sense that a counterexample that has the right hand of (3.1) as the lower bound (up to a constant factor in the exponent), and forbids the existence of a sharper bound for the martingale difference sequence class. Since our result generalizes their Theorem 2.1, one may apply the same counterexample and conclude the optimality of our bound. See more in the next paragraph.
Optimality of our result
To claim optimality we note that (3.1) implies, for the special case and each ,
which is as for some . In the mean time, Fan et al. (2012) generalizes the counterexample in Lesigne & Volnỳ (2001) where, in our terminology of -norm, Theorem 2.1 of Fan et al. (2012) provides for each an ergodic sequence of martingale differences and a sequence of positives such that for all sufficiently large,
Comparing the last equation with (3.2), we conclude the optimality of our result.
Comparison with conditional weak exponential-type conditions
If we pose the additional assumption that ’s satisfy (1.1) in the conditional sense, the martingale concentration inequality can be further improved. Taking the example where and , if one poses a slightly stronger condition
i.e. the martingale differences are scalar-valued and conditionally subgaussian random variables, and one may conclude from the Hoeffding’s concentration inequality (Wainwright, 2015)
Similar bound can be derived for sub-exponential variables. Observe that the power of the term in the exponent of (3.4) is 1, and instead, our bound in (1.2) has an exponent of and is hence worse. Fortunately, to obtain an error probability both inequalities give a cut-off point up to a different polylogarithmic factor of , and these two cut-off points are equivalent if these factors are ignored.
4 Proof of Theorem 1
To prove our main result for the one-dimensional case, Theorem 1, we will use a maxima version of the classical Azuma-Hoeffding’s inequality proposed by Laib (1999) for bounded martingale differences, and then apply an argument of Lesigne & Volnỳ (2001) and Fan et al. (2012) to truncate the tail and analyze the bounded and unbounded pieces separately.
First of all, for the sake of simplicity and with no loss of generality, throughout the following proof of Theorem 1 we shall pose the following extra condition
which, by (4.2), is upper-bounded by
Since is -measurable, and are two martingale difference sequences with respect to , and let be defined as
Since are martingale differences we have that is predictable with , and hence for any ,
To obtain the first bound, we recap Laib’s inequality as follows:
(Laib, 1999) Let be a real-valued martingale difference sequence with respect to some filtration , i.e. , and the essential norm is finite. Then for an arbitrary and ,
(4.7) generalizes the folklore Azuma-Hoeffding’s inequality, where the latter can be concluded from
The proof of Lemma 1 is given in Laib (1999). Recall our extra condition (4.1), then from the definition of in (4.3) that , the desired bound follows immediately from Laib’s inequality in Lemma 1 by setting :
To obtain the tail bound of we only need to show
from which, Doob’s martingale inequality (Durrett, 2010, §5) implies immediately that
Recall from the property of conditional expectation (Durrett, 2010) that for any random variable and a -algebra
where the last equality is due to the second-moment formula for nonnegative random variable (Durrett, 2010). Plugging in and we have
where the last inequality is due to Markov’s inequality that for all
If , we have by setting as in above
We choose as, by making the exponents equal in above,
Plugging this back into (4.16) we obtain
where we applied a few elementary algebraic inequalities, including and a variant of Jensen’s inequality: for one has for all (where the equality holds for ). Thus, (4.2) is concluded by noticing the relation (4.5) and setting in the place of , which hence proves Theorem 1 via the argument in (i) in our proof.
5 General dimensions result
In many applications we are often more interested in a concentration tail inequality for vector-valued martingales. To proceed, we need a so-called dimension reduction lemma for Hilbert space which is inspired from its continuum version proved in Kallenberg & Sztencel (1991). We argue that it is sufficient to prove it for the case . Writing in terms of martingale differences, we have
Lemma 2 (Dimension reduction lemma for or Hilbert space).
Let be a -valued martingale difference sequence with respect to filtration , i.e. for each , . Then there exists a -valued martingale difference sequence with respect to the same filtration so that for each
Let be given. Assume that is a sequence of -valued martingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,
Theorem 2 explicitly argues that the martingale inequality hold with the dimension-free property: the bound on the right hand of (5.2) is independent of dimension and only depends on the martingale differences via .
Proof of Theorem 2.
It remains an open question if similar concentration inequalities hold for polynomial-tail martingale differences where satisfies for ? In the case where ’s are independent, Theorem 6.21 of Ledoux & Talagrand (2013) gives a bound on the sum of vectors that can be turned to a tail inequality, but to our best knowledge a general result for martingale differences (even just in one dimension) is not available and left for future research.
The author thank Xiequan Fan for valuable comments on an earlier version of this note.
- Durrett (2010) Durrett, R. (2010). Probability: Theory and Examples (4th edition). Cambridge University Press.
- Fan et al. (2012) Fan, X., Grama, I., & Liu, Q. (2012). Large deviation exponential inequalities for supermartingales. Electronic Communications in Probability, 17.
- Fan et al. (2017) Fan, X., Grama, I., & Liu, Q. (2017). Deviation inequalities for martingales with applications. Journal of Mathematical Analysis and Applications, 448(1), 538–566.
- Hayes (2005) Hayes, T. P. (2005). A large-deviation inequality for vector-valued martingales.
- Kallenberg & Sztencel (1991) Kallenberg, O. & Sztencel, R. (1991). Some dimension-free features of vector-valued martingales. Probability Theory and Related Fields, 88(2), 215–247.
- Laib (1999) Laib, N. (1999). Exponential-type inequalities for martingale difference sequences. application to nonparametric regression estimation. Communications in Statistics-Theory and Methods, 28(7), 1565–1576.
- Ledoux & Talagrand (2013) Ledoux, M. & Talagrand, M. (2013). Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media.
- Lee et al. (2016) Lee, J. R., Peres, Y., & Smart, C. K. (2016). A gaussian upper bound for martingale small-ball probabilities. The Annals of Probability, 44(6), 4184–4197.
- Lesigne & Volnỳ (2001) Lesigne, E. & Volnỳ, D. (2001). Large deviations for martingales. Stochastic processes and their applications, 96(1), 143–159.
- Li et al. (2018) Li, C. J., Wang, M., Liu, H., & Zhang, T. (2018). Near-optimal stochastic approximation for online principal component estimation. Mathematical Programming, 167(1), 75–97.
- Wainwright (2015) Wainwright, M. (2015). Basic tail and concentration bounds. URL: https://www.stat.berkeley.edu/ mjwain/stat210b/Chap2_TailBounds_Jan22_2015.pdf.