1 Introduction
This note concerns the following problem: let be a vectormartingale difference sequence that take place on the dimensional Euclidean space , where . Assume that has the following weak exponentialtype tail condition: for some and all we have for some scalar
(1.1) 
and hence by Markov’s inequality their tails satisfy for each
then what can we conclude about the tail probability of the random variable
? Note for under the condition (1.1), the moment generating functions
are in general not available, and hence the classical analysis using moment generating functions do not work through and hence new analytical tools are in demand.Our result makes several contributions upon the previous works. First, we conclude that in the onedimensional case where one denotes , a onesided maximal inequality can be concluded that, roughly,
(1.2) 
where the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. In above and the following, we allow the numerical constant to change from paragraph to paragraph. This generalizes the bound of Lesigne & Volnỳ (2001) and Fan et al. (2012), where both groups of authors only consider the case in the independent and martingale difference sequence cases, separately. See also the more recent paper Fan et al. (2017) for similar concentration under a slightly weaker condition. In fact, we also know that the inequality (1.2) is optimal in the sense that it cannot be further improved for a class of martingale difference sequences that satisfy the exponential moment condition (1.1).
Secondly for the general dimension case, applying (1.2) as well as a dimensionreduction argument for vector martingales (Kallenberg & Sztencel, 1991; Hayes, 2005; Lee et al., 2016) allows us to conclude a onesided bound on its Euclidean norm: under (1.1) we have
(1.3) 
where analogously, the factor is solely dependent on for any fixed and grows linearly in , and is a positive numerical constant. To our best knowledge, this provides a first concentration result for vectorvalued martingales with unbounded martingale differences under the weak exponentialtype condition (1.1).
Concentration results of (1.2) and (1.3
) potentially see many applications in probability and statsitcs, including the rate of convergence of martingales, the consistency of nonparametric regression estimation with errors of martingale difference sequence (see
Laib (1999)), as well as online stochastic gradient algorithms for parameters estimation in linear models and PCA (Li et al., 2018).2 Orlicz space and Orlicz norm
In this subsection, we briefly revisit the properties of Orlicz space and its norm that are mostly relevant. Readers who are interested in an exposure of Orlicz space from a Banach space point of view are referred to Ledoux & Talagrand (2013).
Let be the set of nonnegative real numbers. Consider the Orlicz space of valued random vector which lives in the probability space such that some . Let be a nondecreasing convex function with and , and equip the Orlicz space with the norm
One calls the Orlicz norm. In special, random vector has an Orlicz norm defined as Orlicz norm of as a scalarvalued random variable.
In this note, we are interested in the exponentialtailed distributions that corresponds to a family of functions: , , in which case the corresponding Orlicz space is the collection of random variables with exponential moments . ^{1}^{1}1 Rigorously speaking, when is not convex when is in a neighborhood of 0. In this case, one can let the function be
3 One dimensional result
We state our first main result that concludes the righttailed bound (1.2) under a slightly more general condition that forms a supermartingale difference sequence.
Theorem 1.
Let be given. Assume that is a sequence of supermartingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,
(3.1) 
Remark
We make several remarks on Theorem 1, as follows.

By replacing by a larger value in (3.1) of Theorem 1, one may rediscover essentially Theorem 2.1 in Fan et al. (2012) which includes bound (1.1) of Lesigne & Volnỳ (2001) as a special case . ^{2}^{2}2 The work Fan et al. (2012) assumes a slightly more general condition . Nevertheless, our result does not lose any generality in general, since can be absorbed into the Orlicz norm as a polylogarithmic factor. In summary, Theorem 2.1 of Fan et al. (2012) would provide a bound that depends on the maximum of , while our new bound sharpens such bound of Fan et al. (2012) and depends only on the Orlicz norm of the martingale differences in terms of their squared sum. It turns out that the sharpened bound is more desirable to obtain useful upper bounds in many statistical applications.

Theorem 2.1 in Fan et al. (2012) is optimal in the sense that a counterexample that has the right hand of (3.1) as the lower bound (up to a constant factor in the exponent), and forbids the existence of a sharper bound for the martingale difference sequence class. Since our result generalizes their Theorem 2.1, one may apply the same counterexample and conclude the optimality of our bound. See more in the next paragraph.
Optimality of our result
To claim optimality we note that (3.1) implies, for the special case and each ,
(3.2) 
which is as for some . In the mean time, Fan et al. (2012) generalizes the counterexample in Lesigne & Volnỳ (2001) where, in our terminology of norm, Theorem 2.1 of Fan et al. (2012) provides for each an ergodic sequence of martingale differences and a sequence of positives such that for all sufficiently large,
Comparing the last equation with (3.2), we conclude the optimality of our result.
Comparison with conditional weak exponentialtype conditions
If we pose the additional assumption that ’s satisfy (1.1) in the conditional sense, the martingale concentration inequality can be further improved. Taking the example where and , if one poses a slightly stronger condition
(3.3) 
i.e. the martingale differences are scalarvalued and conditionally subgaussian random variables, and one may conclude from the Hoeffding’s concentration inequality (Wainwright, 2015)
(3.4) 
Similar bound can be derived for subexponential variables. Observe that the power of the term in the exponent of (3.4) is 1, and instead, our bound in (1.2) has an exponent of and is hence worse. Fortunately, to obtain an error probability both inequalities give a cutoff point up to a different polylogarithmic factor of , and these two cutoff points are equivalent if these factors are ignored.
4 Proof of Theorem 1
To prove our main result for the onedimensional case, Theorem 1, we will use a maxima version of the classical AzumaHoeffding’s inequality proposed by Laib (1999) for bounded martingale differences, and then apply an argument of Lesigne & Volnỳ (2001) and Fan et al. (2012) to truncate the tail and analyze the bounded and unbounded pieces separately.

First of all, for the sake of simplicity and with no loss of generality, throughout the following proof of Theorem 1 we shall pose the following extra condition
(4.1) In other words, under the additional (4.1) condition proving (3.1) reduces to showing
(4.2) This can be made more clear from the following rescaling argument: one can put in the left of (4.2) in the place of , and in the place of , the left hand of (3.1) is just
which, by (4.2), is upperbounded by
proving (3.1).

We apply a truncating argument used in Lesigne & Volnỳ (2001) and later in Fan et al. (2012). Let be arbitrary, and we define
(4.3) (4.4) Since is measurable, and are two martingale difference sequences with respect to , and let be defined as
(4.5) Since are martingale differences we have that is predictable with , and hence for any ,
(4.6) In the following, we analyze the tail bounds for and separately (Lesigne & Volnỳ, 2001; Fan et al., 2012).

To obtain the first bound, we recap Laib’s inequality as follows:
Lemma 1.
(Laib, 1999) Let be a realvalued martingale difference sequence with respect to some filtration , i.e. , and the essential norm is finite. Then for an arbitrary and ,
(4.7) (4.7) generalizes the folklore AzumaHoeffding’s inequality, where the latter can be concluded from
The proof of Lemma 1 is given in Laib (1999). Recall our extra condition (4.1), then from the definition of in (4.3) that , the desired bound follows immediately from Laib’s inequality in Lemma 1 by setting :
(4.8) To obtain the tail bound of we only need to show
(4.9) where
(4.10) from which, Doob’s martingale inequality (Durrett, 2010, §5) implies immediately that
(4.11) To prove (4.9), first recall from the definition of in (4.4) that
Recall from the property of conditional expectation (Durrett, 2010) that for any random variable and a algebra
where the last equality is due to the secondmoment formula for nonnegative random variable (Durrett, 2010). Plugging in and we have
(4.12) where the last inequality is due to Markov’s inequality that for all
(4.13) It can be shown from Calculus I that the function is decreasing in and is increasing in , where was earlier defined in (4.10) (Fan et al., 2012). If we have
(4.14) If , we have by setting as in above
(4.15) 
Putting the pieces together: combining (4.6), (4.8) and (4.11) we obtain for an arbitrary that
(4.16) We choose as, by making the exponents equal in above,
Plugging this back into (4.16) we obtain
(4.17) where we plugged in the expression of in (4.10). We can further simplify the squarebracket prefactor in the last line of (4.17) which can be tightly bounded by
where we applied a few elementary algebraic inequalities, including and a variant of Jensen’s inequality: for one has for all (where the equality holds for ). Thus, (4.2) is concluded by noticing the relation (4.5) and setting in the place of , which hence proves Theorem 1 via the argument in (i) in our proof.
5 General dimensions result
In many applications we are often more interested in a concentration tail inequality for vectorvalued martingales. To proceed, we need a socalled dimension reduction lemma for Hilbert space which is inspired from its continuum version proved in Kallenberg & Sztencel (1991). We argue that it is sufficient to prove it for the case . Writing in terms of martingale differences, we have
Lemma 2 (Dimension reduction lemma for or Hilbert space).
Let be a valued martingale difference sequence with respect to filtration , i.e. for each , . Then there exists a valued martingale difference sequence with respect to the same filtration so that for each
(5.1) 
For a proof of Lemma 2, see Lemma 2.3 of Lee et al. (2016), which proves the lemma on a generic Hilbert space.
Theorem 2.
Let be given. Assume that is a sequence of valued martingale differences with respect to , i.e. , and it satisfies for each . Then for an arbitrary and ,
(5.2) 
Theorem 2 explicitly argues that the martingale inequality hold with the dimensionfree property: the bound on the right hand of (5.2) is independent of dimension and only depends on the martingale differences via .
Proof of Theorem 2.
It remains an open question if similar concentration inequalities hold for polynomialtail martingale differences where satisfies for ? In the case where ’s are independent, Theorem 6.21 of Ledoux & Talagrand (2013) gives a bound on the sum of vectors that can be turned to a tail inequality, but to our best knowledge a general result for martingale differences (even just in one dimension) is not available and left for future research.
Acknowledgement
The author thank Xiequan Fan for valuable comments on an earlier version of this note.
References
 Durrett (2010) Durrett, R. (2010). Probability: Theory and Examples (4th edition). Cambridge University Press.
 Fan et al. (2012) Fan, X., Grama, I., & Liu, Q. (2012). Large deviation exponential inequalities for supermartingales. Electronic Communications in Probability, 17.
 Fan et al. (2017) Fan, X., Grama, I., & Liu, Q. (2017). Deviation inequalities for martingales with applications. Journal of Mathematical Analysis and Applications, 448(1), 538–566.
 Hayes (2005) Hayes, T. P. (2005). A largedeviation inequality for vectorvalued martingales.
 Kallenberg & Sztencel (1991) Kallenberg, O. & Sztencel, R. (1991). Some dimensionfree features of vectorvalued martingales. Probability Theory and Related Fields, 88(2), 215–247.
 Laib (1999) Laib, N. (1999). Exponentialtype inequalities for martingale difference sequences. application to nonparametric regression estimation. Communications in StatisticsTheory and Methods, 28(7), 1565–1576.
 Ledoux & Talagrand (2013) Ledoux, M. & Talagrand, M. (2013). Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media.
 Lee et al. (2016) Lee, J. R., Peres, Y., & Smart, C. K. (2016). A gaussian upper bound for martingale smallball probabilities. The Annals of Probability, 44(6), 4184–4197.
 Lesigne & Volnỳ (2001) Lesigne, E. & Volnỳ, D. (2001). Large deviations for martingales. Stochastic processes and their applications, 96(1), 143–159.
 Li et al. (2018) Li, C. J., Wang, M., Liu, H., & Zhang, T. (2018). Nearoptimal stochastic approximation for online principal component estimation. Mathematical Programming, 167(1), 75–97.
 Wainwright (2015) Wainwright, M. (2015). Basic tail and concentration bounds. URL: https://www.stat.berkeley.edu/ mjwain/stat210b/Chap2_TailBounds_Jan22_2015.pdf.
Comments
There are no comments yet.