Unlike the real line, the real space , for , is not canonically ordered. As a consequence, such fundamental concepts as quantile and distribution functions, which are strongly related to the ordering of the observation space, and their empirical counterparts—ranks and empirical quantiles—playing, in dimension , a fundamental role in statistical inference, do not canonically extend to dimension .
Of course, a classical concept of distribution function—the familiar one, based on marginal orderings—does exist. That concept, from a probabilistic point of view, does the job in the sense of characterizing the underlying distribution. However, the corresponding quantile function does not mean much (see, e.g., Genest and Rivest (2001)), and the corresponding empirical versions (related to their population counterparts via a Glivenko-Cantelli result) do not possess any of the properties that make them successful inferential tools in dimension .
That observation about traditional multivariate distribution functions is not new: palliating the lack of a “natural” ordering of —hence, defining statistically sound concepts of distribution and quantile functions—has remained an open problem for more than half a century, and has generated an abundant literature that includes, among others, the theory of copulas and the theory of statistical depth.
A number of most ingenious solutions have been proposed, each of them extending some chosen features of the well-understood univariate concepts, with which they coincide for . Coinciding, for , with the classical concepts obviously is important, but it is hardly sufficient for qualifying as a statistically pertinent multivariate extension. For statisticians, distribution and quantile functions are not just probabilistic notions: above all, their empirical versions (empirical quantiles and ranks) constitute fundamental tools for inference. A multivariate extension yielding quantiles and ranks that do not match, in dimension , the properties that make traditional ranks natural and successful tools for inference in dimension one is not a statistically sound extension.
The approach we are adopting here is placing those inferential concerns at the heart of the problem.
1.1.1 Ranks and rank-based inference
To facilitate the exposition, let us focus on ranks and their role in testing problems. Rank-based methods naturally enter the picture in the context of semiparametric statistical models or experiments under which the distribution of some observation (with real-valued ’s), besides the finite-dimensional parameter of interest , also depends on the unspecified density
of some unobserved underlying residual univariate white noise,, say. More precisely, assume that iff the -residuals are i.i.d. with density (although i.i.d.-ness can be relaxed into exchangeability, we will stick to i.i.d.-ness). In such models—call them i.i.d. noise models—testing (with unspecified ) reduces to the problem of testing that is i.i.d. white noise with unspecified density . Typical examples are linear models, with ( a
-vector of covariates,
), or autoregressive models, with(where denotes time and ), etc.
Invariance arguments suggest tests based on the ranks of . Those tests are distribution-free under . That distribution-freeness property is often considered as the trademark and main virtue of (univariate) ranks; it guarantees the validity and similarity of rank-based procedures (for testing ), irrespective of the actual density .
Distribution-freeness alone is not sufficient, though, for explaining the success of rank tests, and efficiency is no less important: other distribution-free methods indeed can be constructed, such as sign or runs tests, that do not perform as well (in i.i.d. noise models) as the rank-based ones. When Wilcoxon’s two-sample location test (Wilcoxon 1945) was introduced on purely heuristic grounds, it was not expected to be particularly powerful. Its unexpectedly high efficiency (compared to the corresponding Student test) soon was noticed, though, and confirmed, if not explained, in Hodges and Lehmann’s famous “0.864 paper” (Hodges and Lehmann 1956). Further surprising results on the power of rank-based methods came with the celebrated Chernoff and Savage (1958) result that normal-score rank-based tests, in two-sample location or regression, are uniformly more powerful (in a local asymptotic sense) than their Student competitors. Similar results have been established later on (Hallin (1994) and Hallin and Tribel (2000) in a time series context; Paindaveine (2006) and Hallin and Paindaveine (2008) in an elliptical context), where rank-based methods are shown to outperform their traditional counterparts.
A general theoretical explanation for this unexpected efficiency of ranks was provided in Hallin and Werker (2003). In the semiparametric context of i.i.d. noise models involving some unspecified density , indeed, the best performance one can hope for, when performing inference on the parameter of interest , is semiparametric efficiency—as developed in the classical monograph by Bickel, Klaassen, Ritov and Wellner (1993). The traditional parametric information bounds (related to the Fisher information matrices) there are replaced with semiparametric efficiency bounds which in general are strictly less favorable—the unavoidable cost of not knowing the actual . The main result in Hallin and Werker (2003) shows that, in i.i.d. noise models, those semiparametric efficiency bounds still can be reached by means of rank-based methods. This is what we refer to as the semiparametric efficiency preservation property of ranks: intuitively (we refer to Hallin and Werker (2003) for a more rigorous and formal statement), this means that, in a local and asymptotic sense, all the information about the parameter of interest is contained in the residual ranks, while the corresponding order statistic of residuals only contains information on the nuisance .
Summing up, the theoretical reasons for the success of ranks for univariate statistical inference in semiparametric models are twofold:
(distribution-freeness, a validity-related exact, finite-sample property): the vector of (-residual) ranks is distribution-free over the (nonparametric) family , where stands for the family of nonvanishing densities over () (see Section 2 for a more precise definition), and
(semiparametric efficiency preservation, a local and asymptotic efficiency property): the semiparametric efficiency bound (at arbitrary ) can be reached, under , via rank-based procedures (tests that are measurable with respect to the ranks of -residuals ).
The key property behind (HW) is the more fundamental maximal invariance property (see Section 7.1 and Chapter 6 of Lehmann and Romano (2005) for definitions and details) of ranks:
(an exact, finite-sample property) the ranks of -residuals are maximal invariant with respect to a class of transformations of generating the fixed- submodel (that is, yielding a unique orbit in the family of fixed- model distributions).
In Hallin and Werker (2003), the generating class happens to be a group—something which (see Section 7) will not be the case in dimension . That group structure, however, plays no role in their proofs, which only require that, for any couple in , there exist a transformation in pushing forward to .
We refer to Section A.4 in Appendix I for more details on semiparametric efficiency preservation.
The (unessential) restriction, in (DF), to nonvanishing densities avoids trivial problems at the boundary of bounded supports, while (HW) (unlike (HW)) is tacitly restricted to the subset of densities satisfying the regularity conditions (uniform local asymptotic normality, etc.) required for semiparametric efficiency to make sense. Those conditions, however, depend on the model under study; in order to avoid specifying any , in this chapter, we focus on (HW).
Properties (DF) and (HW) are those a statistician would like to see satisfied, with and substituted for and , by the concept of ranks associated with the empirical counterpart of any sensible definition of a multivariate distribution function.
1.1.2 Multivariate ranks and the ordering of ,
The problem of ordering for , thus defining multivariate concepts of ranks, signs, empirical distribution functions and quantiles, is not new, and has a rather long history in statistics. Many concepts have been proposed in the literature, a complete list of which cannot be given here. Focusing again on ranks, four types of multivariate ranks, essentially, can be found:
(a) Componentwise ranks.
The idea of componentwise ranks goes back as far as Hodges (1955), Bickel (1965) or Puri and Sen (1966, 1967, 1969). It culminates in the monograph by Puri and Sen (1971), where inference procedures based on componentwise ranks are proposed, basically, for all classical problems of multivariate analysis; more recent references are Chaudhuri and Sengupta (1993), Nordhausen, Oja, and Tyler (2006), Segers, van den Akker, and Werker (2015), … to quote only a very few. Time-series testing methods based on the same ranks have been considered in Hallin, Ingenbleek, and Puri (1989). Componentwise ranks actually are intimately related to copula transforms, of which they constitute the empirical version: rather than solving the tricky problem of ordering, they bypass it by considering univariate marginal rankings. As a consequence, they crucially depend on the choice of a coordinate system. Unless the underlying distribution has independent components (see Nordhausen et al. (2009), Ilmonen and Paindaveine (2011), or Hallin and Mehta (2015)), componentwise ranks in general are not even asymptotically distribution-free. Nor are they invariant under any model-generating class of transformations; a transformation-retransformation approach has been proposed by Chakraborty and Chaudhuri (1996, 1998), which ensures affine-invariance—but the group of affine transformations is not a generating group in this context. As a consequence, neither (DF), (HW) nor (HW) are satisfied.
(b) Spatial ranks and signs. This class of multivariate ranks includes several very ingenuous, elegant and appealing concepts, proposed by several authors (Möttönen and Oja (1995); Möttönen et al. (1997); Chaudhuri (1996); Koltchinskii (1997); Oja and Randles, (2004), Oja (2010), and many others). Similar ideas have been developed by Choi and Marden (1997) and, more recently, in high dimension by Biswas, Mukhopadhyay and Ghosh (2014) and Chakraborthy and Chaudhuri (2014, 2017). We refer to Marden (1999), Oja (1999) or the monograph by Oja (2010) for a systematic exposition and exhaustive list of references. All those concepts are extending the traditional univariate ones. As a rule, however, they fail to achieve distribution-freeness (Biswas et al. (2014) is an exception, but fails on semiparametric efficiency). Their invariance properties at best extend to classes (actually, groups) of rotations, scale or affine transformations, which are not generating groups: neither (HW) nor (HW) are satisfied.
(c) Depth-based ranks. Those ranks have been considered in Liu (1992), Liu and Singh (1993), He and Wang (1997), Zuo and He (2006), Zuo and Serfling (2000), among others; see Serfling (2002, 2012) for a general introduction on statistical depth, Hallin et al. (2010) for the related concept of quantile, Lòpez-Pintado and Romo (2012) for functional extensions, Zuo (2018) for a state-of-the art survey in a regression context. Depth-based ranks, in general, are distribution-free, hence satisfy (DF). At best (except for the Monge-Kantorovich depth recently proposed by Chernozhukov et al. (2017), to be considered below), they also are affine-invariant; affine transformations, however, fail to be a generating group: neither (HW) nor (HW) hold.
(d) Mahalanobis ranks and signs/interdirections. When considered jointly with interdirections (Randles (1989)), lift interdirections (Oja and Paindaveine (2005)), Tyler angles or Mahalanobis signs (see Hallin and Paindaveine (2002a, c)), Mahalanobis ranks do satisfy both (DF) and (HW), hence (HW), but in elliptical models only—when is limited to the family of elliptical densities. There, they have been used, quite successfully, in a variety of multivariate models, including one-sample location (Hallin and Paindaveine 2002a),
-sample location (Um and Randles 1998), serial dependence (Hallin and Paindaveine 2002b), linear models with VARMA errors (Hallin and Paindaveine 2004a, 2005, 2006a), VAR order identification (Hallin and Paindaveine 2004b), shape (Hallin and Paindaveine 2006b; Hallin, Oja and Paindaveine 2006), homogeneity of scatter (Hallin and Paindaveine 2008), principal and common principal components (Hallin, Paindaveine and Verdebout 2010, 2013, 2014). Unfortunately, the tests developed in those references cease to be valid, and R-estimators no longer are root-consistent, under non-elliptical densities.
None of those multivariate rank concepts, thus, is enjoying properties (DF) and (HW)—except, but only over the class of elliptically symmetric distributions, the (pseudo)-Mahalanobis/elliptical ranks and signs. A few other concepts have been proposed as well, related to cone orderings (Belloni and Winkler 2011; Hamel and Kostner 2018), which require some subjective (or problem-specific) preliminary choices, and similarly fail to achieve (DF) and (HW).
The fact that, contrary to the real line , the real space for does not admit a canonical ordering places an essential difference between dimension and dimensions . Whereas the same “exogenous” left-to-right ordering of applies both in population and in the sample, pertinent orderings of are bound to be “endogenous”, that is, distribution-specific in populations, and data-driven (hence, random) in samples. This is the case for the concepts developed under (b)-(d) above; it also holds for the concept we are proposing in this chapter. Each distribution, each sample, thus is to produce its own ordering, inducing (related forms of) quantile and distribution functions, and classes of order-preserving transformations. As a result, datasets, at best, can be expected to produce, via adequate concepts of multivariate ranks and signs, consistent empirical versions of the unavailable underlying population ordering. That consistency typically takes the form of a Glivenko-Cantelli result (GC) connecting an empirical center-outward distribution function to its population version. It is essential, for such a result, to hold without any moment assumptions: moment assumptions (as in Chernozhukov et al. (2017), where consistency is established under compactly supported distributions—hence under the existence of finite moments of all orders), as a rule, are inappropriate in the intrinsically ordinal context of distribution and quantile functions.
No ordering of , moreover can be expected to be of the one-sided “left-to-right” type, since “left” and “right” do not make sense anymore. A depth-type center-outward ordering is by far more sensible. All this calls for revisiting the traditional univariate concepts from a center-outward perspective, while disentangling the population concepts from their sample counterparts.
1.1.3 Outline of the chapter
In this chapter, we show that the so-called Monge-Kantorovich ranks and signs recently proposed by Chernozhukov et al. (2017), unlike the many concepts that have been considered so far, do enjoy distribution-freeness (DF) and the maximal invariance property (HW) which typically entails (HW). We do not go all the way, in this chapter, to prove the implication from (HW) to (HW), though: although following along the same lines, essentially, as in Hallin and Werker (2003), a formal proof indeed requires model-specific regularity assumptions, and asymptotic representation results, in the Hájek style, for the new linear rank statistics. Such results are beyond the scope and page limitations of this chapter, and are the subject of ongoing work.
Using nontechnical arguments, we also show how those multivariate ranks and signs very naturally and intuitively emerge from revisiting classical univariate concepts, to which they reduce for . In particular, we propose a measure transportation-based concept of center-outward distribution function, for which we establish a Glivenko-Cantelli property in the absence of any moment assumptions. Refraining from moment assumptions calls for an approach which is entirely different from the Monge-Kantorovich optimization perspective adopted in Chernozhukov et al. (2017). The techniques considered there (and in most of the measure-transportation literature) indeed are deeply rooted in the analytical features of the Monge-Kantorovich problem, which focuses on minimizing an expected quadratic loss which, in the absence of finite second-order moments, no longer make sense. The tools we are using here are of a more fundamental geometric nature, exploiting the concept of cyclical monotonicity and the approach initiated by McCann (1995) (see Section 2.1 for details). This fact is emphasized by a shift in the terminology: as our approach is no longer based on Monge-Kantorovich optimization techniques, we consistently adopt the terminology center-outward ranks and signs for the ranks and signs associated with empirical center-outward distribution functions, despite the fact that they coincide with the Monge-Kantorovich ranks and signs introduced in Chernozhukov et al. (2017).
Section 1.2 provides, for those who are not familiar with measure transportation, a very succinct and elementary account of some classical facts in the area.
In Section 3, we start with revisiting the traditional concepts of univariate distribution/quantile functions and their empirical counterparts. Those traditional concepts strongly depend on the left-to-right nature of the canonical ordering of . As this left-to-right feature cannot be expected to extend to higher dimension, rather than the classical distribution function , we adopt a center-outward form , the empirical version of which naturally leads to center-outward ranks and signs. We then establish (Section 1.3.4), still for , a characterization of those center-outward distribution functions, ranks and signs in terms of measure transportation results. That characterization naturally extends to arbitrary dimension, and is exploited in Section 1.4 to define center-outward distribution functions, ranks and signs in .
Section 1.5 deals, for arbitrary , with the Glivenko-Cantelli property of empirical center-outward distribution functions. Sections 1.6 and 1.7 study the distributional and invariance/equivariance properties of central-outward ranks and signs, establishing (DF), the independence between ranks, signs and the order statistics, and the maximal invariance property (HW) which, as explained, leads to (HW)—hence indicating that center-outward ranks and signs fully qualify as statistically meaningful multivariate extensions of the traditional concepts, with which they coincide for .
Throughout, stands for the family of nonvanishing Lebesgue densities over , —to be precise, the family of all densities such that, for all there exist in such that for ; let denote the corresponding family of distributions,
the joint distribution of i.i.d.-tuples with common distribution in
. The probability measures and distribution functions associated with densities, … are denoted by , …, and , …, respectively; , … stand for the distributions of i.i.d. -tuples with densities , , … The notation , is used for the (open) unit ball and the unit sphere in , respectively.
1.2 Measure transportation: Monge, Kantorovich, Brenier, McCann
Starting from a very practical problem—How should one best move given piles of sand to fill up given holes of the same total volume?—Gaspard Monge (1746-1818), with his 1781 Mémoire sur la Théorie des Déblais et des Remblais
In modern notation, the simplest and most intuitive—if not most general—formulation of Monge’s problem is (in probabilistic form) as follows. Let and belong to the family of probability measures over (for simplicity) , and let
be a Borel-measurable loss function:represents the cost of transporting to . The objective is to find a measurable (transport) map solving the minimization problem
where ranges over the set of measurable map from to , and is the so-called push forward of by (in statistics, a more classical but heavier notation for would be or , where is the transformation of induced by ; see Lehmann and Romano (2005)). For simplicity, and with a slight abuse of language, we will say that is mapping to . A map achieving the infimum in (1.2.1) is called an optimal transport map, in short, an optimal transport, of to . In the sequel, we shall restrict to the quadratic (or L) loss function .
The problem looks simple but it is not. Monge himself (who moreover was considering the more delicate loss ) did not solve it, and relatively little progress was made until the 1940s, when renewed interest in the topic was triggered by the contributions of Leonid Vitalievitch Kantorovich (1912-1986; Nobel Prize in Economics in 1975) and his groundbreaking duality approach. Among the most powerful ensuing results is the Polar Factorization Theorem by Brenier (1987, 1991; see Chapter 3 in Villani (2003)) which implies, among other things, that for L loss, if and are absolutely continuous with finite second-order moments, the solution of Monge’s problem exists, is (a.e.) unique, and the gradient of a convex (potential) function—a form of multivariate monotonicity. The subject ever since has been a very active domain of mathematical analysis, with applications in various fields, from fluid mechanics to economics (see Galichon (2016)), learning, and statistics (Carlier et al. (2016); Panaretos and Zemel (2016, 2018); Álvarez et al. (2018) and del Barrio et al. (2018)). It was popularized recently by the French Fields medalist Cédric Villani, with two monographs (Villani 2003, 2009), where we refer to for background reading, along with the two volumes by Rachev and Rüschendorf (1998), where the scope is somewhat closer to probabilistic and statistical concerns.
Whether described as in (1.2.1
), or relaxed into the more general coupling form adopted by Kantorovich, the so-called Monge-Kantorovich problem remains an optimization problem, though, which only makes sense under densities for which expected costs are finite—under finite variances, thus, for quadratic loss. Such moments assumption, in a general context of distribution functions, ranks and quantiles, is not appropriate. Brenier’s theorem relies on similar assumptions, but inspired a remarkable result by McCann (1995, page 310), hereafter theMcCann Theorem. The nature of that theorem is geometric rather than analytical and, contrary to Monge, Kantorovitch and Brenier, does not require any moment restrictions. McCann’s Theorem implies that, for any given absolutely continuous , there exists, in the class of gradients of convex functions, a -essentially unique element pushing forward to . Under the existence of finite moments of order two, that mapping moreover coincides with the L-optimal (in the Monge-Kantorivich sense) transport of to .
Those measure transportation results are the basis of Carlier et al. (2016)’s concept of vector quantile regression, and of Chernozhukov et al. (2017)’s concept of Monge-Kantorovich depth and related quantiles, ranks and signs; see also Ekeland et al. (2012) for precursory ideas. While Carlier et al. (2016) consider mappings to the unit cube, Chernozhukov et al. (2017) deal with mappings to general reference distributions, including the uniform over the unit ball. On the other hand, they emphasize the consistent estimation of Monge-Kantorovich depth/quantile contours, with techniques requiring compactly supported distributions (hence finite moments of all orders, which is quite regrettable when defining a quantile concept); their proofs strongly exploit Kantorovich’s duality approach.
In the present chapter, we privilege mappings to the uniform distribution over the unit ball, which enjoys better invariance/equivariance properties than the unit cube—the latter indeed is not unique, and possesses edges and vertices, which are “special points”—and naturally extends the elliptical case. Moreover, we are focusing on the inferential properties of quantiles, ranks and signs and, adopting McCann’s geometric point of view, we manage to waive moment assumptions which, as we already stressed, are inappropriate in the context. The focus, applicability and mathematical nature of our approach, thus, is quite different from that of Chernozhukov et al. (2017).
Yet another approach is taken in a recent paper by Faugeras and Rüschendorf (2018), who propose combining a copula transform with a mapping in the Chernozhukov et al. (2017) style. This takes care of the compact support/second-order moment restriction, but results in a concept that heavily depends on the original coordinate system, which compromises the maximal invariance property (HW) leading to (HW).
1.3 Distribution and quantile functions, ranks and signs in
The concept of empirical distribution function, hence the concepts of ranks, signs, order statistics, and quantiles, are well understood and abundantly studied in dimension one. Before introducing multivariate extensions, we therefore briefly revisit the traditional versions of those fundamental concepts and some of their main properties.
1.3.1 Traditional univariate concepts
Denote by an
-tuple of real-valued random variables—observations or residuals associated with some parameterof interest, which we emphasize, when needed, by writing . We throughout consider the case that the ’s are (under parameter value for the ’s) i.i.d. with density , distribution and distribution function .
In dimension one, the definition of ranks is based on the canonical left-to-right ordering of the real line: the rank of among is traditionally defined as , . Intimately related with the concept of ranks is the dual concept of order statistics, with the th order statistic , implicitly defined by , Under the assumptions made, the vector of order statistics is sufficient and complete, while the vector of ranks is uniform over the permutations of , hence distribution-free. Basu’s Theorem (Basu (1955)) thus implies that and are mutually independent.
For the empirical distribution function , the classical definition yields
the denominator (rather than ) is chosen so that takes values in the open interval . The restriction of to then is uniform over the permutations of the regular grid
hence distribution-free and independent of the order statistic.
The Glivenko-Cantelli Theorem tells us that
which, under the assumptions made (nonvanishing densities), is equivalent to the apparently weaker property (GC) that
Actually, is entirely determined by its restriction to —the couples , . All other values of constitute an arbitrary interpolation carrying no further information: any choice of a nonde- creasing interpolation would be equally legitimate and does satisfy the same Glivenko-Cantelli property (1.3.2). From now on, we use the notation for that restriction (a data-driven mapping of the observations to the grid (1.3.1)); any monotone nondecreasing interpolation will be denoted by .
1.3.2 Center-outward distribution and quantile function in
For the purpose of multidimensional generalization, let us consider slightly modified concepts of distribution function, quantiles, ranks, and signs. Define the center-outward distribution function of as .
Clearly, being linear transformations of each other,and carry the same information about . Just as , the center-outward distribution function is a probability-integral transformation: denoting by the uniform distribution over the one-dimensional unit ball , iff . Boldface is used in order to emphasize the interpretation of as a vector-valued quantity: while is the -probability contents of the interval ) (the one-dimensional ball with radius ), the unit vector (a point on the unit sphere ; can be defined arbitrarily) is a direction or a sign—the sign of the deviation of from the median of . Those interpretations, as we shall see, will carry over to dimension .
A quantile function usually is defined as the inverse of a distribution function. Inverting (which for is strictly increasing) yields the center-outward quantile function Quantiles thus are indexed by the points of the unit ball ; is to be interpreted as a quantile level. The sets and the closed intervals where and are such that accordingly have the interpretation of quantile contours and quantile regions, at quantile level .
While traditional distribution and quantile functions are associated with nested half-lines of the form carrying probability , the center-outward ones are about nested intervals (containing ) with -probability contents , the geometry of which, unlike the traditional collection of half-lines (which is fixed), is adapted to the underlying distribution . The translation of the center-outward concept in terms of the traditional one is straightforward, though, as and , where .
1.3.3 Center-outward ranks and signs in
Turning to a sample , define the center-outward rank of as and , respectively, according as
is odd or even, itsempirical sign as , and the value at of the empirical center-outward distribution function as
with values on the regular grids
|( odd), and ( even)||(1.3.5)|
Those grids are the intersection between the two unit vectors and the circles with radii , and , centered at the origin—along with the origin when is odd.
If are i.i.d. with some density , the signs are uniform over the unit sphere , and independent of the ranks ; each rank is uniformly distributed over the integers ( odd), the integers ( even), while the -tuple is uniform over their permutations.
Formula (1.3.4) looks complicated, but it is not: the center-outward ranks, actually, result from ordering from left to right the observations sitting to the right of the median (sign ), and ordering from right to left the observations sitting to the left of the median (sign ); the regular grids (1.3.5) on [-1,1 ] are replacing the traditional regular grid (1.3.1) of values over [0,1]. In view of (1.3.4), the Glivenko-Cantelli result (1.3.3) for straightforwardly extends to :
If is to be defined over the whole real line, any nondecreasing interpolation of the couples provides a solution. Clearly, infinitely many choices are possible, and all of them yield a Glivenko-Cantelli statement under sup form (similar to (1.3.2)). Some are continuously differentiable, some are simply continuous (e.g., a linear interpolation), some are discontinuous, some are strictly increasing, some are step functions. Among them is the continuous-from-the-left on the left-hand side of the (empirical) median, and continuous-from-the-right on the right-hand side of the median piecewise constant interpolation shown in Figure 1.1 (bottom left).
Clearly, the traditional ranks and the empirical center-outward values , , generate the same -field: all classical rank statistics therefore can be rewritten in terms of . Traditional and center-outward ranks, therefore, are equivalent statistics.
1.3.4 Relation to measure transportation
The probability-integral transformation from to the unit ball is mapping the distribution to the uniform distribution over . As a monotone increasing function, it is the gradient (here, the derivative) of a convex function (which is defined up to an additive constant). It follows from McCann’s Theorem that it is the (essentially) unique gradient of a convex function mapping to . Therefore, this characterization can be adopted as the definition of . The huge advantage of this measure transportation-based definition is that it does not involve the canonical ordering of , and therefore readily extends to , .
1.4 Distribution and quantile functions, ranks and signs in
We are now ready to propose our definition of distribution and quantile functions in , along with their empirical counterparts. To start with, observe that , which is the Lebesgue-uniform distribution over the unit ball , is also the product of the uniform measure over the unit sphere with a uniform measure over the unit interval of distances from the origin. We similarly define as the product of the uniform measure over the unit sphere with a uniform measure over the unit interval of distances to the origin; while we still call it uniform over the unit ball, no longer coincides, for , with the Lebesgue-uniform measure over .
1.4.1 Center-outward distribution and quantile functions in
Before turning to the definition of center-outward distribution and quantile functions in , we need the following property, which guarantees the existence, uniqueness and continuity of the concepts, and is borrowed, with some minor modifications, from Theorem 1.1 in Figalli (2018).
Let . Then, (i) the gradient of convex function pushing forward to the uniform over the unit ball is unique; the set is compact and has Lebesgue measure zero;
(ii) the restriction of to is a homeomorphism from to , with inverse (defined on ) , where is the Legendre transform of ; for , however, consists of a single point, and is a homeomorphism from to ;
(iii) if has Lebesgue density , then
where the norming constant is the area of the unit sphere and the Hessian of .
The following definitions then coincide, for , with the univariate ones given in Section 1.3.2.
Let . The center-outward distribution function of is the unique gradient of convex function mapping to the open unit ball and pushing forward to the uniform over . The corresponding (center-outward) quantile function is . Denoting by and the closed ball and the hypersphere with radius centered at the origin, the quantile function characterizes quantile regions and quantile contours , respectively, of order (i.e., with probability contents ). The elements of (a compact set with Lebesgue measure zero coinciding with and ) are called center-outward medians.
The following elementary properties of and readily follow from the definition, or are immediate consequences of Proposition 1.4.1; details are left to the reader.
Let have a density . Then, (i) is a probability integral transformation of , i.e., iff ;
(ii) for , and are homeomorphisms between and , respectively, and the center-outward median is uniquely defined; for , the restrictions of and to and are homeomorphisms between and , respectively, and the center-outward medians form a compact set of measure zero;
(iii) the quantile regions , with boundaries , are connected, compact, and nested as increases from to ; their probability contents is .
The center-outward distribution and quantile functions and thus preserve the probability integral transformation nature of univariate distribution functions, and the interpretation of univariate quantile contours as the boundaries . The terminology quantile region and quantile contour of order is justified (for ) by (iii).
For any given distribution , induces a (partial) ordering of similar to the ordering induced on the unit ball by the system of polar coordinates, and actually coincides with the “vector rank transformation” considered in Chernozhukov et al. (2017); the compact support and Cafarelli assumptions made there are not needed here, though. The quantile contours also have the interpretation of depth contours associated with the Monge-Kantorovich depth concept considered in the same reference.
1.4.2 Center-outward ranks and signs in
Turning to the sample situation, let denote an -tuple of random vectors— observations or residuals associated with some parameter of interest. We throughout consider the case that the ’s are (possibly, under parameter value ) i.i.d. with density , distribution and center-outward distribution function .
For the empirical counterpart of , we propose the following extension of the univariate concept described in Section 1.3.3. Assuming , let factorize into
Next, consider a sequence of “regular grids” of points in the unit ball obtained as the intersection between
– a “regular” -tuple of unit vectors, and
– the hyperspheres centered at , with radii , ,
along with copies of the origin whenever . In theory, by a “regular” -tuple , we only mean that the sequence of uniform discrete distributions over converges weakly, as , to the uniform distribution over . In practice, each -tuple should be “as uniform as possible”. For , perfect regularity can be achieved by dividing the unit circle into arcs of equal length . Starting with , however, this typically is no longer possible. A random array of independent and uniformly distributed unit vectors does satisfy (almost surely) the weak convergence requirement. More regular deterministic arrays (with faster convergence) can be considered, though, such as the low-discrepancy sequences of the type considered in numerical integration and Monte-Carlo methods (see, e.g., Niederreiter (1992), Judd (1998), Dick and Pillichshammer (2014), or Santner et al. (2003)), which are current practice in numerical integration and the design of computer experiments.
The resulting grid of points then is such that the discrete distribution with probability masses at each gridpoint and probability mass at the origin converges weakly to the uniform over the ball —recall that, by uniform, we mean the product of a uniform over (the distribution of a multivariate sign) and a uniform over the unit radius (the distribution of a distance to the origin). That grid, along with the copies of the origin, is called the augmented grid ( points).
We then define , as the solution of an optimal coupling problem between the observations and the augmented grid. Let denote the set of all possible bijective mappings between and the points of the augmented grid just described. Under the assumption made, the ’s are all distinct with probability one, so that contains classes of indistinguishable couplings each (two couplings and are indistinguishable if for all ).
The empirical center-outward distribution function is the (random) mapping
where the set coincides with the points of the augmented grid and ranges over the possible permutations of .
is thus any of the ( with probability one) indistinguishable couplings between the observations and the points of the augmented grid that minimize, over the possible couplings, the sum (the mean) of within-pairs squared distances—a trivial and purely formal multiplicity that does not occur for or . Determining such a coupling is a standard optimal assignment problem, which clearly takes the form of a linear program for which efficient operations research algorithms are available.
Reinterpreting (1.4.2)-(1.4.3) as a (conditional on the sample) expected transportation cost, the same optimal coupling(s) also constitute(s) the optimal L transport mapping the sample empirical distribution to the uniform discrete distribution over the augmented grid (and, conversely, the two problems being entirely symmetric, the optimal L transport mapping the uniform discrete distribution over the augmented grid to the sample empirical distribution). Classical results (see, again, McCann (1995)) then show that optimality is achieved (that is, (1.4.2)-(1.4.3) is satisfied) iff the so-called cyclical monotonicity property holds for the -tuple (1.4.4). Except for a set with Lebesgue measure zero in (those points for which the minimal distance, in (1.4.2)-(1.4.3), is the same for at least two permutations of the grid—a finite collection of linear subspaces with dimension less than ), and apart from the trivial multiplicity just mentioned, the solution is unique.
A subset of is said to be cyclically monotone if, for any finite collection of points ,
The subdifferential of of a convex function does enjoy cyclical monotonicity, which heuristically can be interpreted as a discrete version of the fact that a smooth convex function has a positive semi-definite second-order differential.
Note that a finite subset of is cyclically monotone iff (2.1.2) holds for —equivalently, iff, among all pairings of and , maximizes (an empirical correlation), or minimizes (an empirical distance). In other words, a finite subset is cyclically monotone iff the couples are a solution of the optimal assignment problem with assignment cost . The L transportation cost considered here is thus closely related to the concept of convexity and the geometric property of cyclical monotonicity; it does not play the statistical role of an estimation loss function—the L distance between the empirical transport and its population counterpart (the expectation of which might be infinite) is never considered.
Associated with our definition of an empirical center-outward distribution function are the following concepts of
– center-outward ranks ,
– center-outward signs , and
– center-outward quantile contours and center-outward quantile regions , where , , is an empirical probability contents, to be interpreted as a quantile order.
The contours and regions defined here are finite collections of observed points; the problem of turning them into continuous contours enclosing compact regions is treated in Chapter 2.
Up to this point, we have defined multivariate generalizations of the univariate concepts of center-outward distribution and quantile functions, center-outward ranks and signs, all reducing to their univariate analogues in case . However, it remains to show that those extensions are adequate in the sense that they enjoy in the properties that make the inferential success of their univariate counterparts—namely,
a Glivenko-Cantelli-type asymptotic relation between and ,
finite- distribution-freeness (with respect to ), and
the maximal invariance property leading to semiparametric efficiency preservation.
Let be i.i.d. with distribution . Then,
This proposition considerably reinforces, under more general assumptions (no second-order moments), an early strong consistency result by Cuesta-Albertos et al. (1997).
Section 1.4 so far only provides a definition of at the sample values . If is to be extended to , an interpolation , similar for instance to the one shown, for , in Figure 1.1, has to be constructed. Such interpolation should belong to the class of gradients of convex functions from to , so that the resulting contours have the nature of continuous quantile contours. Moreover, they still should enjoy (now under a form similar to (1.3.2)) the Glivenko-Cantelli property. Constructing such interpolations is considerably more delicate for than in the univariate case, and is the subject of Chapter 2, where we also refer to for numerical implementation and pictures. It should be insisted, though, the form (1.5.1) of Glivenko-Cantelli is not really restrictive, as interpolations do not bring any additional information, and are mainly intended for a graphical depiction of contours (in dimension , thus).
Proposition 1.5.1 has an important corollary in the case of elliptical densities. Recall that a -dimensional random vector has elliptical distribution with location , positive definite symmetric scatter matrix and radial density iff has spherical distribution , which holds iff where , with density , is the distribution function of .
The mapping is thus a probability-integral transformation. Chernozhukov et al. (2017) show (Section 2.4) that it actually coincides with ’s center-outward distribution function . Letting , be i.i.d. with elliptical distribution , denote by and consistent estimators of and , respectively: the empirical version of , based on Mahalanobis ranks and signs (the ranks of the estimated residuals and the corresponding unit vectors ) is, for the th observation, .
Let , be i.i.d. with elliptical distribution , and assume that and are strongly consistent estimators of and , respectively. Then, and coincide, and