1 Introduction
This paper shows how methods from stochastic convex geometry can be successfully used in the foundations of data science. Before we discuss the geometric results, we discuss their implications:
Logistic regression is perhaps the most widely used nonlinear model in multivariate statistics and supervised learning
[12]. Statistical inference for this model relies on the theory of maximum likelihood estimation. In the binary classification case, given independent observations , logistic regression links the response to the covariates via the logistic modelhere
is the unknown vector of regression coefficients. In this model, the
loglikelihood is given byand, by definition, the maximum likelihood estimate (MLE) is any maximizer of this functional. The basic intuition behind this method is as follows: we seek coefficients so that corresponds as closely as possible with the observations . For example, if and have different signs, there is a larger “penalty” expressed in the loglikelihood, since in that case . See [10] for further discussion and examples.
One difficulty arising in machine learning is that the MLE does not exist in all situations. In fact, given two data classes, say one of red points (where
), and one of blue points (where ), it is wellknown that an MLE exists if and only if the convex hulls of the blue points intersects the convex hull of the red points [1, 18]. Although an appealing criterion for existence, this geometric characterization leads to another question: How much training data do we need, as a function of the dimension of the covariates of the data, before we expect an MLE to exist with high probability?The seminal work of Cover [6] (adapting a technique originally due to Schläfli [17]) provides an answer in a special case. When applied to logistic regression, Cover’s main result states the following: assume that the
’s are drawn i.i.d. from a continuous probability distribution
and that the class labels are independent from , and have equal marginal probabilities; i.e., . Then Cover showed that as and grow large in such a way that , the convex hulls of the data points asymptotically overlap  with probability tending to one  if , whereas they are separated  also with probability tending to one  if . When the class labels are not independent from the , the problem is more difficult. In this case, Candès and Sur [3]proved that a similar phase transition occurs, and is parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients.
Tukey introduced a notion of depth for a point relative to a data cloud , as the smallest number of data points in a closed halfspace with boundary through (see [21, 16] and references therein). We say a point has halfspace depth in if that every halfspace containing contains at least points in . A centerpoint of an point data set is a point such that every halfspace containing has at least points in , thus it is a point of depth at least
. In a way a centerpoint is a generalization of the notion of median for highdimensional data. Centerpoints are useful in a variety of applications (see e.g.,
[7] for references). Unfortunately, obtaining a centerpoint is difficult, and the current best randomized algorithm constructs a centerpoint in time [4, 13]. Thus finding an approximate centerpoint of a set is of interest.Consequences of our geometric results
The first contribution of our paper is to further develop the connection between geometric probability (Cover’s result), discrete geometry (Tverbergtype results), and the conditions for the existence of MLEs. Our paper discusses the generalization of Cover’s stochastic separation problem to more than two colors by studying socalled Tverberg partitions a partition of a data set into classes so that the intersection of all the convex hulls of the classes is nonempty.
Each of our stochasticgeometric theorems has a nice implication. Table 1 summarizes our theorems (middle column) as well as their consequences to the existence of the Maximumlikelihood estimator in terms of the size of the data set (right column).
Deterministic version  Stochastic version  Likely MLE Existence 
Radon 
Cover’s Theorem[6]  pair of data classes (mentioned above) 
Tverberg  Thms 4,5  all pairs of data classes 
(Theorem 2 part 1.)  
Radon with tolerance  Thm 7  pair of data classes with outliers removed 
(Theorem 2 part 2.)  
Tverberg with tolerance  Thms 2,6,[19]  all pairs of data classes with outliers removed 
(Theorem 2 part 2.) 
There are two common approaches to extend binary classification to multiclass classification: “onevsrest” and “onevsone”. Suppose the data has classes. In “onevsrest”, we train
separate binary classification models. Each classifier
for is trained to determine whether or not an example is part of class or not. To predict the class for a new sample , we run all classifiers on and choose the class with the highest score: In “onevsone” regression, we train separate binary classification models, one for each possible pair of classes. To predict the class for a new sample , we run all classifiers on and choose the class with the most votes.To apply “onevsone” multinomial logistic regression, we would like to ensure that the MLE exists between the data corresponding to every pair of labels. The next theorem applies our stochastic Tverberg theorem to give a sufficient condition for all these MLEs to exist with high probability (a sequence of events , occurs with with high probability if .)
Theorem 1 (Stochastic Tverberg theorems applied to multinomial regression)
Fix .
Assume that the ’s are drawn i.i.d. from a centrally symmetric continuous probability distribution on and that the class labels are independent from , and have equal marginal probabilities; i.e.,
for all .
Then

Letting the number of data points grow as a function of the number of labels , the MLE exists between the data corresponding to every pair of labels with high probability as long as

Suppose the number of data points is , where we fix the number of labels, and is a function of  the number of outliers to be removed from the data set. Then the MLE exists between the data corresponding to every pair of labels with high probability if any points are removed, so long as .
The same bound applies to “onevsrest” logistic regression, since MLE existence in that case is a weaker condition. The various special cases of Stochastic Tverberg theorems are thus useful in different kinds of classification problems, and these observations are summarized in Table 1.
The last two rows of the table of summarizing our results was motivated by the challenge of dealing with outlier data and seeking robust classification of data, we rely on an additional parameter: tolerance. A tolerant partition, which will be defined formally later, but it is a notion of “robust” intersection in the sense that the intersection of the convex hulls of the subsets remains nonempty even after any points are removed. See Figure 1 for an example of a tolerant partition.
The parameter of tolerance is also significant in studying MLE existence. A natural observation is that tolerant partitions correspond to robust MLE existence. Any points, possibly corrupted or outlier data, can be removed and still the convex hulls of the data with each label intersect.
In fact, the parameter of tolerance is also similar to an important parameter used to guarantee to the convergence speed of first order methods for finding MLEs. Recently, when studying binomial logistic regression, Freund, Grigas and Mazumunder [9] introduced the following notion to quantify the extent that a dataset is nonseparable (where denotes the negative part of ):
DegNSEP* is thus the smallest (over all normalized models ) average misclassification error of the model over the observations. They showed that the condition number DegNSEP* informs the computational properties and guarantees of the standard deterministic firstorder steepest descent solution method for logistic regression. Let us now briefly discuss how the parameter of tolerance for Radon partitions (Tverberg 2partitions) can be viewed as a discrete analogue of DegNSEP*.
Define PertSEP* as the smallest (or more precisely, the infimum thereof) perturbation of the feature data which will render the perturbed problem instance separable. Namely,
In Proposition 2.4 of [9] it is shown that DegNSEP* = PertNSEP*.
In this paper we introduce a new parameter simply defined as the norm of the smallest perturbation of the feature data which will render the perturbed problem instance separable. In other words, it is the minimal number of data points we could move to make the data set separable, normalized by the total number of data points. Namely,
The following theorem shows that the tolerance of a Radon partition is given by :
Theorem 2
Suppose that , is a Radon partition with tolerance precisely equal to . Then viewing as a labeled dataset (with ), we have that
Theorem 2, combined with a result of Soberón, has a corollary, stated precisely in the next section, which roughly says that of a randomly bipartitioned point set asymptotically approaches . This is the highest possible value one could hope for since, by definition, of any two class data set is bounded above by . In fact, this result extends easily to the multiclass setting. In other words, for a large randomly partitioned data set, we expect of every pair of data classes to be close to  independent of both the dimension of the covariates, as well as the number of classes .
For further discussion of PertSEP* and for twoclass data, including more probabilistic aspects of these condition numbers and many interesting implications for steepest descent algorithms, see [9].
We also discuss how our geometric probability results are related to the problem of computing approximations to centerpoints of datasets. Table 2 and the discussion that follows summarize our contributions. Tverberg’s theorem implies that every data set has a centerpoint, as the Tverberg intersection point of a Tverberg partition must be a point of halfspace depth one in each of the color classes. Hence an effective version is desirable as a method to obtain centerpoints. The proof of Radon’s lemma is constructive and, in fact, one of the most notable randomized algorithms for computing approximate centerpoints works by repeatedly replacing subsets of points by their Radon point. In contrast, no known polynomial time algorithm exists for computing exact Tverberg points. Thus, fast algorithms for approximate Tverberg points have been introduced in [5, 14, 15]. If one is interested in probabilistic algorithms for finding Tverberg partitions, the main results of our paper can be used to give expected performance of algorithms where we obtain Tverberg partitions by random choice, so long as the points come from a balanced distribution.
In particular, our Theorem 5 suggests a trivial algorithm for finding a Tverberg partition among a set of i.i.d. points drawn from a distribution which is balanced about a point . According to Theorem 5, a random equipartition of such points into less than sets should produce a Tverberg partition with high probability. This trivial nondeterministic algorithm was also suggested by Soberón, except using a random allocation rather than equipartition. Our asymptotic results improve the bounds on expected performance of Soberón’s proposed algorithm (random allocation) for points from a balanced distribution as well. We summarize the performance and time complexity of various algorithms for obtaining Tverberg partitions, including our own (last two rows), in Table 2.
2 Our geometric methods: Stochastic Tverbergtype theorems
We begin by remembering Tverberg’s celebrated theorem [22] which generalizes Radon’s lemma to partitions (see [2, 7] for references and the importance of this theorem):
Theorem (Theorem: (H. Tverberg 1966))
Every set with at least points in Euclidean space has at least one Tverberg partition (with tolerance zero).
The notion of “tolerant Tverberg theorems” was pioneered by Larman [11] and refined over the years, such as in the following result due to Soberón and Strausz [20]. Here is the precise definition:
Definition 1
Given a set , a Tverberg partition of with tolerance is a partition of into subsets with the property that all convex hulls of the intersect after any points are removed. In other words, for all , we have
Theorem (Theorem: (Soberón, Strausz 2012))
Every set with at least points in has at least one Tverberg partition with tolerance . In other words, can be partitioned into parts so that for all , we have
More recently, P. Soberón proved the following bound [19]. Let denote the smallest positive integer such that a Tverberg partition with tolerance exists among any points in dimension . Then for fixed and . The proof of this result relies on the probabilistic method and, as Soberón remarked, can in fact be used to prove a Stochastic Tverbergtype result, which we will revisit later.
Prior Stochastic Tverberg theorems
Before stating our main results, we introduce two models for random partitioned data point sets. In both models will use the term colors instead of subsets, for ease of notation. Hereafter, when we refer to a continuous distribution on , we mean continuous with respect to the Lebesgue measure on . We defer proofs of the new results stated until the next section.
Our first model is a socalled random equipartition model i.e., we ensure that every color has the same number of points. More specifically, given integers and and a continuous probability distribution on , we let denote a random equipartitioned point set with points, consisting of colors, and points of each color, distributed independently according to .
Our second model is a random allocation model: Given integers and and a continuous probability distribution on , we let denote a random point set with points i.i.d. according to , which are randomly colored one of colors with uniform probability ( for each color).
For example, using these models we can state Cover’s result as follows:
Theorem (Theorem: (T. Cover 1965))
If is a continuous probability distribution on , then
In particular, we have
Furthermore, for any and any sequence of continuous probability distributions where each is a distribution on , we have
and
To the best of the authors’ knowledge, the first generalization of Cover’s 1964 result to more than two colors appeared only recently in Soberón’s paper [19]:
Theorem (Theorem: P. Soberón 2018)
Let be positive integers and let be a real number. Given points in , a random allocation of them into parts is a Tverberg partition with tolerance with probability at least , as long as
This result is quite remarkable. For any fixed and , it shows that the probability of a random allocation of of points in in colors having tolerance at least approaches one as goes to infinity. On the other hand, by pigeonhole principle, any allocation of points into colors must have one color with at most points. Thus, for a fixed number of colors , the tolerance of a random partition is asymptotically as high as it could possibly be! By Theorem 2, this result yields the following corollary.
Corollary 1
For any sequence of partitioned point sets with a distribution on , and any , we have with high probability.
In fact, for fixed and , Corollary 1 can be extended to the multiclass setting. In other words, for a large randomly partitioned data set, we expect of every pair of data points to be close to :
Theorem 3
Fix . For any distribution on and any sequence of partitioned point sets
we have
with high probability.
Our new stochastic geometric theorems
Our first theorem is a geometric probability result similar to Soberón’s and Cover’s. It yields a Stochastic Tverberg theorem for equipartitions (without tolerance).
Theorem 4 (Stochastic Tverberg theorem for equipartitions)
Suppose is a probability distribution on that is balanced about some point
, in the sense that every hyperplane through
partitions into two sets of equal measure. Then
In fact, the previous theorem is asymptotically tight in the number of colors . This is shown by our next theorem, which establishes an interesting threshold phenomenon for Tverberg partitions.
Theorem 5 (Tverberg Threshold Phenomena for equipartitions)
Let be a continous probability distribution in balanced about some point . Consider the sequence of random equipartitioned point sets , where , and depends on . Then is Tverberg with high probability if , and is not Tverberg with high probability if .
Remark: It is also interesting to consider the same problem from the “box convexity” setting where the convex hull of a set of points is defined to be the smallest box (with sides parallel to the coordinate axes) enclosing those points. Since checking convex hull membership is easier in the box convexity setting, this set up may be more relevant in certain applications. Our method of proof of Theorem 4 also works in box convexity setting, and we obtain the same bounds.
We note that the number of points needed to reach the conclusion in Theorem 5 is independent of the dimension, as in the aforementioned result of Soberón [19].
The next two theorems adapt both Cover’s result and Theorem 4 to the setting of tolerance.
Theorem 6 (Stochastic Tverberg with tolerance for equipartition)
Suppose is a probability distribution on that is balanced about some point .
For the case of random bipartitions, we can adapt Cover’s result to obtain a Stochastic Radon theorem with Tolerance.
Theorem 7 (Stochastic Radon with tolerance for random allocation)
If is a continuous probability distribution on , then
In particular, we have
Remark: Theorem 7 yields a weaker expected tolerance than Soberóns result, but the proof is shorter and more elementary.
For random allocations with more than two colors, we will use some developments on random allocation problems, including the following notation. If balls are thrown into urns uniformly and independently, let equal the number of throws necessary to obtain at least balls in each urn.
Corollary 2 (Stochastic Tverberg for random allocation)
Suppose is a probability distribution on that is balanced about some point .

Then

For the case of Tverberg without tolerance, we also have

Suppose , is a sequence of random partitioned point sets, where depends on .
Then is Tverberg with high probability if .
These results are improvements on Soberón’s bound when the number of colors is large relative to the desired tolerance.
3 Proofs of our stochastic results
Proof (Proof of Theorem 2)
Let denote the minimal number of points perturbed among any perturbation that makes separable, and denote the minimal number of points needing to be removed from to make separable. Then is equal to , and the tolerance of , is equal to . It suffices to show that . To see that , note if in are moved so that the resulting set is separable, then is also separable. To see that , suppose that is separable by a hyperplane. Then moving to the appropriate sides of the hyperplane determined by , we can construct a separable dataset , obtained from moving points from .
Proof (Proof of Theorem 3)
For fixed and , let denote the event that a random allocation of points in in colors has tolerance at least . By Soberón’s theorem above, asymptotically approaches one as goes to infinity. Now, for fixed and , let denote the event that a random allocation of points into colors has between and points of color , where
. By the law of large numbers,
approaches one as goes to infinity. As the events , where , all have probability approaching one, the probability of the intersection of all these events also approaches one. This can be seen by applying the union bound to their complements. Thus there exists such that the , where , simultaneously occur with probability . Therefore with probability , each pair of colors has at most points, and is a Radon partition of tolerance at least (the tolerance of each bipartition is a priori bounded below by the tolerance of the partition). By theorem 2, of each pair is at least with probability . Since and were arbitrary, this completes the proof.Proof (Proof of the lower bound in Theorem 4)
After a possible translation, can assume without loss of generality that is balanced about the origin. We will prove that
by bounding from below the probability that the origin is a Tverberg point. We may assume without loss of generality that none of the randomly selected points are the origin. Furthermore we can radially project the points onto a sphere of radius smaller than the minimal norm of the projected points, since that will not affect whether the origin is a Tverberg point. After this projection, we may assume the points are uniformly sampled on a small sphere centered at the origin. The origin is then a Tverberg point as long as the points from each color contain the origin in their convex hull. This is equivalent to showing no color has all of its points contained in one hemisphere. For a fixed color, the probability of the points of that color being contained in one hemisphere was computed by Wagner and Welzl [23] (generalizing the celebrated result of Wendel [24] addressing the case when is rotationally invariant about the origin) as
(1) 
Using this to compute the probability that none of the color classes is contained in one hemisphere we obtain the desired bound above.
Proof (Proof of the upper bound in Theorem 4)
Again, we assume without loss of generality that is balanced about the origin. We will first treat the case , and then explain how to obtain the bound for arbitrary . To bound the probability of a Tverberg partition from above, we bound the probability of the complement below. We let denote the event that the convex hulls have empty intersection. In dimension one, is contained in the event that there is at least one color class with all points less than zero, and at least one color class with all points greater than zero. Since we assume that the origin equipartitions , we can rephrase this as the probability that among people each flipping fair coins, there is at least one person with all heads and at least one person with all tails. In other words, denoting by and the events that at least one person gets all heads or tails respectively, we have . We have
Since and , this yields
The probability of a Tverberg partition is thus bounded as follows
This proves the desired bound for dimension one. For higher dimensions, we note that if we let , denote the projection onto the th axis for , we have that the signs of
are independent Bernoulli random variables with probability
(as the hyperplane orthogonal to the th axis equipartitions by the assumption that is balanced about the origin). Thus to have a Tverberg partition, we must have that no pair of the color classes are separated by the origin after projecting onto the coordinate axes. Since these events are independent, the probability of this happening is bounded as follows.
Proof (Proof of Theorem 5)
We will show that is Tverberg with high probability if . Fix an . We set and apply the lower bound in Theorem 4 to deduce that
Choosing a constant so that , we have
We will show that the limit as approaches infinity of the left hand side is bigger than for any . Fix . As , there exists an such that for all . Consequently for all . Thus
Since was arbitrary, we see that the probability of a Tverberg partition tends to 1.
Now we show that is not Tverberg with high probability if . As before, we fix an greater than zero apply the upper bound in Theorem 4 with to obtain
For any , when is large, both terms inside the parentheses are smaller than . Since , the probability of a Tverberg partition converges to zero as approaches infinity.
Proof (Proof of Theorem 6)
Again, we assume without loss of generality that is balanced about . Let denote the set of points of some fixed color. Then we assume that , and we can partition into subsets with for each . By Wagner and Welzl’s result (Equation 3 above), for each , contains the origin with probability at least . By independence, the probability that less than of the contain the origin is less than . On the other hand, if at least of the contain the origin, then by pigeonhole principle contains the origin for any . Thus, with probability at least , we have that contains the origin. Since this probability is independent for each of the colors, the result follows.
Using a similar strategy combined with Cover’s result, we give the proof of Theorem 7 below.
Proof (Proof of Theorem 7)
Given points in colored red and blue by random allocation, we arbitrarily partition them into groups of size at least . By Cover’s result, for each fixed group, the convex hulls (of the red and blue points) in that group intersect with probability at least . For each of the groups, we think of the event that the convex hulls in that group intersect as a “success”. Then the probability that at least groups have intersecting convex hulls is bounded below by the probability that a binomial process with trials and success probability has at least total successes. Computing this binomial probability yields the theorem. (If at least groups have intersecting convex hulls, then removing at most points leaves at least one group with intersecting convex hulls. )
Proof (Proof of Corollary 2)
We split the proof according to the three respective parts of the statement.

The probability that a random allocation of points into colors is an Tverberg partition with tolerance is bounded below by the probability that a random allocation of points into colors has at least points per color, times the probability that an equipartition of points into colors is Tverberg with tolerance . The result for Tverberg with tolerance then follows from Theorem 6.

To show the asymptotic result, we use a result on urn models due to Erdős and Renyi [8] saying that
This implies that for any and sequence of points allocated into urns, we have at least points in each urn with high probability. Then we apply Theorem 5, which says that any equipartition of a point set into colors and points per color is Tverberg with high probability.
4 Acknowledgments
This work was partially supported by NSF grants DMS1522158 and DMS1818169. We are grateful to David Rolnick and Pablo Soberón for their comments.
References
 [1] A. Albert and J. A. Anderson, On the existence of maximum likelihood estimates in logistic regression models, Biometrika, 71 (1984), pp. 1–10, https://doi.org/10.1093/biomet/71.1.1, http://dx.doi.org/10.1093/biomet/71.1.1, https://arxiv.org/abs//oup/backfile/content_public/journal/biomet/71/1/10.1093/biomet/71.1.1/2/7111.pdf.
 [2] I. Bárány and P. Soberón, Tverberg’s theorem is 50 years old: A survey, Bull. Amer. Math. Soc., 55 (2018), pp. 459–492, https://doi.org/https://doi.org/10.1090/bull/1634.
 [3] E. J. Candès and P. Sur, The phase transition for the existence of the maximum likelihood estimate in highdimensional logistic regression, 2019. to appear Annals of Statistics.
 [4] T. M. Chan, An optimal randomized algorithm for maximum Tukey depth, in Proceedings of the Fifteenth Annual ACMSIAM Symposium on Discrete Algorithms, ACM, New York, 2004, pp. 430–436.
 [5] K. L. Clarkson, D. Eppstein, G. Miller, C. Sturtivant, and S. Teng, Approximating center points with iterative Radon points, International Journal of Computational Geometry & Applications, 6 (1996), pp. 357–377.

[6]
T. M. Cover,
Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition
, Electronic Computers, IEEE Transactions on, EC14 (1965), pp. 326 – 334, https://doi.org/10.1109/PGEC.1965.264137.  [7] J. A. De Loera, X. Goaoc, F. Meunier, and N. H. Mustafa, The discrete yet ubiquitous theorems of Carathéodory, Helly, Sperner, Tucker, and Tverberg, Bull. Amer. Math. Soc. (N.S.), 56 (2019), pp. 415–511, https://doi.org/10.1090/bull/1653, https://doi.org/10.1090/bull/1653.

[8]
P. Erdős and A. Rényi,
On a classical problem of probability theory
, Magyar Tud. Akad. Mat. Kutató Int. Közl., 6 (1961), pp. 215–220.  [9] R. M. Freund, P. Grigas, and R. Mazumder, Condition Number Analysis of Logistic Regression, and its Implications for Standard FirstOrder Solution Methods, arXiv eprints, (2018), arXiv:1810.08727, p. arXiv:1810.08727, https://arxiv.org/abs/1810.08727.
 [10] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, vol. 103 of Springer Texts in Statistics, Springer, New York, 2013, https://doi.org/10.1007/9781461471387, https://doi.org/10.1007/9781461471387. With applications in R.
 [11] D. G. Larman, On sets projectively equivalent to the vertices of a convex polytope, Bulletin of the London Mathematical Society, 4 (1972), pp. 6–12.
 [12] P. McCullagh and J. A. Nelder, Generalized linear models, Monographs on Statistics and Applied Probability, Chapman & Hall, London, 1989, https://doi.org/10.1007/9781489932426, https://doi.org/10.1007/9781489932426. Second edition [of MR0727836].
 [13] G. L. Miller and D. R. Sheehy, Approximate centerpoints with proofs, Comput. Geom., 43 (2010), pp. 647–654, https://doi.org/10.1016/j.comgeo.2010.04.006, https://doi.org/10.1016/j.comgeo.2010.04.006.
 [14] W. Mulzer and D. Werner, Approximating Tverberg points in linear time for any fixed dimension, Discrete & Computational Geometry, 50 (2013), pp. 520–535.
 [15] D. Rolnick and P. Soberón, Algorithmic aspects of Tverberg’s theorem, CoRR, abs/1601.03083 (2016), http://arxiv.org/abs/1601.03083, https://arxiv.org/abs/1601.03083.
 [16] P. J. Rousseeuw and A. Struyf, Computing location depth and regression depth in higher dimensions, Statistics and Computing, 8 (1998), pp. 193–203, https://doi.org/10.1023/A:1008945009397, https://doi.org/10.1023/A:1008945009397.
 [17] L. Schläfli, Gesammelte mathematische Abhandlungen. Band I, Verlag Birkhäuser, Basel, 1950.
 [18] M. J. Silvapulle, On the existence of maximum likelihood estimators for the binomial response models, Journal of the Royal Statistical Society. Series B (Methodological), 43 (1981), pp. 310–313, http://www.jstor.org/stable/2984941.
 [19] P. Soberón, Robust Tverberg and colourful Carathéodory results via random choice, Combinatorics, Probability and Computing, 27 (2018), p. 427–440, https://doi.org/10.1017/S0963548317000591.
 [20] P. Soberón and R. Strausz, A generalisation of Tverberg’s theorem, Discrete Comput. Geom., 47 (2012), pp. 455–460, https://doi.org/10.1007/s004540119379z, https://doi.org/10.1007/s004540119379z.
 [21] J. W. Tukey, Mathematics and the picturing of data, Proceedings of the International Congress of Mathematicians, Vancouver, 1975, 2 (1975), pp. 523–531, https://ci.nii.ac.jp/naid/10029477185/en/.
 [22] H. Tverberg, A generalization of Radon’s theorem, J. London Math. Soc., 41 (1966), pp. 123–128.
 [23] U. Wagner and E. Welzl, A continuous analogue of the upper bound theorem, Discrete Comput. Geom., 26 (2001), pp. 205–219, https://doi.org/10.1145/336154.336176, https://doi.org/10.1145/336154.336176. ACM Symposium on Computational Geometry (Hong Kong, 2000).
 [24] J. G. Wendel, A problem in geometric probability., Mathematica Scandinavia, 11 (1962), pp. 109–112, https://doi.org/10.7146/math.scand.a10655, https://www.mscand.dk/article/view/10655.