Exact Solutions in Log-Concave Maximum Likelihood Estimation

We study probability density functions that are log-concave. Despite the space of all such densities being infinite-dimensional, the maximum likelihood estimate is the exponential of a piecewise linear function determined by finitely many quantities, namely the function values, or heights, at the data points. We explore in what sense exact solutions to this problem are possible. First, we show that the heights given by the maximum likelihood estimate are generically transcendental. For a cell in one dimension, the maximum likelihood estimator is expressed in closed form using the generalized W-Lambert function. Even more, we show that finding the log-concave maximum likelihood estimate is equivalent to solving a collection of polynomial-exponential systems of a special form. Even in the case of two equations, very little is known about solutions to these systems. As an alternative, we use Smale's alpha-theory to refine approximate numerical solutions and to certify solutions to log-concave density estimation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

11/08/2018

An Efficient Algorithm for High-Dimensional Log-Concave Maximum Likelihood

The log-concave maximum likelihood estimator (MLE) problem answers: for ...
05/18/2018

Fast Multivariate Log-Concave Density Estimation

We present a computational approach to log-concave density estimation. T...
11/14/2018

Saddlepoint adjusted inversion of characteristic functions

For certain types of statistical models, the characteristic function (Fo...
07/18/2019

A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families

We consider the problem of computing the maximum likelihood multivariate...
08/28/2018

Active set algorithms for estimating shape-constrained density ratios

We review and modify the active set algorithm by Duembgen et al. (2011) ...
06/03/2018

Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries

This note considers softmax parameter estimation when little/no labeled ...
05/24/2021

A new computational framework for log-concave density estimation

In Statistics, log-concave density estimation is a central problem withi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nonparametric methods in statistics emerged in the 1950-1960s [25, 47, 41, 3] and fall into two main streams: smoothing methods and shape constraints. Examples of smoothing methods include delta sequence methods such as kernel, histogram and orthogonal series estimators [55], and penalized maximum likelihood estimators, e.g., spline methods [24]

. Their defining feature is the need to choose the smoothing or tuning parameters. It is a delicate process because smoothing parameters depend on the unknown probability density function. In contrast to smoothing methods, shape constrained nonparametric density estimation is fully automatic and does not depend on the underlying probability distribution, though this comes at the expense of worse

convergence rates for smooth densities [23]. Some previously studied classes of functions include non-increasing [26], convex [28], -monotone [6, 7] and -concave [19]. For general references on nonparametric statistics we refer the reader to [52, 50, 54, 27].

In this paper we focus on the class of log-concave densities, which is an important special case of -concave densities. The choice of log-concavity is attractive for several reasons. First of all, most common univariate parametric families are log-concave, including the normal, Gamma with shape parameter greater than one, Beta densities with parameters greater than 1, Weibull with parameter greater than 1 and others. Furthermore, log-concavity is used in reliability theory, economics and political science [4]. In addition to this, log-concave densities have several desirable statistical properties. For example, log-concavity implies unimodality but log-concave density estimation avoids the spiking phenomenon common in general unimodal estimation [20]. Moreover, this class is closed under convolutions and taking pointwise limits [14]. We refer the reader to [49] for an overview of the recent progress in the field.

Let be a point configuration in with weights such that and . The log-concave maximum likelihood estimation (MLE) problem aims to find a Lebesgue density that solves

(1.1)

It has been shown that the solution exists with probability and is unique, and its logarithm is a tent function – i.e., a piecewise linear function with regions of linearity inducing a subdivision of the convex hull of  [56, 40, 15, 46], see Figure 1 for an example. While MLE is the most widely studied estimator in this setting, it is not the only one, for examples see [18, 16].

The maximum likelihood estimator is attractive because of its consistency under general assumptions [40, 20, 14, 22] and superior performance compared to kernel-based methods with respect to mean integrated squared error, as observed in simulations [15]. At the same time, the convergence rate is still an open question and only lower [32, 33] and upper [32, 10] bounds are known. Further theoretical properties have been studied for some special cases of log-concave densities, e.g., -affine densities [31] and totally positive densities [45]. Several algorithms have been developed to compute the log-concave MLE in one dimension [48] and in higher dimensions [15, 2, 43]. Software implementations include R packages such as logcondens [21] and cnmlcd [34] in one dimension, and LogConcDEAD [13] and fmlogcondens [42] in higher dimensions.

Example 1.1.

The starting point of this paper is the following problem. Consider the sample of points in with uniform weights:

How many cells does the subdivision induced by the logarithm of the optimal log-concave density have?

Using the R package LogConcDEAD with default parameters, one obtains that the logarithm of the maximum likelihood estimate is a piecewise linear function with seven unique linear pieces. However, when one investigates the optimal density more closely, it appears that several linear pieces are similar. For example, a visual inspection of the optimal density depicted in Figure 1 suggests that there are only four unique linear pieces. What is the true number of unique linear pieces of the optimal density? Is it four, seven or another value?

Theoretically, the algorithm used in LogConcDEAD finds the true optimal density, however, in practice, the answer is a numerical approximation. By changing the parameter sigmatol from default value to , LogConcDEAD outputs four unique linear pieces, exactly as we observed in Figure 1. Although it might seem obvious that four is the correct number of linear pieces, in reality the situation is more complicated, see Example 4.16. How do we find the correct number of linear pieces?

Figure 1: The optimal tent function for the sample of 14 points in Example 1.1.

The goal of this paper is to study exact solutions to log-concave maximum likelihood estimation. An exact solution will have three different meanings in this paper. First, one might hope that it is an algebraic number. This would enable exact symbolic computations by way of storing a floating point approximation of a number along with a polynomial that vanishes on it. Such computations are not possible for transcendental numbers. Thus, the first main result of our paper is Theorem 3.7, which states that the heights at the sample points of the logarithm of the log-concave density estimate are, in general, transcendental.

Second, in light of Theorem 3.7, we would like to express the maximum likelihood estimator in closed form using well-known mathematical operations and functions, although not necessarily elementary functions. In the simplest case of one cell in one dimension, we derive the log-concave density estimator in closed form using the generalized -Lambert function, see Proposition 3.9. It is known that the generalized -Lambert function is not an elementary function. More generally, solving the MLE can be restated as a collection of polynomial-exponential systems of equations, which have been studied in the literature. However, even in the case of two equations, only bounds on solutions are known [35]. This suggests that it might be difficult to express the log-concave maximum likelihood estimator in closed form. As an alternative, we turn to Smale’s -theory, which we describe briefly now.

Third, given a sufficiently close floating point solution to the MLE problem, one hopes that it can be refined to any desired precision using Newton iteration or other techniques. A natural question arises: when is the approximate solution good enough for these methods to succeed? A way to make this mathematically rigorous is Smale’s -theory [9, 53], which we discuss in Section 4. We obtain the -certified solutions to log-concave density estimation. This allows us to test and compare numerical solvers, as well as rigorously decide the certified, correct subdivision for a given log-concave density estimation problem. Our methods are especially relevant when the precision of the log-concave density estimate is important. This opens new pathways to answering the motivating question: what is the correct number of cells?

The code for computations in this paper can be found at: https://github.com/agrosdos/Computing-the-Exact-LogConcave-MLE.

2 Geometry of log-concave maximum likelihood estimation

We start by reviewing the geometry of log-concave maximum likelihood estimation mostly following [46].

Definition 2.1.

Let be the convex hull of a point configuration

. For a fixed real vector

, we define a function on , called the tent function, as the smallest concave function such that for . The tent function is piecewise linear on with linear pieces being equal to upper facets of the convex hull of the points in . We define at all points outside . If for , then is called relevant.

It was shown by Cule, Samworth and Stewart for uniform weights [15] and by Robeva, Sturmfels and Uhler in general [46] that the constrained optimization problem (1.1) for finding the log-concave maximum likelihood estimate is equivalent to the unconstrained optimization problem

(2.1)

Moreover, the log-concave maximum likelihood estimate is a tent function with tent poles at some of the . Therefore finding the log-concave density which maximizes the likelihood of our data reduces to finding an optimal height vector .

Definition 2.2.

We follow the definitions in [17]. Given a point configuration in , a subdivision of is a collection of -polytopes, denoted , such that the union of polytopes in equals , the vertex set of polytopes in is contained in and the intersection of polytopes in can only happen along lower dimensional faces. A subdivision is called a triangulation, if all polytopes in are simplices. A triangulation of the point configuration is called maximal, if every element of is a vertex of a simplex in . A subdivision is called regular if its full dimensional cells are combinatorially equivalent to the regions of linearity of a tent function on for some height vector

Corollary 2.3.

[46, Corollary 2.6] To find the optimal height vector in (2.1) is to maximize the following rational-exponential objective function over :

(2.2)

where is any regular triangulation that refines the subdivision induced by the tent function .

If induces a regular subdivision that is not a maximal regular triangulation, then we can consider any maximal regular triangulation that refines . Thus if there are maximal regular triangulations of , then to find the optimal we must compare the optimal values which are obtained by solving the optimization problem (2.2) times, once for each maximal regular triangulation . However, it is not enough to consider the system of critical equations only for each of the maximal regular triangulations, since the optimization problem (2.2) is not smooth. We will see this phenomenon in Example 2.4.

Example 2.4.

Fix , and . The configuration has two triangulations and , which are both regular triangulations. Only is a maximal triangulation. Hence solving the optimization problem (2.1) is equivalent to maximizing the following objective function

(2.3)

If or , then a denominator on the right hand side of (2.3) becomes zero. However, the objective function in the formulation (2.1) can be still simplified to

Let . The output from LogConcDEAD suggests that the optimal tent function is supported on one cell, with heights given by , and . However, the vector is neither a critical point of (2.3) nor of the function

which is the right hand side (2.2) for the triangulation . This can be seen by taking partial derivatives of these functions with respect to and substituting . In the case of , it is particularly easy to see that there are no solutions, since . In the case of , the system of critical equations fails to certify in the sense of Section 4.

The points being collinear is equivalent to where , . Since , we have . Hence Substituting this expression into the objective function (2.3) we get

which for uniform weights becomes

We will verify in Example 4.13 that is a critical point of the function .

To visualize the situation, we consider the Samworth body

which was introduced in [46]. The unconstrained optimization problem (2.1) is equivalent to the constrained optimization problem of maximizing the linear function over the Samworth body. For different choices of weight vector , we obtain different optimal height vectors on the surface of the Samworth body, and the height vector determines the triangulation. The Samworth body consists of two regions that can be seen in Figure 2. The green region comes from the one-simplex triangulation , while the red region comes from the two-simplex triangulation . Moreover, one can see lines separating the green region into two pieces and the red region into three pieces (ignore the curve separating the green and the red regions for now). These lines correspond to the degenerate cases where , or , and hence the right hand side of (2.2) is not defined. Therefore those lines are simply artifacts of the reformulation (2.2) since in the original unconstrained (2.1) these points present no difficulty. The intersection of the three lines is the point .

Figure 2: The Samworth body for .

Consider the curve separating the green and red regions of the Samworth body. This curve corresponds to that induce the subdivision and are collinear. To understand the green region, see the piecewise linear functions drawn in Figure 3. Since the lowest (dotted) function is not concave, it is invalid as a tent function. Therefore, if the height is too low, the optimal tent function will be the (solid-line) linear function. In effect, the optimal tent-function ignores heights if they are too low. This basic phenomenon is responsible for the green part of the Samworth body being flat in the direction, meaning that it is a pencil of half-lines parallel to the -axis.

Figure 3: The red tent function corresponds to a vector in the red region of the Samworth body. The solid green tent function corresponds to a vector on the curve separating red and green regions of the Samworth body. The dotted green function is not convex. Its height vector belongs to the green region of the Samworth body and both green sets of heights give the same tent function.

The transition from the red region to the green region is not smooth. For every on the curve between the green and red regions, there is a two-dimensional cone of weight vectors that give as an optimal solution. The generators of this cone are described in [46, Theorem 3.7]. The optimal height vector for lies on the curve between the red and green regions. It is not a critical point of the function (2.3), because is not a normal vector to the red region at the point .

We now return to the general situation. Let be a configuration of points . Fixing a maximal regular triangulation of our point configuration , we can find the optimal by solving the system of critical equations for in (2.2). These partial derivatives take the form (see [46, Proof of Lemma 3.4]):

(2.4)
Definition 2.5.

For a fixed maximal regular triangulation of , let be the matrix such that the system of critical equations (2.4) can be written in the form

(2.5)

where is a column vector of exponentials , and is a column vector of weights . The matrix is called the score equation matrix.

The entries of are in the field of rational functions in the variables . Diagonal entries of are

and off-diagonal entries of are

The matrix can be written as a sum of matrices over maximal simplices . This will be described explicitly in the proof of Theorem 3.1.

As we observed in Example 2.4, to find the optimal density, it is not enough to solve the systems of critical equations only for maximal regular triangulations. Since the system of critical equations becomes much more difficult for general subdivisions, we do not write it out here.

3 Transcendentality and closed-form solutions

In this section we use notions from geometric combinatorics to study the structure of (2.5). In particular, we will prove that the matrix is invertible. This will be our main tool in proving the transcendentality of log-concave MLE and deriving closed form solutions in the one-dimensional one cell case using Lambert functions.

3.1 Score equation matrix invertibility and transcendentality

Towards proving transcendentality, we first investigate the invertibility of the matrix .

Theorem 3.1.

Consider a point configuration in , let be a maximal regular triangulation of . The score equation matrix from (2.5) is invertible.

Definition 3.2.

Given a triangulation , we define the neighborhood of a vertex in to be the set of vertices

Before giving the proof of Theorem 3.1, we illustrate the construction in the proof with a small example.

Example 3.3.

Let be a four point configuration in with , where and . Let be the score equation matrix for the entire regular triangulation . Let us denote the difference by . Then , where

We define matrix to be the matrix with its -th column multiplied by , for all from to . We obtain the following matrices

The product of the diagonal entries of is a polynomial of degree 12. Whereas a term in the expansion of the determinant of with off-diagonal entries has at most degree 10.

Proof of Theorem 3.1.

The score equation matrix associated to a maximal regular triangulation can be written as

where the entries of for are

The matrix is sparse: If or does not belong to then .

Let (resp. ) be the matrix that is obtained by multiplying the -th column of (resp. ) by for :

(3.1)

Fix . We describe separately the off-diagonal and diagonal entries of . For and we get

And for the diagonal entries

Given a polynomial , we can rewrite as a univariate polynomial in of degree , where is a constant with respect to . We then define the initial form of with respect to to be

We observe that for the off-diagonal entries , the initial form with respect to is

where is the number of vertices adjacent to in . Whereas for the diagonal entry , the initial form is

In both cases, the degree of the initial form is the degree of the polynomial. We sum the matrices for , to get and note that the coefficient of the monomial in is the number of simplices in containing vertex . Hence, using the Leibniz formula to compute the determinant of , we get that the product of diagonal entries is a polynomial of degree . All off-diagonal entries in that column of are of degree one smaller, thus any monomial in the expanded form of the determinant with off-diagonal entries must have degree at least two smaller than the product of diagonal entries. The following equality is a direct consequence of (3.1)

Since is not identically , is not identically zero, hence is invertible over the field of rational functions. ∎

The proof of Theorem 3.1 inspires the following conjecture about the combinatorial properties of the determinant.

Conjecture 3.4.

The highest degree component of the numerator of is

Since is invertible, (2.5) can be rewritten as

where entries of are rational functions in .

Corollary 3.5.

Fix a maximal triangulation . Then the critical equations (2.4) can be written in the form

(3.2)

where . If , then .

We will explore rational-exponential systems of the form (3.2) further in Sections 3.2-3.3. The following is a result from transcendental number theory, for a textbook reference see Theorem 1.4 of [5].

Theorem 3.6 (Lindemann-Weierstrass).

If are distinct algebraic numbers then the numbers are linearly independent over the algebraic numbers.

A special case of the Lindemann-Weierstrass theorem is the Lindemann theorem which states that is transcendental for algebraic .

Theorem 3.7.

Assume that . If then at least one coordinate of the optimal height vector is transcendental. If , then all coordinates of are algebraic if and only if is in the cone over the secondary polytope .

Proof.

It follows from the proof of [46, Lemma 3.4] that any relevant such that is a density, is a critical point of for a maximal regular triangulation and some weight vector . We consider the rational-exponential system (3.2) for this choice of and . Then we have where is a rational function in . Assume that are algebraic. By Lindemann’s theorem is algebraic if and only if .

However, is always algebraic, since are algebraic and the algebraic numbers form a field. Hence . We can argue similarly that for all . The vector belongs to the boundary of the Samworth body if and only if the volume of the convex hull of is . In this case, is the optimal solution for any in the cone over the secondary polytope by [46, Corollary 3.9]. ∎

3.2 One cell in one dimension

In this section we apply the invertibility of the score equation matrix to give a closed form solution to log-concave maximum likelihood estimator in case the logarithm of the optimal density is a linear function on the real line. If , then

and

Hence the polynomial-exponential system (3.2) has the form

(3.3)
(3.4)

Dividing (3.3) by (3.4) and setting , gives

(3.5)

In the rest of the section we will discuss how to solve Equation (3.5) using Lambert functions. The solutions for and can then be obtained from Equations (3.3) and (3.4) by solving for