I Introduction
Minimum Message Length (MML) is a general name for any member of the family of statistical inference methods based on the minimum message length principle, which, in turn, is closely related to the family of Minimum Description Length (MDL) estimators [1, 2, 3], but predates it. The minimum message length principle was first introduced in [4], and the estimator that follows the principle directly, which was first described in [5], is known as Strict MML (SMML).
Although purportedly returning ideal inferences, SMML is never used in practice because it is computationally unfeasible and analytically intractable in all but a select few cases [6, 7].
A computationallyfeasible approximation to SMML was introduced in [8]. This is known as the WallaceFreeman approximation (WFMML), and is perhaps the MML variant that is in widest use.
Although not as popular as Maximum Likelihood (ML) or Maximum A Posteriori (MAP), MML still enjoys a wide following, with over 70 papers published regarding it in 2016 alone, including [9, 10, 11]. MML proponents cite for it a wide variety of attributes that make the method and its inferences in some ways superior to other methods, and claim that these benefits outweigh the method’s computational requirements, which are heavy even when using the WallaceFreeman approximation. (One paper [12] cites MML computation times that are over 400,000 times longer than ML computation times, despite working on a dataset of less than 1,000 items, using only 9 attributes and having only 3 classes.^{1}^{1}1It may be the case that these times would have been reducible through optimisation, but no information regarding this is given in the paper.)
One of the many properties often attributed to MML is “consistency”. For example, [13] states:
SMML has been studied fairly thoroughly, and is known […] to be consistent and efficient.
Loosely speaking, an estimate is said to be consistent for a particular problem if given enough observations the estimate is guaranteed to converge to the correct parameter value. (See [14] for a formal definition.) Importantly, it is the property of an estimate, not of an estimator: it is given in the context of a specific estimation problem. However, in the MML literature, statements such as the one quoted above are often given without specifying a particular estimation problem. To determine how to interpret such statements, consider the following quote from [15]:
These results of inconsistency of ML and consistency of MML and marginalised maximum likelihood for an increasing number of distributions with shared parameter will remain valid for e.g. the von Mises circular distribution, a distribution where Maximum Likelihood shows even worse small sample bias than it does for the Gaussian distribution. We seek a problem of this NeymanScott nature (of many distributions sharing a common parameter value) for which the MML estimate remains consistent (as we know it will) but for which the marginalised Maximum Likelihood estimate is either inconsistent or not defined.
This quote explicitly states that MML is known to be consistent even on completely unspecified estimation problems (except for the fact that they involve many distributions sharing a common parameter value), and gives three specific examples: the von Mises estimation problem, the NeymanScott estimation problem, and problems of a “NeymanScott nature”.
The claim repeated in the MML literature is therefore that MML’s consistency property is universal, independent of the specific choice of estimation problem.^{2}^{2}2The title of this paper “MML is not consistent for NeymanScott” should be interpreted as the logical opposite, which is to say “It is not true that MML is consistent for a general member of the NeymanScott estimation problem family; it will be inconsistent for some NeymanScott cases.” This claim is not only simply stated but also argued. For example, [15] claims:
The fact that general MML codes are (by definition) optimal (i.e. have highest posterior probability) implicitly suggests that, given sufficient data, MML will converge as closely as possible to any underlying model.
and [16] adds:
[B]ecause the SMML estimator chooses the shortest encoding of the model and data, it must be both consistent and efficient.
(Cf. [17].)
MML approximations are not said to hold such universal consistency properties. However, [15] calculated the WallaceFreeman MML estimate and [18] calculated another MML approximation known as “Ideal Group” (IG), both working on the NeymanScott estimation problem [19], which is a problem on which many other estimation methods do not produce consistent estimates.
The fact that this example can be computed using the Ideal Group approximation, which is often as intractable as SMML itself, has made the NeymanScott problem a touchstone in the MML literature and a consistentlygiven example to showcase its superior performance.
Note, however, that NeymanScott is a frequentist problem, defined by its likelihoods. To put any Bayesian method, including any of the MML variants discussed, to use on an estimation problem, one must also define a prior distribution for the estimated parameters. Without a specified prior, the “NeymanScott problem” is, from a Bayesian viewpoint, an entire family of estimation problems. Both [15] and [18] analysed the problem under a specific prior, which we shall name the Wallace prior.
Importantly, neither in these two papers nor in papers citing this result (e.g., [20, 21, 22, 23]) is the result ever restricted by this choice of prior. For example, [24] writes:
[T]he WallaceFreeman MML estimator […] and the DoweWallace Ideal Group (IG) estimator have both been shown to be statistically consistent for the NeymanScott problem.
This paper analyses the NeymanScott problem under another prior which we name the scale free prior. We show that neither the statements about SMML nor about the MML approximations are true for this instance of the NeymanScott problem, thus giving a counterexample to these general claims. In fact, outside of a few simple, onedimensional cases for which ML is also consistent, SMML has not been shown to be consistent anywhere, and there is no reason to assume SMML holds any consistency properties that are superior to those of ML.
The methods developed in this paper, allowing for the first time direct, nonapproximated analysis of a highdimensional SMML solution, are general, and applicable beyond just the NeymanScott problem.
Table I lists the consistency properties of MML, both previously known and new to this paper.
SMML  WFMML  IG  

unknown  yes  yes  

no  no  no  

unknown  no  unknown  

unknown  unknown  unknown  

Ii Definitions
Iia Notation
This paper deals with the problem of statistical inference: from a set of observations, , taken from (the observation space) we wish to provide a point estimate, , to the value,
, of a random variable,
, drawn from (parameter space). When speaking about statistical inference in general, we use the symbols introduced above. For a specific problem, such as in discussing the NeymanScott problem, we use problemspecific names for the variables. However, in all cases Latin characters refer to observables, Greek to unobservables that are to be estimated, boldface characters to random variables, nonboldface characters to values of said random variables, and hatnotation to estimates. Boldface is used for the observations, too, when considering the observations as random variables.All point estimates discussed in this paper are defined using an or an . We take these as, in general, returning sets. Nevertheless, we use “”, as shorthand for “”.
To be consistent with the notation of [18], we use to indicate the prior and
(1) 
as the marginal. The integral of over may be (in which case it is a proper prior and the problem is a proper estimation problem) but it may also integrate to other positive values (in which case it is a scaled prior) or diverge to infinity (in which case it is an improper prior). Our analysis will reject a prior as pathological only if it does not allow computation of a marginal using (1).
When speaking of events that have positive probability, we will use the notation. However, in calculating over a scaled or improper prior some probabilities will be correspondingly scaled when computed as an integral over the prior or the marginal. For these we use the notation.
For reasons of mathematical convenience, we take both the observation space, , and the parameter space, , as complete metric spaces, and assume that priors, likelihoods, posterior probabilities and marginals are all continuous, differentiable, everywherepositive functions. This allows us to take limits, derivatives, s, s, etc., freely, without having to prove at every step that these are welldefined and have a value.
IiB Mml
Minimum Message Length (MML) is an inference method that attempts to codify the principle of Occam’s Razor in informationtheoretic terms [18].
Define
(2) 
Given a piecewiseconstant function , we define
and
The SMML estimator is usually defined as the minimiser of . However, because this minimiser may not be unique, we use the more rigorous
Functions minimising are known as SMML codebooks.
IiC The Ideal Point
We introduce the notion of an “ideal point” which will be central to our analysis. This is built on an approximation for SMML known in the MML literature as Ideal Group [18].
The Ideal Group estimator is defined in terms of its functional inverse, mapping values to (sets of) values. We refer to such functions as reverse estimators and denote them . The Ideal Group reverse estimator is defined as
(3) 
where is a threshold whose value is given in [18], and which is computed in a way that guarantees that the ideal group is a nonempty set for each .
Because the ideal group is always nonempty, it must include
We refer to this as the Ideal Point approximation (a notion and a name that, unlike the Ideal Group, are new to this paper).
We denote the inverse functions of reverse estimators, e.g.
by the same hat notation as estimators, but stress that these are only true estimators (albeit, perhaps, multivalued) if the reverse estimator is a surjection.
IiD The NeymanScott problem
Definition 1.
The NeymanScott problem [19] is the problem of jointly estimating the tuple after observing , each element of which is independently distributed .
It is assumed that .
The interesting case for NeymanScott is to observe the behaviour of the estimate for when this estimate is part of the larger joint estimation problem, while taking to infinity and fixing .
This setup creates an inconsistent posterior [25], a situation where even with unlimited data, the uncertainty regarding the true value of remains high, even though the value of is known with high confidence. Because of this, many of the popular estimation methods fail to return a consistent estimate for in this scenario. Maximum Likelihood, as a case in point, returns the estimate , rather than .
MML’s success on the NeymanScott problem has made it an oftcited showcase for the power of this method. In [18] alone, eight entire sections (4.2 – 4.9) are devoted to it. As another example, [24], using the NeymanScott problem as a key example, writes about the family of MMLbased estimates that these are likely to be unique in being the only estimates that are both statistically invariant to representation and statistically consistent even for estimation problems where, as in the NeymanScott problem, the joint posterior does not fully converge.
It is in this context that our finding that MML is, in fact, no better than ML for NeymanScott becomes highly significant for MML at large.
Iii Analysis of the Ideal Group approximation
Although [18] discusses the NeymanScott problem at great length, the actual analysis of the Ideal Group estimate for it (ibid., Section 4.3) is brief enough to be quoted here in full.
Given the Uniform prior on and the scale free prior on , we do not need to explore the details of an ideal group with estimate . It is sufficient to realise that the only quantity which can give scale to the dimensions of the group in
space is the Standard Deviation
. All dimensions of the data space, viz., are commensurate with .
Hence, for some and , the shape of the ideal group is independent of and , and its volume is independent of but varies with as . Since the marginal data density varies as , the coding probability , which is the integral of over the group, must vary as . The Ideal Group estimate for data obviously has , and the estimate of is found by maximizing as
Unfortunately, the argument of [18] is incorrect. For the shape of the ideal group to be independent of and it is not enough for one to be translation invariant and for the other to be scale invariant. The solution of (3) is only scale and translation independent if , as a single unit, is simultaneously both scale and translation invariant.
Definition 2.
An inference problem will be called scale free if for some parameterization of and , both of which are assumed to be vector spaces, it is true that for every , , and ,
(4) 
and
(5) 
where “” and “” refer to the set notation
Translation independence can be defined analogously.
Notably, in our problem, for (5) to hold, i.e. for the shape of the likelihood distribution not to change when switching scales, the scale change must be not only in but also in . The prior advocated in [18] (which we refer to as “the Wallace prior”), does not satisfy (4), because the change from integrating over to integrating over increases the area of integration by a factor of , where the exponent is simply the dimension of the parameter space: one dimension for and for the parameters.
The only prior which satisfies the claims of [18] regarding the shape of the ideal group is therefore . We will refer to it as the scale free prior, and call the NeymanScott problem under it the scale free NeymanScott problem.
Both priors have an improper scale free distribution on
and both have an improper uniform distribution on
given , but in order to attain scale freedom, one must relinquish the idea that and are independent: in the scale free prior, the are individually scale free, whereas in the Wallace prior they are individually uniformly distributed.The original proof of [18] is therefore incorrect. Its claim that the ideal group approximation is consistent for the prior is, however, true. We present here an alternative proof for this, which works in the native observation space and utilises the concept of the ideal point.
Theorem 1.
The Ideal Group MML reverse estimator is consistent for the NeymanScott problem under the Wallace prior.
Proof.
In the observation space, the probability density of a given set of observations, , given a particular choice of and , is
(6) 
Under the Wallace prior, this results in the marginal probability density of the observations being
(7) 
Note that is a sufficient statistic for this problem, because both and can be calculated based on it, where for we use the relation
For this reason (following [18]), we can present the equations above solely in terms of .
Recall that the ideal point is defined as the value minimising , and for this reason guarantees that the ideal group for necessarily includes it.
Differentiating according to and according to each we reach the desired
∎
Unfortunately, Theorem 1 does not hold for the scale free prior.
Theorem 2.
The Ideal Group MML reverse estimator is not consistent for the scale free NeymanScott problem. In particular, it contains for the point , which is the (inconsistent) maximum likelihood estimate, as the Ideal Point.
The proof is essentially the same as above, but substituting in the scale free prior instead of the Wallace prior.
Proof.
We begin by recalculating and given the new prior.
(8) 
Following the same argument as before, the Ideal Point estimator now becomes
which is identical to the maximum likelihood estimate, and well known to be inconsistent. ∎
We remark that much as we were able to switch the Ideal Group approximation from being consistent to being inconsistent by the change of prior, we can do the same while keeping the prior but making a slight change in the likelihoods: instead of using
as in the standard NeymanScott setup, one can make the Ideal Group approximation inconsistent by switching to
The reason for this is that the new problem is the same as the NeymanScott problem under the scale free prior, except for a change of parameters: what was before is now . All MML methods discussed are invariant to such reparameterization.
This demonstrates an important point: the prior is in no way pathological, nor can it be blamed for the inconsistency. The same inconsistency is equally reproducible with the Wallace prior.
Iv SMML analysis
Iva Some special types of inference problems
The NeymanScott problem satisfies many good properties that enable our analysis, but which are not unique to it. We enumerate them here.
We begin by defining transitivity, a property that generalises the notion of scale freedom.
Definition 3.
An automorphism for an estimation problem , with and , is a pair of continuous bijections, and , such that

For every ,
(9) and

For every and every ,
(10)
where .
For reasons of mathematical convenience, we assume that and are such that the Jacobians of these bijections, and , are defined everywhere, and their determinants, and , are positive everywhere. This allows us, for example, to restate condition (9) as
and condition (10) as
An estimation problem will be called observation transitive if for every there is an automorphism for which .
An estimation problem will be called parameter transitive if for every there is an automorphism for which .
An estimation problem will be called transitive if it is both observation transitive and parameter transitive.
Lemma 2.1.
The scale free NeymanScott problem with fixed and and with observable parameters is transitive.
Proof.
Transitivity of the NeymanScott problem stems from its scale and translationinvariance: Consider and with .
It is straightforward to verify that is an automorphism. Furthermore, for any and it is straightforward to find parameters and that would map to , and similarly for and . ∎
Transitivity implies other good properties. Define
Definition 4.
An estimation problem , with and , will be called homogeneous if the value of is a constant, , for all .
Lemma 2.2.
Every parametertransitive estimation problem is homogeneous.
More generally, for any , if there exists an automorphism such that , then .
Proof.
Assume to the contrary that for some such , the inequality holds.
Let be an automorphism on such that , and let be a value such that attains its minimum at .
By definition,
contradicting the assumption.
The option also cannot hold, because is also an automorphism, this one mapping to . ∎
Similarly:
Definition 5.
An estimation problem , with and , will be called comprehensive if the value of
is a constant, , for all .
Lemma 2.3.
Every observationtransitive estimation problem is comprehensive.
More generally, for any for which there exists an automorphism such that , .
Proof.
The proof is identical to the proof of Lemma 2.2, except that instead of choosing such that we now choose an automorphism such that , and instead of choosing such that attains its minimum at , we choose such that attains its minimum over all at . ∎
Another good property, and one that one would expect of a typical, natural problem, is concentration.
Define for every ,
and
Definition 6.
An estimation problem will be called concentrated if for every there is an for which is a bounded set.
Lemma 2.4.
The scale free NeymanScott problem with fixed and and with observable parameters is concentrated.
Proof.
The general formula for in the NeymanScott problem has been given in (8), and can easily be shown to be a strictly convex function of , with a unique minimum, for any . As such, is bounded for any .
Consider, now, the NeymanScott problem under the parameterization and . In this reparameterization, it is easy to see that for any translation function, , is an automorphism. In particular, this means that for any ,
All such sets are translations of each other, having the same volume, shape and bounding box dimensions.
It follows regarding the inverse function, , that for any it maps to a set of the same volume and bounding box dimensions as each , albeit with an inverted shape.
In particular, it is bounded.
Being bounded under the new parameterization is tantamount to being bounded under the native problem parameterization. ∎
The last good property we wish to mention regarding the scale free NeymanScott problem is the following.
Definition 7.
An estimation problem will be called local if there exist values and such that for every there exist , such that for all outside a subset of of total scaled probability at most ,
(11) 
Lemma 2.5.
Every proper estimation problem is local.
Proof.
Consider any estimation problem over a normalised (unscaled) prior, and consequently also a normalised (unscaled) marginal.
The total probability over all is, by definition, , so choosing satisfies the conditions of locality. ∎
Lemma 2.6.
The scale free NeymanScott problem is local.
Definition 8.
An estimation problem is called regular if it is observationtransitive, homogeneous, concentrated and local.
IvB Relating SMML to IP
We will now show that for regular problems, one can infer from the IP solution to the SMML solution.
Our first lemma proves for a family of estimation problems that the SMML solutions to these problems do not diverge entirely, in the sense of allocating arbitrarily high (scaled) probabilities to single values. Although a basic requirement for any good estimator, no such result was previously known for SMML.
For a codebook , let
be known as the region of in .
Lemma 2.7.
For every local estimation problem there is a such that no SMML codebook for the problem contains any whose region has scaled probability greater than in the marginal distribution of .
Proof.
Let and be as in Definition 7. Note that can always be increased without violating the conditions of the definition, so it can be assumed to be positive.
Assign for a constant to be computed later on, and assume for contradiction that contains a whose region, , has scaled probability greater than . By construction, contains a nonempty, positive scaled probability region wherein (11) is satisfied.
Let be the scaled probability of , and let be .
Also, define , noting that
(12) 
because, by assumption, and , so
We will design a codebook such that , proving by contradiction that is not optimal.
Our definition of is as follows. For all , . Otherwise, will be the value among for which the likelihood of is maximal.
Recall that
Because, by construction, the set , of scaled probability , satisfies that for any ,
we have
(13) 
On the other hand, the worstcase addition in (scaled) entropy caused by splitting the set into separate values is if each receives an equal probability. We can write this worstcase addition as
(14) 
This is in the case that . If , the expression is dropped from (14). This change makes no difference in the later analysis, so we will, for convenience, assume for now that .
To reach a contradiction, we want . If , equation (15) degenerates to for an immediate contradiction. Otherwise, contradiction is reached if
or equivalently if
(16) 
A small enough value can bring the lefthand side of (16) arbitrarily close to , and in particular to a value lower than for any .
Lemma 2.7 now allows us to draw a direct connection between SMML and .
Theorem 3.
In every local, homogeneous estimation problem , for every SMML codebook and for every there exists a for which the set
is a set of positive scaled probability in the marginal distribution of .
Proof.
Suppose to the contrary that for some , no element is mapped from a positive scaled probability of values from its respective .
Let be the set of values with positive scaled probability regions in , and let be the directed graph whose vertex set is and which contains an edge from to if the intersection
has positive scaled probability. By assumption, has no selfloops.
Let
We claim that for any ,
(17) 
an immediate consequence of which is that and therefore cannot have any cycles.
To prove (17), note first that because of our assumption that all likelihoods are continuous, , for every and any choice of , has positive measure in the space of , and because of our assumption that all likelihoods are positive, a positive measure in the space of translates to a positive scaled probability. This also has the side effect that all vertices in must have an outgoing edge (because this positive scaled probability must be allocated to some edge).
Next, consider how transferring a small subset of , of size , in from to changes . Given that can be made arbitrarily small, we can consider the rate of change, rather than the magnitude of change: for to be optimal, we must have a nonnegative rate of change, or else a smallenough can be used to improve . Given that is the sum of over all , by transferring probability from to , the rate of change to is .
Consider now the rate of change to . By transferring probability from , where it is outside of (and therefore by definition assigned an value of at least ) to , where it is assigned into (and therefore by definition assigned an value that is smaller than ) the difference is a reduction rate greater than .
The condition that the rate of change of is nonnegative therefore translates simply to (17), thus proving the equation’s correctness.
However, if contains no selfloops and no cycles, and every one of its vertices has an outgoing edge, then it contains arbitrarily long paths starting from any vertex. Consider any such path starting at some of length greater than , where is as in Lemma 2.7. By (17), we have that the scaled probability assigned to the value ending the path is greater than , thereby reaching a contradiction. ∎
We can now present our main theorem, formalising the connection between the SMML estimator and the ideal point.
Theorem 4.
In any regular estimation problem, for every ,
(18) 
In particular, is a true estimator, in the sense that for every .
Proof.
Let be a value for which we want to prove (18).
From Theorem 3 we know that for all there exists a codebook and a for which is nonempty. Let be a value inside this intersection.
By observation transitivity, there is an automorphism such that . Let .
Let us define by . It is easy to verify that by the definition of automorphism , so is also an SMML codebook, and furthermore
so .
Consider now a sequence of such for . The set is a complete metric space, by construction the reside inside the nested sets , and by our assumption that the problem is concentrated, for a small enough , is bounded. We conclude, therefore, that the sequence has a converging subsequence. Let be a bound for one such converging subsequence.
We claim that is inside both and , thus proving that their intersection is nonempty.
To show this, consider first that we know because is a continuous function, and by construction .
Lastly, for every in the subsequence, , so follows from the closure of the SMML estimator (which is guaranteed by definition). ∎
Corollary 4.1.
For the scale free NeymanScott problem with fixed and , and .
In particular, this is true when approaches infinity, leading SMML to be inconsistent for this problem.
Proof.
The IP estimator for the NeymanScott problem was already established in Theorem 2 to be singlevalued and equal to the ML estimator. The value of the SMML estimator therefore follows from Theorem 4.
As the consistent estimator for is and not , the SMML estimator is inconsistent. ∎
At first glance, this result may seem impossible, because, as established, an SMML codebook can only encode a countable number of values. Corollary 4.2 resolves this seeming paradox.
Corollary 4.2.
The scale free NeymanScott problem with fixed and admits uncountably many distinct SMML codebooks, and for every value there is a continuum of SMML estimates.
Proof.
Uncountably many distinct codebooks can be generated by arbitrarily scaling and translating any given codebook, which, as we have seen, does not alter .
To show that for every value there are uncountably many distinct SMML estimates, recall from our proof of Lemma 2.4 that if we consider the problem in observation space and parameter space, then both scaling and translation in the original parameter space are translations under the new representation. If any belongs to a region of volume in this space that is mapped to a particular by a particular , one can create a new codebook, , which is a translation of in both and , which would still be optimal.
As long as the translation in observationspace is such that is still mapped into its original region, its associated will be the correspondinglytranslated . As such, the volume of values associated with a single is at least as large the volume of the region of (and, by observationtransitivity of the problem, at least as large as the volume of the largest region in the codebook’s partition). ∎
SMML is therefore not a point estimator for this problem at all.
IvC Relating IP to ML
Beyond the connections between the SMML solution and the Ideal Point approximation, there is also a direct link to the maximum likelihood estimate.
Theorem 5.
If is a homogeneous, comprehensive estimation problem, then .
Proof.
By definition,
By assumption, the estimation problem is homogeneous, so is a constant, , independent of . Substituting into the definition of and calculating the functional inverse, we get
For an arbitrary choice of , let be such that . The value of is , and there certainly is no for which (or this would contradict homogeneity), so, using the notation of Definition 5, .
Thus,
∎
Corollary 5.1.
In any regular estimation problem, for every ,
V Additional results
The following is a list of additional results that are immediate corollaries of the above. They are given with sketched proofs.

Contrary to the oftcited claims of [24], the WallaceFreeman approximation [8] is inconsistent for the scale free NeymanScott problem. In fact, every frequentist estimation problem for which ML is inconsistent (such as the von Mises problem) admits a prior for which the WallaceFreeman approximation is inconsistent. This follows from the folkloric and immediate result (cf. [18], p. 412) that the WallaceFreeman approximation coincides with ML for estimation problems whose prior is their Jeffreys prior [26, 27]. The scale free prior happens to also be a Jeffreys prior for the NeymanScott problem.

Contrary to the claims of [13] and others, SMML does not satisfy internal consistency in the sense of returning the same estimate whether it is estimated jointly with or alone. The problem of estimating only is also regular, for which reason IG, IP, SMML and ML all coincide for it. The ML estimate is in this case consistent, and therefore not equal to the former estimate. The same is true also for the WallaceFreeman approximation, as the marginalised problem also has a Jeffreys prior.

Contrary to the claims of [28]
, it is not true that when SMML is applied in parallel to a large number of independent estimation problems, its predictions for each individual problem are distributed with the same mean and variance as the posterior for
. Parallel estimation of multiple independent regular problems is, itself, a regular problem. Hence, the SMML estimate for each individual problem will coincide with that problem’s ML estimate, even when this is inconsistent.
[Proof that NeymanScott is local]
We prove Lemma 2.6, stating that the scale free NeymanScott problem is local.
Proof.
Set , for a value to be chosen later on. Importantly, , and all other constants introduced later on in this proof (e.g., , and ) depend solely on and and are not dependent on . As such, they are constants of the construction.
Let , and for let be the vector identical to except that its ’th element equals . Let be the vector identical to except that its ’th element equals .
For , we use all and all . Next, we pick , where is Euler’s constant.
This leaves a further values of to be assigned. To assign these, divide for each the range between and into equallength segments, and let be the set containing the centres of these segments. We define our remaining values as
We will show that, for a constant to be chosen later on, outside a subset of of total scaled probability ,
Equivalently:
(19) 
Showing this is enough to prove the lemma, because for a sufficiently large ,
so by choosing the conditions of Definition 7 are satisfied. (Recall that is a constant of the construction, and therefore can depend on .)
To prove (19), let us divide the problem into cases. First, let us show that this holds true for any value for which, for any , . To show this, assume without loss of generality that for a particular the equation holds true.
Comments
There are no comments yet.