# Variances of surface area estimators based on pixel configuration counts

The surface area of a set which is only observed as a binary pixel image is often estimated by a weighted sum of pixel configurations counts. In this paper we examine these estimators in a design based setting -- we assume that the observed set is shifted uniformly randomly. Bounds for the difference between the essential supremum and the essential infimum of such an estimator are derived, which imply that the variance is in O(t^2) as the lattice distance t tends to zero. In particular, it is asymptotically neglectable compared to the bias. A simulation study shows that the theoretically derived convergence order is optimal in general, but further improvements are possible in special cases.

## Authors

• 2 publications
02/14/2021

### Improved Estimators for Semi-supervised High-dimensional Regression Model

We study a linear high-dimensional regression model in a semi-supervised...
05/29/2022

### Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds

We study stochastic zeroth order gradient and Hessian estimators for rea...
06/01/2021

### Median bias of M-estimators

In this note, we derive bounds on the median bias of univariate M-estima...
03/15/2018

### A Unified Theory of Regression Adjustment for Design-based Inference

Under the Neyman causal model, it is well-known that OLS with treatment-...
12/30/2021

### Optimal Difference-based Variance Estimators in Time Series: A General Framework

Variance estimation is important for statistical inference. It becomes n...
07/17/2019

### Edge Detection for Event Cameras using Intra-pixel-area Events

In this work, we propose an edge detection algorithm by estimating a lif...
10/27/2020

### Impossibility of phylogeny reconstruction from k-mer counts

We consider phylogeny estimation under a two-state model of sequence evo...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

There are several competing algorithms for the computation of the surface area of a set which is only observed through a pixel image, e.g. [1, 9, 10, 11, 17]; see [8, Sec. 12.2 and 12.5] for an overview. A computationally fast and easy to implement approach is taken by so-called local algorithms, [10, 11, 17] in the above list. The idea behind these algorithms is the following: In a -dimensional image a pattern of side length is called -pixel configuration ( factors). Mathematically it is modeled as a disjoint partition of into two disjoint subsets, where represents the set of black pixels and represents the set of white pixels. Since the set consists of points and each point is either colored black or colored white, there are pixel configurations. We enumerate them as . A pixel image of lattice distance can be represented by the set of its black pixels, where for and is the homothetic image of with scaling factor and scaling center at the origin. While the observation window is usually bounded in applications, our results hold only if the observed set is completely contained in the observation window and thus we assume that the observation window is . Now the -th pixel configuration count at lattice distance of the image is defined as

 Nt,j(A)={v∈Zd∣(A−tv)∩{0,…,(n−1)t}d=tBj}.

It represents the number of occurrences of the -th pixel configuration in the image , cf. Figure 1. A local algorithm now approximates the surface area of a set by a weighted sum

 2(nd)∑j=1td−1wjNt,j(A) (1)

of pixel configuration counts, where is a pixel image of and are constants chosen in advance, called weights. The factor compensates for the fact that increases of order as for those pixel configurations which are “responsible” for the surface area (see [22, Corollary 3.2] for ways how to make this precise). The two pixel configurations and consisting only of white pixels resp. consisting only of black pixels lie typically outside resp. inside the image set and not on its boundary. Hence these counts do not provide information about the surface area. Thus one should put

 w1=w2(nd)=0, (2)

which will be assumed in this paper.

For the theoretical investigation of such algorithms we assume that the set is randomly shifted, i.e. the random set

for a random vector

is considered. A natural choice for the distribution of

is the uniform distribution on

, but the results of this paper will hold of any distribution of . As discretization model the Gauss discretization is used, i.e. a pixel is colored black if it lies in the shifted set and is colored white otherwise. Thus the model for the image of is

 A=(K+tU)∩tZd.

Under these assumptions no asymptotically unbiased estimator for the surface area exists. More specifically, it is shown in

[25] that any local estimator of the surface area in attains relative asymptotic biases of up to for certain test sets, when is uniformly distributed on , and explicit weights for which this lower bound is achieved are given. However, the bias is only one component of the error. There are a number of papers investigating the other component, namely the variance, for other estimators. Hahn and Sandau [5] as well as Janác̆ek and Kubínová [4] investigate estimators which are suitable when the picture is analogue or when limited computational capacity requires an artificial coarsening of the image. Svane [24] investigates the variance of local algorithms for gray-scale images based on single pixels, i.e. . There is no estimator for binary digital images for which the variance has been investigated so far. The objective of the present paper is to examine the variance of local estimators for binary digital images.

While we use a similar setup as [25], we need slightly more strict regularity assumptions on the set . In we assume:

1. The boundary of is piecewise the graph of a convex or concave function with either of the two coordinate axis being the domain, i.e.

 bdK=m⋃k=1Fk,

where for each we have either

 Fk={(x,fk(x))∣x∈Dk}orFk={(fk(x),x)∣x∈Dk},

where is a compact interval and is a continuous function which is convex or concave, and where for contains only points that are both endpoints of and of . At the intersection point of two sets and they form an angle of strictly positive width (while an angle of is allowed).

An even more strict assumption is needed in . We require:

1. The set is of the form

 K=cl(m′⋃k=1Lk∖m⋃k=m′+1Lk),

where denotes the closure of a set and where the sets are either convex polytopes with interior points or compact convex sets with interior points for which is a -manifold with nowhere vanishing Gauss-Kronecker curvature. In intersection points the bodies and do not have a common exterior normal vector. Geometrically this means that and intersect nowhere under an angle of zero.

Under the above assumptions we can show our main result.

###### Theorem 1.

Let be a set fulfilling Assumption (R1) if or Assumption (R2) if and let be an estimator of the form (1) fulfilling (2). Then there is a constant such that

 sup{^St(K+tv)∣v∈[0,1)d}−inf{^St(K+tv)∣v∈[0,1)d}≤s⋅t,t>0. (3)

Thus,

 Var(^St(K+tU))≤s2t2/4,t>0,

where is a random vector.

In the first step of the proof of Theorem 1 we show that Assumption (R1) or Assumption (R2) implies that the boundary of can be decomposed into certain sets to be defined below such that the intersections are small for in a certain sense. In the second step we derive upper and lower bounds for certain sums of pixel configuration counts. Since it will be possible to reconstruct the individual pixels configuration counts from these sums, the bounds derived in the second step imply the assertion of Theorem 1. The details are given in Section 2.

In Section 3 we show by an example that the assertions of Theorem 1 do not need to hold for a set that is the union of two convex sets which intersect under an angle of zero. Moreover we show that an essential lemma (Lemma 3) of our proof fails to hold for general compact and convex sets . Thus the method of our proof breaks down completely without the assumption that the sets from (R2) are either polytopes or sufficiently smooth. It is unclear, whether the assertion of Theorem 1 still holds in this more general situation. A simulation study (Section 4) shows that the order derived in Theorem 1 is optimal for the cube and thus is optimal in general, whereas a better bound can be achieved for the ball. In the simulation part we will also examine the integral of mean curvature. In Section 5 we discuss our results, we compare them to the results Svane [24] obtained for gray-scale images and we mention some open questions.

## 2 The proof

In this section we prove Theorem 1. We start by introducing some notation and in particular defining the sets mentioned in the introduction. Then we show that in dimension Assumption (R1) implies the existence of an appropriate boundary decomposition of , followed by a proof that in dimension such a decomposition is implied by (R2). After this, we prove that this boundary decomposition implies certain upper and lower bounds on the pixel configuration counts. Finally we show how these bounds imply Theorem 1.

### 2.1 Notation

We assume and to be fixed and hence we will suppress dependence on and in the notation. We fix an enumeration of the points in and consider for every permutation the set

 ~Gp:={u∈Sd−1∣⟨xp(1),u⟩<⟨xp(2),u⟩<⋯<⟨xp(nd),u⟩},

where , and . Notice that

 Gp:={u∈Sd−1∣⟨xp(1),u⟩≤⟨xp(2),u⟩≤⋯≤⟨xp(nd),u⟩},

unless is empty. If and the non-empty sets are the eight arcs which essentially (up to permuting the indices or changing the sign of the entries) look like

 G1={(u1,u2)∈S1∣0≤u1, 0≤u2, u1≤u2}.

If then there are more (and thus smaller) arcs.

If and there are 48 sets which are isometric to

 G1={(u1,u2,u3)∈S2∣0≤u1, 0≤u2, 0≤u3, u1≤u2, u1+u2≤u3},

and 48 sets that are isometric to

 G2={(u1,u2,u3)∈S2∣0≤u1, 0≤u2, 0≤u3, u1≤u2≤u3, u3≤u1+u2}.

For each let denote the closure of the set of all boundary points of that have at least one exterior normal vector in . Taking the closure is necessary in order to ensure because there are boundary points of in which there is no exterior normal vector. An illustrative example is given in Figure 2.

Consider a further decomposition

 bdK=μ⋃κ=1M+κ,

where each set is an intersection if fulfills (R1) and an intersection if fulfills (R2). In order to ensure that these sets intersect not “not much”, put , . Let be the element of with - choose an arbitrary one if it is not unique.

A cell is a set of the form for some point . For let denote the system of cells , such that and . Let denote the system of all cells intersecting both and another boundary component , , i.e.

 C′κ={C=[l1,l1+n−1]×⋯×[ld,ld+n−1]∣ (l1,…,ld)∈Zd, C∩Mκ≠∅, C∩Mλ≠∅ for some λ∈{1,…,μ}∖{κ} }.

Put , and (notice that is the number of cells intersected by more than one boundary component counted with multiplicity – a cell intersected by two components is counted twice, a cell intersected by three components is counted three times etc.).

### 2.2 The boundary decomposition

We will now show that the Assumptions (R1) resp. (R2) imply that the number of cells intersecting more than one boundary component is small in a certain way.

###### Lemma 2.

Let be a compact set satisfying (R1).

Then for and sufficiently large for a bound which may depend on , but not on or .

Proof: Fix . Since the functions of (R1) are assumed to be either convex or concave, the set is connected. Cells of intersect also another boundary part , . By the compactness of the sets there is such that for in such a situation always and intersect. They intersect usually in one point, in some exceptional cases in two points. We will assume that there is only one intersection point in the following, since in the case of two intersection points the notation is blown up, while the ideas of the proof remain the same.

If the angle between and at their intersection point is bigger than , then and the angle between two vectors from and from is at least . Hence any cell intersecting both and must contain a point which has distance at most from and therefore there can be at most such cells (it would not be difficult to obtain a far lower bound; however, it only matters that this bound is independent of , so we will not take the effort of improving it).

So assume from now on. Let and be the unit vectors such that are normal vectors of in and are normal vectors of in , oriented in such a way that both and point from to in a neighborhood of (by the assumptions made so far, the angle which and form at is strictly positive but not larger than , so this choice is properly defined; it is convenient to orient the vectors like this and ignore whether they are now outward or inward normal vectors). Let and and put and . There is some with and , where . For sufficiently large the sets and have distance more than and hence there can be no cell which intersects both sets. Let denote the angle between and and let denote a half-plane with and . Then a point of which lies in the same cell as a point from can have at most distance from . So there can be at most cells which intersect both and .

Altogether we have

 N′κ(rK+v)≤∑κ′:ακ,κ′>π/225n2+∑κ′:ακ,κ′≤π/2((n−1)2√2/sin(~ακ,κ′)+n)2,

where the sums are taken over all with (if consists of two points, then contributes two summands to the sum). Summing up we get

 N′(rK+v)≤∑κ,κ′:ακ,κ′>π/225n2+∑κ,κ′:ακ,κ′≤π/2((n−1)2√2/sin(~ακ,κ′)+n)2,

where the sums are taken over all ordered pairs

with . ∎

###### Lemma 3.

Let be a compact set fulfilling (R2).

Then for and large enough for a bound which may depend on , but not on or .

In the proof of this lemma we need the following lemma. Let denote the Lebesgue measure of the -dimensional unit ball.

###### Lemma 4.

Let be a rectangle of side-lengths

within a hyperplane

. Assume . Then intersects at most cells.

Proof: Consider the parallel set

 R⊕ρ:={x∈Rd∣∥x−y∥≤ρ for one y∈R},ρ≥0,

of . By the Steiner formula its Lebesgue measure is given by

 λd(R⊕ρ)=d∑j=0~κd−jρd−jVj(R),

where is the -th intrinsic volume of ; see e.g. [18, (4.1)]. The intrinsic volumes of the rectangle are given by

 Vj(R)=∑1≤ij≤⋯≤ij≤d−1ai1⋅⋯⋅aij≤(d−1j)a1⋅a2⋅⋯⋅aj,j=0,…,d−1 and Vd(R)=0.

Hence

 λd(R⊕(n−1)√d)≤d−1∑j=0~κd−j(d−1j)(n−1)d−jd(d−j)/2a1…aj.

A cell intersected by is completely covered by . In particular, for a cell intersected by , the subset is completely covered by , and these subsets are disjoint for different cells. Hence the assertion follows. ∎

Proof of Lemma 3: Fix . Each cell of intersects another boundary part , . By the compactness of the sets there is such that for in such a situation and always intersect. We have to distinguish several cases:

1. case: and are parts of polytopes:
We may assume w.l.o.g. that and are contained in hyperplanes. Since the angle under which and meet is non-zero, their intersection is at most -dimensional.

Let be the set of points in that lie in a cell which is also intersected by . A point in can at most have distance from and therefore it can most have distance from the affine hull of , where is the angle under which and intersect. However, the metric projection of a point onto does not need not to lie in and so we have to find an upper bound for the diameter of , where denotes the metric projection of onto . For consider the boundary of the parallel set of at distance within . Let be small enough that either lies in or has distance at least from for any vertex of . Then

 βλ:=min{d(Mλ∩(x+E⊥),x)∣x∈I+ρλ}/ρ>0,

where , does not depend on . Put . The diameter of is at most , where is the diameter of ; indeed, since a point can have distance at most from , the point can have distance at most from .

Altogether, is contained in a -dimensional rectangle with side lengths at most and the remaining side length being at most . Thus, by Lemma 4, the number of cells that are intersected both by and by is bounded by a polynomial of degree .

2. case: and belong to the same smooth body :
The support function of a non-empty compact set is defined as

 h~L(u):=max{⟨x,u⟩∣x∈~L},u∈Rd;

see [18, Sec. 1.7.1]. From [18, p. 115] we get that is twice differentiable on , since is convex and is a -manifold with non-vanishing Gauss-Kronecker curvature. Let be its Hessian matrix in and put , where denotes the matrix norm induced by the Euclidean norm.

The sets and belong to two different sets and for . In any point there must be a normal vector of that lies in . By [18, Corollary 1.7.3] can have an exterior normal vector in only in points that lie in the image of under . The set is a subset of a -dimensional sphere. Hence it can be covered by sets isometric to . This set is the image of under the mapping which is Lipschitz continuous with Lipschitz constant . There are points with and thus , where . Hence there are points such that . Now for every point one of the points , has distance at most , since has Lipschitz constant .

Since the principle curvatures depend continuously on the point, a compactness argument ensures that the principle radii of curvature of are bounded from below by for sufficiently large . Then a point in a cell intersecting both and can have distance at most from the nearest point in and therefore it has distance less than from the nearest point . Thus there can be at most cells intersecting both and .

3. case: belongs to a polytope , while belongs to a smooth convex body (or the other way round):
Let denote the affine hulls of the facettes of which are intersected by . Then is the boundary of a convex body lying in for each . Put .

By the smoothness assumption on , the angle and form at the points is continuous as function of . By the compactness of it attains a minimum . Unlike in the 1st case one cannot assume that two points and are always seen from an appropriate point under an angle of at least , since is curved. We shall explain now, why this can be assumed with replaced by . Since is a -manifold, there is some critical radius such that the metric projection is defined uniquely for any point of distance less than to . Let for be the infimum over all for which there is such that one of the four points lies in , where is a unit normal vector of in within the linear subspace which is parallel to and where is the unit normal vector of . Now is lower semicontinuous, since if , and with for all converge to limits , and , then . Hence attains a minimum on . Let be such that for any . Let be large enough such that the distance of to is larger than . Then cell intersecting both and must contain a point of distance at most from .

Now can be represented as union of the graphs of convex function, which are Lipschitz continuous with Lipschitz constant . Hence there are points with , where is the diameter of . Now a cell intersecting both and must contain a point of distance at most from the nearest point Hence there are less than

 s∑i=1N(i)⋅(⌈2(n−1)√d/sin(α/2)+n+2⌉)d=s∑i=1(2d−2)⋅(⌈√(d−2)/2⋅r⋅Λ(i)⌉)d−2⋅(⌈2(n−1)√d/sin(α/2)+n+2⌉)d

cells intersecting both and .

4. case: and belong to different smooth bodies and :
Fix . Then there are a neighborhood of , a vector , a neighborhood of in and two -functions such that

 bdLki∩U={w+gi(w)ν1+z0∣w∈W},i=1,2.

By the implicit function theorem there is a unit vector , a neighborhood of in and a -function with

 Γ:=bdLk1∩bdLk2∩U={w+h(w)ν2+g1(w+h(w)ν2)ν1+z0∣w∈W′}

possibly after replacing by a smaller set.

Replacing by a subset if necessary we may assume that , and are Lipschitz continuous with Lipschitz constant . Moreover, choose such that is contained in a cube of side-length . Then there are points with