 # Uncertainty of visual measurement and efficient allocation of sensory resources

We review the reasoning underlying two approaches to combination of sensory uncertainties. First approach is noncommittal, making no assumptions about properties of uncertainty or parameters of stimulation. Then we explain the relationship between this approach and the one commonly used in modeling "higher level" aspects of sensory systems, such as in visual cue integration, where assumptions are made about properties of stimulation. The two approaches follow similar logic, except in one case maximal uncertainty is minimized, and in the other minimal certainty is maximized. Then we demonstrate how optimal solutions are found to the problem of resource allocation under uncertainty.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 1. Combination of uncertainties

### 1.1 1.1. Noncommittal approach

Let the stimulus be an integrable function of one variable that depends on two aspects of stimulation:

• Stimulus location on , where can be space or time, the “location” indicating, respectively where or when stimulation occurred.

• Stimulus content on , where can be spatial or temporal frequency of stimulus modulation.

We consider a sensory system equipped with many measuring devices, each able to estimate both stimulus location and content from

. We assume that the error of estimation is a random variable with probability density

.

It is sometimes assumed that sensory systems know : a case we review in the next section. But in general we do not know

; we only know (or guess) some of its properties, such as its mean value and variance. In particular, let

 px(x)=∫p(x,f)df,pf(f)=∫p(x,f)dx (S1)

be the (marginal) means of on dimensions and . Sensory systems can optimize their performance with this minimal knowledge, as follows.

To reduce the chances of making gross errors, we use the following strategy. We find the condition of minimal uncertainty against the profile of maximal uncertainty, i.e., using a minimax approach [von Neumannvon Neumann1928, Luce RaiffaLuce Raiffa1957]. We do so in two steps. First we find such and for which measurement uncertainty is maximal. Then we find the condition at which the function of maximal uncertainty has the smallest value: the minimax point.

We evaluate maximal uncertainty using the well-established definition of entropy [ShannonShannon1948]:

 H(X,F)=−∫p(x,f)logp(x,f)dxdf.

Recall that Shannon’s entropy is sub-additive:

 H(X,F)≤H(X)+H(F)=H∗(X,Y), (S2)

where

 H(X)=−∫px(x)logpx(x)dx,H(F)=−∫pf(f)logpf(f)df.

Therefore, we can say that the uncertainty of measurement cannot exceed

 H∗(X,F)=−∫px(x)logpx(x)dx−∫pf(f)logpf(f)df. (S3)

Eq. S3 is the “envelope” of maximal measurement uncertainty: a “worst-case” estimate.

By the Boltzmann theorem on maximum-entropy probability distributions

[Cover ThomasCover Thomas2006], the maximal entropy of probability densities with fixed means and variances is attained when the functions are Gaussian. Then, maximal entropy is a sum of their variances [Cover ThomasCover Thomas2006]. We obtain

 px(x)=1σx√2πe−x2/2σ2x,pf(f)=1σf√2πe−f2/2σ2f,

where and

are the standard deviations. And the maximal entropy is simply:

 H=σ2x+σ2f. (S4)

That is, when variances are unknown, maximal uncertainty of measurement is a sum of variances of measurement components.

This is the method used by GepshteinTyukinKubovy2007,GepshteinTykinAlbright2010 in derivations of joint uncertainty and composite uncertainty functions.111For simplicity, GepshteinTykinAlbright2010 use intervals of measurement, rather than interval variances, as estimates of component uncertainties. The authors then found the optimal conditions by looking for minimal values of the uncertainty functions.

### 1.2 1.2. Top-down approach

Now we assume the system enjoys some knowledge of stimulation, so we can use likelihood as a measure of uncertainty. Suppose we want to derive a combined estimate from two estimates and of some parameter of stimulation. We assume that likelihood functions , , and are continuous, differentiable, and known. Let us first assume that likelihoods are separable:

 P(z|x,f)=Px(z|x)Pf(z|f). (S5)

Then, the most likely value of is

 z∗=argmaxzP(z|x,f)=argmaxz[logPx(z|x)+logPf(z|f)].

We can use the logarithmic transformation because it is a strictly monotone continuous function on , and hence it does not change maxima of continuous functions.

It is commonly assumed that and are Gaussian functions, or that they are well approximated by Gaussian functions. For example, YuilleBuelthoff1996 assumed that cubic and higher-order terms of the Taylor expansion of

can be neglected, which is equivalent to the assumption of Gaussianity. (We return to this assumption, and also the assumption of separability in a moment.) Then

 Px(z|x)=cxe−(z−zx)2/2σ2x,Pf(z|f)=cfe−(z−zf)2/2σ2f, cx,cf∈R>0

and

 logPx(z|x)+logPf(z|f)=logcx+logcf−12σ2x(z−zx)2−12σ2f(z−zf)2.

The latter expression is maximized when its first derivative over is zero. Hence

 z∗=⎛⎝1σ2x+1σ2f⎞⎠−1⎛⎝1σ2xzx+1σ2fzf⎞⎠=1σ2x+σ2f(σ2fzx+σ2xzf), (S6)

which is the familiar weighted-average rule of cue combination [CochranCochran1937, Maloney LandyMaloney Landy1989, Clark YuilleClark Yuille1990, Landy, Maloney, Johnsten YoungLandy 1995, Yuille BülthoffYuille Bülthoff1996]. In general, when the number of measurements is greater than two, the combination rule of Eq. S6 becomes

 z∗=1∑iσ2i∑izi∏j≠iσ2j, (S7)

where are such that individual likelihood functions attain their maxima at .

Why is the assumption common that likelihood functions have the simple form of Eq. S5, i.e., are separable and Gaussian? An answer follows from the argument we presented in the previous section. Suppose that one seeks to estimate the likelihood function when its shape is unknown. We saw in the previous section that the least certain estimate is the likelihood function for which the entropy is maximal. Hence, by sub-additivity of entropy (Eq. S2), the least certain estimate of is

 P(z|x,f)=Px(z|x)Pf(z|f),

as in Eq. S5. Moreover, if the mean values and variances of and are fixed, then the likelihood functions must be Gaussian, by the same argument. Indeed, separable Gaussian likelihood functions are the least certain estimates.

## 2 2. Resource allocation

In GepshteinTykinAlbright2010 we asked how sensory system ought to allocate their resources in face of uncertainties inherent in measurement and stimulation. We approached this problem in two steps. First, we combined all uncertainties in uncertainty functions: comprehensive descriptions of how quality of measurement varied across conditions of measurement. Second, we proposed how limited resources are to be allocated given the uncertainty functions. Here we illustrate the second step in more detail, using the approach of constrained optimization.

A key requirement of allocation is to optimize reliability (reduce uncertainty) of measurement by many sensors. Satisfying this requirement alone makes the system place all sensors where conditions of measurement are least uncertain, leaving the system unprepared for sensing the stimuli that are useful but whose uncertainty is high. To prevent such gaps of allocation, we propose that minimal requirements should be twofold:

A. Reliability: Prefer low uncertainty.

B. Comprehensiveness: Measure all useful stimuli.

We formalize these requirements as follows. Let:

• be the size of measuring device (“receptive field”),

• be the uncertainty function associated with measuring devices of different size, and

• be the amount of resources allocated across (Eq. the number of cells with receptive fields of size ).

Encouraging reliability.   By requirement A, the system is penalized for allocating resources where uncertainty is high. This is achieved, for example, when the cost for placing resources at is

 k1U(Δ)r(Δ),

where is a positive constant. The higher the uncertainty at , or the larger the amount of resources allocated to , the higher the cost. Hence the total cost of allocation is:

 J1=∫bak1U(Δ)r(x)dΔ. (S8)

Functional is minimal when all the detectors are allocated to (i.e., have the size of) at the lowest value of .

Encouraging comprehensiveness.   By requirement B, the system is penalized for failing to measure particular stimuli. This is achieved, for example, when the allocation cost is

 k2r(Δ),

where is a positive constant. The total penalty of this type is:

 J2=∫bak2r(x)dx. (S9)

Functional is large (infinite) when all resources are allocated to a small vicinity (one point). is small when are large for all .

Prescription of allocation.   The total penalty of requirements A and B is

 J=∫bak1U(Δ)r(Δ)+k2r(Δ)dΔ. (S10)

Using standard tools of calculus of variations [<]e.g., ¿Elsgolc1961 we find such function that minimizes . In particular, we consider a variation of with respect to changes of :

 δJ=∫ba∂∂r(Δ)(k1U(Δ)r(Δ)+k2r(Δ))δr(Δ)dΔ=∫ba(k1U(Δ)−k2r2(Δ))δr(Δ)dΔ.

Because at optimal the value of is zero for all , we deduce that conditions of optimality are:

 U(Δ)−kr2(Δ)=0, k=k2k1 (S11)

In other words

 r(Δ)=√kU(Δ). (S12)

This is the prescription of optimal allocation.

Amount of resources.   If the total amount or resources in the system is known and is :

 ∫bar(Δ)=C, (S13)

then we may modify coefficients and in Eq. S10, to make Eq. S10 consistent with Eq. S13. Or, we may use the method of Lagrange multipliers, looking for conditions where variation of the following functional vanishes:

 ¯J=∫bak1U(Δ)r(Δ)+k2r(Δ)dΔ+λ(∫bar(Δ)−C). (S14)

We find Lagrange multiplier at which Eq. S13 is satisfied. The solution (using a method similar to that used for solving Eq. S11) is:

 (k1U(Δ)+λ)−k2r2(Δ)=0 ⇒r(Δ)=√k2k1U(Δ)+λ (S15)

provided that

 ∫ba√k2k1U(Δ)+λdΔ=C.

The latter constraint is used to find in Eq. S15. In either case, the shape of the optimal allocation function is determined by , such that allocation function is maximal where is minimal. The formulation in Eq. S14 has an advantage. It allows one to derive optimal prescriptions under changes in the amount of resources allocated to the task, such as in selective attention.

Generalizations.   In a multidimensional case, when represents several variables (e.g., spatial and temporal extents of receptive fields, and ), and is a function of many variables, the prescription is

 r(s,t)=√kU(s,t).

Using the method of Lagrange multiplies, one can show that a similar result is obtained when the costs of reliability and comprehensibleness (Eqs. S8S9) have more general formulations:

 J1=∫bak1U(Δ)rp(Δ)dΔ,J2=∫bak21rq(Δ)dΔ, p,q≥1,

The previously derived prescription holds: allocate maximal amount of resources to conditions of minimal uncertainty.

1.0

## References

• [Clark YuilleClark Yuille1990] Clark, J. J. Yuille, A. L. 1990. Data fusion for sensory information processing systems. Norwell, MA, USA: Kluwer Academic Publishers.
• [CochranCochran1937] Cochran, W. G. 1937. Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society (Supplement), 4, 102–118.
• [Cover ThomasCover Thomas2006] Cover, T. M. Thomas, J. A. 2006. Elements of information theory. New York: John Wiley.
• [ElsgolcElsgolc2007] Elsgolc, L. D. 2007. Calculus of variations. Dover Publications. (Original work published in 1961.)
• [Gepshtein, Tyukin AlbrightGepshtein 2010] Gepshtein, S., Tyukin, I. Albright, T. 2010. The uncertainty principle of measurement in vision. (Manuscript in preparation.)
• [Gepshtein, Tyukin KubovyGepshtein 2007] Gepshtein, S., Tyukin, I. Kubovy, M. 2007. The economics of motion perception and invariants of visual sensitivity. Journal of Vision, 7(8), 1–18. (doi: 10.1167/7.8.8)
• [Landy, Maloney, Johnsten YoungLandy 1995] Landy, M., Maloney, L., Johnsten, E. Young, M. 1995. Measurement and modeling of depth cue combinations: in defense of weak fusion. Vision Research, 35, 389–412.
• [Luce RaiffaLuce Raiffa1957] Luce, R. D. Raiffa, H. 1957. Games and decisions. New York: John Wiley.
• [Maloney LandyMaloney Landy1989] Maloney, L. T. Landy, M. S. 1989. Statistical framework for robust fusion of depth information. In W. A. Pearlman (), Proc. spie vol. 1199, p. 1154-1163, visual communications and image processing iv, william a. pearlman; ed. ( 1154-1163).
• [ShannonShannon1948] Shannon, C. E. 1948. A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, 623–656.
• [TaubTaub1963] Taub, A. H. (). 1963. John von Neumann: Collected works. Volume VI: Theory of games, astrophysics, hydrodynamics and meteorology. New York, NY, USA: Pergamon Press.
• [von Neumannvon Neumann1928] von Neumann, J. 1928. Zur Theorie der Gesellschaftsspiele. [On the theory of games of strategy]. Mathematische Annalen, 100, 295–320. (English translation in Taub1963.)
• [Yuille BülthoffYuille Bülthoff1996] Yuille, A. L. Bülthoff, H. H. 1996. Bayesian decision theory and psychophysics. In D. C. Knill W. Richards (),

Perception as Bayesian inference (

123–161).
Cambridge, UK: Cambridge University Press.