A forgotten Theorem of Schoenberg on one-sided integral averages

by   Stefan Steinerberger, et al.
Yale University

Let f:R→R be a function for which we want to take local averages. Assuming we cannot look into the future, the 'average' at time t can only use f(s) for s ≤ t. A natural way to do so is via a weight ϕ and g(t) = ∫_0^∞f(t-s) ϕ(s) ds. We would like that (1) constant functions, f(t) ≡const, are mapped to themselves and (2) ϕ to be monotonically decreasing (the more recent past should weigh more heavily than the distant past). Moreover, we want that (3) if f(t) crosses a certain threshold n times, then g(t) should not cross the same threshold more than n times (if f(t) is the outside wind speed and crosses the Tornado threshold at two points in time, we would like the averaged wind speed to cross the Tornado threshold at most twice). A Theorem implicit in the work of Schonberg is that these three conditions characterize a unique weight that is given by the exponential distribution ϕ(s) = λ^-1 e^-λ s for some λ > 0.



There are no comments yet.


page 1

page 2

page 3

page 4


Analysis of temporal properties of wind extremes

The 10-minute average wind speed series recorded at 132 stations distrib...

Approaches to Stochastic Modeling of Wind Turbines

Background. This paper study statistical data gathered from wind turbine...

Determining offshore wind installation times using machine learning and open data

The installation process of offshore wind turbines requires the use of e...

Forecasting wind power - Modeling periodic and non-linear effects under conditional heteroscedasticity

In this article we present an approach that enables joint wind speed and...

Modelling wind speed with a univariate probability distribution depending on two baseline functions

Characterizing the wind speed distribution properly is essential for the...

Why Do We Need Foundations for Modelling Uncertainties?

Surely we want solid foundations. What kind of castle can we build on sa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and Result

The purpose of this paper is to discuss how one would go about averaging continuous functions. Let be a continuous function and suppose that we are interested in, at a given time , finding a local average of using only function values for . This is the canonical setting for many applications where we cannot look into the future (one only needs to think of sports or finance where this is a constant problem). A natural way of constructing an average is via

where is a (not necessarily continuous) weighting function. Many different weighting functions are conceivable, the one that is presumably used most often in practice is

the average taken over the last units of time. A natural question is whether there is a ’best’ weight and, as usual, this depends on how one defines things. We will proceed in an axiomatic fashion and state a list of desirable properties.

Property 1. Invariance of Constants. Averaging should leave constant functions, , invariant.

Property 2. Monotonicity. is (not necessarily strictly) monotonically decreasing.

Property 3. Variation-diminishing property. For any , if the set is a union of (not necessarily bounded) intervals, then the set is the union of at most intervals. If is the union of at most (not necessarily bounded) intervals, then so is .

The first condition is completely unambiguous: an average of constant values needs to return the same value in order to be meaningful. This translates easily into

We observe that this condition also implies that any function satisfying Properties (1) and (2) is nonnegative: if it assumes negative values anywhere, then monotonicity would imply that it is not integrable which violates Property (1). In particular,

is a probability distribution. As a consequence of that, we have that

which is also exceedingly natural: the average value at any point cannot exceed the previously attained maximal value or be smaller than all previous values. The second condition, monotonicity, is natural insofar as we would like the recent past to be more representative than the distant past. Property (3) is a smoothing property: the averaged function should not venture into ’extreme’ territory more often than the function does itself . Requiring property (3) to be satisfied for all therefore corresponding to a uniform smoothing at all scales – extreme events can be represented in the average but they should not be over-represented. A simple example is as follows: suppose is the ELO strength of a chess player measured at time . This indicator is discontinuous and changes after each game – however, if a chess players has their ELO exceed 2800 for the entirety of the year 2016 and then once more, briefly, in 2018, then it would be desirable for the averaged function to exceed the value 2800 at most two times and not, say, three times. It would be perfectly reasonable, however, if the averaged function exceeds the value 2800 only once or never at all (for example if the value in 2016 hovers very close to 2800 all the time and was much lower before or, conversely, if in the month of 2018 the value is only exceeded for a brief period of time).

These three conditions uniquely characterize a weight (up to dilation symmetries).

Theorem (Schönberg).

If a function satisfies properties (1), (2) and (3), then

To the best of our knowledge, this Theorem has never been stated or proved. I. J. Schönberg mentions in passing in his 1948 paper [9] that, as a consequence of his classification theorem, ’All Polya frequency functions turn out to be continuous everywhere with the single exception of the truncated exponential’ and this is exactly what is needed to prove the Theorem which should be attributed to Schönberg. The use of exponential distributions to compute one-sided averages is completely classical in time series analysis (’exponential smoothing’) and usually ascribed to work of Brown [1] or Holt [4] in the 1950s but the fact that properties (1) – (3) uniquely characterize exponential smoothing does not seem to be known.

We emphasize that, as one often encounters in axiomatic approaches, the result is only as good as one’s faith in the axioms. This is the second purpose of this paper: to perhaps motivate a study of axiomatic approaches towards integral averages. What properties should an integral averaging operator have and which types of averages possess these properties? We believe all three properties to fairly natural (with (3) being a particularly subtle way of defining smoothing). As is customary in axiomatic approaches, there are presumably other axioms that might also be of interest and will generally lead to different results.

2. The Proof


As discussed above, properties (1) and (2) imply that

is a probability density function. This implies, by linearity, that the function is invariant under adding constants. This allows us to replace the study of

with the study of when and become positive (by replacing with which leads to being replaced by ). Property (3) is then equivalent to asking that the number of sign changes of is at most that of the number of sign changes of or equivalently, it asks that convolution with has the variation-diminishing property in the sense of Schönberg. Phrased differently, we learn that is a Polya frequency function [2, 3, 5, 6, 7]. Schönberg’s theory [8, 9, 10, 11] implies

for some where the function is an entire function of the form

where , , and are real numbers and . It remains to find all functions of that type satisfying all our properties. We analyze the behavior for purely imaginary where . Since and are real,

The product is well defined since, using for ,

Moreover, we have for all . We distinguish two cases: either all but one are zero or at least two are nonzero. In the second case, we observe

for some fixed

. Applying the inverse Fourier transform shows that

We note that by assumption (2) as well as for all by assumption. Let be so large that

We then observe that

The first term can be made sufficiently small by further increasing . This contradiction shows that all but exactly one to be 0. The same argument can be run of . Thus

This shows that

We define a function

We note that

An application of the inverse Fourier transform shows that

The normalization

then implies


  • [1] R. Brown, Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. (1956)
  • [2] C. de Boor, A Practical Guide to Splines, Springer, 1978.
  • [3] J. M. Lane and R.F. Riesenfeld, A geometric proof for the variation diminishing property of B-spline approximation, Journal of Approximation Theory 37, p. 1-4 (1983).
  • [4] C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. Office of Naval Research Memorandum. 52 (1957) and reprinted in C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. International Journal of Forecasting. 20 (1): 5–10 (2004).
  • [5] M. Marsden and I. Schönberg, On Variation Diminishing Spline Approximation Methods, in: I. J. Schoenberg Selected Papers, p. 247–268, Springer, 1988.
  • [6] G. Polya, Qualitatives über Wärmeausgleich, Z. angew. Math. u. Mech. 13, 125–128 (1933);
  • [7] G. Polya, G., Sur un theoreme de Laguerre, Compt. Rend. 156, 996–999 (1913).
  • [8] I. Schönberg, On Totally Positive Functions, LaPlace Integrals and Entire Functions of the LaGuerre-Polya-Schur Type, Proc. Natl. Acad. Sci. U.S.A 33, p. 11-17, 1947.
  • [9] I. Schönberg, On Variation-Diminishing Integral Operators of the Convolution Type, Proc. Natl. Acad. Sci. U.S.A 34, p. 164–169, 1948.
  • [10] I. Schönberg, On Polya frequency functions, Journal d’Analyse Mathematique 1, p. 331–374, 1951.
  • [11] I. Schönberg,, On variation diminishing approximation methods, Proceedings of MRC Symposium, On numerical approximation, Madison, Wisconsin, 249–274 (1958).