1. Introduction and Result
The purpose of this paper is to discuss how one would go about averaging continuous functions. Let be a continuous function and suppose that we are interested in, at a given time , finding a local average of using only function values for . This is the canonical setting for many applications where we cannot look into the future (one only needs to think of sports or finance where this is a constant problem). A natural way of constructing an average is via
where is a (not necessarily continuous) weighting function. Many different weighting functions are conceivable, the one that is presumably used most often in practice is
the average taken over the last units of time. A natural question is whether there is a ’best’ weight and, as usual, this depends on how one defines things. We will proceed in an axiomatic fashion and state a list of desirable properties.
Property 1. Invariance of Constants. Averaging should leave constant functions, , invariant.
Property 2. Monotonicity. is (not necessarily strictly) monotonically decreasing.
Property 3. Variation-diminishing property. For any , if the set is a union of (not necessarily bounded) intervals, then the set is the union of at most intervals. If is the union of at most (not necessarily bounded) intervals, then so is .
The first condition is completely unambiguous: an average of constant values needs to return the same value in order to be meaningful. This translates easily into
We observe that this condition also implies that any function satisfying Properties (1) and (2) is nonnegative: if it assumes negative values anywhere, then monotonicity would imply that it is not integrable which violates Property (1). In particular,
is a probability distribution. As a consequence of that, we have that
which is also exceedingly natural: the average value at any point cannot exceed the previously attained maximal value or be smaller than all previous values.
The second condition, monotonicity, is natural insofar as we would like the recent past to be more representative than the distant past.
Property (3) is a smoothing property: the averaged function should not venture into ’extreme’ territory more often than the function does itself . Requiring property (3) to be satisfied for all therefore corresponding to a uniform smoothing at all scales – extreme events can be represented in the average but they should not be over-represented.
A simple example is as follows: suppose is the ELO strength of a chess player measured at time . This indicator is discontinuous and changes after each game – however, if a chess players has their ELO exceed 2800 for the entirety of the year 2016 and then once more, briefly, in 2018, then it would be desirable for the averaged function to exceed the value 2800 at most two times and not, say, three times. It would be perfectly reasonable, however, if the averaged function exceeds the value 2800 only once or never at all (for example if the value in 2016 hovers very close to 2800 all the time and was much lower before or, conversely, if in the month of 2018 the value is only exceeded for a brief period of time).
These three conditions uniquely characterize a weight (up to dilation symmetries).
If a function satisfies properties (1), (2) and (3), then
To the best of our knowledge, this Theorem has never been stated or proved. I. J. Schönberg mentions in passing in his 1948 paper 
that, as a consequence of his classification theorem, ’All Polya frequency functions turn out to be continuous everywhere
with the single exception of the truncated exponential’ and this is exactly what is needed to prove the Theorem which should be attributed to Schönberg. The use of exponential distributions to compute one-sided averages is completely classical in time series analysis (’exponential smoothing’) and usually ascribed to work of Brown  or Holt  in the 1950s but the fact that properties (1) – (3) uniquely characterize exponential smoothing does not seem to be known.
We emphasize that, as one often encounters in axiomatic approaches, the result is only as good as one’s faith in the axioms. This is the second purpose of this paper: to perhaps motivate a study of axiomatic approaches towards integral averages. What properties should an integral averaging operator have and which types of averages possess these properties? We believe all three properties to fairly natural (with (3) being a particularly subtle way of defining smoothing). As is customary in axiomatic approaches, there are presumably other axioms that might also be of interest and will generally lead to different results.
2. The Proof
As discussed above, properties (1) and (2) imply that
is a probability density function. This implies, by linearity, that the function is invariant under adding constants. This allows us to replace the study of
with the study of when and become positive (by replacing with which leads to being replaced by ). Property (3) is then equivalent to asking that the number of sign changes of is at most that of the number of sign changes of or equivalently, it asks that convolution with has the variation-diminishing property in the sense of Schönberg. Phrased differently, we learn that is a Polya frequency function [2, 3, 5, 6, 7]. Schönberg’s theory [8, 9, 10, 11] implies
for some where the function is an entire function of the form
where , , and are real numbers and . It remains to find all functions of that type satisfying all our properties. We analyze the behavior for purely imaginary where . Since and are real,
The product is well defined since, using for ,
Moreover, we have for all . We distinguish two cases: either all but one are zero or at least two are nonzero. In the second case, we observe
for some fixed
. Applying the inverse Fourier transform shows that
We note that by assumption (2) as well as for all by assumption. Let be so large that
We then observe that
The first term can be made sufficiently small by further increasing . This contradiction shows that all but exactly one to be 0. The same argument can be run of . Thus
This shows that
We define a function
We note that
An application of the inverse Fourier transform shows that
-  R. Brown, Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. (1956)
-  C. de Boor, A Practical Guide to Splines, Springer, 1978.
-  J. M. Lane and R.F. Riesenfeld, A geometric proof for the variation diminishing property of B-spline approximation, Journal of Approximation Theory 37, p. 1-4 (1983).
-  C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. Office of Naval Research Memorandum. 52 (1957) and reprinted in C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. International Journal of Forecasting. 20 (1): 5–10 (2004).
-  M. Marsden and I. Schönberg, On Variation Diminishing Spline Approximation Methods, in: I. J. Schoenberg Selected Papers, p. 247–268, Springer, 1988.
-  G. Polya, Qualitatives über Wärmeausgleich, Z. angew. Math. u. Mech. 13, 125–128 (1933);
-  G. Polya, G., Sur un theoreme de Laguerre, Compt. Rend. 156, 996–999 (1913).
-  I. Schönberg, On Totally Positive Functions, LaPlace Integrals and Entire Functions of the LaGuerre-Polya-Schur Type, Proc. Natl. Acad. Sci. U.S.A 33, p. 11-17, 1947.
-  I. Schönberg, On Variation-Diminishing Integral Operators of the Convolution Type, Proc. Natl. Acad. Sci. U.S.A 34, p. 164–169, 1948.
-  I. Schönberg, On Polya frequency functions, Journal d’Analyse Mathematique 1, p. 331–374, 1951.
-  I. Schönberg,, On variation diminishing approximation methods, Proceedings of MRC Symposium, On numerical approximation, Madison, Wisconsin, 249–274 (1958).