The aggregation of several input variables into a single representative output arises naturally as a problem in many practical applications and domains. The research effort has been disseminated throughout various fields including economics, computer science, mathematics and engineering, with the subsequent mathematical formulation of aggregation problems having coalesced into a significant body of knowledge concerning aggregation functions. A wide range of aggregation functions are presented in the literature, including the weighted quasi-arithmetic means, ordered weighted averages, triangular norms and co-norms, Choquet and Sugeno integrals and many more. Several recent books provide a comprehensive overview of this field of study (Beliakov et al. (2007); Grabisch et al. (2009); Torra (2007)).
Aggregation functions are commonly used within fuzzy logic, where logical connectives are typically modeled using triangular norms and triangular co-norms. Beyond this field the averaging functions - more commonly known as means
- that are frequently applied in decision problems, statistical analysis and in image and signal processing. Means have been an important tool and topic of study for over two millennia, with examples such as the arithmetic, geometric and harmonic means known to the Greeks (Rubin (1968)). Each of these means shares a fundamental property with the broader class of aggregation functions; that of monotonicity with respect to all arguments (Beliakov et al. (2007); Grabisch et al. (2009); Torra (2007)
). There are though many means of significant practical and theoretical importance that are non-monotonic and hence not classified as aggregation functions. For example, a non-monotonic average of pixel intensities within an image subset is used to perform tasks such as image reduction (Wilkin (2013)), filtering (van den Boomgaard and van de Weijer (2002); Sylvain et al. (2008)) or smoothing (Barash and Comaniciu (2004)). Within statistics, robust estimators of location are used to estimate the central tendency of a data set and the mode, an average possibly known to the Greeks (Rubin (1971)), is a classic example of a non-monotonic average.
Monotonicity with respect to all arguments has an important interpretation in decision making problems: an increase in one criterion should not lead to the decrease of the overall score or utility. However, in image processing an increase in only one pixel value above its neighbours may be due to noise or corruption and should not necessarily increase the intensity value that represents that region. Accordingly, the averaging functions used in such applications do not fit within the established theories regarding aggregation functions and are typically dealt with only from the signal processing perspective.
There are also many non-monotonic means appearing in the literature, with the mode, Gini means, Lehmer means, Bajraktarevic means (Beliakov et al. (2007); Bullen (2003)) and mixture functions (Ribeiro and Marques Pereira (2003); Marques Pereira and Ribeiro (2003)) being particularly well known cases. Ideally we would like a formal framework for averaging functions that encompasses non-monotonic means and places them in context with existing monotonic aggregation functions, enabling us to better understand the relationships within this broad class of functions. In so doing we are then able to broaden our understanding of non-monotonic averaging as an aggregation problem.
We achieve this aim herein by relaxing the monotonicity requirement for averaging aggregation functions and propose a new definition that encompasses many non-monotonic averaging functions. We justify this approach by the following interpretation of averaging: while we accept that an increase in one input, or coordinate, may lead to a decrease of the aggregate value, we argue that the same increase coincident in all inputs should only lead to an increase of the aggregate value. This is akin to the property of shift-invariance, which along with homogeneity is one of the basic requirements of the non-monotonic location estimators (Rousseeuw and Leroy (1987)
). We do not impose shift-invariance though, as that would severely limit the range of averaging functions that fall under our definition averaging functions (for instance, the only shift invariant quasi-arithmetic means are weighted arithmetic means). Rather we consider the property of directional monotonicity in the direction of the vector, which is obviously implied by shift-invariance as well as by the standard definition of monotonicity. We call this property weak monotonicity within the context of aggregation functions and we investigate it herein.
The remainder of this article is structured as follows. In Section 2 we provide the necessary mathematical foundations that underpin the subsequent material. Section 3 provides the main definitions and presents various properties of weakly monotone aggregation functions. Within Section 4 we examine several non-monotonic means and prove that they are, in fact, weakly monotonic. In Section 5 we draw our conclusions and discuss future research directions arising as a result of this investigation.
2.1 Aggregation functions
In this article we make use of the following notations and assumptions. Without loss of generality we assume that the domain of interest is any closed, non-empty interval and that tuples in are defined as . We write as the shorthand for such that it is implicit that . Furthermore, is ordered such that for , implies that each component of is no greater than the corresponding component of . Unless otherwise stated, a constant vector given as is taken to mean , where is a constant and is implicit within the context of use.
The vector denotes the result of permuting the vector such that its components are in non-decreasing order, that is, , where is the permutation such that . Similarly, the vector denotes the result of permuting such that . We will make use of the common shorthand notation for a sorted vector, being . In such cases the ordering will be stated explicitly and then represents the th largest or smallest element of accordingly.
Consider now the following definitions:
A function is monotonic (non-decreasing) if and only if, then .
A function is an aggregation function in if and only if is monotonic non-decreasing in and , , with .
A function is called idempotent if for every input the output is .
The functions of most interest in this article are those that have averaging behaviour.
A function has averaging behaviour (or is averaging) if for every it is bounded by
Aggregation functions that have averaging behaviour are idempotent, whereas idempotency and monotoicity imply averaging behaviour.
A function is called internal if its value coincides with one of the arguments.
Of particular relevance is the notion of shift-invariance Calvo et al. (2002); Lázaro et al. (2004) (which is also called difference scale invariance Grabisch et al. (2009)). A constant change in every input should result in a corresponding change of the output.
A function is shift-invariant (stable for translations) if whenever .
A function is homogeneous (with degree one) if for all .
Aggregation functions that are shift-invariant and homogeneous are known as linear aggregation functions. The canonical example of a linear aggregation function is the arithmetic mean.
The term mean is used synonymously with averaging aggregation functions. Chisini’s definition of a mean as an average states that the mean of independent variables , with respect to a function , is a value for which replacement of each value in the input by , results in the output (Chisini (1929), stated in Grabisch et al. (2009)). I.e.,
As was noted by de Finetti (de Finetti (1931), stated in Grabisch et al. (2009)), Chisini’s definition does not necessarily satisfy Cauchy’s requirement that a mean be an internal value (Cauchy (1821)). However, by assuming that is a non-decreasing, idempotent function, then existence, uniqueness and internality of are restored to Chisini’s definition. Gini ( Gini (1958), p.64), writes that an average of several quantities is a value obtained as a result of a certain procedure, which equals to either one of the input quantities, or a new value that lies in between the smallest and the largest input. The requirement that be non-decreasing is too strict given the aims of this article and as such, following many authors (e.g., Gini (1958); Bullen (2003)), we take the definition of a mean to be any averaging (and hence idempotent) function.
A function is called a mean if and only if it is averaging.
The basic examples of (monotonic) means found within the literature include weighted arithmetic mean, weighted quasi-arithmetic mean, ordered weighted average (OWA), order statistic , and the median. Less known examples include Choquet and Sugeno integrals and their special cases; the logarithmic means, Heronian means, Bonferroni means and others Bullen (2003); Grabisch et al. (2009).
In continuing, we wish to consider a broader class of means to include those that are not necessarily monotonic. A classic example is the mode, being the most frequent input, which is routinely used in statistics. 111In general the mode is multivalued, so in order to make it a single-variate function, a convention is needed to select one of the multiple outputs, e.g. the smallest. The mode is not monotonic as the following example shows. Taking the vectors , and , then and .
An important class of means that are not always monotonic are those expressed by the Mean of Bajraktarevic, which is a generalisation of the weighted quasi-arithmetic means.
Mean of Bajraktarevic. Let be a vector of weight functions , and let be a strictly monotonic function. The mean of Bajraktarevic is the function
When , and all weight functions are the same, the Bajraktarevic mean is called a mixture function (or mixture operator) and is given by
For the case where the weight functions are are distinct , the operator is a generalised mixture function. A particularly interesting sub-class of Bajraktarevic means are Gini means, obtained by setting and when , or if .
Gini means generalise the (weighted) power means (for ) and hence include the minimum, maximum and the arithmetic mean as special cases. Another special case of the Gini mean is the Lehmer, or counter-harmonic mean, obtained when . The contra-harmonic mean is the Lehmer mean with . We will investigate the Lehmer mean and its properties further in Section 4.
2.3 Penalty based functions
In Calvo and Beliakov (2010) it was demonstrated that averaging aggregation functions can be expressed as the solution of a minimisation problem of the form
where is a penalty function satisfying the following definition:
Penalty function. The function is a penalty function if and only if it satisfies:
if and only if all ; and,
is quasi-convex in for any ,
for some constant and any closed, non-empty interval .
A function is quasi-convex if all its sublevel sets are convex, that is are convex sets for all , see Rockafellar (1970). The first two conditions ensure that has a strict minimum and that a consensus of inputs ensures minimum penalty, providing idempotence of . The third condition implies a unique minimum (but possibly many minimisers that form a convex set). Since multiplication by, or addition of a constant to will not change the minimisation, may be shifted (if desired) so that . One can think of as describing the dissimilarity or disagreement between the inputs and the value . It follows that is a function that minimises the chosen dissimilarity. It is not necessary to explicitly state , provided a suitable penalty function is given and the optimisation problem solvable. Subsequently it is sufficient to solve (2.4) to obtain the aggregate .
Non-monotonic averaging functions can also be represented by a penalty function. For penalty-based functions we have the following results due to Calvo and Beliakov (2010).
Any idempotent function can be represented as a penalty based function such that
Any averaging function can be expressed as a penalty based function.
As mentioned in Mesiar et al. (2008), mixture functions can be written as a penalty function with
Clearly the necessary condition of the minimum is
Hence defines a mixture function. A representation of a function as a penalty based function sometimes can simplify technical proofs, as we shall see later in the paper.
It is apparent given the examples presented that many means are non-monotonic and thus not aggregation functions according to Definition 2. In the next section we introduce weak monotonicity and consider some properties of weakly monotonic averaging functions. We subsequently investigate several important examples and show that they are indeed weakly monotonic functions, allowing us to place them in a new framework with existing averaging aggregation functions.
3 Weak monotonicity
3.1 Main definition
As mentioned in Section 1 we are motivated by two important issues. The first one is that there exist many means that are not generally monotonic and hence not aggregation functions, while the second one is that there are many practical applications in which non-monotonic means have shown to provide good aggregate values commensurate with the objectives of the aggregation. To encapsulate these non-monotonic means within the framework of aggregation functions we aim to relax the monotonicity condition and present the class of weakly monotonic averaging functions. The definition of weak monotonicity provided herein is prompted by applications and intuition, which suggests that it is reasonable to expect that a representative value of the inputs does not decrease if all the inputs are increased by the same amount (or shifted uniformly) as the relative positions of the inputs are not changed. A formal definition that conveys this property is as follows.
A function is called weakly monotonic non-decreasing (or directionally monotonic) if for any , such that .
If is directionally differentiable in its domain then weak monotonicity is equivalent to non-negativity of the directional derivative .
Evidently monotonicity implies weak monotonicity, hence all aggregation functions are weakly monotonic. By Definition 6 all shift-invariant functions are also weakly monotonic. It is self evident that weakly monotonic non-decreasing functions form a cone in the linear vector space of weakly monotonic (increasing or decreasing) functions.
Let us establish some useful properties of weakly monotonic averages. Consider the function formed by the composition , where and are means.
If is monotonic and are weakly monotonic, then is weakly monotonic.
By weak monotonicity implies that such that , with . Thus , where . The monotonicity of ensures that and hence and is weakly monotonic. ∎
By trivial extension, since all monotonic functions are also weakly monotonic, then if either of or is monotonic, then is again weakly monotonic.
If is weakly monotonic and are shift invariant, then is weakly monotonic.
Shift invariance implies that , with . Thus , where . The weak monotonicity of ensures that and hence and is weakly monotonic. ∎
Consider functions of the form .
If is weakly monotonic and is a linear function then the transform is weakly monotonic.
and hence . Hence
by weak monotonicity of . Hence and is weakly monotonic. ∎
Note that unlike in the case of standard monotonicity, a nonlinear -transform does not always preserve weak monotonicity.
The dual of a weakly monotonic function is weakly monotonic under standard negation.
The following result is relevant to an application of weakly monotonic averages in image processing, discussed in Section 4.3.
Let be a shift invariant function, and be a function. Let be a penalty based averaging function with the penalty depending on the terms . Then is shift-invariant and hence weakly monotonic.
(by shift invariance)
Indeed we need not restrict ourselves to penalty functions with terms depending on . Functions that depend on the differences with the minimum will satisfy the above proof and satisfy the conditions on with regards to the existence of solutions to (2.4). In particular, Huber type functions used in robust regression can replace the squares of the differences.
For the transform, if is nonlinear then may or may not be weakly monotonic for all , which can be observed by example.
Take and , then and . If is the shorth (we prove the weak monotonicity of the shorth in Section 4) then and . As clearly and is not weakly monotonic.
Internal means are not necessarily weakly monotonic, as illustrated by the following example.
which is internal with values in the set . Consider the points and , then and . It follows that , however . Hence this is not weakly monotonic for all .
4 Examples of weakly monotonic means
In this section we look at several examples of weakly monotonic, but not necessarily monotonic averaging functions. We begin by considering several of the robust estimators of location, then move on to mixture functions and some interesting cases of means from the literature. While some of the examples involve shift-invariant functions, many of their nonlinear -transforms yield proper weakly monotonic functions.
The functions presented below are defined through penalties that are not quasi-convex, therefore we need to drop the condition that is quasi-convex from Definition 10.
Quasi-penalty function. The function is a quasi-penalty function if and only if it satisfies:
if and only if all ; and,
is lower semi-continuous in for any ,
for some constant and any closed, non-empty interval .
Note that the third condition ensures the existence of the minimum and a non-empty set of minimisers. In the case where the set of minimisers of is not an interval, we need to adopt a reasonable rule for selecting the value of the penalty-based function . We suggest stating in advance that in such cases we choose the infimum of the set of minimisers of . From now one will refer to quasi-penalty functions.
4.1 Estimators of Location
Perhaps the most widely used estimator of location is the mode, being the most frequent input.
Mode: The mode is the minimiser of the (quasi)penalty function
It follows that , which is minimised for the value . Hence, and thus the mode is shift invariant. By Definition 6 the mode is weakly monotonic.
Note that the mode may not be uniquely defined, e.g., the mode of , in which case we use a suitable convention. The quasi-penalty associated with the mode is not quasi-convex, and as such it may have several minimisers. A convention is needed as to which minimiser is selected, e.g., the smallest or the largest. Other examples of non-monotonic means that follow also involve quasi-penalties, and the same convention as for the mode is adopted. Then also discrete scales can be considered, compare to, e.g., the paper of Kolesárová et al. (2007).
The Least Trimmed Squares estimator (Rousseeuw and Leroy (1987)) rejects up to
of the data values as outliers and minimises the squared residual using the remaining data.
Least Trimmed Squares (LTS): The LTS uses the (quasi)penalty function
where is the th order statistic of , and . If is the order permutation of such that , then the minima of occur when , which implies that the minimum value is . Since is shift invariant then and thus
where . It follows that the value that minimises is , hence the LTS is shift invariant and thus weakly monotonic.
The remaining estimators of location presented compute their value using the shortest contiguous sub-sample of containing at least half of the values. The candidate sub-samples are the sets . The length of each set is taken as and thus the index of the shortest sub-sample is
Under the translation the length of each sub-sample is unaltered since and thus remains the same.
Consider now the Least Median of Squares estimator (Rousseeuw (1984)), which is the midpoint of .
Least Median of Squares (LMS): The LMS can be computed by minimisation of the (quasi)penalty function
The value minimises the penalty , given by
is clearly . Hence, and the LMS is shift invariant and weakly monotonic.
The Shorth (Andrews et al. (1972)) is the arithmetic mean of
Shorth: The shorth is given by
Since the set is unaltered under translation and the arithmetic mean is shift invariant, then the shorth is shift invariant and hence weakly monotonic.
OWA Penalty Functions: Penalty functions having the form
define regression operators, (Yager and Beliakov (2010)). Consider the following results dependent on the weight vector .
generates Least Squares regression and is monotonic and hence weakly monotonic;
generates Chebyshev regression and is monotonic and hence weakly monotonic;
Since all the terms are constant under transformation (cf Theorem 20), the OWA regression operators are shift-invariant for any choice of the weight vector .
For then is the Least Median of Squares operator and hence shift invariant and weakly monotonic; and
For then is the Least Trimmed Squares operator and hence is shift invariant and weakly monotonic.
In the cases 3-5 the OWA regression operators are not monotonic.
Density based means: The density based means were introduced in Angelov and Yager (2013). Let denote the distance between inputs and . The density based mean is defined as
and where is Cauchy kernel given by
It may appear that the class of weakly monotonic averages consists mostly of shift-invariant functions, as the above examples illustrate. This impression is due to the fact that such examples came from robust regression, where the very definition of robust estimators of location involve shift-invariance Rousseeuw and Leroy (1987). However, the class of weakly monotonic functions is reacher, as various (but not all) transforms of shift-invariant functions (with non-linear ) are weakly monotonic but not shift-invariant. Some results on the conditions on which preserve weak monotonicity are presented in Wilkin et al. (2014). A few more examples are presented in the sequel.
4.2 Mixture Functions
The mixture functions were given by Eqn. (2.2), which we recall here for clarity
Mesiar et al. (2008) have shown that under the constraint that is non-decreasing and differentiable, if , then is an aggregation function and hence monotonic (and by extension, also weakly monotonic). Additionally, is invariant to scaling of the weight functions (i.e., ). In Mesiar and Spirkova (2006), it was shown that the dual, , of is generated by .
As mentioned in Section 2, a special case of the Gini means (with are the Lehmer means, which are generally not monotonic. Lehmer means are mixture functions with weight function , which is neither increasing for all nor shift invariant. Note that for the value of Lehmer means at with at least one component is defined as the limit when , so that is continuous on
We begin by establishing some general properties of Lehmer means.
The Lehmer mean , given by
monotonic (and linear) along the rays emanating from the origin;
not generally monotonic in ;
has neutral element for ; and,
has absorbing element for .
The proof is presented in the Appendix.
We now establish a sufficient condition for weak monotonicity of Lehmer means, which depends on both and the number of arguments . We provide a relation between these two quantities.
The Lehmer mean of n arguments, is weakly monotonic on if , .
The Lehmer mean for is known to be monotonic (Farnsworth and Orr (1986)) and hence weakly monotonic in that parameter range. In the range the Lehmer mean is not weakly monotonic, because it’s partial derivative at when tends to . Hence we focus on the cases and . The proof is easier to present in penalty-based representation, as the partial derivatives have more compact form. As stated in Section 2.3, can be written as a penalty-based function (2.4) with penalty . Differentiation w.r.t yields
At the minimum we have the implicit equation , with the necessary condition that yields . We remind that for any the Lehmer mean is defined in the limit as . The partial derivatives are given by the implicit derivative , with
By differentiation and thus the sign of the partial derivatives depends on the sign of , which is given by
These derivatives can be either positive or negative. To establish weak monotonicity we require that the directional derivative of in the direction be non-negative. We have that and thus the sign of the directional derivative is determined only by the sign of . We will henceforth work with the sorted inputs, such that is thus the largest input and the smallest.
Consider first the case: .
We examine the term and note that for any input since is averaging (condition 3 of Lemma 1). Then it follows that
For the remaining we compute the smallest possible value of by selecting the point of minimum value, which is attained for
At the optimum either or
At we have that (for ) and (for ), and at we have that
and since each then
This expression is non-negative and hence is weakly monotonic provided that
For we get .
Now consider the case: . We have that
and note that these derivatives are defined in the limit for the case where . I.e., . We now examine the term and note that since is averaging. Thus
Again we consider the remaining by seeking the minimum of , given by
This attains a minimum at and substitution into gives
The directional derivative of can be written as
We note that the sign of this derivative does not change in the limit as and is non-negative for