Probability density functions are popularly known as non-negative functions with the unit integral constraint. This clearly inhibits their direct processing using standard methods of functional data analysis  including any their approximation since unconstrained functions are assumed there. In addition, density functions are rather characterized by deeper geometrical properties that need to be taken into account for any reliable analysis [6, 21, 22]. Specifically, in contrast to functions in the standard space, densities obey the scale invariance and relative scale properties . Scale invariance means that not just the representation of densities with the unit integral constraint, but any its positive multiple conveys the same information about relative contributions of Borel sets on the whole probability mass. Relative scale can be explained directly with an example: the relative increase of a probability over a Borel set from 0.05 to 0.1 (2 multiple) differs from the increase 0.5 to 0.55 (1.1 multiple), although the absolute differences are the same in both cases. If we restrict to a bounded support that is mostly used in practical applications [3, 9, 14, 15], density functions can be represented with respect to Lebesgue reference measure using the Bayes space of functions with square-integrable logarithm [6, 22].
The Bayes space has structure of separable Hilbert space that enables to construct an isometric isomorphism between and , the space restricted to . Accordingly, analogies of summing two functions and multiplication of a function by a real scalar in the space together with an inner product between two densities are required. Given two absolutely integrable density functions and a real number we indicate with and the perturbation and powering operations, defined as
respectively. The resulting functions are readily seen to be probability density functions, though, note that the unit integral constraint representation was chosen just for the sake of convenient interpretation. In  it is proven that endowed with the operations
is a vector space. Note that the neutral elements of perturbation and powering are, with (i.e., the uniform density), and 1, respectively. The difference between two elements , denoted by , is obtained as perturbation of with the reciprocal of , i.e., . Finally, to complete the Hilbert space structure, the inner product is defined as
Form of the inner product clearly indicates that the relevant information in densities is contained in (log-)ratios between elements from the support .
Density functions can be considered as functional counterparts to compositional data, positive vectors carrying relative information [1, 17] that are driven by the Aitchison geometry . In order to enable their statistical processing using standard multivariate methods in real space , the preferred strategy is to express them either in centered log-ratio coefficients  with respect to a generating system, or in logratio coordinates, preferably with respect to an orthonormal basis . The latter coordinates (called also isometric log-ratio coordinates), as well as the clr coefficients, provide isometry between the Aitchison geometry and the real Euclidean space. A similar strategy is used also for densities in the Bayes space . An isometric isomorphism between and is represented by the centered log-ratio (clr) transformation [22, 14], defined for as
We remark that such an isometry allows to compute operations and inner products among the elements in in terms of their counterpart in among the clr-transforms, i.e.
However, the clr transformation induces an additional constraint,
that needs to be taken into account for computation and analysis on clr-transformed density functions. As the clr space is clearly a subspace of , hereafter it is denoted as . The inverse clr transformation is obtained as
again as before, the denominator is used just to achieve the unit integral constraint representation of the resulting density (without loss of relative information, carried by the density function).
According to  it is not necessary to restrict ourselves to the constrained clr space, because a basis in can be easily constructed. Specifically, let is a basis in and assume that is a constant function, then form a basis in . Of course, also here an orthonormal basis is preferable, but it is not always possible in applications. Nevertheless, if this would be so, then a function can be projected orthogonally to the space spanned, e.g., by the first functions . This is done through the respective coefficients in the basis expansion
Functional data analysis relies strongly on approximation of the input functions using splines . However, splines are mostly utilized purely as an approximation tool, without considering further methodological consequences. Because statistical processing of density functions requires a deeper geometrical background, provided by the Bayes spaces, this should be followed also by the respective spline representation, performed preferably in the clr space . In  a first attempt of constructing a spline representation that would honor the zero integral constraint (4) was performed. The problem is that -splines that form basis for the spline expansion in  come from , but not from . This paper presents an important step ahead – such -splines are constructed that form basis functions in the clr space . Consequently, the -splines can be expressed also directly in and the spline representation formulated in terms of the Bayes space which can be used for interpretation purposes; hereafter we refer to simplicial splines. Apart from methodological advantages, using simplicial splines simplifies construction and interpretation of spline coefficients that can be considered as coefficients of a (possibly orthonormal) basis in .
The paper is organized as follows. In the next section the construction of -splines basis in is presented together with a comparison to spline functions introduced in . Section 3 is devoted to smoothing splines in and Section 4 discusses orthogonalization of basis functions (that form, by construction, an oblique basis). Section 5 introduces a new class of splines that reflect the Bayes spaces methodology, simplicial splines. Section 6 provides an application to anthropometric data and the final Section 7 concludes.
2 Construction of -spline basis in
Because the clr transformation enables to process density functions in the standard space, just restricted according to zero integral constraint (4), it is natural that also construction of simplicial splines should start in . Nevertheless, before doing so, some basic facts about -spline representation of splines are recalled, see [2, 4] for details. Let the sequence of knots
be given. -spline of degree (order ) is defined as
Consequently, -spline of degree , , , (order ) is defined by
Now let the new functions for , , be defined
i.e., more precisely for
Noteworthy, functions have similar properties as -splines .
It is evident that for the function and its derivatives up to order are all continuous.
It is easy to check that for
and of course
From boundary conditions for -splines
From the perspective of a crucial point is that integral of equals to zero:
From these properties it is evident that functions are -splines with zero integral.
It is known that for the vector space of polynomial splines of degree , , defined on a finite interval with the sequence of knots , the dimension is
For the construction of all basis functions it is necessary to consider some additional knots. Without loss of generality we can add knots
Then every spline in has a unique representation
In ,  the splines with zero integral are studied. There is given the necessary and sufficient condition for -splines coefficients of these splines. However, typical -splines , thus ignoring the constraint (4) in for construction of the -spline basis, were used there.
Now, regarding the definition (7), we are able to use -splines which have zero integral on (denoted also as -splines in the sequel). In the following, denotes the vector space of polynomial splines of degree , defined on a finite interval with the sequence of knots and having zero integral on . With respect to the condition of the zero integral it is clear that
which are basis functions of the space . Then every spline has a unique representation
Reduction of dimension for splines in by one is a very natural consequence of clr transformation of density functions. Note that this feature is present also for clr coefficients of compositional data .
We consider knots . The task is to find a cubic spline with the given sequence of knots and which has zero integral on the interval . It is evident that , . We consider the additional knots
The basis functions of the space are plotted in Figure 6. Every spline can be written as
Thus, e.g., for the cubic spline with zero integral is plotted in Figure 7.
Now let us compare splines from the spaces and . For this purpose we recall the following theorem, which was proven in .
For a spline , , the condition is fulfilled if and only if
By considering the above thoughts, we are now able to prove the following statement.
Every spline , is an element of the space and fulfills the condition .
Every spline can be written in matrix notation as
where is the vector of basis functions of the space and is the vector of the corresponding -spline coefficients. Regarding Theorem 2.1 we know that
where and , .
With respect to (8) we are able to write the functions in matrix notation,
Then it is clear that
Therefore the spline , can be written in matrix notation as
where and . Furthermore, let us denote
Then and it is evident that . To fulfill the condition of the zero integral on interval it is necessary and sufficient that
In order to prove this equality we can write
it can be easily seen that
and the proof is complete. ∎
The formula (16) is very useful, because we can use the standard -spline basis for working with splines honoring the zero integral constraint, which is very convenient from computational point of view.
3 Smoothing spline in
In  the construction of smoothing splines in the space was studied, however using standard -spline basis functions . Now we are able to construct smoothing splines in this space with new basis functions . For this purpose, let data , , weights , , sequence of knots , and a parameter be given. For arbitrary our task is to find a spline , which minimizes the functional
Note that the choice of parameter and , where stands for th derivation, affects smoothness of the resulting spline. Let us denote , , and . Regarding the representation (12) and matrix notation (16) the functional can be written as a quadratic function
is positive definite, because , are basis functions. Upper triangular matrix has full row rank. is a diagonal matrix such that
Finally, stands for the collocation matrix, i.e.
Using the notation ,
it is possible to rewrite the quadratic function as
Our task is to find a spline which minimizes the functional , in other words, we want to find a minimum of the function (20). It is obvious that this function has just one minimum if and only if the matrix is positive definite (p.d.). From (19) it can be easily seen that
From Schoenberg-Whitney theorem and its generalization, see  and , it is known that matrix is of full column rank if and only if there exists with , , such that , . In this case from the necessary and sufficient condition for a unique minimum of quadratic function, i.e.
we get a system of linear equations and then the unique solution of this system is given by
Consequently, the resulting smoothing spline is obtained by formula
in matrix notation using standard -splines as
where the vector is given in (21).
4 Orthogonalization of basis functions
A further step is to orthogonalize the basis
of the space that is by construction obligue with respect to the space metric. For this purpose the idea presented in 
is used. We search for a linear transformationsuch that
forms an orthogonal set of basis functions of the space , i.e.
Regarding the lemma presented in  and notation used here we can formulate the following statement.
An invertible transformation orthogonalizes the basis functions if and only if it satisfies the condition that
where represents the positive definite matrix
With respect to the definition of basis functions the matrix can be expressed as
where . The linear transformation is not unique and can be computed for example by the Cholesky decomposition. The basis functions
To sum up, the spline with zero integral can be constructed as a linear combination of orthogonal -splines having zero integral in a form
On the other hand, the standard and well-known -splines can be used to represent in matrix form
This formulation seems to be very useful because it allows us to use existing -spline codes in software R or Matlab, for example, the collocation matrix or computation of integrals in (18).
5 Simplicial splines in the Bayes space
Construction of splines directly in has important practical consequences, however, it is crucial also from the theoretical perspective. Expressing -splines as functions in enables to back-transform them to the original Bayes space using (5). It results in simplicial -splines (-splines), obtained from (8) as
Accordingly, for instance -splines from Figures 1-5 can be now expressed directly in the Bayes space as -splines, see Figures 10-12. Note that -splines fulfill the unit integral constraint which is, however, not necessary for further considerations. As a consequence, it is immediate to define vector space of simplicial polynomial splines of degree , defined on a finite interval with the sequence of knots . From isomorphism between and it holds that
Moreover, from isometric properties of clr transformation it follows that every simplicial spline in can be uniquely represented as
The resulting simplicial splines (with either orthogonal, or non-orthogonal -spline basis) can be used for representation of densities directly in . This is an important step in construction of methods of functional data analysis involving density functions, like for ANOVA modeling  or for the SFPCA method introduced in  and demonstrated in the next section. In the latter case -splines were indeed already used for construction of the procedure, although at the respective level of development the authors were not aware of that. With -splines one has a guarantee that methods are developed consistently in the Bayes space. Moreover, the possibility of having an orthogonal basis enables to gain additional features resulting from orthogonality of finite dimensional projection in combination with approximate properties of spline functions.
As usual, simplicial splines can be tuned according to concrete problem, with the advantage of their direct formulation in the Bayes space sense.
To illustrate smoothing of concrete data with a simplicial spline, 1000 values from standard normal distribution were simulated and the support was determined by minimum and maximum simulated values,
To illustrate smoothing of concrete data with a simplicial spline, 1000 values from standard normal distribution were simulated and the support was determined by minimum and maximum simulated values,. Data were collected in a form of histogram, where the breakpoints are equidistantly spaced. The representative points of the histogram cells are denoted as asterisks in Figure 13 (left) and these points form the input data , for smoothing purpose. The -values stand for discretized relative contributions to the overall probability mass, therefore discrete clr transformation  is needed to obtain a real vector with zero sum constraint (Figure 13, right). These data points were smoothed using the procedure from Section 3 and back-transformed to the original space. In the concrete setting , , , , ,