## 1 Introduction

In various problems of applied sciences the data take values in a manifold. Examples are circle and sphere-valued data. They appear in interferometric SAR imaging [63], as wind directions [72], and as orientations of flow fields [3, 75]. Further, in color image processing they appear in connection with color spaces such as HSI, HCL, as well as in chromaticity based color spaces [27, 84, 59, 60]. Other examples are data with values in the special orthogonal group which may express camera positions or orientations of aircrafts [83], Euclidean motion group-valued data [70] representing, e.g., poses as well as shape-space data [64, 20]. Another prominent manifold is the space of positive (definite) matrices of dimension endowed with the Fisher-Rao metric [69]. For each is a Cartan-Hadamard manifold which has nice differential-geometric properties. On the application side,

is the data space in diffusion tensor imaging

[68] which allows the quantification of diffusional characteristics of a specimen non-invasively [13, 56] and which is therefore helpful in the context of neurodegenerative pathologies such as schizophrenia [41] and autism [6]. Another application of positive matrices are deformation tensors; cf. [83].There are various directions on processing manifold valued data. Manifold-valued partial differential equations have for instance been considered in

[82, 29]. In particular, finite element methods for manifold-valued data are the topic of [51, 71]. Work on statistics on Riemannian manifolds can be found in [66, 22, 40, 23, 67, 39]. Optimization problems for manifold-valued data are for example the topic of [2, 1, 48] and of [55] with a view towards learning in manifolds. We also mention related work on optimization in Hadamard spaces [10, 9]. Nonsmooth variational methods for segmentation and denoising using Potts and Mumford-Shah models for manifold-valued data were considered in [92, 78]. TV functionals for manifold-valued data were considered in [42, 43, 44]; algorithmic approaches for TV minimization were considered in [79, 61, 49, 91, 19, 77]. Examples of applications of TV regularization for medical imaging tasks can be found in [15, 74, 16]. Recently, TV type regularization with indirect data terms, which in particular may be used to realize deconvolution models in the manifold setting, has been considered in [76].Interpolatory wavelet transforms for linear space data have been investigated by D. Donoho in [34]. Their analogues for manifold-valued data have been introduced by Ur Rahman, Donoho and their coworkers in [83]. Such transforms have been analyzed and developed further in [50, 47, 89]

. Typically, the wavelet-type transforms employ an (interpolatory) subdivision scheme to predict the signal on a finer scale. The ‘difference’ between the prediction and the actual data on the finer scale is realized by vectors living in the tangent spaces of the predicted signal points which point to the actual signal values; i.e., they yield actual signal values after application of a retraction such as the exponential map. These tangent vectors then serve as detail coefficients. Subdivision schemes for manifold-valued data have been considered in

[85, 45, 93, 88, 86]. Interpolatory wavelet transforms and subdivision are discussed in more detail in Section 2.1 below which deals with preliminaries. All the aforementioned approaches in this paragraph consider explicit schemes, i.e., the measured data is processed in a forward way using the analogues of averaging rules and differences in the manifold setting. In contrast, we here consider an implicit approach based on a variational formulation.In this paper, we consider wavelet sparse regularization for manifold-valued data. Let us briefly describe the univariate situation here; details and the multivariate setup are considered in Section 2. Let be data living in the manifold defined on a discrete grid of length We consider the problem (which, for , is a variant of the LASSO [80, 26] for manifold-valued data)

(1) |

Here, denotes the argument to optimize for; it may be thought of as the underlying signal generating the response where is a (necessarily) nonlinear operator which models for instance a system’s response. In the simplest case of pure denoising, may be implemented as the identity on Further instances of are detailed in Section 2; examples are the manifold valued analogues of convolution operators. The nearness between and the data is measured by which denotes the th power of the distance in The parameter vector regulates the trade-off between the data fidelity, i.e., the distance to the data and the regularizing term Let us more precisely describe the term which is the central topic of the paper. We let

(2) |

Here the symbol denotes the norm induced by the Riemannian scalar product in the point which is the point where the detail is a tangent vector at. The parameter is a smoothness parameter and the parameter stems from a norm type term. The details at scale of the interpolatory wavelet transform for manifold valued data are given by

(3) |

Here and ( the finest level) denote the thinned out target at scale and respectively. The coarsest level is denoted by denotes the application of an interpolatory subdivision scheme for manifold-valued data to the coarse level data evaluated at the index which serves as prediction for Then, denotes the tangent vector sitting in which points to Here, we use the symbol to denote the inverse of the Riemannian exponential function We point out that we use the normalization of the coefficients employed in [34] in the linear case. Details are discussed in Section 2.1. The second term addresses the coarsest scale; it measures the th power of the distance between the target items on the coarsest scale. As already pointed out, the case in (1), corresponds to the manifold analogue of the LASSO or -sparse regularization which, in the linear case, is addressed by (iterative) soft thresholding [35]. This choice is particularly interesting since it favors solutions which are sparse w.r.t. the considered wavelet expansion. The manifold analogue of -sparse regularization is obtained by using the regularizer

(4) |

The operator is used to count the number of elements in the corresponding set. Note that so the number of non-zero detail coefficients of the wavelet expansion is penalized. Similar to the linear case [87, 35, 26], potential applications of the considered sparse regularization techniques are denoising and compression.

### 1.1 Contributions

The contributions of the paper are as follows: *(i)*
we introduce and study a model for wavelet sparse regularization in the manifold setup;
*(ii)* we provide algorithms for the proposed models;
*(iii)* we show the potential of the proposed algorithms by applying them to
concrete manifold-valued data.
Concerning *(i)*, we propose a variational scheme employing manifold valued interpolatory wavelets in the regularizing term. In particular, we consider a sparsity promoting type term
as well as an type term.
We obtain results on the existence of minimizers for the proposed models.
Concerning *(ii)* we provide the details for an algorithmic realization of the proposed variational
model.
In particular, we apply the concepts of a generalized forward backward-scheme with Gauss-Seidel type update and a trajectory method as well as the well-established concept of a cyclic proximal point algorithm.
To implement these schemes we derive expressions for
the (sub)gradients and proximal mappings of the atoms of the wavelet regularizing terms.
This includes the manifold analogues of and sparse wavelet regularization.
Concerning *(iii)*, we provide a detailed numerical study of the proposed scheme.
We provide experiments with data living on the unit circle, in the two-dimensional sphere as well as in the space of positive matrices.

### 1.2 Outline of the paper

The paper is organized as follows. The topic of Section 2 is to derive a model for wavelet sparse regularization for manifold valued data and show its well-posedness. In Section 3 we consider the algorithmic realization of the proposed models. In Section 4 we provide a numerical investigation of the proposed algorithms. Finally, we draw conclusions in Section 5.

## 2 Model

In Section 2.1 we provide basic information on subdivision schemes and interpolatory multiscale transforms needed to define a variational model for wavelet sparse regularization for manifold valued data. In Section 2.2 we give a detailed formulation of the manifold analogue of the variational problem for wavelet sparse regularization. In Section 2.3, we obtain well-posedness results for the variational problem, i.e., we show that all considered problems have a minimizer.

### 2.1 Preliminaries: Subdivision schemes, interpolatory multiscale transforms

We here provide information on subdivision schemes and interpolatory multiscale transforms we need later on.

#### Subdivision schemes.

We consider the multivariate setup. The purpose of a subdivision scheme is to refine a grid function. More precisely, given a function which lives on a coarser dimensional grid, a subdivision scheme assigns a function living on a finer dimensional grid. Here, we use the multiindex notation to specify the points in the domain of the grid functions. In case of dyadic refinement and a linear subdivision scheme, the assignment of to data is done via the rule

(5) |

Here, is called the mask of the subdivision scheme We always assume that which means that reproduces constants. For simplicity, we will use dyadic refinement in the following. We point out, that our approach in this paper is also amenable for refinement using general dilation matrices; references dealing with subdivision schemes for more general dilation matrices are [53, 54, 52, 90]. In the following we are interested in interpolatory subdivision schemes. To explain the notion of an interpolatory scheme we notice that the mask on an -variate domain actually encodes convolution operators A scheme is interpolatory if the convolution operator corresponding to the coarse level grid is the identity operator. In formulas, this means that for all were denotes the Kronecker symbol which equals one if and only if the indices are equal, and zero else. Examples of interpolatory schemes are the well-known four-point scheme [36] or the schemes of Delaurier and Dubuc [32] with the particular instances of the first order interpolatory scheme and the third order interpolatory Deslaurier-Dubuc scheme given by the masks

(6) |

respectively. These are univariate schemes which can be adapted such as to work in any multivariate domain of dimension by a tensor product construction: the mask of the multivariate scheme is given by where is the mask of the univariate scheme, and is a multiindex.

In recent years, there has been a lot of interest in nonlinear subdivision schemes and in particular in subdivision schemes dealing with geometric or manifold-valued data [85, 45, 93, 88, 86]. For geometric data, typically geometric analogues of linear subdivision schemes are considered. This means that the coefficients stored in the mask are used to define a geometric scheme employing a manifold construction replacing the averaging operation in affine space. Examples of such constructions include geodesic averaging [85], - schemes [83] and intrinsic means [88]. The latter have turned out to be particularly suitable [86, 46, 47]. For this reason, and due to their variational nature, we concentrate on geometric schemes based on intrinsic means. The intrinsic mean variant of the subdivision scheme is given by

(7) |

The intrinsic mean in the manifold is given by

(8) |

It replaces the affine average on the right hand side of (5) (which is the affine space variant.) The coefficients of the scheme are the coefficients of mask which are the same for a linear scheme and its geometric analogue. Although the subdivision operator is nonlinear in the general manifold setting we omit the brackets in the notation since this is particularly convenient in connection with index notation.

The first definition of the intrinsic mean can be traced back to Fréchet; a lot of work has been done by Karcher [57]. For details including an historic overview we refer to [4]; see also [58]. Due to its use as a means of averaging in a manifold, it is employed as a basic building in various works; e.g., [68, 39, 81] as well as many references in the paragraph above. Various work deals with the numerical computation of the center; see, for instance [5, 38, 7].

The mean is not unique for all input constellations. However, for data living in a small enough ball uniqueness is guaranteed. For a discussion, precise bounds and an extensive list of reference concerning this interesting line of research we refer to Afsari [4]. A reasonable way to deal with the nonuniqueness is to consider the whole set of minimizers in such a case. In this paper we do so where – in case of nonuniqueness – we add an additional constraint originating from our variational formulation described in Section 2.2.

#### Interpolatory wavelet transforms.

Interpolatory wavelet transforms for data living in linear spaces have been considered by D. Donoho in [34]. To explain the idea, we consider grid functions at the finer scale and the coarser scale which are related via for all multiindices and levels (which might originate from point samples of a function defined on with values in ) The interpolatory wavelet transforms for real valued data in [34] are then defined by mapping the signal to the details

(9) |

where is a multiindex, denotes the ordinary subtraction in a vector space, and denotes a linear interpolatory subdivision scheme. Please note the slight abuse of notation in the expression which represents the scheme applied to the data evaluated at the multiindex The transformation then maps

(10) |

where is a grid function at the finest scale used as input.

Interpolatory transforms for manifold-valued data have been introduced in [83]. In the manifold situation, we interpret (9) as follows: (i) the symbol denotes a geometric subdivision scheme , here concretely the intrinsic mean analog given by (8) of a linear interpolatory scheme, e.g., a multivariate tensor product Deslaurier-Dubuc scheme; (ii) the symbol is interpreted via the the inverse of the Riemannian exponential mapping concretely, We note that, in contrast to the real valued situation, where the tangent spaces at all points are naturally identified, the details now are tangent vectors with sitting in Concerning analytic results for these transforms we refer to [50, 47, 89].

### 2.2 Wavelet Sparse Regularization in the Manifold Setup – Variational Formulation

The previously discussed interpolatory wavelet transforms are explicit means of processing data in the sense that analogues of averaging rules and differences in the manifold setting are applied to the data directly. In contrast, we here consider an implicit approach based on a variational formulation where the interpolatory wavelet transform is used as a regularizing term.

In the following, we always consider a complete and connected Riemannian manifold (with its canonical metric connection, its Levi-Civita connection). More precisely, we consider a manifold with a Riemannian metric which is a smoothly varying inner product in the tangent space of each point . According to the Hopf-Rinow theorem, the complete manifold is geodesically complete in the sense that any geodesic can be prolongated arbitrarily. The Levi-Civita connection is the only connection which is symmetric and which is compatible with the metric. For an account on Riemannian geometry we refer to the books [73, 33].

Extending the discussion started in the introduction, we derive a model for wavelet sparse regularization for manifold-valued data in the multivariate situation. We consider data where denotes a multiindex consisting of components. We propose to consider the problem

(11) |

where the target variable is a multivariate function defined on a regular grid, and, in contrast to (1), denotes a multiindex consisting of components representing the -dimensional domain of For instance, for a classical image and for a volume Further, we use the usual multiindex notation to denote

(12) |

the th power of the distance in The parameter vector regulates the trade-off between the data fidelity, i.e., the distance to the data and the regularizing term Possible choices of the imaging operator are discussed below.

Our central topic is the precise definition of the wavelet sparse regularizing term in an -variate situation. We assume that the signal dimensions and the scale level are related via with the scalar and the multiindices Then, the coarsest scale has size with data points in the th component, . We define by

(13) |

Here, denotes a multiindex, which, in dependence on the level takes values in the range With the details where defined by (9) of the interpolatory wavelet transform, we have

(14) |

Here, denotes the norm induced by the Riemannian scalar product in the points and we recall that the symbol denotes the spatial dimension of the domain of Concerning the second term which addresses the coarsest level, we let , be the th unit vector. The term measures the th power of the distance between neighboring data items on the coarsest scale w.r.t. any of the coordinate directions. We note that often three parameters are considered in connection with the linear space analogue of the regularizer in (14), cf. [26]. In this respect, we here consider the particular case of As already pointed out we are particularly interested in the case in (14), which corresponds to the manifold analogue of -sparse regularization.

The manifold analogue of -sparse regularization is given by using the regularizer

(15) |

which incorporates the number of non-zero detail coefficients

#### Handling nonunique means.

For the definition of the details in (9) we use geometric subdivision schemes and we have defined geometric subdivision scheme using intrinsic means in (7). As explained above, intrinsic means are locally unique. (We note that it is often the case in differential geometry that the considered objects are only locally unique, e.g., geodesics.) To avoid complications, we add an additional constraint originating from our variational formulation. In case of non-uniqueness, we denote the set of all minimizers in (8) by and single out those which minimize the corresponding expression used to define More precisely, in such cases, we overload the previously given definition in (9) by

(16) |

This means that we choose the details of smallest size (in the sense of the Riemannian metric).

#### Instances of the imaging operator .

We here discuss various instance of possible imaging operators to give an impression to the reader. The discussion is by no means complete. At first, letting be the identity in the manifold corresponds to pure denoising. Inpainting situations can be modeled by removing the missing summands. Let us be more precise. To this end, let us consider a matrix with real-valued entries given as

(17) |

th positive row sums, i.e., for all (Note that we do not require the particular items to be nonnegative.) Here can be read as multiindices. Using the matrix as a kind of kernel, we may define the th component of by

(18) |

Then, the denoising situation corresponds to

being the identity and the inpainting situation corresponds to removing those rows of the identity matrix which correspond to missing data. But these are only special cases; actually, we can use any matrix

with positive row sums; see [76]. In particular, by this construction, we can realize the geometric analogue of any convolution operator which reproduces constants. Explicitly, in the bivariate situation (without multiindices), this reads(19) |

where denotes the bivariate convolution kernel. We further notice, that in the manifold case, where each data item can be more complex than a real number, also may denote an item/pixel-wise construction or reconstruction process. An example of such a pixel-wise construction process is the pixel-wise generation of diffusion tensor from DWI measurements in diffusion tensor imaging [16]. Another example in connection with shape spaces is [74].

#### Handling the boundary of the image or volume.

In the context of wavelets, classical means of dealing with data on a bounded domain consisting of a Cartesian product of intervals is to extend the data beyond the domain by either extending the data by

(zero-padding), by periodizing, or by extending the data by reflection. In the setup of interpolatory manifold valued data there is no distinguished zero-element such that the notion of zero padding does not generalize immediately. However, periodizing and extending the data by reflection are notions which directly carry over to the manifold and interpolatory setup. In the context of orthogonal and biorthogonal wavelets on the interval more sophisticated methods using tailored boundary rules have been developed. For details we refer to

[30] and the references therein. Inspired by these approaches for orthogonal and biorthogonal wavelets in the linear case, we use specially tailored boundary rules in experiments in this work. In particular, for a th order linear univariate Deslaurier-Dubuc scheme, a natural boundary treatment consists of fitting a th order polynomial to the leftmost or rightmost data points, and evaluating these polynomials at the corresponding half-integers.### 2.3 Existence of minimizers

We here derive results on the existence of minimizers for the variational problem (11) of wavelet sparse regularization using the regularizers of (14) and of (15). For the regularizer with we get the existence of minimizers without additional constraints on the measurement operator and the manifold . This result is formulated as Theorem 3. For the regularizer with we need some additional constraints on This result is formulated as Theorem 4. In particular, existence is guaranteed for the denoising situation with being the identity. Finally, we give a corresponding existence result for wavelet sparse regularization using type regularizing terms in Theorem 6. We note that, for compact manifolds – which include the spheres in euclidean space, the rotation groups, as well as the Grassmannian manifolds – we get the existence of minimizers of the wavelet regularizers regularizer with and of the wavelet sparse regularizers using type regularizing terms without additional constraints on the measurement operator

We start with some preparation.

###### Lemma 1.

The regularizing term of (14) is lower semicontinuous.

###### Proof.

In order to show that the sum in (14) is a lower semicontinuous function of it is enough to show that each member of the sum is a lower semicontinuous function of The members of the form are continuous by the continuity of the distance function . We show that the mappings are lower semicontinuous functions on To this end, we consider a sequence each such that in as Since the power function is increasing, it is sufficient to show that

(20) |

Here, in case of non-unique details, we choose a realization and a corresponding prediction according to (16). We note that, by (16), the value does not depend on the choice of

Since, by assumption, in we find a point and a positive number such that all members together with all and are contained in a common ball around with radius or in other words, all members of all sequences are contained in a common bounded set. In particular, cf. (13). As explained in Section 2.1 a subdivision scheme encodes convolution operators with kernels given by where denotes the mask of the scheme. Each convolution operator is defined via the intrinsic mean; cf. (7) and (8). Hence, for each convolution operator defined via we may apply [76, Lemma 2] to obtain a positive number such that all means are contained in a common ball Taking the maximum of this radii w.r.t. the averaging operators, where denotes the dimension of the domain, we have that all are contained in the common ball for all Hence, the form a bounded sequence. Further, the sequence is bounded as well as a convergent sequence. As a next step, we choose a subsequence indexed by such that

(21) |

and now invoke the boundedness of the sequences. By the Hopf-Rinow theorem, since is assumed to be geodesically complete, we may extract convergent subsequences (which we, abusing notation for better readability, also denote by the indexation ) such that the limits

(22) |

exist, and such that converges to since is assumed to converge to Since and the distance function is continuous, we have

(23) |

using (21) for the last identity. We next show that To see this, we assume towards a contradiction that is not in Then, there is an element such that, by the definition in (8), Since as by (22), there is an index such that which contradicts being a minimizer of the corresponding sum. Hence By the definition in (16) this implies for any as in (16). Hence, Together with (23), this implies

(24) |

which means that the mappings are lower semicontinuous and, in turn, yields the assertion of the lemma. ∎

###### Lemma 2.

The type regularizing term of (15) is lower semicontinuous.

###### Proof.

We show that the mappings

(25) |

are lower semicontinuous functions of This implies the assertion of the lemma since To see the lower semicontinuity of let in as If there is nothing to show, so we assume that Then, and since this implies for sufficiently large Hence, for suffiently large , and therefore, We next study the lower semicontinuity of and again let As above, if there is nothing to show. So we may assume that which means that by (16). We show that for sufficently large which then implies that Assume to the contrary, that there is a subsequence such that Then, (which exists by our assumption) is a mean for data But this means which is a contradiction. Hence, for sufficiently large and therefore In summary, and are lower semicontinuous which implies that is lower semicontinuous. ∎

###### Theorem 3.

###### Proof.

By Lemma 1, is lower semicontinuous. We consider a sequence of signals and use the notation to denote the diameter of . We show that as To this end we consider the sums in (14), an note that Further, where the constant is smaller than where denotes the multiindex containing the dimensions of on the coarsest scale; cf. the description near (13). Hence, with independent of Therefore, and so as Together with the lower semicontinuity of all requirements of [76, Theorem 1] are fulfilled and we may apply this theorem to conclude the assertion. ∎

We note that a statement analogous to Theorem 3 does not hold for the regularizer of (14) with since then coercivity cannot be ensured in general. To account for that, we give a variant of Theorem 3 which imposes some additional constraints to but then also applies to the situation

###### Theorem 4.

The variational problem (11) of wavelet regularization using the regularizers of (14) with has a minimizer provided is an operator such that there is a constant such that, for any signal , the coarsest level of

may be estimated by

(26) |

for all In particular, the problem (11) has a minimizer for being the identity. Furthermore, the problem always has a solution, when the manifold is compact.

For the proof of Theorem 4 we employ [76, Theorem 7] which we here state for the reader’s convenience.

###### Theorem 5.

Let be pairs of (a priori fixed) indices. We assume that is lower semicontinuous. We further assume that is a regularizing term such that, for any sequences of signals the conditions and for some and for all and all imply that If is an imaging operator such that there is a constant such that, for any signal , for all , then the variational problem

(27) |

has a minimizer.

###### Proof of Theorem 4.

We apply Theorem 5. The lower semicontinuity of the regularizer is shown in Lemma 1. Towards the other condition of Theorem 5, let be a sequence such that and such that for some all and all We have to show that Towards a contradiction, assume that there is a subsequence of and such that for all We show that there is a constant such that for all and Since and since

Comments

There are no comments yet.