# Constraint matrix factorization for space variant PSFs field restoration

Context: in large-scale spatial surveys, the Point Spread Function (PSF) varies across the instrument field of view (FOV). Local measurements of the PSFs are given by the isolated stars images. Yet, these estimates may not be directly usable for post-processings because of the observational noise and potentially the aliasing. Aims: given a set of aliased and noisy stars images from a telescope, we want to estimate well-resolved and noise-free PSFs at the observed stars positions, in particular, exploiting the spatial correlation of the PSFs across the FOV. Contributions: we introduce RCA (Resolved Components Analysis) which is a noise-robust dimension reduction and super-resolution method based on matrix factorization. We propose an original way of using the PSFs spatial correlation in the restoration process through sparsity. The introduced formalism can be applied to correlated data sets with respect to any euclidean parametric space. Results: we tested our method on simulated monochromatic PSFs of Euclid telescope (launch planned for 2020). The proposed method outperforms existing PSFs restoration and dimension reduction methods. We show that a coupled sparsity constraint on individual PSFs and their spatial distribution yields a significant improvement on both the restored PSFs shapes and the PSFs subspace identification, in presence of aliasing. Perspectives: RCA can be naturally extended to account for the wavelength dependency of the PSFs.

Comments

There are no comments yet.

## Authors

• 2 publications
• 5 publications
• 1 publication
• 1 publication
• 1 publication
• ### A Very Fast Algorithm for Matrix Factorization

We present a very fast algorithm for general matrix factorization of a d...
11/02/2010 ∙ by Vladimir Nikulin, et al. ∙ 0

read it

• ### Dynamic Partial Sufficient Dimension Reduction

Sufficient dimension reduction aims for reduction of dimensionality of a...
09/26/2019 ∙ by Lu Li, et al. ∙ 0

read it

• ### Super-resolution method using sparse regularization for point-spread function recovery

In large-scale spatial surveys, such as the forthcoming ESA Euclid missi...
10/16/2014 ∙ by Fred Maurice Ngolè Mboula, et al. ∙ 0

read it

• ### PSF field learning based on Optimal Transport Distances

Context: in astronomy, observing large fractions of the sky within a rea...
03/17/2017 ∙ by F. M. Ngolè Mboula, et al. ∙ 0

read it

• ### Simplex-Structured Matrix Factorization: Sparsity-based Identifiability and Provably Correct Algorithms

In this paper, we provide novel algorithms with identifiability guarante...
07/22/2020 ∙ by Maryam Abdolali, et al. ∙ 0

read it

• ### PIntMF: Penalized Integrative Matrix Factorization Method for Multi-Omics Data

It is more and more common to explore the genome at diverse levels and n...
03/03/2021 ∙ by Morgane Pierre-Jean, et al. ∙ 0

read it

• ### Spatial Spread Sampling Using Weakly Associated Vectors

Geographical data are generally autocorrelated. In this case, it is pref...
10/29/2019 ∙ by Raphaël Jauslin, et al. ∙ 0

read it

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In many applications such as high precision astronomical imaging or biomedical imaging, the optical system introduces a blurring of the images that needs to be taken into account for scientific analyses, and the blurring function, also called Point Spread Function (PSF), is not always stationary on the observed field of view (FOV). A typical example is the case of the Euclid space mission [1], to be launched in 2020, where we need to measure with a very high accuracy the shapes of more than one billion of galaxies. An extremely important step to derive such measurements is to get an estimate of the PSF at any spatial position of the observed images. This makes the PSF modeling a critical task. In first approximation, the PSF can be modeled as a convolution kernel which is typically space and time-varying. Several works in image processing [2] and specifically in astronomy [3, 4], address the general problem of restoring images in presence of a space variant blur, assuming that the convolution kernel is locally known.

In astronomical imaging, unresolved objects such as stars, can provide PSF measurements at different locations in the FOV. Nevertheless, these images can be aliased given the CCD sensors sizes which makes a super-resolution (SR) step necessary. This is the case for instance for the Euclid mission.

The SR is a widely studied topic in general image processing literature[5]. In astronomy, softwares IMCOM [6] and PSFEx[7], which propose an SR option, are widely used. The IMCOM provides an oversampled output image from multiple under-sampled input images, assuming that the PSF is perfectly known. It does not deal with the PSF restoration itself. The PSFEx treats SR as an inverse problem, with a quadratic regularizer. In [8], a sparsity based super-resolution method was proposed, assuming that several low resolution (LR) measurements of the same PSF are available. In practice, we generally don’t have such multiple measurements.

In this paper, we consider the case where the PSF is both space variant and under-sampled, and we want to get an accurate modeling at high resolution of the PSF field, assuming we have under-sampled measurements of different PSFs in the observed field. We assume that the PSFs vary slowly across the field. Intuitively, this implies a compressibility of the PSFs field, which leads us to the question of what would be a concise and easily understandable representation of a spatially indexed set of PSFs.

## 2 Notations

We adopt the following notation conventions:

• bold low case letters are used for vectors;

• bold capital case letters are used for matrices;

• we treat vectors as column vectors unless explicitly mentioned otherwise.

For a matrix , we note the coefficient of the line, or its column and or its line, that we treat as a line vector. More generally for , we note the matrix obtained by extracting the columns of indexed from to ; for , is defined analogously with respect to ’s lines. For a vector , refers to its component. For a given integer , we note

the identity matrix of size

. Let be a euclidean space ( can be or if we consider spatial or spatio-temporal data respectively). We note a set of vectors in . In this paper, we only consider the case ; will be a set of positions in a plan.

## 3 The PSFs field

### 3.1 The observation model

We assume that we have an image , which contains unresolved objects such as stars, which can be used to estimate the PSFs field. Noting one of these objects at spatial position , is therefore a small patch of with pixels, around the spatial position . We will write as a 1D vector. The relation between the ”true” PSF and the noisy observation is

 yk=Mkxk+nk (1)

where is a linear operator and is a noise that we assume to be Gaussian and white. We will consider two kinds of operators in this paper: the first one is the simple case where and we have the number of pixels in is equal to , and the second one is a shift+downsampling degradation operator and , where is the downsampling factor in lines and columns, with .

Noting the matrix of lines and columns of all observed patches, the matrix of all unknown PSFs, we can rewrite Eq. 1 as

 Y=F(X)+N (2)

where .

This rewriting is useful because, as we discuss in the following, the different PSFs are not independent, which means that the problems of Eq. 1 should not be solved independently for each . In other terms, the vectors belong to a specific unknown manifold that needs to be learned by using the data globally.

### 3.2 The data model

Let be a dimensional subspace of embedding the PSFs field. We assume that there exists a continuous function , so that . The regularity of translates the correlation of the data in space (and time).

Let be a basis of . By definition, we can write each as a linear combination of the , , , or equivalently

 X=SA (3)

where and is a matrix containing the coefficients of the vectors () in the dictionary . Each column of the matrix , that we also refer to as an atom, can be seen as an eigen PSF, i.e. a given PSF’s feature distributed across the field.

### 3.3 The inverse problem

We need therefore to minimize , which is an ill posed problem due to both the noise and the operator , denoting the Frobenius norm of a matrix. There are several constraints that may be interesting to use in order to properly regularize this inverse problem:

• positivity constraint: the PSF should be positive;

• low rank constraint: as described above, we can assume that , which means that we can instead minimize

 minA,S∥Y−F(SA)∥2F; (4)

we assume that ; this dimension reduction has the advantage that there are much less unknown to find, leading to more robustness, but the problem is now that the cost function is not convex anymore;

• smoothness constraint: we can assume that the vectors are structured; the low rank constraint does not necessarily impose to be smooth or piece-wise smooth; adding an additional constraint on atoms, such as a sparsity constraint, allows to capture spatial correlations within the PSFs themselves; an additional dictionary can therefore be introduced which is assumed to give a sparse representation of the vectors ;

• proximity constraint: we can assume that a given at a position is very close to another PSF at position if the distance between and is small; this means that the field must be regular; this regularity can be forced by adding constraints on the lines of the matrix ; indeed, the values relative to a line correspond to the contribution of the th eigen PSF to locations relative to the spatial positions .

We show in section 5 how these four constraints can be jointly used to derive the solution. Let first review existing methods susceptible to solve this problem.

## 4 Related work

In all this section, refers to the observed data set . In the first part, the aforementioned degradation operator is simply the identity. Therefore we review some dimension reduction methods. In the second part is a shifting and downsampling operator; we present a PSF modeling software dealing with this more constraining setting.

### 4.1 Dimension reduction

The principal components analysis is certainly one of the most popular mathematical procedure in multivariate data analysis and especially, dimension reduction. In our case, we want to represent

’s elements using vectors, with . A PCA gives an orthonormal family of vectors in

so that the total variance of

along these vectors directions is maximized. By definition, the PCA looks for redundant features over the whole data set. Therefore, in general, the principal components neither capture localized features (in sense of ) nor have a simple physical interpretation.

In [9], a ”regularized” PCA is proposed to address this shortcoming for spatial data analysis in atmospheric and earth science. Indeed, as a PCA, the method solves the following problem,

 minA∥Y−YATA∥2F,s.t..t.AAT=Ir, (5)

for some chosen small . Moreover, it jointly imposes a sparsity constraint and a smoothing penalties with respect to the space , on the matrix lines. This way, with the right balance between those two penalties, one favors the extraction of localized spatial features, making the interpretation of the optimal easy. Yet, there is no obvious way of setting the sparsity and smoothness parameters, which are crucial; moreover, unless the data actually contain spatially localized and non-overlapping features, the coupled orthogonality and sparsity constraint is likely to yield a biased approximation of the data.

In the context of remote sensing and multi-channel imaging, two ways of integrating spatial information into PCA are proposed in [10]; the set is made of multi-channel pixels. In the first way, the author introduces a weighting matrix indicating the relative importance of each pixel. For instance, the weight of a given pixel can be related to its distance to some location of interest in . Then, the computation of the covariance matrix of image bands is slightly modified to integrate this weighting. This idea is close to the methodology proposed in [11]. As a consequence, one expects to recover spectral features spatially related to some location of interest within the most important ”eigen-pixels”. Yet, we do not have any specific location of interest in and we rather want to recover relevant features across the whole data set.

The second approach aims at taking into account the spatial associations and structural properties of the image. To do so, modified versions of the image bands covariance matrices are calculated, with increasing shifts between the bands, up to a predetermined maximum shifting amplitude. These covariance matrices, including the ”regular” one, are averaged and the principal components are finally derived. Intuitively, one expects the spectral features present in structured images regions to be strengthened and therefore captured into the principal components. However, we consider a general setting where the data are randomly distributed with respect to , which makes the shifted covariances matrices ill-defined.

A review of PCA applications and modifications for spatial data analysis can be found in [12].

In case the data lie on or are close to a manifold of dimension embedded in , one can consider using one of the numerous non-linear dimension reduction algorithms published in the manifold learning literature, such as GMRA [13], [14]. The idea is to partition the data in smaller subsets of sample close to each other in the sense of the manifold geometry. From this partionning, the manifold tangent spaces are estimated at subsets locations; estimates are then simply given by the best regressions of these subsets with dimensional affine subspaces. The method includes some multiresolution considerations that are not relevant to our problem. This procedure provides a dictionary in which each of the original samples need at most elements to be represented. Moreover, the local processing of the data, which is necessary in this setting because of the manifold curvature, makes this approach somehow compatible with the considered problem. Indeed, by hypothesis, the closer two samples will be in sense of , the closer they will be in , and the more likely they will fall into the same local cluster.

Another interesting alternative to the PCA can be found in [15]. This construction called ”Treelets” extracts features by uncovering correlated subsets of variables across the data samples. It is particularly useful when the sample size is by far smaller than the data dimensionality (), which does not hold in the application we consider in the following.

### 4.2 Super-resolution

In this subsection, takes the following form:

 F(X)=[M1x(c)1,…,Mpx(c)p], (6)

where is a warping and downsampling matrix. Since we consider a set of compact objects images, the only geometric transformation one has to deal with for registration is the images shifts with respect to the finest pixel grid, which can be estimated using the images centroids [8].

To the best of our knowledge, the only method dealing with this specific setting is the one used in the PSF modeling software PSFEx [7]. This method solves a problem of the form:

 minΔS12∥Y−F((ΔS+S0)A)∥2F+λ∥ΔS∥2F. (7)

is a rough first guess of the model components. Each line of the weight matrix is assumed to follow a monomial law of some given field’s parameters. The number of components is determined by the maximal degree of the monomials. For instance, let say that we want to model the PSFs variations as a function of their position in the field with monomials with degrees up to 3, then:

• one needs 6 components corresponding to the monomials ;

• assuming that the PSF in ’s columns order is located at then the column of is given by up to a scaling factor.

This method is used for comparisons in the Numerical experiments part.

## 5 Resolved Components Analysis

### 5.1 Matrix factorization

We have seen that we can describe the PSFs field as

 [f(u1),…,f(up)]=X=SA. (8)

The matrix is independent of the spatial location, and the line of gives the contribution of the vector to each of the samples. As discussed in section 3.3, the field’s regularity can be taken into account by introducing a structuring of the matrix . We can write:

 A[i,:]T=N∑l=1αilυl,i=1…r, (9)

where is a set of vectors spanning . Equivalently, we can write , where and is a matrix (see Fig. 1).

### Physical interpretation

An interesting way to well interpret is to consider the ideal case where the measurements are distributed following a regular grid of locations . In this case, we can expand the vector using the Discrete Cosine Transform (DCT), and vectors in Eq. 9 are regular cosine atoms, and the column index of the matrix is related the frequency. Hence, lines relative to high frequencies will be related to quicky varying PSF components in the field, while lines related to low frequencies will be related to PSFs stable components. In practice, the sampling is not regular and the DCT cannot be used, and has to be learned in a way to keep the harmonic interpretation valid. We want some lines to describe stable PSFs components on the FOV, and other to be more related to local behavior.

### 5.2 The proximity constraint on A

As previously mentioned, we want to account for the PSFs field’s regularity by constraining ’s lines. Specifically, we want some lines to determine the distribution of stable features across the PSFs field while we want other lines to be related to more localized features. In order to build this constraint, let first consider the simple case of a one dimensional field of regularly spaced PSFs.

#### 5.2.1 Regularly distributed observations

We first assume that .

We suppose that , for some integer and we consider the 1D vector defined as follows:

 ψi=ψp−i+1=−1/|ui−uk+1|e ifi≠k+1, (10) ψi=p∑j=1j≠k+1a/|uj−uk+1|e, otherwise (11)

for some positive reals and . We suppose that is normalized in norm. We refer to this family of signals, parametrized by and as ”notch filters”, in reason of their frequency responses shapes. Some examples can be found in Fig. 2. One can observe that is essentially a high pass filter. As increases, the notch structure clearly appears, with an increasing notch frequency. It is clear that, for a vector , minimizing the functional promotes vectors with spectra concentrated around the notch frequency corresponding to the chosen values of and . We can directly use this family of filters to constraint as follows: we define the functional

 Ψ:Mrp(R)↦R+,A→r∑i=1Ψei,a(A[i,:]), (12)

where is a set of reals verifying and . Because the notch frequency increases with , minimizing promotes varying level of smoothness of ’s lines, which is what we wanted to achieve. The filter and the functional definitions can be extended to higher dimensions of the space by involving a multidimensional convolution [16]. Therefore, if the PSFs are distributed over a regular grid with respect to , one can implement the proximity constraint by solving

 minA,S12∥Y−F(SA)∥2F+λΨ(A), (13)

for some positive . Yet, in practical applications, the observations are in general irregularly distributed. In the next section, we propose a slightly different penalty which is usable for arbitrary observations distributions.

#### 5.2.2 General setting

Let define the functional

 ˆΨe,a:Rp↦R+,v→p∑k=1(p∑i=1i≠kavk−vi∥uk−ui∥e2)2, (14)

where and are positive reals. Minimizing tends to enforce the similarity of close features, with respect to ; in other terms, the more is large, the less important is in and somehow determines the radius of similarity. For , because of the uniform spacing of the values and the decay of , for sufficiently high ; we give more details on this approximation in A in the 1D case. However unlike , the functional is still relevant without the uniform sampling hypothesis and we expect qualitatively the same behavior as with respect to the frequency domain if the data sampling is sufficiently dense. Therefore we use instead of in the functional of Eq.13. Besides, we use the term frequency even for randomly distributed samples.

#### 5.2.3 Flexible penalization: the redundant frequencies dictionary V

The efficiency of the regularization of the problem 13 relies on a good choice of the parameters and . Indeed, if the associated notch frequencies does not match with the data set frequency content, the regularization will more or less bias the PSF estimation depending on the Lagrange multiplier . Besides, setting this parameter might be tricky. We propose an alternate strategy for constraining , which leads to the factorization model introduced in Section 5.1 and still builds over the idea of notch filters.

For we can write

 ˆΨe,a(v)=∥Pe,av∥22, (15)

where is a matrix defined by

 Pe,a[i,j]=−1∥ui−uj∥e2 ifi≠j, (16) Pe,a[i,i]=p∑j=1j≠ia∥ui−uj∥e2, (17)

. Therefore,

 ˆΨe,a(v)=vTQe,av, (18)

where

and is symmetric and positive. We consider the singular values decomposition (SVD) of

: . The diagonal values of are sorted in decreasing order. We note the vector made of these diagonal values, so that . Considering the reduced form , it is clear that minimizing promotes vectors correlated with

last eigenvectors. In the case of regular sampling with respect to

, these eigenvectors are the harmonics close to the notch frequency of . We can rewrite the functional accordingly:

 Ψ(A)=r∑i=1p∑j=1dei,a[j]⟨v,Vei,a[:,j]⟩2. (19)

It is clear from this expression that minimizing

enforces the selection of the eigenvectors associated with the lowest eigenvalues in the set

for describing ’s lines. This can be seen as a sparsity constraint over ’s lines with respect to the atoms ; yet, the small subset of atoms which will carry most of the information is somehow predefined through the eigenvalues . This is unsuitable if the notch filters parameters are poorly selected; on the contrary, one would like to select in a flexible way the atoms which fit the best the data.

Let suppose that we have determined a set of parameters so that the filters notch frequencies would cover the range of significant frequencies (with respect to ) present in the data. As previously, we note the eigenvector’s matrices associated with the operators . We note . Considering the preceding remark, we introduce the following problem:

 minα,S12∥Y−F(SαVT)∥2Fs.t.∥α[l,:]∥0≤ηl,l=1…r (20)

Now . Each line of is a sparse linear combination of ’s lines, and the ”active” atoms are optimally selected according to the data. The choice of the parameters and is discussed in a forthcoming section.

#### 5.2.4 A connection with graphs theory

In case , the matrix is the laplacian of an undirected fully connected and weighted graph with nodes , such that the weight of the vertex connecting a node to a node is [17]. As proposed in spectral graph theory [18], this gives a natural interpretation of (and ) eigenvectors as harmonic atoms in the graph’s geometry. Each line of the matrix can be seen as a function defined on a family of graphs determined by the observations locations, so that we enforce the regularity of ’s lines according to the graphs geometry. Our approach is thereby close to the spectral graphs wavelets framework[19]. However, the graphs wavelets are built on a single graph and a scaling parameter allows one to derive wavelets atoms corresponding to spectral bands of different sizes. In our case, the scales diversity is accounted for by building a dictionary of harmonics corresponding to different graphs. Indeed, as increases, the weight associated to the most distant nodes (in the sense of ) becomes negligible, which implies that the corresponding graph laplacian is determined by nearby nodes, yielding ”higher” frequencies harmonics.

### 5.3 The smoothness constraint on S

As previously mentioned, each PSF is a structured image. We can account for this through a sparsity constraint. This has proven effective in multiple frame PSFs super-resolution [8].

Since we do not estimate individual PSFs directly, we instead constraint the eigen PSFs which are ’s columns. Specifically, we promote ’s columns sparsity with respect to a chosen dictionary . By definition, a typical imaging system’s PSF concentrates most of its power in few pixels. Therefore a straightforward choice for is . In other words, we will enforce the sparsity of ’s columns in the pixels domain.

On the other hand, we take as the second generation Starlet forward transform [20], without the coarse scale. The power of sparse prior in wavelet domain for inverse problems being well established, we shall online emphasize the fact that this particular choice of wavelet is particularly suitable for images with nearly isotropic features.

### 5.4 Algorithm

We define the sets and . The aforementioned constraints leads us to the following optimization problem:

 minα,S12∥Y−F(SαVT)∥2F+r∑i=1∥wi⊙Φssi∥1+ιΩ1(α)+ιΩ2(S,α). (21)

where denotes the Hadamard product and denotes the indicator function of a set (see B). The term promotes the sparsity of columns with respect to . The vectors weight the sparsity against the other constraints and allow some adaptivity of the penalty, with respect to the uncertainties propagated to each entry of [8].

The parametric aspects of this method are made clear in the subsequent sections.

The Problem 21 is globally non-convex because of the coupling between and and the constraint. In particular, the feasible set , with is non-convex.

Therefore, one can at most expect to find a local minimum. To do so, we consider the following alternating minimization scheme:

1. Initialization: , with ,

2. For k = 0 …:
(a) ,

(b) .

The problem (a) remains non-convex; yet there exists heuristic methods allowing one to approach a local minimum

[21, 22, 23]. The problem (b) is convex and can be solved efficiently.

One can note that there is no positivity constraint in the sub-problem (a). This choice is motivated by two facts:

• the feasible set of (b) is non-empty for any choice of ;

• allowing to be outside of the global problem feasible set (for fixed) brings some robustness regarding local degenerated solutions.

There is an important body of work in the literature on alternate minimization schemes convergence, and in particular in the non-convex and non-smooth setting (see [24] and the references therein). In the proposed scheme, the analysis is complicated by the asymmetry of the problems (a) and (b).

We define the function

 H(α,S)=12∥Y−F(SαVT)∥2F+r∑i=1∥wi⊙Φssi∥1 (22)

and the matrix . One immediate sufficient condition for the sequence to be decreasing (and thereby convergent) is

 H(αk+1,ˆSk)≤H(αk,Sk) (23)

which occurs if stays sufficiently close to . Although we do not prove this always holds true, we observe on examples that the matrix in general only has a few and small negative entries for . This follows from the adequacy of the dictionary for sparsely describing ’s lines.

The complete method is given in Algorithm 1. The resolution of the minimization sub-problems is detailed in appendices.

### 5.5 Parameters setting

#### 5.5.1 Components sparsity parameters

We consider the terms of the form , where is the alternate minimization index and is the re-weighted minimization index. We first suppose that . We decompose as:

 wk,j=κβk,j⊙λk (24)

Let consider the minimization problems in in Algorithm 1. Assuming that we simply minimize the quadratic term using the following steepest descent update rule,

 Sm+1=Sm+μF∗(Y−F(SmAk))ATk, (25)

for a well chosen step size ,

being the adjoint operator one can estimate the entry-wise standard deviations of the noise which propagates from the observations to the current solution

. For a given matrix in , we assume that takes the following general form . We define as:

 F2(X)=[(M1⊙M1)X[:,1],…,(Mp⊙Mp)X[:,p]] (26)

We note the observational noise (or model uncertainty) that we assume to gaussian, white and centered. The propagated noise entry-wise standard deviations are given by

 Σk=μ√F2∗(Var(B))(ATk⊙ATk), (27)

where returns entry-wise variances and is the adjoint operator of . Now one can proceed to a hypothesis testing on the signal presence in each entry of based on [25], and denoise accordingly. For instance, we define the noise-free version of as follows:

 (28)

where

controls the false detection probability; the noise being gaussian, we typically choose 3 or 4 for

.

The sequence converges to a solution of the problem

 argminS12∥Y−F(SαkUT)∥2F+r∑i=1κ∥λk[:,i]⊙S[:,i]∥1, (29)

for . One can find some material on minimization schemes in Appendice C. This choice yields a noise-free but biased solution because of the thresholding; this is a well-known drawback of norm based regularizations. The purpose of the vector is to mitigate this bias[26]. is a vector with ones at all entries. At the step 6 in Algo 1, is calculated as follows:

 βk,j=11+|Sk|κλk, (30)

where all the operations are entry-wise and is the vector made of element-wise absolute values of entries. Qualitatively, this removes the strongest features from the norm terms by giving them small weights, which makes the debiasing possible; conversely, the entries dominated by noise get weights close to 1, so that the penalty remains unchanged.

For we follow the same rational. To set the sparsity in the transform domain according to the noise induced uncertainty, we need to further propagate it (the noise) through the operator . Formally, we need to estimate the element-wise standard deviations of

. Let consider the intermediate random matrix

. Assuming that

 F(F∗(.))=λId(.), (31)

’s lines are statistically independent. Therefore, within a given column of , the entries are statistically independent from one another. We deduce that the element-wise standard deviations of are given by

 Σk=μ√(Φs⊙Φs)F2∗(Var(B))(ATk⊙ATk). (32)

Then is obtained as previously and is calculated as

 βk,j=11+|ΦsSk|κλk. (33)

The property 31 is approximately true in the case of super-resolution.

#### 5.5.2 Number of components

We do not propose a method to choose the number of components . Yet, we observe that because of the sparsity constraint, some lines of the matrix at the step 8 in Algorithm 1 are equal to the null vector, when the number of components is overestimated. The corresponding lines in and subsequently the corresponding columns in are simply discarded. This provides an intrinsic mean to select the number of components. Thus in practice, one can choose the initial r as the data set dimensionality from the embedding space point of view, which can be estimated based on a principal component analysis.

#### 5.5.3 Proximity constraint parameters

In this section, we consider the functionals and especially the choice of the parameters and . Let assume that we have determine a suitable range for the parameters: for .

For a particular we consider the matrix and its eigenvectors matrix introduced in section 5.2.3. As previously stated, we want the weights matrix lines to be sparse with respect to ’s eigenvectors. In order to choose the parameters and initialize the weights matrix, we use the following greedy procedure. We consider a sequence of matrices , with . For we define

 Je,a(Ri)=maxk∈⟦1,p⟧∥RiVe,a[:,k]∥2, (34)

and we note the optimal eigenvector. We choose the couple of parameters as:

 (ei,ai)=argmax(e,a)∈SJe,a(Ri). (35)

and .

Regarding the set , we choose the interval and . This range allows the notch structure, assuming that ; for , behaves as a low pass filter. For , we observe that becomes a notch filter, with a notch frequency close to the null frequency for . As previously stated, determines the influence of two samples on one another corresponding coefficients in the matrix in the algorithmic process. According to Section 5.2.2, we set . Let consider the graph introduced in section 5.2.4. The higher is , the lower is connexity. Considering that we are looking for global features (yet localized in the field frequency domain), the highest possible value of should guarantee that the graph is connected. This gives us a practical upper bound for . Once is determined, we discretize this set, with a logarithmic step, in such a way to have more samples close to which correspond to low notch frequencies. We solve approximately Problem 35 by taking the best couple of parameters in the discretized version of . This step is the most computationally demanding, especially for large data samples.

#### 5.5.4 Weights matrix sparsity parameters

The parameters are implicitly set by the minimization scheme used at step 8 in 1. This is detailed in C.

## 6 Numerical experiments

In this section, we present the data used to test the proposed method, the simulation realized and comparisons to other existing methods for both dimensionality reduction and super-resolution aspects.

### 6.1 Data

The data set consists of simulated optical Euclid PSFs as in [8], for a wavelength of . The PSFs distribution across the field is shown on Fig. 3.

These PSFs account for mirrors polishing imperfections, manufacturing and alignment errors and thermal stability of the telescope.

### 6.2 Simulation

We applied different dimension reduction algorithms to a set of 500 PSFs located in the blue box on Fig. 3. We applied the algorithms to different observations of the fields, with varying level of white gaussian noise. For a discrete signal of length corrupted with a white gaussian noise , we define the signal to noise ratio (SNR) as:

 SNR=∥s∥22Nσ2b. (36)

### 6.3 Quality assessment

In astronomical surveys, the estimated PSF’s shape is particularly important; precisely, one has to be able to capture the PSF anisotropy. We recall that for an image

, the central moments are defined as

 μp,q(X)=∑i∑j(i−ic)p(j−jc)qxij (37)

with , are the image centroid coordinates. The moments and quantifies the light intensity spreading relatively to the lines and respectively. Now we consider the moment . We introduce the centered and rotated pixels coordinates defined by the system of equations

 xi,θcos(θ)+yj,θsin(θ)=i−ic (38) −xi,θsin(θ)+yj,θcos(θ)=j−jc, (39)

for some . Then we have

 μ1,1=∑i∑j[sin(2θ)2(−x2i,θ+y2j,θ)+(2cos2(θ)−1)xi,θyj,θ]xij, (40)

and in particular, . It becomes clear that quantifies the light intensity spreading with respect to the pixels grid diagonals.

The ellipticity parameters are defined as,

 e1(X)=μ2,0(X)−μ0,2(X)μ2,0(X)+μ0,2(X) (41) e2(X)=2μ1,1(X)μ2,0(X)+μ0,2(X). (42)

We define the vector . This vector characterizes how much departs from an isotropic shape and indicates its main direction of orientation. It plays a central theoretical and practical role in weak lensing based dark matter characterization [27].

Another important geometric feature is the so-called PSF size. It has been shown that the size error is a major contributor to the systematics in weak gravitational lensing surveys [28]. We characterize the size of a PSF as follows:

 S(X)=(∑i∑j((i−ic)2+(j−jc)2)xij∑i∑jxij)1/2. (43)

Assuming that a given PSF is a 2D discrete probability distribution, this quantity measures how much this distribution is spread around its mean

. Let note the set of ”original” PSFs and the set of corresponding estimated PSFs with one of the compared methods, at a given SNR. The reconstruction quality is accessed through the following quantities:

• the average error on the ellipticity vector: ;

• noting , the dispersion of the errors on the ellipticity vector is measured through the nuclear norm ;

• the average absolute error on the size: in pixels;

• the dispersion of the errors on the size: , in pixels.

### 6.4 Results

#### 6.4.1 Dimension reduction

In this section, we compare RCA to PCA, GMRA and the software PSFEx. We ran a PCA with different number of principal components between 0 and 15. 10 was the value which provided the best results. GMRA input was the data set intrinsic dimension [29], two, since the PSFs only vary as a function of their position in the field; we provided the absolute squared quadratic error allowed with respect to the observed data based on the observation noise level. For PSFEx, we used 15 components. Finally, RCA used up to 15 components, and effectively, 2 and 4 components respectively for the lowest SNR fields realization. As previously mentioned, we assess the components sparsity’s constraint:

• on the one hand we consider which enforces the components sparsity in pixels domain; this is referred to as ”RCA” in the plots;

• on the other hand, we take as the second generation Starlet forward transform [20], without the coarse scale; this is referred to as ”RCA analysis” in the plots.

One can see on the left plot in Fig. 4 that the proposed method is at least 10 times more accurate on the ellipticity vector than the other considered methods. Moreover the right plot shows that the accuracy is way more stable. This is true for both choice of the dictionary .

Fig. 5 shows that the estimated size is very sensitive to the choice of the dictionary . The results are by far more accurate with a sparsity constraint on the components in wavelet domain than in direct domain.

For a given estimate of the PSF at a given location, the error on the size parameter is more sensitive to errors on the core of the PSF (main lobe and first rings) and less sensitive to errors on the outer part of the PSF than one would expect regarding the error on the ellipticity vector. The error on the outer part of the PSF is essentially related to the observational noise, whereas the error on core of the PSF - which has a high SNR - is more related to the method induced bias. This explains why the PCA performs quite well for this parameter. On the other hand, the bias induced by the sparsity is not only related to the dictionary choice, but also to the underlying data model with respect to the chosen dictionary.

As previously explained, the components sparsity term is set in such a way to penalize any feature which does not emerge from the propagated noise, which is a source of bias. By using wavelets, we might recover features which are dominated by noise in pixel domain as long as the wavelet filters profile at given scale and direction, matches those features spatial structure. Thus, we expect less error on the reconstructed PSF’s core by using wavelets.

We might also consider two distinct ways of using sparsity for the components:

• we can model each component as , with sparse, which is known in the sparse recovery literature as synthesis prior;

• we can alternately constraint to be sparse.

This priors are equivalent if the dictionary is unitary [30]. Therefore the pixel domain sparsity constraint can be considered as falling into both framework. However, the two priors are no longer equivalent and potentially yields quite different solutions for overcomplete dictionaries.

We observe in practice that unless the simulated PSFs are strictly sparse with respect to the chosen dictionary - this includes redundant wavelet dictionaries, the synthesis prior yields a bias on the reconstructed PSF size, since the estimated PSFs are sparse linear combinations of atoms which are in general sharper than a typical PSF profile. The analysis prior is somehow weaker and appears to be more suitable for approximately sparse data.

We do not observe a significant difference between these methods with respect to the mean squared error, except for GMRA which gave noisier reconstructions.

We applied the aforementioned methods to the PSFs field previously used, with additional 30 corners PSFs and 30 localized PSFs as shown on Fig. 3 at an SNR of 40. This assess the behavior of the algorithms with respect to spatial clustering and sparse data distribution. One can see in Fig. 6 examples of simulated observed PSFs from different areas in the FOV.

For each of these observed PSFs, the reconstructed PSFs for each method are shown in Fig. 7.

One can observe that the proposed method gives noiseless and rather accurate PSFs reconstruction for both the center, the corners and the localized area of the field (see Fig. 3). One can also see that we fail to capture accurately the rings pattern in the corners and the localized area. The dictionary considered are not specifically adapted to curve-like structures. The ring patterns varies across the FOV but are locally correlated. Therefore, they can only be recovered where the PSFs are sufficiently dense and numerous, which is the case at the FOV’s center.

PCA and PSFEx yield a significant increase of the SNR in their estimated PSFs at the center and in the localized area. Yet, they fail to do so in the corners because of the lack of correlation for the PCA and local smoothness for PSFEx.

Finally, the poor results obtained with GMRA can be explained by the fact that the underlying manifold sampling is not sufficiently dense for the tangent spaces to be estimated reliably.

#### 6.4.2 Super-resolution

In this section, the data are additionally downsampled to Euclid telescope resolution. PCA and GMRA does not handle the downsampling. Therefore we only consider PSFEx and RCA in this section. For each method, we estimate an upsampled version of each PSF, with a factor 2 in lines and columns; in case of Euclid, this is enough to have a Nyquist frequency greater than half the signal spatial bandwidth [31].

As previously, RCA Analysis refers to the proposed method, with the dictionary chosen as the second generation Starlet forward transform [20], without the coarse scale; RCA LSQ refers to the proposed method with the dictionary chosen as the identity matrix, and the weight matrix simply calculated as

 ˆA=argminA12∥Y−F(ˆSA)∥2F, (44)

being the current estimate of the components matrix. Among all the methods previously considered for comparison, PSFEx is the only one handling the undersampling.

As for the dimension reduction experiment, the proposed method with chosen as a wavelet dictionary is at least one order of magnitude more accurate over the shape parameters and the mean square error. Besides, Fig. 10 shows that the proximity constraint over the matrix allows one to select a significantly better optimum than a simple least square update of . Indeed, regularizing the weight matrix estimation reinforces the rejection of ’s null space.