## 1 Introduction

The quest to solve decades old phase retrieval problem has led to numerous algorithms and methodologies. This is no surprise given the many applications of phase retrieval, including those in areas such as crystallography, optics, and imaging [millane, kim]. With an increase in the number of applications in various fields, the developed methodologies continue to increase in number, complexity, and efficiency. Note that a large number of methods in the literature have their roots in the seminal works by Gerchberg, Saxton, and Fienup [GS1, GS2, fienup1, fienup2, fienup3]. However, the Gerchberg-Saxton algorithm’s shortcomings in terms of finding the optimal solution in an efficient manner has resulted in numerous new directions. One such direction views the problem from a non-convex lens. In such a scenario, methodologies such as Wirtinger flow (WF), truncated Wirtinger flow (TWF), reshaped Wirtinger flow (RWF), and incremental truncated Wirtinger flow (ITWF) have all shown promise in addressing the problem in an efficient and accurate manner [candes, chen, TWF, RWF]. However, like most established phase retrieval algorithms, they struggle with classical parameter optimization such as determining the optimal step size. On a relevant note, data-driven approaches, such as deep learning techniques, have become immensely useful in recent years for handling complex data sets. Nevertheless, a prevailing issue with these purely data-driven approaches is gaining much more expressive power at the cost of losing interpretability. Although deep learning has been used to perform phase retrieval, it has primarily focused on designing neural nets for limited algorithms such as hybrid-input-output (HIO) and Fienup’s method [IntroDNN, Isil]. Such works are limited by their inability to deal with multiple types of models, as well as the more recent phase retrieval algorithms that are more complex in nature. Additional body work has been done using convolutional neural nets, such as prDeep [metzler], leading to a separate class of architectures not versatile enough to improve the existing algorithms.

The deep unfolding technique is a game-changing fusion of model-based and data-driven approaches. Specifically, it allows for designing model-aware deep architectures based on well-established iterative signal processing techniques. Deep unfolded networks have shown great promise in various signal processing applications [farsad2020data, bertocchi2019deep, khobahi2019deep, shlezinger2020deepsic, khobahi2019model, balatsoukas2019deep, shlezinger2019viterbinet, agarwal2020deep] and are a perfect example of hybrid models that can make use of the immense amounts of data along with utilizing domain knowledge of the underlying problem at hand. Moreover, they can take advantage of the expressive power of deep neural networks, while simultaneously taking advantage of the adaptability and reliability of model-based methods. This makes them an ideal candidate for problems such as phase retrieval, particularly in non-convex settings that struggle with bounding complexity.

In this paper, we propose model-aware deep architectures for the problem of phase retrieval. In particular, we focus on the different variants of the RWF algorithm that have recently shown immense promise in the context of phase retrieval.

## 2 System Model

The task of phase retrieval is concerned with recovering a complex or real-valued signal of interest, , from linear phase-less measurements of the form

(1) |

where the set of sensing vectors

, are assumed to be known*a priori*. Define the sensing matrix as

with the corresponding measurement vector given by . Then, by considering the least-square criterion, the task of recovering the signal of interest, , from the measurements vector, , can be expressed as [RWF]:

(2) |

Evidently, the problem of phase retrieval is non-convex, and, as described in the previous section, researchers have considered several methodologies to approach this problem. Most notable model-based approaches consider either a loss function of the form (

2) or an equivalent representation that usually involves higher-order variables. Then, convex methodologies can be utilized to reformulate the problem as a semi-definite program or one can resort to non-convex methods to tackle (2) directly. On the other hand, the existing data-driven approaches make use of the expressive power of deep neural networks and consider a conventional fully connected neural network (withdenoting the network parameters), and train it in a manner that the resulting network acts as an estimator of the true signal given the measurements vector

. Usually, such data-driven approaches require a large amount of data for training purposes, and more importantly, once trained, it lacks from the inherent interpretability that comes with the model-based approaches. Hence, in this paper we aim to bridge the gap between the model-based and data-driven approaches by proposing a novel model-aware deep architecture based on the well-established first-order optimization algorithms specialized for tackling (2). The resulting network can be seen as a hybrid model-based and data-driven first-order method that enjoys from the interpretability and versatility of model-based algorithms, and at the same time, offers the expressive power of deep neural networks. Moreover, due to the incorporation of the domain knowledge in the deep architecture, it has significantly less trainable parameters and requires much less data to be trained compared to the conventionally ‘bulky’ deep neural networks. To this end, we consider the Incremental Reshaped Wirtinger Flow (mini-batch IRWF) algorithm as a blue-print to design a model-aware deep architecture.The iterations of the IRWF algorithm for finding the critical points of the non-convex problem in (2) can be simply stated as follows: starting from a proper initial point (more on this below), the IRWF algorithm generates a sequence of points according to the following update rule:

(3) |

where is some positive step-size, and denotes the gradient of the objective function in (2) realized at the point , which is given by:

(4) |

where the function is applied element-wise and captures the phase of the vector argument; e.g., for real valued signals . Due to the non-convex nature of (2), the reconstructed signal can only be recovered up to a global phase difference, and hence, a proper metric to quantify the quality of the reconstructed signal (by performing iterations of the form (4)) can be defined as follows:

(5) |

where denotes the true signal. Note that for a real-valued *one-bit phase* scenario, the above metric becomes . Inspired by [RWF], we will focus on a real-valued phase retrieval scenario for the rest of this paper. However, the proposed method can be easily extended to complex-valued signals by proper transformations, that will be considered in a future publication.

Generally, the two critical aspects in tackling a non-convex optimization problem (aside from having a proper solver) are: *i*) the choice of the initial starting point for the iterative optimizer, and *ii*) a proper step-size design scheme such that it guarantees the convergence of the sequence to a critical point and provides the ability to control and optimize the convergence factor of the underlying iterative solver. Note that the *rate of convergence* of a first-order method cannot be improved unless by resorting to the higher order information. However, the *convergence factor* of such methods can be enhanced and improved by properly tuning the step-sizes resulting in accelerated iterations. A popular choice for obtaining a good initial point for the problem of phase retrieval is known as the spectral method in the literature [candes]. In this paper, we adopt the alternative initialization proposed in [RWF] which benefits from a lower-complexity than that of the spectral method. In particular, the starting point is initialized as , where , and

is the leading eigenvector of the matrix

.In the following, we present our deep *U*nfolded *P*hase *R*etrieval (*UPR*) framework— a *model-aware* deep architecture specifically tailored for the problem of phase retrieval, based on the RWF algorithm.

## 3 UPR: The Proposed Framework

In this section, we present the proposed hybrid model-aware and data-driven deep architecture for the problem of phase retrieval. Particularly, we consider the iterations of the form (4

) as a base-line and unfold them onto the layers of a deep neural network. Before we present the proposed methodology, we first introduce some general concepts from the theory of first-order mathematical optimization.

Iterative optimization techniques are a popular choice for both convex or non-convex programming. In particular, first order methods are among the most popular and well-established iterative optimization techniques due to their low per-iteration complexity and efficiency in complex scenarios. However, first-order methods generally suffer from a slow speed of convergence and predicting the number of iterations required for convergence is generally a difficult task. As a result, they are not ordinarily suitable for real-time signal processing applications. Consequently, it is natural to consider fixing the total number of iterations of such algorithms along with seeking to optimize the parameters in the iterations that result in the best improvement in the underlying objective function at hand, while allowing only iterations. Thus, our goal is to improve the existing first order methods by *meta optimizing* the IRWF iterations when the total computational budget is fixed (i.e., allowing only iterations of the form (4)). In order to do so, we formulate the meta-optimization problem in a deep learning setting, and interpret the resulting unrolled iterations as a neural network with layers where each layer is designed to imitate one iteration of the original iterative optimization method. Eventually, such a deep neural network can be trained using a small data-set and the resulting network can be used as an enhanced first-order method for solving the underlying problem at hand.

Consider the minimization of an objective function and let be a parameterized mapping operator defined as

(6) |

where denotes the identity operator, is a positive definite matrix, and denotes the set of parameters of the operator . Then, most of the first-order optimization methods can be represented in terms of the above mapping operator and by considering an updating rule of the form:

(7) |

Specific to our problem is the fact that one can obtain the iterations of the form (4) by setting for and replacing the gradient in (6) with the gradient of the quadratic loss function defined in (4), and by following the above updating rule. Generally, for a given set of (pre-conditioning) positive definite matrices , the iterations of the form (7) can be seen as a pre-conditioned gradient-descent method. In this paper, we focus our attention to learning a set of pre-conditioning matrices where each matrix has a diagonal structure with positive entries. Particularly, we consider , where , . In such a setting, given an initial starting point , performing iterations of the form (7) corresponds to the following composite mapping:

(8) |

where represents the overall set of parameters of the mapping operator. From another perspective, the above composite mapping can be seen as a deep neural network with -layers and as its input. The output of such a deep architecture is the estimated point after performing iterations of the underlying iterative optimization algorithm. Thus, the training of such a model-aware deep architecture corresponds to learning the pre-conditioning matrices resulting in an accelerated first-order method. For the training, we first fix the class of the underlying objective function, i.e. by fixing the measurement matrix in our case. Then, we generate a set of observations from some known vectors , and seek to learn the parameters of the network by minimizing the distance between the output of the network and the optimal points of the fixed class of the objective function. Note that the set of points are the global minimums of the underlying objective function .

In light of the above description, we now present the mathematical structure of the proposed UPR architecture. We define the computational dynamics of the -th layer of the UPR architecture as follows:

(9) | ||||

(10) |

where , for some large , represents an smooth approximation of the function to allow for back-propagating the gradients during the training, is the input to the -th layer, and . Hence, the dynamics of the overall network with layers will be the same as (8). We consider the training of the proposed architecture via the following optimization problem:

(11) |

where denotes the total number of training points, the training points are generated from the data-acquisition model in (1), and the initial points are generated using the initialization method described in Section 2.

## 4 Numerical Results

In this section, we investigate the performance of the proposed UPR framework for the task of phase retrieval through various numerical experiments. We implemented the proposed UPR architecture using the library[paszke2017automatic]. In addition, for training purposes, we utilized the Adam optimizer [kingma2014adam] with a learning rate of . We consider the Empirical Success Rate (ESR) and the average relative error for comparison purposes. Specifically, the ESR metric is defined as the number of successful trials out of attempts, where a successful trial constitutes obtaining , and we define the relative error as . In all the simulations, we consider a UPR architecture with layers and also we compare our results with the state-of-the-art mini-batch IRWF algorithm. It should be noted that for a fair comparison, we use the same parameters for the min-batch IRWF method as reported in [RWF], and consider performing iterations of it. Both methods are initialized using the spectral method defined in the previous sections. The UPR architecture was trained on a small training data-set of size with and for generating both test and training data-sets. The simulations provided in this section are based on evaluating the network on a test data-set that was not shown to the network during the training.

Fig. 1(a) shows the empirical rate of success for the mini-batch IRWF and the UPR architecture with respect to . In addition, Fig. 1(b) demonstrates the convergence rate of the proposed method and the mini-batch IRWF algorithm, averaged over successful trials for and . Finally, Fig. 1(c) is a plot of the average relative error with respect to .

It is evident from Fig. 1(a) that the proposed method outperforms the state-of-the-art mini-batch IRWF in terms of ESR. Specifically, it can be seen that the UPR quickly achieves a very high ESR as compared to its counterpart even for a small number of measurements . It was shown in [RWF] that IRWF operates as the best amongst various well-performing algorithms for phase retrieval and the mini-batch IRWF outperforms them all numerically. Thus, the performance of the UPR architecture is presumably superior to those algorithms as well.

The average relative error versus the number of iterations (layers) for both algorithms is presented in Fig. 1(b). It can be observed that the proposed method significantly outperforms the mini-batch IRWF algorithm in terms of the speed of convergence, and achieves a very low relative error quickly. It was shown in [RWF]

that the mini-batch IRWF algorithm converges with a high probability along with a fewer number of iterations as compared to other methods. Hence, our methodology clearly improves upon the performance of the underlying algorithm by learning the proper pre-conditioning matrices. It should be noted that the ability of the mini-batch IRWF algorithm to converge with a high probability would enable our deep unfolding method to converge even faster. The average relative error versus the number of measurements

with is illustrated by Fig. 1(c). Again, it can be seen that the UPR architecture outperforms the mini-batch IRWF in terms of accuracy and achieves a very low relative error even for a small number of measurements.It should also be mentioned that one of the many pitfalls of non-convex approaches is the difficulty in showing a newly devised algorithm can consistently converge. The purpose of this paper was to show the potential and value of model-based deep learning techniques for the phase retrieval problem. However, although it is well positioned to improve performance, the resulting model-aware deep architecture relies heavily on such convergence guarantees to function properly, which is the case with mini-batch IRWF algorithm.

## 5 Conclusion

We considered the problem of phase retrieval and proposed a novel hybrid model-based and data-driven deep architecture, the *UPR* framework, showed that it significantly outperforms the state-of-the-art model-based algorithms. The proposed deep architecture enjoys from a relatively small number of trainable parameters compared to other deep learning-based methods and not only benefits from interpretability and versatility of model-based algorithms, but also from expressive power of data-driven methods.

Comments

There are no comments yet.