Blaschke Product Neural Networks (BPNN): A Physics-Infused Neural Network for Phase Retrieval of Meromorphic Functions

11/26/2021
by   Juncheng Dong, et al.
0

Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain. In many cases, only the magnitude of these functions are observed on various points on the purely imaginary jw-axis since coherent measurement of their phases is often expensive. However, it is desirable to retrieve the lost phases from the magnitudes when possible. To this end, we propose a physics-infused deep neural network based on the Blaschke products for phase retrieval. Inspired by the Helson and Sarason Theorem, we recover coefficients of a rational function of Blaschke products using a Blaschke Product Neural Network (BPNN), based upon the magnitude observations as input. The resulting rational function is then used for phase retrieval. We compare the BPNN to conventional deep neural networks (NNs) on several phase retrieval problems, comprising both synthetic and contemporary real-world problems (e.g., metamaterials for which data collection requires substantial expertise and is time consuming). On each phase retrieval problem, we compare against a population of conventional NNs of varying size and hyperparameter settings. Even without any hyper-parameter search, we find that BPNNs consistently outperform the population of optimized NNs in scarce data scenarios, and do so despite being much smaller models. The results can in turn be applied to calculate the refractive index of metamaterials, which is an important problem in emerging areas of material science.

READ FULL TEXT VIEW PDF
06/16/2020

Real-time 3D Nanoscale Coherent Imaging via Physics-aware Deep Learning

Phase retrieval, the problem of recovering lost phase information from m...
10/14/2021

Physics informed neural networks for continuum micromechanics

Recently, physics informed neural networks have successfully been applie...
08/08/2019

NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data

We propose a neural network based approach for extracting models from dy...
11/17/2017

A unified deep artificial neural network approach to partial differential equations in complex geometries

We use deep feedforward artificial neural networks to approximate soluti...
04/18/2022

PR-DAD: Phase Retrieval Using Deep Auto-Decoders

Phase retrieval is a well known ill-posed inverse problem where one trie...
11/24/2017

PhasePack User Guide

"Phase retrieval" refers to the recovery of signals from the magnitudes ...
02/21/2018

Fast flow-based algorithm for creating density-equalizing map projections

Cartograms are maps that rescale geographic regions (e.g., countries, di...

1 Introduction

Numerous physical systems are described by ordinary or partial differential equations whose solutions are given by holomorphic or meromorphic functions in the complex domain, where and are respectively the magnitude and phase of . In many cases, only , the magnitude of , is observed on a finite set of points () of the purely imaginary -axis111For readers unfamiliar with phase retrieval problems and have questions about why function values are only observed on the purely imaginary line, please refer to the explanation we have included in the Appendix., since the coherent measurement of their phases may be expensive (see Ch. 13 of PhaseRetrieval2016) or require significant expertise (king2009hilbert). However, it is desirable to retrieve the lost phase from measurements of the magnitudes for

, since the phase often encodes important information. For example in both spectroscopy and optical imaging, estimation of

is often not possible. However, phase information determines the optical constants in the spectroscopy of materials (lucarini2005kramers-kronig), and is more important than in optical imaging (PhaseRetrieval2016). The phase retrieval problem is also important in a number of other disciplines, including holography (Fienup1982), x-ray crystallography (Nugent2007), astronomy (stark1987image), blind deconvolution (Shechtman2015), and coherent diffractive imaging (Latychevskaia2018).

Many techniques for phase retrieval have been developed in the literature. The Kramers–Kronig formula, see lucarini2005kramers-kronig (also known as Sokhotski–Plemelj formula), is noteworthy and is derived by the application of the Hilbert Transform to . Unfortunately, for phase retrieval, this elegant formula requires knowledge of the magnitude along the entire -axis. However, as mentioned above, acquiring these values is often impossible since is typically a finite set with a small number of elements. This severely limits the application of Kramers–Kronig’s formula in many applications. Another approach is based on compressed sensing, but it requires sparsity assumptions, and stringent assumptions on the design or projection matrix (i.e., Restricted Isometry Property (RIP) (candes2008restricted), mutual incoherence (foucart2013invitation), etc) that often do not hold for meromorphic functions encountered in practice.

To overcome the above limitations, phase retrieval can also be cast as a data-driven regression problem, where we wish to learn a mapping from the magnitude of a signal, , to its phase, . If the number of frequencies () measured is

then the phase can be represented as a vector of complex numbers

, where is the phase at -th frequency. Similarly, the signal’s corresponding magnitude vector can be given by , and the goal is to infer based upon . We can then learn this mapping using a data-driven model (see Figure 1 below for an example).

Figure 1:

Example of the deep neural network approach to phase retrieval problem. FCNN refers fully-connected neural network here but can be replaced by any machine learning model. The magnitude is encoded into an input of size

while the output is an -dimensional complex vector of corresponding phases.

With a sufficient amount of training data, it is likely that a universal function approximator (i.e. neural network), can learn the underlying function well within a tolerance range. For instance, deep learning has shown great success on phase retrieval problems in the imaging area where abundant training data is available 

(sinha2017lensless; metzler2018prdeep). But sufficient training data can be expensive to acquire in many practical applications, or sometimes even impossible.

In this paper, we depart from the existing literature by exploring the applications of physics-infused neural networks to this setting. The infusion of physics reduces the number of required training examples to a small enough amount that is practical for many measurements and design applications of interest. The main idea of the approach is described next. Given the fact that any phase value has a magnitude of 1 (i.e. for all ), we observe that conditions of a Theorem of Helson and Sarason are then satisfied indicating that can be approximated by a rational function whose numerator and denominator are both Blaschke products (Garcia). This is then used to design a neural network for calculating coefficients of these Blaschke products which may be be used for phase retrieval.

In Section 2, we briefly review conventional phase retrieval methods, and existing applications of the neural network to phase retrieval problems. In Section 3, we briefly review Blaschke products and the Helson-Sarason Theorem. In Section 4, we propose our approach for baking these results into neural networks resulting in Blaschke Product Neural Networks. In Section 5, we provide experimental results on various synthetic and metamaterials datasets along with a inclusive set of baselines to demonstrate the performance improvements of our construction.

2 Related Work

We first briefly review existing methods of phase retrieval applicable to phase retrieval of analytic and meromorphic functions of interest.

Perhaps the most important result in this area is the elegant calculation of Kramers and Kronig (also known as Sokhotski–Plemelj formula) obtained by the application of the Hilbert Transform to and is given by

where P.V. denotes Cauchy’s principal value. We note that this can also be derived by invoking the Cauchy integration formula along a path containing the imaginary -axis.

However, there is a major obstacle in the application of this elegant formula. In many important practical scenarios, the set of frequencies (points on -axis) that we have measurements of has a small cardinal number. In such cases, the approximation of Kramers-Kronig integral by a Riemann Sum produces unsatisfactory results. Other approaches used to analytically extend or extrapolate have also demonstrated limited success  (Chalashkanov2012).

Another approach to phase retrieval is by sparse representation. This approach assumes that it is a priori known that has a sparse representation in a known dictionary of functions i.e. it can be written as a linear sum of elements of with only a small number of non-zero coefficients. Under this assumption, many techniques are developed to recover these coefficients and the phase of . Again, this is an interesting proposal but the underlying assumptions may not hold in many problems of interest. In particular, neither a dictionary may be known nor classes of functions of interest may be sparse in any fixed dictionary. Recently, some new techniques based on deep learning have been developed to learn the structure of the underlying signal which we want to recover its phase. That is, deep neural networks are used to encode the prior knowledge on the structure of the signal instead of hard-coded assumptions such as sparsity (jagatap2019phase). However, these methods require a significant number of training samples from the magnitude of the underlying signal, which limits the applicability of these approaches on material science applications.

Recently, DL has demonstrated its capability to solve the phase-retrieval problem, especially in the imaging area: sinha2017lensless achieved lensless imaging for phase objects utilizing CNN. xue2019reliable

constructed a bayesian convolution neural network to achieve phase imaging while quantifying the uncertainty of the network output.

metzler2018prdeep combined regularization-by-denoising framework with a convolution neural network for noisy phase retrieval for images. goy2018low achieved low photon count phase retrieval with a Convolutional Neural Network(CNN) as well. To avoid the difficulty of getting a large set of paired phase and intensity measurements, Zhang et al. zhang2021phasegan used an unpaired dataset for phase-retrieval. Other works of tayal2020inverse; houhou2020deep have also used deep learning for end-to-end phase recovery. None of these works use the underlying physics in their deep learning approach or they require a large number of training samples; hence, are not directly applicable to the physics applications we are interested in this paper. We took a different path of applying physical prior knowledge to our phase retrieval process.

One closely related line of work to ours is solving partial differential equations using neural networks (dissanayake1994neural; berg2018unified; zhang2019bilinear; chen2018neural). However, these methods typically need a lot of data and require careful selection of hyper-parameters; additionally, they do not consider directly solving the phase retrieval problem. Our approach, on the other hand, needs no specific hyper-parameter search algorithm, and achieves high-accuracy result with a very limited training samples.

3 Mathematical Background

We briefly review the mathematical background required in the rest of this paper. In particular, we review Blaschke products and the Helson-Sarason Theorem. We note that the standard presentation of the Blaschke products is motivated by inner functions in Hardy spaces presented over the unit disk whose boundary is the unit circle (Rudin; Garcia). In our case, we are interested in the right hand half plane whose boundary is the -axis. These two are equivalent using the complex Mobius transformation . In this light, our exposition is closer to Akotuwicz.

Let be an infinite sequence of numbers for which for all for which

then for any integer , any complex function given by

is referred to as a infinite Blaschke Product. If the set is finite, is referred to as a finite Blaschke product. For a finite Blaschke Product, it is clear that since there are only finite number of coefficients , and therefore this condition can be dropped. Note that roots with do not contribute to the product.

For both finite and infinite Blaschke products, we can see that for on the purely imaginary axis 222Details proving can be found in the appendix. In this paper, we are only interested in finite Blaschke products because of the following approximation Theorem of Helson and Sarason.

Theorem 1

Let be any continuous function on the -axis for which for all . Let . Then there exists finite Blaschke products and such that

where the denotes the standard norm computed over the axis.

Let . We know that because . Applying the Helson-Sarason Theorem, for analytic or meromorphic functions (with no poles on the imaginary axis), the phase can be approximated by a rational function of finite Blaschke products. The main theme of this paper is that these Blaschke products can be in turn computed using neural networks with a small amount of data.

From the expression of finite Blaschke products, we can see that for where a complex number with and is a complex polynomial of and , where is constructed from by replacing all its coefficients with their respective conjugates. Note that . Let and represent the Blaschke products and in the above theorem with the complex polynomials, we have the approximating rational function written as with Thus we have the estimation for phase function as:

(1)

where and and correspond to respectively factors of with roots with positive and negative real parts.

3.1 Blaschke Product Neural Network

Inspired by the above, we model the phase function by equation 1, and propose the Blaschke Product Neural Network (BPNN). A BPNN is a neural network with the magnitude of observations as the input and the coefficients of the rational function of Blaschke products as the output. In this paper, we only present BPNNs which are fully connected neural networks (FCNNs), although such a restriction is not necessary. We next describe the operation of BPNNs by discussing the forward and backward pass mechanisms.

Forward Pass

The output of BPNN are Blaschke coefficients of the complex polynomials that are in turn used to construct predicted phase function according to equation 1. Without loss of generality, let be the set of frequencies for which experimental results measuring the magnitude of signals is performed, and their corresponding phases must be retrieved. It can be assumed that . The BPNN then produces estimates of the phase . Combining with the known magnitude at , this gives an estimate

Backward Pass

After predictions are generated with forward pass , the prediction error is computed as

The training seeks to minimize

over all training samples in a standard manner using the stochastic gradient descent algorithm and its variants. Cross-validation and testing can also be done in a standard format.

An important hyper-parameter of interest is number of degrees for complex polynomials and . This hyper-parameter can either be cross-validated or chosen based on the practitioner’s knowledge about the complexity of the specific phase retrieval problem. In our experiments, we found that complex phase retrieval problems may require larger degree polynomials, and calculating the value of these high degree polynomials over large frequency ranges (100THz - 1500THz and above) may lead to the overflow problem. To overcome this problem, we propose a piece-wise implementation of BPNN.

Piece-wise Implementation of BPNN

In this approach, given a large frequency range, the whole frequency band is partitioned into non-overlapping contiguous segments. The Blaschke method is then applied to each segment (see figure1(b)). The BPNN is then trained to output the respective Blaschke coefficients for each segment. During training, the predictions of phases on all segments are concatenated. Then the following training is the same as non-piecewise BPNN. The exact method of partitioning and degrees of corresponding Blaschke polynomials for each sequence are hyper-parameters of choice. We have two example applications of piece-wise BPNNs to metamaterial datasets in our numerical experiments.

(a) Illustration of BPNNs
(b) Illustration of Piece-wise Implementation of BPNN
Figure 2: Structure of BPNN (and piecewise BPNN): BPNN is an end-to-end model. Predictions are constructed by the combination of the Blaschke coefficients output of FCNN and values of input magnitude. In the piece-wise BPNN case, FCNN generates the sets of Blaschke coefficients for all segments (partitions) of the frequency range, and predictions are made for each segment separately and the results are concatenated.

4 Experiments

We conduct simulations on five phase retrieval datasets, two of which are synthetic, while the remaining three are metamaterial datasets. A detailed listing of the datasets is shown in Table 1. We endeavor to test the BPNN on diverse, yet complete, datasets, and thus we choose datasets that include both holomorphic and meromorphic complex functions. Additionally, the selected metamaterial datasets are comprehensive, and well represent the discipline.

4.1 Datasets

PHASE RETRIEVAL PROBLEM TYPE # FREQUENCY
Polynomial ODE (4.1.1) Synthetic 200
Lorentzian response function (A.7) Synthetic 1000
Metamaterial absorber (4.1.3) Metamaterial 1000
Huygens’ metasurface (4.1.4) Metamaterial 2001
Membrane metasurface (4.1.5) Metamaterial 2001
Table 1: List of phase retrieval problems in Experiments section. (# Frequency) shows total number of frequency points considered for the problem. Please see relevant sections for detailed descriptions.

4.1.1 Synthetic I: Polynomial ODE

We first select points on the

axis and fix these values. The first synthetic dataset is the solution to a ordinary differential equation (ODE) given by:

where is a complex polynomial of degree with its roots lying on the unit circle. We uniformly sample roots of on the unit circle, then solve the above ODE on the -axis using the Euler method with the real and imaginary part of the initial condition uniformly selected in for randomly generated initial values. The values are then calculated and their magnitudes and phases are used in this experiment, of which are used for training and for testing.

4.1.2 Synthetic II: Lorentzian Response Function

The second synthetic dataset is generated by considering a causal model of a non-magnetic material, with a frequency-dependent complex dielectric permittivity defined as a sum of Lorentzian oscillators. A simple Lorentzian model of the dielectric function is applicable to a wide range of resonant material systems with response functions that must obey Kramers-Kronig relations. Using the transfer-matrix method, Markos2008, the complex transmission coefficient can be computed directly from the dielectric function via closed form equations.(Please see appendix for more details). We generate a dataset of complex transmission coefficients by uniformly sampling Lorentzian parameters for the frequency range 100-500 THz. We consider a fixed set of four Lorentzian oscillators, and generate 50 sample transmission spectra for training, and 2000 spectra for the test dataset.

4.1.3 Metamaterial I: Metamaterial Absorber

Metamaterials are structured materials consisting of periodic arrays of resonant elements that derive their spectral properties from their unit cell geometry smith2000composite. With effective material properties that can consist of resonances in both the electric permittivity and magnetic permeability, and hence more unknown material parameters, metamaterials often represent a more complicated phase-retrieval problem than many conventional systems.

The first metamaterial dataset considers a metal-based absorber geometry with a metallic ground plane backing a dielectric spacer layer, designed to operate in the millimeter wave (50-200 GHz) frequency range. This metamaterial absorber (MMA) dataset has 55,000 pairs of MMA geometry and complex frequency dependent reflection coefficient ( is zero due to the metal backing). The data are obtained with commercial computational electromagnetic simulation software (CST Microwave Studios).

4.1.4 Metamaterial II: Huygens’ Metasurface

Huygens’ metasurfaces made of dielectric materials represent another class of metamaterials decker2015high, with underlying properties that are different than their metallic counterparts, including for example the lack of Ohmic losses liu2017experimental. When individual designs are combined into supercells, all-dielectric metasurfaces can exhibit very rich spectral phenomena. The Huygens’ metasurface dataset consists of the complex reflection and transmission response derived from a supercell of elliptical resonators adopted from the work of deng2021neural. The phase information can be inferred directly from the complex reflection or transmission. Out of a total of 1000 samples, we randomly sample 80 for training and reserve the rest for testing.

4.1.5 Metamaterial III: Membrane Metasurface

The membrane metasurfaces are similar to the Huygens’ metasurfaces. However, instead of having four stand-alone elliptical resonators in the supercell, it consists of a thin slab of SiC with regions of empty holes. The SiC slab filling the supercell has a thickness defined by , and a periodicity . In the SiC slab, we placed four elliptical holes where the elliptical SiC resonators were placed. The four holes thus have the same geometrical parameters , , and . The fourteen geometrical parameters share the same range as the elliptical resonators dataset. Similarly, the input of these datasets contains fourteen geometry parameters, and the outputs are complex reflection and transmission with 2001 frequency points range from 100 THz to 500 THz. Out of a total of 7331 samples, we randomly sample 80 for training and reserve the rest for testing.

4.2 Benchmarking Neural Networks

Comparison is done on each dataset with training datasets of increasing size. For each phase retrieval problem and each training dataset, we compare BPNN to a family of NNs of various sizes and structures. For each structure, we search its hyper-parameters of training to guarantee we have NNs close to optimal. And we record test error at each training epoch and report the best test error. We note that recording test error is uncommon but this guarantees that the reported performance is close to its limit.

We use NNs of various structures and all hidden layers of each of the NNs have the same width. The range of hidden layer width in search is [32,64,128,256] and the number of hidden layers in search is [1,2,3,4]. For hyper-parameters of training, we search dropout rate in the list of [0., 0.05, 0.1, 0.15, 0.2] and learning rate in the list of [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]. Each NN is trained with 6000 epochs after which training losses of all neural networks have converged. The training is performed with 3 random initialization and we report the best test error. Huygens’ metasurface dataset is more complex than the other datasets, so the range of hidden layer size in search is increased to [64,128,256,512] while keeping the ranges of the other searched parameters the same. Unlike the extensive searching for NNs, we do not search hyper-parameters of BPNN and only use one BPNN for each phase retrieval problem. We also train BPNN with 3 random initialization on each dataset and report its best test error. We use ReLU as activation function for both NNs and BPNNs while Adam

(kingma2017adam) is used for training of all NNs and BPNNs.

4.3 Results

For each dataset, we compare the family of regular fully-connected neural networks, each with its best searched hyper-parameters to BPNN without hyper-parameters search. For three meta-material datasets, we further include four baselines: (1)The Kramers-Kronig method (KK-method): a conventional phase retrieval method (lucarini2005kramers-kronig), (2)AAA-network algorithm: a established rational approximation technique, similar to BPNN we used its functional form to approximate the phase function (AAA), (3)MUSIC algorithm: a data-driven function approximation method that estimates functions as sum of exponentials (MUSIC), and (4) Linear+BP: using linear regressor instead of neural networks in BPNN.(Please see Appendix for more details about KK-method, AAA algorithm and MUSIC algorithm.) All y-axes are plotted with a dB scale in base 10 ( where is the MSE error) for visual clarity. For the legends in Fig. 3

we use entries of the form (A, B), where A refers to the width (i.e., number of neurons) in each hidden layer of the neural network being evaluated, and B refers to the depth of the neural network (i.e., number of hidden layers).

Performance of Baselines We note that both the KK and MUSIC methods exhibit poor performance, because the KK method requires a large number of magnitude measurements to achieve accurate estimates, while the MUSIC method expects that the target function is sparse in some basis, which cannot be assumed for our problems. The Linear+BP model also performs substantially worse than the the BPNN, suggesting that the relationship between the phase magnitude an the Blaschke Product coefficients is non-linear. Although representing the phase function with a Blaschke Product decreases the complexity of the problem, mapping from magnitude to the Blaschke coefficients still appears to be beyond the capability of a linear regressor. In the following paragraphs we describe results of each dataset in detail.

Synthetic Datasets We use BPNN of 4 roots with a fully-connected neural network of 2 hidden layers, each of size 64 and no dropout. It is obvious from figure[3] that BPNN outperforms a family of optimized neural networks by a large margin. This supports our theoretical justification of using BPNN when the phase functions are precisely meromorphic functions. The outstanding performance of BPNN on the Lorentzian dataset implies its value on phase retrieval problems in metamaterials. To confirm this, we conduct more experiments on real metamaterial datasets.

Figure 3: Performance comparison of a number of neural networks of different structures, each with its best hyper-parameters, and BPNN on two synthetic phase retrieval problems with training set of various size.
Figure 4: Performance comparison of family of neural networks, each with its best hyper-parameters and BPNN (without hyper-parameter search) on MMA dataset along with four other baselines: the KK method, linear regressor+Blaschke Product, MUSIC algorithm and the AAA method.

MMA, ADM, Membrane

For the MMA dataset, similar to synthetic datasets, we also use BPNN with 4 roots while using a fully-connected neural network with 2 hidden layers of size 64. No dropout nor weight-decay was added. Although without any regularization techniques for neural networks, BPNN still demonstrate superior performance to all the optimized neural networks. Given that ADM dataset and Membrane dataset are much more complex than the synthetic datasets and MMA dataset, we use a Piecewise BPNN with frequencies partitioned into 20 equal-length sequences and we assign each sequence with 3 roots. We note that this is the most naive split for Piecewise BPNN and a more reasonable split can lead to improved performance. We find that the best errors of regular neural networks are close to BPNN’s best error. However, we also find that the median error of BPNN is much smaller than the median error of the best neural network (the neural network with smallest test MSE error across all architectures). This shows that the BPNN is making continued good predictions through the sacrifice of small amounts of poor predictions. This property can be valuable when the prediction problem at hand is the on-or-off type due to BPNN’s higher probability of making good predictions.





Figure 5: On the left is the performance comparison of family of neural networks, each with its best hyper-parameters and BPNN along with four other baseline methods. On the right is the comparison of median MSE loss of the best neural network in the large family and median MSE loss of BPNN. (On top are performance comparisons on Huygens’ metasurface dataset; At bottom are performance comparisons on membrane metasurface dataset.)

5 Conclusion

We presented a physics-infused neural network referred to as Blaschke Product Neural Network (BPNN) for phase retrieval problem of holomorphic and meromorphic functions. Through extensive experiments on synthetic and metamaterial datasets, we have found that the data required for training neural networks is significantly less than expected when the underlying functions are holomorphic or meromorphic. By baking the physics into neural networks, even less training data is required and the performance is further improved.

6 Reproducibility Statement

We upload four datasets. Due to restriction of maximum supplementary size of 100MB, we include three datasets in supplementary: two synthetic datasets and Huygens’ metasurface dataset. Membrane metasurface can be accessed with this link (https://figshare.com/s/966f196d1847269a447c

). Details about hyper-parameter search space and the process of searching can be found in the experiment section. Included in the supplementary we also have our implementation of the BPNN and neural networks by Pytorch with easy option of the hyper-parameters.

Acknowledgments

This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-21-1-2590. YD, OK, SR, JM, and WJP acknowledge funding from the Department of Energy under U.S. Department of Energy (DOE) (DESC0014372).

References

Appendix A Appendix

a.1 Observations of phase function values

Here we provide a short explanation for the reason why the function values are only available on the imaginary line. In physics, electrical engineering and other fields, solutions to many linear PDEs with constant coefficients, and also many other measurements are done in the frequency domain. This in principle is achieved by multiplying a real signal x(t) by sinusoidals

and , and integrating over time (Calculating Cosine and Sine Transforms) for various values of . If x(t) is a real physical signal, these cosine and sine transforms are real and imaginary parts of

, the complex Fourier transform of x(t). Thus

gives the value of over the imaginary axis.

a.2

We provide the details proving the norm of Blashke product is 1 if . To show that it has a norm of one, we should show that all component (that are multiplied) would have a norm of one individually:

a.3 Hyper-parameters of Piecewise BPNN

In order to provide analysis and some insights on hyper-parameters of Piecewise BPNN, we conduct experiments on the two most complex datasets, Hyugens’ metasurface dataset and membrane metasurface dataset, to have empirical results regarding the effect of hyper-parameters (e.g number of segments and number of roots per segment) on performance. We use training dataset of size 50 for both datasets and fix BPNN’s neural networks to have 2 hidden layers and hidden size of 64 for each layer. From figure[6] we can see that Piecewise BPNN’s performance is robust to hyper-parameters: a wide range of different settings of hyper-parameters can generate satisfactory results as long as the total number of roots is above the threshold necessary. Similarly, the Piecewise BPNN is also robust to overfitting caused by high number of roots: while the optimal total number of roots appears in the range of 60-80, overshooting total number of roots to 160 or even 200 doesn’t cause a sharp increase in test error. We believe this is a strong indication that the Piecewise BPNN learns the necessary complexity and thus zeros out the unnecessary degrees

Figure 6: On the left is the heatmap of minimum test MSE (mean squared error) of Piecewise BPNN with different hyper-parameter configurations on Hyugen’s metasurface dataset. On the right is the heatmap of minimum test MSE of Piecewise BPNN with different hyper-parameter configurations on membrane metasurface dataset. Each minimum MSE is calculated with 3 runs of training from different initialization.

a.4 Baseline model 1: Kramers–Kronig

The Kramers–Kronig relations, implied by causality of a stable physics system, are often used to convert between imaginary parts and real parts of physical functions such as dielectric functions, susceptibility, and reflection coefficient. We adopt the notation from grosse1991analysis, since the metamaterials problems included in the main text focus on the phase retrieval tasks on spectral data, including reflectance. In general, the Kramers–Kronig relations for complex reflection coefficient are given by:

which and are real and complex part of reflection coefficient respectively.

In practice, and are often unknown. The reflectance is the physical quantity that is directly measurable by intensity measurement, but the phase information is lost. Therefore, it is beneficial to derive phase from reflectance . A general form of complex reflection coefficient can be written as . We can infer that:

If Kramers-Kronig relations are valid for , we can first measure from zero frequency to the highest measurable frequency. Then we can apply Kramers-Kronig relations on as real part of to infer the imaginary part . In our implementation, we follow the above procedure to derive , and therefore can get the Kramers-Kronig reconstructed complex reflection coefficient . Since the loss metric in the main text is the averaged mean square error of the real and imaginary parts of the reflection coefficient, and are further taken from to be compared with and .

The above procedure is built on the premise that there is causality for , but is not a real physical function that is proven to be governed by causality. Thus, we cannot prove that Kramers-Kronig relations is valid on . Furthermore, the reflection spectrum in the metamaterials dataset are typically in a finite frequency range that degrades the performance of the Kramers-Kronig relations. Nonetheless, we provided the Kramers-Kronig relations as a baseline to perform phase retrieval.

a.5 Baseline model 2: AAA-network algorithm details

Here we supply the details of the Adaptive Antoulas-Anderson (AAA) -network baseline we included in our main text. Following the notation of nakatsukasa2018aaa, it uses rational barycentric representation of rational apprixmation, which approximate an arbitrary function as rational function of type (m, m):

where are support points, weights and value parameters of this approximation. It approximates a complex function over a set of complex support points using an iterative greedy selection and uses linear least-squares to solve for weights.

Although AAA also utilizes rational approximation, there are a couple of fundamental differences between AAA and our Blashke product approximation. (1) The functional form difference: AAA uses the barycentric representation, which although limited to type (m, m) rational, covers a much larger functional space compared to our Blashke product, that is specialized to approximate the phase function on the unit circle according to the Helson-Sarason Theorem. (2) The fitting process: AAA uses a greedy solution to pick the support point first and then solve the least square for weights and values. Our Blashke product fits simultaneously and does not require iterative steps internally (although the actual fitting is done using gradient-based methods, which requires iterative steps). This means the AAA is difficult to build into a neural network and get gradient feedback from the fit, unlike our BPNN approach.

With AAA, we can successfully approximate the phase function using three sets of barycentric parameters (also known as AAA parameters). However, this process requires knowledge of the phase function or at least the ability to sample points on the phase function. Therefore a naive AAA alone is not able to be applied to our application of phase retrieval, which not only requires the approximation of phase function but also the extrapolation of the mapping between magnitude and the approximation parameters.

For programming implementation, we are using the existing package Driscoll2014333We used this public Python version: https://github.com/c-f-h/baryrat

. Setting the number of support points comparable to our BP (so that they have similar degrees of freedom), we first run the AAA algorithm for all of our training sets (number of points ranging from 1 to 50) and get the training AAA parameters. Then we used a neural network to model the relationship between the measured magnitude to the AAA parameters for the training data. The network structure we used was 64 neurons with 2 hidden layers with Relu activation. After the network is properly trained (reach convergence for 300 epochs), we input the testing spectra magnitude and get the inferred AAA parameters for the test spectra. Then we simply reconstructed the barycentric representation and compared the reconstructed phase function with the ground truth to get the MSE measure.

As graphs in the main text suggested, all of the AAA networks performed poorly compared to BP and to naive NN results. This is expected due to following reasons:

  1. The functional form of barycentric rational functions lacks actual physics meaning: Encompassing all (m, m) rational functions, they are too large and not restrictive enough for the phase function. The lack of actual physical meaning in the rational approximation unlike our Blashke product (which takes into account the unit circle property of phase functions) is the most important reason we suspect this not working

  2. The two-stage process error: There are two stages in our AAA-network baseline, a AAA parameter fitting stage and a network mapping learning stage. The error of each stage would compound and damage the overall performance

  3. One-to-many issue: Unlike the Blashke product where the parameters form an ordered set, the parameters of the barycentric representation of AAA form an unordered set. The order is important for neural networks to learn a meaningful mapping as it would not have one-to-many problems, which damages the training performance

a.6 Baseline model 3: MUSIC algorithm details

Multiple Signal Classification (MUSIC) algorithm schmidt1986multiple is a signal processing algorithm popular in frequency finding and direction finding where the signal is linear combination of multiple signal source of different frequencies and/or directions. MUSIC assumes that the signal vector is consist of p complex exponentials whose frequencies w are unknown.

Since MUSIC is not originally applied to phase retrieval problems and there is no widely single way to do phase retrieval, we would describe fully the process of our MUSIC phase retrieval algorithm here. First, we would assume that our signal is consist of the sum of p complex exponentials with unknown frequency w and noise term n:

where A is the Vandermonde matrix of steering vectors and s is the amplitude vector. In the assumption the actual number of sources is much lower than the observation, therefore p ¡ M. With this assumption, we would make another assumption that the signal comes from the same distribution, namely their autocorrelation matrix (M by M) should be a constant. Therefore, we would estimate this autocorrelation using the average of the sampled correlation matrix:

Now with the estimate of autocorrelation, we estimate the frequency of this matrix using the eigenspace method, where the eigenvectors associated with a small eigenvalue would be treated as the noise subspace and would be used to recover the signal frequency. Specifically, we decomposite the

into eigenvalues and eigenvectors, took the smallest (M-p) eigenvalues and their corresponding eigenvectors to form noise subspace . Since the eigenvectors from a Hermitian matrix should be orthogonal to each other, meaning any pure signal from signal space would be orthogonal to all the vectors that span the noise subspace. Using a squared norm distance, for a signal , the distance:

should be zero for all pure signals. Or, the inverse of should be the peaks in the space when we sweep through a range of possible values. Now, we sweep the space with a fine grid and the peaks in the would be our values. Therefore, we can construct the A matrix by the top-p peaks in the curve.

The above is a standard MUSIC algorithm to find the signal frequency (assuming the signal is a sum of exponential), now with the frequency of the signal found, finding the actual signal x would be equivalent to finding the corresponding amplitude vector s. Therefore we constructed our phase retrieval problem as an optimization problem:

Where the is the ground truth magnitude or experimental measurements. For each test phase retrieval problem (each test magnitude spectra), we would need to run the optimization again using the estimated A matrix. We tried the Newton raphson method for this operation but it did not converge even if we set a larger tolerance. Therefore we used the PyTorch framework (without any neural network) and utilized the parallel, GPU-based gradient optimization ability. We constructed the computational graph using the above equation and randomly initialized 1024 initial guesses and used Adam optimizer for 300 epoch till the loss converges. Afterward we would take the best performing single fit among the 1024 candidates as our final answer for this single magnitude. Due to time constraints (300 epoch with 1024 initialization for each of our test cases), we did 500 test cases instead of the full dataset like other methods.

a.7 Details on metamaterials dataset

Metamaterial absorber:

The input parameters include the four geometric dimensions of the top metal layer , material properties of a top dielectric layer , the conductivity of the metal , and thicknesses for all layers . The output is the complex reflection coefficient , with 1000 frequency points for each of the real and imaginary part of evaluated from 0 – 200 GHz.

Membrane Metasurface:

The metasurface supercell geometry has a periodicity , including four elliptical SiC resonators with consistent height . Each elliptical resonator has a major axis radius , a minor axis radius , and a rotational angle . The fourteen geometry parameters define inputs of computational simulations and yield complex reflection and transmission responses. This ADM dataset is particularly interested in the frequency-dependent reflection and transmission range from 100 - 500 THz.

Synthetic II: Lorentzian Response Function

The second synthetic dataset is generated by considering a causal model of a non-magnetic material, with a frequency-dependent complex dielectric permittivity defined as a sum of Lorentzian oscillators:

A simple Lorentzian model of the dielectric function is applicable to a wide range of resonant material systems with response functions that must obey Kramers-Kronig relations. Using the transfer-matrix method, Markos2008, the complex transmission coefficient can be computed directly from the dielectric function via the closed form equation:

where , , and are pre-defined physical constants. We generate a dataset of complex transmission coefficients by uniformly sampling Lorentzian parameters for optimized for the frequency range 100-500 THz. We consider a fixed set of four Lorentzian oscillators, and generate 50 sample transmission spectra for training, and 2000 spectra for the test dataset.