1 Introduction
Coordinatebased neural representations have emerged as a compelling alternative for modeling and processing signals (e.g., sound, image, video, geometry, etc.). Instead of using discrete primitives such as pixels or vertices, coordinatebased multilayer perceptrons (MLPs) represent graphics images llff ; huang2021textrm ; fathony2020multiplicative , shapes SAL ; IGR ; mescheder2018occupancy ; peng2020convolutional , or even radiance fields chen2021mvsnerf ; lin2021barf ; meng2021gnerf ; mildenhall2020nerf ; muller2022instant ; park2021nerfies ; zhang2020nerf++ in terms of continuous functions that are memory efficient and amenable for solving inverse reconstruction problems via training. Earlier coordinatebased MLPs tend to be biassed towards low frequencies, whereas more recent implicit neural approaches have adopted a sinusoidal representation for better recovering high frequencies, by either transforming the input coordinates to the Fourier basis mildenhall2020nerf ; zhong2021cryodrgn ; yu2022anisotropic , or encoding the sinusoidal nonlinearity via deeper MLP architectures sitzmann2019siren ; fathony2020multiplicative .
Although effective in countering high frequency losses, such approaches are inefficient in training or inference, inherent to the MLPbased optimization strategy: bruteforce mappings from lowdimension, lowfrequency inputs onto high frequency target functions do not sufficiently consider the underlying characteristics of the mapping function. As a result, they have largely resorted to large MLPs (either wide in hidden dimensions or deep in layers) to reliably learn the corresponding function mapping. The downside of large MLPs is the long training time and slow inference speed.
To accelerate training, embeddingbased MLPs yu2021plenoxels ; sun2021direct ; muller2022instant ; chen2022tensorf have adopted an inverse optimization strategy. They jointly search for an optimal mapping network and the optimal inputs (i.e., the high dimension embedding volume). By replacing the input field with a highdimension highfrequency embedding volume (whose size is generally determined by the volume resolution and kernel size), they manage to bridge the gap between the lowdimension lowfrequency coordinate inputs and the highfrequency outputs with a much smaller MLP in both width and depth. Successes of these approaches are illustrated by their accelerations by orders of magnitudes in both training and rendering chen2022tensorf ; muller2022instant ; sun2021direct ; yu2021plenoxels ; yu2021plenoctrees .
However, as their embedding space is also represented in discrete forms, stateoftheart techniques have relied on interpolations for querying highdimensional embedded features. To maintain high efficiency, the most adopted schemes are still linear interpolation. Yet they present several major limitations:

The highest recoverable frequency of the embedding fields is determined by the resolution of the volume. To preserve high frequencies, it is critical to discretize the volume at a very fine level. Fully frequency preservation hence requires demand extremely high memory consumption, prohibitive on even the most advanced GPUs.

From a signal processing perspective, linear interpolation within the embedding volume not only leads to aliasing but also causes higherorder derivatives to vanish, hindering backpropagation and the overall convergence. Figure 1 shows a typical example.

Spatially discretized embedding, compared to its frequency dual, provides limited insights on the signal. In particular, despite a large variety of Fourier signal processing tools for both editing and stylization, few are directly applicable to spatial embedding.
To address these limitations, we present PREF, a novel PhasoRial Embedding Field, that represents the Fourier transformed embedding with compact, complexnumbered phasors. We derive a comprehensive framework using PREF to approximate the Fourier embedding space so that each spatial coordinate’s feature can be represented in terms of the sinusoidal features within a discrete phasor volume. PREF presents several key advantages over previous spatial embedding techniques. First, the nonlinear transform nature of the phasor avoids the vanishing derivative problem in linearly interpolated space embedding. Second, PREF is highly compact and does not require using high resolution volumes to preserve high frequency. In fact, it facilitates easy queries of the specific frequency of the input embedding. Finally, based on Fourier transforms, PREF manages to exploit many existing Fourier signal processing techniques such as differentiation, convolution, and Parseval’s theorem to conduct traditionally expensive operations for the MLP. Overall, PREF provides a new embedding that benefits direct and inverse neural reconstructions on tasks ranging from 2D image completion to 3D point cloud processing and the 5D neural radiance field reconstruction.
To summarize, the contributions of our work include:

A compact and physically explainable encoding field based on the phasor formulation called PREF. We conduct a comprehensive theoretical analysis and demonstrate the advantages of PREF over previous spatial embedding techniques.

We show that PREF provides a robust and compact solution to MLPbased signal processing. It is compact and at the same time effectively preserves high frequency components and therefore details in signal reconstruction.

We develop a highly efficient frequency learning framework using an approximated inverse Fourier transform scheme along with a novel Parseval regularizer. Comprehensive experiments demonstrate that PREF outperforms the stateoftheart techniques in various signal reconstruction tasks in both accuracy and robustness.
2 Related Work
Our PREF framework is in line with renewed interest on adopting implicit neural networks to represent continuous signals from low dimensional input. In computer vision and graphics, they include 2D images with pixels
Chen_2021_CVPR , 3D surfaces in the form of occupancy fields mescheder2018occupancy ; DVR ; peng2020convolutional or signed distance fields (SDFs) SAL ; IGR ; wang2021neus ; yariv2020multiview ; martel2021acorn ; NGLOD ; yariv2021volume ; wang2021spline , concrete 3D volumes with a density field ji2017surfacenet ; qi2016volumetric , 4D light fields levoy1996light ; wood2000surface and 5D plenoptic functions for the radiance fields mildenhall2020nerf ; zhang2020nerf++ ; oechsle2021unisurf ; liu2020neural ; park2021nerfies ; yariv2020multiview .Encoding vs. Embedding.
To learn high frequencies, stateoftheart implicit neural networks have adopted the Fourier encoding scheme by transforming the coordinates by periodic and functions or under Euler’s formula . Under Fourier encoding, feature optimization through MLPs can be mapped to optimizing complexvalued matrices with complexvalued inputs in the linear layer. For example, Position Encoding (PE) and Fourier Feature Maps (FFM) both transform spatial coordinates to Fourier basis at earlier input layers martin2021nerf ; tancik2020fourfeat ; zhong2021cryodrgn whereas SIREN sitzmann2019siren
embeds the process in the deeper layers by using periodic activation functions.
Improving training and inference efficiency of MLPbased networks has also been explored from the embedding perspective with smart data structures. Various schemes reiser2021kilonerf ; hedman2021baking ; yu2021plenoctrees ; yu2021plenoxels ; sun2021direct ; muller2022instant ; chen2022tensorf replace the deep MLP architecture with voxelbased representations, to trade memory for speed. Early approaches bake an MLP to an Octree along with a kilo subNeRF or 3D texture atlas for realtime rendering reiser2021kilonerf ; hedman2021baking . These approaches rely on a pretrained or halftrained MLP as prior and therefore still incur long perscene training. Planoxel yu2021plenoxels and DVGO sun2021direct
directly optimize the density values on discretized voxels and employ an auxiliary shading function represented by either spherical harmonics (SH) or a shallow MLP network to account for view dependency. They achieve orders of magnitude acceleration over the original NeRF on training but incur very large memory footprint by storing pervoxel features. However, over parametrization can easily lead to noisy density estimation and subsequently inaccurate surface estimations and rendering. The seminal work of InstantNGP
muller2022instant spatially groups features via embedding with a hash table to achieve unprecedentedly fast training. In a similar vein, TensoRF chen2020tensoremploys highly efficient tensor decomposition via vertical projections.
It is worth noting that embeddingbased techniques share many similarities to neural inversion inversion
that aims to jointly optimize the inputs and network weights. In a nutshell, different from traditional feedforward neural network optimization that attempts to refine network weights, neural inversion seeks to find inputs, often nonunique, that will produce the desired output response under the fixed set of weights. Classic examples include latent embedding optimization
rusu2018metain machine learning and GAN inversions
karras2019style ; xia2021gan ; Karras2021 ; karras2019style that finds the optimal latent code to best match the target image for subsequent editing or stylization. In the context of embedding, the focus is to maintain a learnable highorder input embedding and then compute the feature via interpolation in the embedding space via schemes as simple as the nearest neighbor but no more complicated than linear interpolation. While efficiency in feature querying, they share a common limitation of vanishing gradient, that is, the piecewise constant firstorder derivative and zero secondorder derivative due to linear interpolation, leading to higher than usual errors than bruteforce pure MLP implementations. We show that our PREF representation is both compact and effective in preserving high frequency details by overcoming the hurdle of vanishing derivatives.3 Background and Notations
Our goal is to fit a continuous function parameterized with low dimensional inputs . Let be an MLP with parameter , be a complexvalued (phasor) volume of dimensionality . represents the inverse Fourier Transform. We use the MLP to approximate as as , where can be computed as .
Our phasorial embedding field (PREF) resembles existing spatial embedding. However, spatial embedding uses a realvalue volume rather than and employs local interpolation or hash functions in place for . Similar to spatial embedding though, PREF can also handles any continuous fields by the spatiallyembedded MLPs, e.g., signed distance fields (SDFs) NGLOD ; wang2021neus ; yariv2020multiview or radiance fields (RFs) chen2020tensor ; sun2021direct ; yu2021plenoxels ; mildenhall2020nerf . In fact, as a frequencybased alternative, PREF explicitly associates frequencies with features: each volume entry represents the Fourier coefficients under the corresponding frequency.
Next, we show that under PREF, many neat properties of the Fourier transform translate to the complex phasor volume, facilitating much more efficient optimization and accessible manipulation. For simplicity, we carry out our derivations in 2D while high dimensional extensions can be similarly derived as shown in various applications.
We first present the inverse Fourier transform, and along the way, present the associate theorems, which we utilize to optimize our phasorial representation. Before proceeding, we explain our notation. Let be the continue and discrete Fourier series, and is a phasor volume that translates to phasorial embedding field .
Inverse Fourier Transform. Let be a 2D continuous bandlimited signal, its discrete inverse Fourier transform factorizes the signal into a Fourier series with corresponding coefficients:
(1) 
Recall that corresponds to an equallyspaced matrix where each entry is a phasor with real and imaginary part of the corresponding frequency as:
(2) 
The resulting phasor volume can be viewed as a multichannel version of the Fourier map . Therefore,
inherits several nice properties of the Fourier transform, which is efficient for frequencydomain manipulations.
Theorem 1.
Let be an absolutely continuous differentiable function, and be its inverse Fourier transform, we have
(3) 
Theorem 2.
Let be absolutely continuous differentiable function, and be its inverse Fourier transform, we have
(4) 
Beyond these, modeling a signal in a Fourier domain provides various unique properties; for example, the circular convolution theorem indicates that a convolution of two sequences can be obtained as the inverse transform of the product of the individual transforms.
A nive solution is to represent the signals based on the standard inverse Fourier transform and a dense phasorial volume grid describing the Fourier coefficients, then jointly update the volume grid and MLP with gradient descent. However, such a solution is inefficient since both model size and computation increase in , where is the resolution of the Fourier series. We next introduce a novel modeling, phasorial encoding field, for resolving the limitation in the nive solution.
4 Phasorial Embedding Fields
Our PREF is a de facto continuous feature field transformed from a multichannel multidimensional square Fourier (phasor) volume (e.g., for 2D tasks). In a nutshell, PREF employs the Fourier transform in Eq. 1 to map spatial coordinate into channel feature . The mapped results can be fed into an MLP to process taskspecific fields such as SDFs or RFs. Figure 2 illustrate this. As each channel is independent, PREF can transform the spatial coordinates and update the phasor volume in parallel. Therefore, we simply omit in the rest of the derivations for clarity. We start by defining a phasor volume of PREF and subsequently discuss how to efficiently extract features from PREF and then optimize the volume.
4.1 Phasor Volume Decomposition
Bruteforce representation of PREF with full frequencies is clearly too expensive in both computation and memory. Because many natural signals are bandlimited and to reduce complexity, we instead use a sparse set of frequencies to encode PREF. The process is equivalent to selectively marking a large portion of the entries in matrix as zero. In particular, we set out to factorize by logarithmic sampling along each dimensions. This results in two thin matrices and , with being a small number. Consequently, such a transform simplifies to . The same formulations can be also easily extended to a higher dimensional signal with a similar logarithmic sampling along individual dimensions.
4.2 IFT Approximation
Recall that full numerical integration of the Fourier transform (i.e., Eq. 1) requires computing an expensive inner product between the entire phasor volume and the frequencyencoded volume . It needs to be computed for every single point in , prohibitively expensive for any practical frequency learning scheme. To make the computational tractable, we observe that if all input coordinates are equally spaced, then the Fourier transform simplifies to the Discrete Fourier Transform (DFT) that can be explicitly evaluated using fast Fourier transform (FFT) methods, e.g., the Cooley–Tukey algorithm. To compute an arbitrary (offgrid) input coordinate , one possible solution is to first compute a map of equallyspaced coordinates by 2D FFT and then perform bilinear interpolation from the 2D ongrid map given. However, such a scheme, i.e., computing an entire dense grid with many unused vertices, provides a poor tradeoff between complexity and accuracy.
We instead employ both FFT and numerical integration (NI) to achieve both high accuracy and relatively low complexity. Specifically, we first perform 1D FFT along one of the axes to obtain a intermediate map , with .
(5) 
where is a linear interpolation operation that corresponds to interpolating from the intermediate map . Note that, the length in the reduced dimension is extremely small (details described in 4.1). Therefore, persample numerical integration is very efficient, significantly reducing the training cost. Further, different from interpolation in the spatial domain chen2022tensorf ; muller2022instant ; sun2021direct ; yu2021plenoxels that results in vanishing high order gradient, such a frequency interpolation scheme, benefits from the periodic characteristic in the frequency domain, and manages to preserve gradients (see Fig 1).
4.3 Volume Regularization
Recall that many high dimensional signal reconstruction problems including NeRF are illposed. Therefore, PREF alone without additional priors may still produce reconstruction artifacts on such problems. In traditional signal processing, several regularization techniques such as the Lasso Regression (LR loss) and Total Variation regularizers (TV loss) have been imposed on natural signals to restrict the complexity (parsimony) of the reconstruction. Yet both LR and TV losses are designed for processing signals in the spatial domain and are not directly applicable to PREF. We therefore propose a novel Parseval regularizer as:
(6) 
We show that the Parseval regularizer behaves like anisotropic TV in the spatial domain.
Lemma 3.
Let be integrable, and be its Fourier transform. The anisotropic TV loss of can be represented by .
Proof: Recall the TV loss can be computed as . Since and are Fourier pairs, we have Fourier transform preserves the energy of original quantity based on Parseval’s theorem (theorem 2), i.e.,
(7) 
According to theorem 1, and are also Fourier pairs. The integration derivative along axis is defined as,
(8) 
By taking square root on both sides, we have . And can be derived with similar proof.
4.4 Complexity Analysis
Our PREF approximation scheme reduces the memory complexity by representing the feature space with a sparse set of frequencies, i.e., to represent a channel 3D feature volume. Similar to NGP, we can equip a shallow MLP module to PREF to achieve full spectral reconstruction.
Finally, to further improve efficiency, we adopt a twostep procedure for computing the PREF features for all input coordinates: (1) we conduct FFT to compute an offgrid intermediate map per batch of the input, and (2) we perform per sample numerical integration (NI) of the Fourier Transform. This indicates that the larger the batch, the lower the average complexity for the input samples. For a feature volume of size with samples to be transformed and queried, our scheme reduces the complexity from based on naive 3D NI to using 2D FFT and 1D NI. Notice that in a single training batch. Therefore, our implementation only incurs around increase in computational cost over the spatiallyembedded acceleration schemes. Yet PREF reduces the memory consumption, improves modeling/rendering quality via full spectral reconstruction, and provides convenient frequency manipulations.
5 Experiments
We evaluate our PREF with a number of natural signal processing tasks, including 2D image completion, signed distance field regression, and radiance field reconstruction. For each task, we tailor a solution based on our PREF.
5.1 Baselines
We first analyze and compare our PREF with three standard backbones of coordinatebased MLPs, which also focus on boosting the capability or training efficiency.
Positional Encoding with Fourier Features. The Fourier features tancik2020fourfeat aim to learn highfrequency functions in low dimensional problem domains. Fourier features transform low dimension coordinates to before passing them into an MLP, where is defined as for
. Such a mapping function is deterministic and robust to hyperparameter
. However, the usage of exponential spaced onaxis frequency series is insufficient to cover the complete frequency domain. Therefore, PE bias toward axisalign signals tancik2020fourfeat .Periodic Activation Functions. SIREN sitzmann2019siren proposes an alternative encoding: . A major advantage of SIREN is that it better preserves highorder gradients and subsequently supports network modulation by controlling the amplitude and phase of the active layer. A major challenge here is that the periodic activation contains a mass of local minimal. Thus it requires a careful network initialization for stable training.
Spatial Feature Grids. The latest acceleration schemes including DVGO sun2021direct , iNGP muller2022instant , and TensoRF chen2020tensor , use
as a local transform. That is, TensoRF uses an orthogonal projection, DVGO uses linear interpolation, and iNGP uses the hashing to map spatial coordinates into a learnable embedding space. These schemes greatly reduce the MLPs dependency and are highly efficient in training (e.g., a Pytorch implementation reduces NeRF training time from hours to minutes, and a CUDA implementation further reduces the time cost to seconds). Yet, as local interpolants, they struggle to maintain continuity while avoiding diminishing highorder gradients.
PREF. Our PREF employs based on a learnable embedding grid in the frequency domain. It continuously (after inverse Fourier transform) and globally encodes spatial coordinates into the sum of inner products. Here we advocate the "global" characteristics of PREF because each phasor impacts all spatial grids. Next, we will demonstrate that PREF can simultaneously achieve robustness via frequency decomposition and maintain the highorder gradients via periodic modeling. It is also highly efficient to conduct embeddingbased learning equipped with shallow MLPs.
5.2 Applications
We evaluate our PREF on three neural reconstruction tasks, including image regression, SDF regression and radiance field reconstruction. The choice of phasor volume size and MLP varies with tasks, mainly for fair comparisons with the aforementioned baselines, which we detail in each task.
Natural  Text  
Dense Grid sun2021direct  
PE tancik2020fourfeat  
SIREN sitzmann2019siren  
Ours 
2D Image Regression and Reconstruction.
2D image regression aims to evaluate the capability of the representation supervised with all image pixels, and image reconstruction or image inpainting is trained with partial pixels and predict the missing ones. We quantitatively evaluate the PSNR scores between the outputs and groundtruth images for both tasks.
We first conduct pilot experiments of frequency learning under the image regression task, as shown in Figure 1. The dense grid setting, corresponding to the 3rd and the 5th column, adopts a linear interpolation in a learnable feature grid and optional, followed by an MLP; Our PREF uses the same volume resolution as the dense grids and performs the standard inverse Fourier transform to predict the target image. Our method significantly outperforms the dense grid setting in terms of the convergence speed and the reconstruction quality of pixels. We owe this improvement to the periodic nature of our Fourier decomposition that globally regularizes the whole signal instead of regionally as that in dense grids. We show that the representation of local parameterization with linear interpolation can hurt its generalization ability, consequently producing noisy results and losing its highorder gradients.
We further evaluate our PREF with the inpainting task, where we train our PREF with a regularlyspaced grid containing missing pixels from each image in the Natural and Text dataset. In this experiment, our PREF consists of a phasor volume with the reduced dimension d = 8 and a twolayer MLP. Following fathony2020multiplicative , We report test error on an unobserved pixels in Tab. 1. Reasonably, our approach also achieves qualitatively better reconstruction results.
Method  BatchSize  Steps  Time  Size(MB)  PSNR  SSIM 
SRN sitzmann2019scene      10h    22.26  0.846 
NeRF mildenhall2020nerf  4096  300k  35h  5.0  31.01  0.947 
SNeRG hedman2021baking  8192  250k  15h  1771.5  30.38  0.950 
NSVF liu2020neural  8192  150k  48h    31.75  0.950 
PlenOctrees yu2021plenoctrees  1024  200k  15h  1976.3  31.71  0.958 
Plenoxels yu2021plenoxels  5000  128k  11.4m  778.1  31.71  0.958 
DVGO sun2021direct  5000  30k  15.0m  612.1  31.95  0.957 
TensoRF chen2022tensorf  4096  30k  17.4m  71.8  33.14  0.963 
Ours  4096  30k  18.1m  34.4  32.08  0.952 
SDF regression and editing. Next, we explore the capability of PREF for geometric representation mescheder2018occupancy ; park2019deepsdf and editing. We evaluate the 3D shape regression from given point clouds together with its SDF values. We adopt Armadillo and Gargoyle, two widelyused models. For each model, we normalize the training mesh into a bounded box of , and sample points for training: points on the surface, points around the surface by adding a Gaussian noise on the surface point with , and points uniformly sampled within the bounding box. We report the IOU of the ground truth mesh and the regressed signed distance field, by discretizing them into two volumes. We also report the Chamfer distance metric by sampling surface points from the predicted mesh with marching cube lorensen1987marching . We implement DVGO by ourselves and adopt the implementation of iNGP^{1}^{1}1adopted from https://github.com/ashawkey/torchngp for comparisons. We include detailed parameter choices in supplementary. The quantitative evaluation in Tab.4 demonstrates that we achieve competitive results with existing STOAs, while remaining a more compact model.
Our trained SDF model also allows implicit surface editing such as Gauss smoothing, where we apply a pointwise multiplication of a Gauss on the trained phasorial embedding, as shown in Fig.4. This may be useful for surface denoising or texture removal.
Armadillo  Gargoyle  
Memory (MB)  IOU  Chamfer  IOU  Chamfer  
iNGP muller2022instant  46.7  99.34  5.54e6  99.42  1.03e5 
DVGO sun2021direct  128.0  98.81  5.67e6  97.99  1.19e5 
PE tancik2020fourfeat  6.0  96.65  1.21e5  80.46  1.03e4 
Ours  36.0  99.02  5.57e6  99.05  1.06e5 
Neural Radiance Field Reconstruction. Finally, we evaluate PREF on the popular radiance field reconstruction tasks. NeRF reconstruction attempts to recover scene geometrics and appearance given a set of multiview input images with posed cameras. In this task, we evaluate the reconstruction quality of the novel views and evaluate the model size for the compactness and the efficiency with its training speed. In practice, we individually model the density and color, then jointly optimize them via a volume render scheme mildenhall2020nerf supervised only with image color. However, this sometimes leads to overfitting to the training view and leading to floaters in empty space due to the strong model capability. We utilize a new Parseval term to regularize the embedding field, as described in sec. 4.3. We set the expected volume size as , roughly containing 36MB parameters. Please see supplementary for detailed information. During optimization, we apply a coarsetofine training scheme starting from the low frequency series. We then progressively unlock the rest high frequency series to be learnable at step to reach the expected frequencies . Tab. 2 shows the quantitative results; our model can be on par with the stateoftheart radiance field reconstruction approaches while reaching compact modeling and a fast training process.
6 Conclusion
We have presented a novel neural approach for compact modeling that decompose a natural signal into Fourier series, and developed a new approximating inverse transform scheme for efficient reconstruction. PREF produces high quality images, shapes, and radiance fields from given limited data and outperforms recent works. Benefiting from our physical meaningful Fourier decomposition and fast transformation, we allow explicitly manipulating the learned embedding under different frequencies. One interesting future direction is to apply our representation to 3Daware image generation, and PREF is potentially able to resolve the challenge of highresolution rendering.
References

[1]
M. Atzmon and Y. Lipman.
Sal: Sign agnostic learning of shapes from raw data.
In
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2020.  [2] P. Beatty, D. Nishimura, and J. Pauly. Rapid gridding reconstruction with a minimal oversampling ratio. IEEE Transactions on Medical Imaging, 24(6):799–808, 2005.
 [3] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. Tensorf: Tensorial radiance fields. arXiv preprint arXiv:2203.09517, 2022.
 [4] A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su. Mvsnerf: Fast generalizable radiance field reconstruction from multiview stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
 [5] W. Chen, X. Zhu, R. Sun, J. He, R. Li, X. Shen, and B. Yu. Tensor lowrank reconstruction for semantic segmentation. In European Conference on Computer Vision, pages 52–69. Springer, 2020.
 [6] Y. Chen, S. Liu, and X. Wang. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8628–8638, June 2021.
 [7] R. Fathony, A. K. Sahu, D. Willmott, and J. Z. Kolter. Multiplicative filter networks. In International Conference on Learning Representations, 2020.
 [8] J. Fessler and B. Sutton. Nonuniform fast fourier transforms using minmax interpolation. IEEE Transactions on Signal Processing, 51(2):560–574, 2003.
 [9] A. Gropp, L. Yariv, N. Haim, M. Atzmon, and Y. Lipman. Implicit geometric regularization for learning shapes. In International Conference on Machine Learning, pages 3789–3799. PMLR, 2020.
 [10] P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec. Baking neural radiance fields for realtime view synthesis. arXiv preprint arXiv:2103.14645, 2021.
 [11] Z. Huang, S. Bai, and J. Z. Kolter. : Implicit layers for implicit representations. Advances in Neural Information Processing Systems, 34, 2021.
 [12] C. Jensen, R. Reed, R. Marks, M. ElSharkawi, J.B. Jung, R. Miyamoto, G. Anderson, and C. Eggen. Inversion of feedforward neural networks: algorithms and applications. Proceedings of the IEEE, 87(9):1536–1549, 1999.
 [13] M. Ji, J. Gall, H. Zheng, Y. Liu, and L. Fang. SurfaceNet: An endtoend 3D neural network for multiview stereopsis. In Proc. ICCV, 2017.

[14]
T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and
T. Aila.
Aliasfree generative adversarial networks.
In Proc. NeurIPS, 2021.  [15] T. Karras, S. Laine, and T. Aila. A stylebased generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
 [16] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [17] M. Levoy and P. Hanrahan. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 31–42. ACM, 1996.
 [18] C.H. Lin, W.C. Ma, A. Torralba, and S. Lucey. Barf: Bundleadjusting neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021.
 [19] L. Liu, J. Gu, K. Z. Lin, T.S. Chua, and C. Theobalt. Neural sparse voxel fields. arXiv preprint arXiv:2007.11571, 2020.
 [20] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. SIGGRAPH Computer Graphics, 21(4):163–169, 1987.
 [21] J. N. Martel, D. B. Lindell, C. Z. Lin, E. R. Chan, M. Monteiro, and G. Wetzstein. Acorn: Adaptive coordinate networks for neural scene representation. arXiv preprint arXiv:2105.02788, 2021.
 [22] R. MartinBrualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
 [23] Q. Meng, A. Chen, H. Luo, M. Wu, H. Su, L. Xu, X. He, and J. Yu. Gnerf: Ganbased neural radiance field without posed camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6351–6361, 2021.
 [24] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger. Occupancy networks: Learning 3d reconstruction in function space. Proc. CVPR, 2019.
 [25] B. Mildenhall, P. P. Srinivasan, R. OrtizCayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
 [26] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020.
 [27] M. J. Muckley, R. Stern, T. Murrell, and F. Knoll. TorchKbNufft: A highlevel, hardwareagnostic nonuniform fast Fourier transform. In ISMRM Workshop on Data Sampling & Image Reconstruction, 2020. Source code available at https://github.com/mmuckley/torchkbnufft.
 [28] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022.
 [29] M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020.
 [30] M. Oechsle, S. Peng, and A. Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multiview reconstruction. arXiv preprint arXiv:2104.10078, 2021.
 [31] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
 [32] K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, and R. MartinBrualla. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
 [33] S. Peng, M. Niemeyer, L. Mescheder, M. Pollefeys, and A. Geiger. Convolutional occupancy networks. In European Conference on Computer Vision, pages 523–540. Springer, 2020.
 [34] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multiview cnns for object classification on 3d data. In Proc. CVPR, 2016.
 [35] C. Reiser, S. Peng, Y. Liao, and A. Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. arXiv preprint arXiv:2103.13744, 2021.
 [36] A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. Metalearning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
 [37] V. Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein. Implicit neural representations with periodic activation functions. In arXiv, 2020.
 [38] V. Sitzmann, M. Zollhöfer, and G. Wetzstein. Scene representation networks: Continuous 3dstructureaware neural scene representations. arXiv preprint arXiv:1906.01618, 2019.
 [39] C. Sun, M. Sun, and H.T. Chen. Direct voxel grid optimization: Superfast convergence for radiance fields reconstruction. arXiv preprint arXiv:2111.11215, 2021.
 [40] T. Takikawa, J. Litalien, K. Yin, K. Kreis, C. Loop, D. Nowrouzezahrai, A. Jacobson, M. McGuire, and S. Fidler. Neural geometric level of detail: Realtime rendering with implicit 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11358–11367, 2021.
 [41] M. Tancik, P. P. Srinivasan, B. Mildenhall, S. FridovichKeil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, and R. Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020.
 [42] P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang. Neus: Learning neural implicit surfaces by volume rendering for multiview reconstruction. arXiv preprint arXiv:2106.10689, 2021.
 [43] P.S. Wang, Y. Liu, Y.Q. Yang, and X. Tong. Spline positional encoding for learning 3d implicit signed distance fields. arXiv preprint arXiv:2106.01553, 2021.
 [44] D. N. Wood, D. I. Azuma, K. Aldinger, B. Curless, T. Duchamp, D. H. Salesin, and W. Stuetzle. Surface light fields for 3d photography. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 287–296. ACM Press/AddisonWesley Publishing Co., 2000.
 [45] W. Xia, Y. Zhang, Y. Yang, J.H. Xue, B. Zhou, and M.H. Yang. Gan inversion: A survey. arXiv preprint arXiv:2101.05278, 2021.
 [46] L. Yariv, J. Gu, Y. Kasten, and Y. Lipman. Volume rendering of neural implicit surfaces. In ThirtyFifth Conference on Neural Information Processing Systems, 2021.
 [47] L. Yariv, Y. Kasten, D. Moran, M. Galun, M. Atzmon, B. Ronen, and Y. Lipman. Multiview neural surface reconstruction by disentangling geometry and appearance. In Proc. NeurIPS, 2020.
 [48] A. Yu, S. FridovichKeil, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. arXiv preprint arXiv:2112.05131, 2021.
 [49] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. Plenoctrees for realtime rendering of neural radiance fields. arXiv preprint arXiv:2103.14024, 2021.

[50]
H. Yu, A. Chen, X. Chen, L. Xu, Z. Shao, and J. Yu.
Anisotropic fourier features for neural imagebased rendering and
relighting.
In
Proceedings of the AAAI Conference on Artificial Intelligence
, 2022.  [51] K. Zhang, G. Riegler, N. Snavely, and V. Koltun. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
 [52] E. D. Zhong, T. Bepler, B. Berger, and J. H. Davis. Cryodrgn: reconstruction of heterogeneous cryoem structures using neural networks. Nature methods, 18(2):176–185, 2021.
Appendix A Phasorial Embedding Fields Implementation Details
Phasor Volume Decomposition. Recall that PREF is a continuous embedding field corresponding to a multichannel multidimensional square Fourier volume. We elaborate on implementation details. Let be a 3D phasor volume representing an embedding field .
Note that the is Hermitian symmetric when the is a realvalued feature embedding, i.e., (i.e., its complex conjugate). Further, base on the observation that natural signals are generally bandlimited, we model their corresponding fields with bandlimited phasor volumes where we are able to partially mask out some entries and factor the volume along respective dimensions, as shown in Fig 5. Thus we factorize the full spectrum into trithine embeddings by reusing the linearity of the Fourier Transform, .
IFT Implementation. Recall that for a 3D phasor volume, we approximate via subprocedures of applying 2D Fast Fourier Transforms (FFTs) and 1D numerical integration (NI) to achieve high efficiency. Therefore, given a batch of spatial coordinates, our PREF representation transforms them into a batch of feature embeddings in parallel where PREF can serve as a plugandplay module. Such a module can be applied to many existing implicit neural representations to conduct taskspecific neural field reconstructions. We present a sketchy PyTorch pseudocode in Algorithm 1.
Phasor Volume Initialization. Our PREF approach can be alternatively viewed as a frequency space learning scheme to existing spatial coordinatebased MLPs. In our experiments, we found zero initialization works well for applications ranging from 2D image regression to 5D radiance fields reconstruction while certain applications require more tailored initialization, e.g., geometric initialization in SAL ; yariv2020multiview . This is because (with being the frequency coordinate) needs to satisfy the unique constraints of . We thus initialize the phasor volume as follows: Let be the initialization of . We have = , with as the Fourier transform. We then transform via the inverse Fourier transform (due to duality between and ) as the approximation to . We found such a strategy enhances the stability and efficiency.
Computation Time. One of the key benefits of PREF is its efficiency. As discussed in section 4.3, we conduct frequencybased neural field reconstruction by employing IFT, which is computationally low cost and at the same time effective. When the input batch is sufficiently large (e.g., samples per batch in radiance field reconstruction), the persample numerical evaluation will dominate the computational cost. Since such persample evaluation can be efficiently implemented using matrix product, it is essentially equivalent to adding a tiny linear layer. The overall implementation makes PREF nearly as fast as the stateoftheart, e.g., instantNGP for NeRF. For example, on the Lego example, our PyTorch PREF produces the final result in 16 minutes on a single RTX3090, considerably faster than the original NeRF and comparable to the PyTorch implementation of iNGP. We are in the process of implementing PREF on CUDA analogous, and hopefully, it may achieve comparable performance to the CUDA version of iNGP.
Appendix B Application to Image Regression
b.1 Implementation & Reproducibility Details
Pilot experiments. Before proceeding to more sophisticated tasks such as generation and reconstruction, we first explore a toy example that utilizes PREF to continuously parametrize an image, e.g., a grayscale image. To better provide insights into our frequencybased learning framework, we use a complexvalued grid to regress (upsample) the input image via inverse Fourier transform, using PREF vs. bilinear upsampling on the same MLP network. From a signal processing perspective, if the image is bandlimited, e.g., Nyquist frequency is the fold of 128, a frequency scheme should perfectly reconstruct the image. Yet Bilinear interpolation exhibits aliasing due to the characteristics of the first and secondorder derivatives, as shown in the discussion and Fig. 1 in the paper. We then show the performance using embedding where we expand the grid size to a embedding volume followed by a threelayer MLP with a hidden dimension of 256 that maps the embeddings to pixels. Previous studies sitzmann2019siren
have shown that improper activation functions can also lead to aliasing in highorder gradients, despite the choice of embedding techniques. Therefore, for comprehensive studies, we further compare various mostseen activation functions, including ReLU, Tanh, and the most recent, Sine
sitzmann2019siren . Our experiments show that such a frequencylearning scheme consistently outperforms its spatial counterparts, potential owing to its wellbehaved derivatives and continuous nature, as shown in the line plot of Fig. 1.Image completion. Next, we demonstrate PREF on image completion tasks. We use the commonly adopted setting huang2021textrm ; fathony2020multiplicative : given 25 pixels of an image, we set out to predict another pixels. We evaluate PREF vs. SATO on two benchmark datasets  Nature and Text. Specifically, we compare PREF with a dense grid counterpart, and two stateoftheart coordinatebased MLPsmartin2021nerf ; sitzmann2019siren . The dense grid uses a resolution whereas PREF uses two grids that correspond to the highest frequency of . The two embedding techniques above use the same MLP with three linear layers, 256 hidden dimensions, and ReLU activation. We use Positional Encoding (PE) which consists of a 5layer MLP with frequencies encoding. We adopt SIREN from sitzmann2019siren that uses a 4layer MLP and sine activation. Detailed comparisons are listed in Tab 1.
Optimization details. All experiments use the same training configuration. Specifically, we adopt the Adam optimizer adam with default parameters (), a learning rate of . We use loss with iterations to produce the final results.
Armadillo  Gargoyle  
Memory (MB)  IOU  Chamfer  IOU  Chamfer  
iNGP muller2022instant  46.7  99.34  5.54e6  99.42  1.03e5 
DVGO sun2021direct  128.0  98.81  5.67e6  97.99  1.19e5 
PE tancik2020fourfeat  6.0  96.65  1.21e5  80.46  1.03e4 
Ours  36.0  99.02  5.57e6  99.05  1.06e5 
Appendix C Application to Signed Distance Field Reconstruction
c.1 Task description
Next, we conduct the more challenging task of signed distance field (SDF) reconstruction. An SDF describes the shape in terms of a function as:
(9) 
where is a closed surface and and correspond to regions outside of and inside the surface respectively. is the Euclidean distance from a point to the surface. Our goal is to recover a continuous SDF given a set of discretized samples of value that commonly refers to samples from a mesh.
c.2 Implementation & Reproducibility Details
Data preparation.
We adopt two widely used models: gargoyle (50k vertices) and armadillo (49k vertices). For each training epoch, we scale the model within a bounding box of
and samples points for training: points on the surface, points around the surface by adding Gaussian noise to the surface point with , and the last points uniformly sampled within the bounding box.Metric. We report the IOU of the ground truth mesh and the regressed signed distance field by discretizing them into two volumes. We report the Chamfer distance metric by sampling 30k surface points from the extracted mesh using the marching cube technique lorensen1987marching .
Baseline implementation details. For the embeddingbased baselines, we use our implementation of the dense volume technique sun2021direct . It contains learnable parameters that transform the input coordinates to their feature embedding at a length by trilinearinterpolation. We adopted the PyTorch implementation of iNGP muller2022instant from torchngp ^{2}^{2}2https://github.com/ashawkey/torchngp, where they maintain a multilevel hash function to transform the spatial coordinate into feature embedding. We use a 16 numoflevel hash function with dimension 2. Consequently, the output feature embedding is of length 32. Please refer to muller2022instant for more details on the implementation of multilevel hash. For our PREF, we use three complexvalued volumes to nonlinearly transform the spatial coordinates to a d feature embeddings. All the embeddingbased baselines and our PREF adopt the same MLP structure for fairness that consists of 3 layers that progressively map the input embedding to the 64 dimension features as well as to a scalar, with ReLU as the intermediate activation. Another baseline we compare against the positional encoding (PE) based NeRF that uses a wider and deeper coordinatebased MLP mildenhall2020nerf where we encode the input coordinates into six frequencies in PE and use an MLP of 8 linear layers, 512 hidden dimensions, and ReLU activation. Tab 4 lists individual model size and performance of the baseline vs. PREF. Our method manage to be on par with the stateoftheart iNGP muller2022instant with a compact model size, and outperform its spatial counterpart sun2021direct and frequencybased proceedings. We owe the improvement of PREF to its globally continuous nature that either allows for preserving details.
Training details. We provide additional details on how we train the baseline. As aforementioned, in each epoch, we sample a batch size of to regress the SDF values. The MAPE loss is used for error backpropagation. To optimize the networks, we use the Adam optimizer, with , and . We use an initial learning rate and reduce the learning rate to at the 10th epoch. We adopt a batch size of to optimize all baselines whereas for our method epochs.
Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  Size (MB)  
PlenOctrees yu2021plenoctrees  34.66  25.37  30.79  36.79  32.95  29.76  33.97  29.62  31.71  1976.3 
Plenoxels yu2021plenoxels  33.98  25.35  31.83  36.43  34.10  29.14  33.26  29.62  31.71  778.1 
DVGO sun2021direct  34.09  25.44  32.78  36.74  34.46  29.57  33.20  29.12  31.95  612.1 
Ours  34.95  25.00  33.08  36.44  35.27  29.33  33.25  29.23  32.08  34.4 
Highest Freq  Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  Size (MB) 
256  34.95  25.00  33.08  36.44  35.27  29.33  33.25  29.23  32.08  34.40 
128  33.29  24.64  32.70  36.04  33.77  29.37  31.87  27.75  31.18  9.84 
64  31.54  23.82  30.43  35.25  30.56  28.82  31.22  27.08  29.83  2.28 
32  30.11  22.59  27.77  34.07  27.39  27.65  30.48  25.79  28.23  0.76 
Appendix D Radiance Fields Reconstruction
d.1 Task description
For radiance field, we focus on rendering novel views from a set of images with known camera poses. Each rgb value in each pixel corresponds to a ray cast from the image plane. We adopt the volume rendering model martin2021nerf :
(10) 
where and are corresponding density and color at location , is the interval between adjacent samples. Then we optimize the rendered color with the ground truth color with loss.
(11) 
d.2 Implementation & Reproducibility Details
PREF model setting. We describe how PREF models the density and the radiance . We use three phasor volume cascaded with a twolayer MLP with hidden dimension 64 and output dimension 1 for computing the density (a scalar). We then use Softplus to map the raw output to the positivevalued density. For the viewdependent radiance branch, we use a relatively large volume of followed by a linear layer to output feature embedding. To render viewdependent radiance, we follow the TensoRF pipeline chen2020tensor : we concatenate the result with the positional encoded view directions and feed them into a 2layer MLP with hidden dimension and a linear layer to map the feature to color with Sigmoid activation. All linear layers except the output layer use ReLU activation.
Rendering. To compare with SOTA yu2021plenoxels ; sun2021direct ; chen2020tensor , we train each scene using iterations with a batch size of rays. We adopt a progressive training scheme: from the highest frequency of to . Specifically, we gradually unlock the higher frequencies at the training step . Accordingly, the number of samples per ray progressively increases from about 384 to about 1024. This allows us to achieve more stable optimization by first covering the lower frequencies and later highfrequency details. During training, we maintain an alpha mask to skip empty space to avoid unnecessary evaluations.
Optimization. As mentioned in the paper, our PREF uses the Parsvel regularizer to avoid overfitting where our objective is set to with . Without regularization, PREF may overfit specific frequencies, as shown in Fig 6. On the NeRF synthetic dataset, PREF converges on average minutes with iterations on a single RTX 3090, with an initial learning rate of and gradually decayed by a factor of 10 during the training. The Adam optimizer uses and by default.
d.3 Additional results
We report the breakdown results of our PREF on the SyntheticNeRF dataset in Tab 5. To further evaluate the effectiveness of our frequency encoding, we report the performance in Tab 6 under different model sizes (by varying the phasor volume size). Notice that PREF produces reasonable results (with a mean PSNR of ) even when the model size is reduced to ultrasmall ( MB), a potential benefit for downstream generative tasks that require training thousands of scenes.
Appendix E Application to Shape Editing
e.1 Implementation details
Recall that the continuous embedding field of PREF is synthesized from a phasor volume under various frequencies. Therefore, thanks to Fourier transforms, various tools such as convolution in the continuous embedding fields can be conveniently and efficiently implemented as multiplications. This is therefore a unique advantage of PREF compared with its spatial embedding alternatives chen2020tensor ; sun2021direct ; yu2021plenoxels ; muller2022instant .
Let and be the optimized MLP and phasor volume, respectively. represents the inverse Fourier Transform. Recall that we obtain a reconstruction field by . Modification to the original signal via convolution based filtering can now be derived as:
(12) 
where denotes elementwise multiplication and is a filter.
Now, we explore how to manipulate via the optimized phasor volume and kernel . For simplicity, we only the Gaussian filter while more sophisticated filters can also be applied in the same. Assume
(13) 
where and covers the complete frequency span of ; that is, we can scale the magnitude of phasor features frequencywise. For example, by varying the Gaussian kernel size using , PREF can denoise the neural representation of the signal at different scales, as shown in Fig 7.
Appendix F Limitations
We have demonstrated that PREF enables fast reconstructions of neural signals in the phasor (frequency) space, with smaller model sizes, comparable and sometimes better performances, and more efficient filtering capabilities. Compared with existing spatial embedding techniques, PREF, however, requires additional computational costs for conducting Fourier transforms and therefore is slightly slower than prior art such as iNGP (PyTorch). Our immediate next step is to implement a CUDA version of PREF. However, certain Autograd libraries do not readily support complexvalued parameters optimization. Therefore additional efforts are required to write customized CUDA modules for PREF.
Similar to PE martin2021nerf , PREF masks out certain spatial frequencies in the phasor volume to achieve efficiency and compactness. However, this may lead to directional bias, as observed in prior art tancik2020fourfeat . However, since PREF uses more frequencies (3D sparse frequencies) than PE (axisaligned 1D frequencies), PREF effectively reduces these artifacts, as shown in the experiments. For further improvement, one may adopt Nonuniform Fast Fourier Transform (NuFFT) fessler03 ; beatyy05 ; muckley20 to tackle nonuniform frequency sampling. Overall, by providing a new frequency perspective of neural signal presentation, PREF may stimulate significant future work. To that end, we intend to make our code and data available to the community at GitHub.