1 Introduction
Recent work by Lin et al. [12] – Bundle Adjusted neural Radiance Fields (BARF) – revealed that an architecturallymodified Neural Radiance Field (NeRF) [18]
could effectively solve the joint task of scene reconstruction and pose optimization. One crucial insight from this work is that the error backpropagation to the pose parameters in traditional NeRF is hampered by large gradients due to the highfrequency components in the positional embedding. To ameliorate this effect, the authors proposed a coarsetofine scheduler to gradually enable the frequency support of the positional embedding layer throughout the joint optimisation. Although achieving impressive results, this workaround requires careful tuning of the frequency scheduling process through a cumbersome
multidimensional parameter sweep. In this paper we investigate if this coarsetofine strategy can be bypassed through other means; simplifying the approach and potentially opening up new avenues for improvement.NeRF is probably the most popular application of coordinate multilayer perceptrons (MLPs). NeRF maps an input 5D coordinate (3D position and 2D viewing direction) to the scene properties (viewdependent emitted radiance and volume density) of the corresponding location. A crucial ingredient of most coordinate MLPs is positional encoding. Traditional MLPs suffer from
spectralbias – i.e., they are biased towards learning lowfrequency functions – when used for signal reconstruction. Thus, MLPs, in their rudimentary form, are not ideal for encoding natural signals with fine detail, which entails modeling large fluctuations [23]. To circumvent this issue, NeRF architecturally modifies the MLPs by projecting the lowdimensional coordinate inputs to a higher dimensional space using a positional embedding layer, which allows NeRF to learn highfrequency components of the target function rapidly [35, 18].Of late, there has been an increasing advocacy towards selfcontained coordinate networks. Of particular note in this regard is the work of Sitzmann et al [30]
who advocated that by simply replacing conventional activation functions (e.g. ReLU) with sine – one can remove the need for any type of positional embedding. Although showing promise, such sineMLPs have been found experimentally to be sensitive to weight initialization
[30, 24]. While Sitzmann et al. [30] proposed an initialization scheme that aids sineMLPs to achieve faster convergence when solving for signal reconstruction. Their deployment within NeRF has been limited, with most of the community still opting for positional embedding with conventional activations.Contributions: In this paper we draw inspiration from recent work [24] that has advocated for a broader class of effective activation functions – beyond sine – that can also circumvent the need for positional encoding. Of particular note in this regard are Gaussian activations. To our knowledge, their use in joint signal recovery and pose estimation has not been previously explored. We illustrate that these activations can preserve the firstorder gradients of the target function better than conventional activations enhanced with positional embedding layers. When applied to BARF – that is simultaneously solving for pose and radiance field reconstruction – sineMLPs are quite susceptible to local minima (even with good initialization), but our proposed Gaussian Activated neural Radiance Fields (GARF) exhibit robust stateoftheart performance.
In summary, we present the following contributions:

We present GARF, a selfcontained approach for reconstructing neural radiance field from imperfect camera poses without cumbersome hyperparameter tuning and model initialisation.

We establish theoretical insights of the effect of Gaussian activation in the joint optimisation problem of neural radiance field and camera poses, supported by an extensive empirical results.
We demonstrate that our proposed GARF can successfully recover scene representations from unknown camera poses, even on challenging scenes with lowtextured region, paving the way for unlocking NeRF for realworld applications.
2 Related Work
2.1 Neural Scene Representations.
Recent works have demonstrated the potential of multilayer perceptrons or also known as MLPs as continuous and memory efficient representation for 3D geometry, including shapes [5, 4], objects [16, 1, 20] or scene [31, 8, 30]. Using 3D data such as point clouds as supervision, these approaches typically optimise signed distance functions [20, 8] or binary occupancy fields [16, 4]. To alleviate the dependency of 3D training data, several methods formulate differentiable rendering functions which enables the networks to be optimised using multiview 2D images [31, 19, 18, 36]. Of particular interest is NeRF [18], which models the continuous radiance field of a scene using a coordinateMLP in a volume rendering framework by minimising the photometric errors. Due to its simplicity and unprecedented high fidelity novel view synthesis, NeRF has attracted wide attention across the vision community [21, 2, 14, 37, 34, 44]. Numerous extensions have been made on many fronts, e.g., faster training and inference [2, 43, 27, 13], deformable fields [21], dynamic scene modeling [11, 40, 3], generalisation [38, 29] and pose estimation [12, 39, 42, 15, 7, 33, 32].
2.2 Positional Embedding for Pose Estimation.
Positional embedding is an integral component of MLPs [35, 25, 46] which enable them to learn high frequency functions in low dimensional domain. One of the earliest roots of this approach can be traced to the work by Rahimi et al. [23], who discovered that random Fourier Features can be used to approximate an arbitrary stationary kernel function. Leveraging such an insight, Mildenhall et al [18, 35] recently demonstrated that encoding input coordinates with sinusoidal allows MLPs to represent higher frequency content, enables a highfidelity neural scene reconstruction in novel view synthesis.
Despite the ability of positional embedding in enabling MLPs to represent high frequency components, choosing the right frequency scale which often involves a cumbersome parameter tuning is critical, i.e.
, if the bandwidth of the signal is increased excessively, coordinateMLP tend to produce noisy signal interpolations
[35, 26, 6].More recently, there has been an increasing interest in using coordinateMLPs to tackle the joint problem of neural scene reconstruction and pose optimization [12, 39, 42, 15, 7, 33, 32, 47]. Remarkably, Lin et al. [12] demonstrated that coordinateMLPs entail an unanticipated drawback in camera registration – i.e., large gradient due to high frequency components in positional encoding function could hamper the error backpropagation to the pose parameters. Based on this observation, they proposed a workaround to anneal each component of the frequency function in a coarsetofine manner. By enabling a smoother trajectory for the optimisation problem, they show that such a strategy can lead to better pose estimation, compared to full positional encoding. Unlike BARF, we take different stance – i.e, is there a selfcontained architecture which can tackle the pose estimation problem optimally while attaining high fidelity neural scene reconstruction without positional embedding?
2.3 Embeddingfree Coordinatenetworks.
Sitzmann et al. [30] alternatively proposed sinusoidal activation functions which enable coordinate MLPs to encode high frequency functions without positional embedding layer. Despite its potential, networks that employ sinusoidal activations are hypersensitive to the initialisation scheme [30, 24, 26]. Taking a step further, Ramasinghe et al., recently broadened the understanding the effect of different activations in MLPs. They proposed a class of novel nonperiodic activations that can enjoy more robust performance against random initialisation than sinuosoids. Our work significantly differs from the abovementioned works. While we also advocate for a simple and robust embeddingfree coordinate network, our work focuses on the joint problem of high fidelity neural scene reconstruction and pose estimation.
3 Method
In this section, we will provide an exposition of our problem formulation and different classes of coordinate networks, characterising the relative merits of each class for joint optimisation of neural scene reconstruction and pose estimation.
3.1 Formulation
We first present the formulation of recovering the 3D neural radiance field from NeRF [18] jointly with camera poses. We denote as the camera pose transformations, and as the network in NeRF, respectively. NeRF encodes the volumetric field of a 3D scene using a coordinatenetwork as , which maps each input 3D coordinate to its corresponding volume density and directional emitted colour , i.e., , where is the network weights ^{1}^{1}1 is also conditioned on viewing direction for modeling viewdependent effect, for which we omit here in the derivation for simplicity..
Let be the pixel coordinates, be the imaging function. Given a set of images , we aim to solve for a volumetric radiance field of a 3D scene and the camera poses by minimizing the photometric loss as
(1) 
First, we assume the rendering operation of NeRF in the camera coordinate system. Expressing the pixel coordinate in its homogeneous coordinate as , we can define a 3D point along a camera ray sampled at depth as . The estimated RGB colour of at pixel coordinate is then computed by aggregating the predicted and as
(2) 
where , and and are the bounds of the depth range of interest; see [10] for more details of volume rendering operation. In practice, the integral is commonly approximated using quadrature [18] which evaluates the network at a discrete set of points through stratified sampling [18] at depth . Therefore, this entails querying of the network , whose output are composited through volume rendering. Denoting the ray compositing function as , we can rewrite as . Given a camera pose , we can transform a 3D point in the camera coordinate system to the world coordinate system through a 3D rigid transformation to obtain the synthesized image as
(3) 
We solve the optimization problem (1) using gradient descent. Next, we give a brief exposition of coordinatenetworks and compare them.
3.2 Coordinatenetworks
Coordinatenetworks are a special class of MLPs that are used to encode signals as trainable weights. An MLP with layers can be formulated as
(4) 
where , are trainable weights at the layer, is the bias, and is a non linear function. With this definition in hand, we briefly discuss several types of coordinatenetworks below.
3.2.1 ReLUMLPs:
employ the ReLU activation function . Despite being a universal approximator in theory, ReLUMLPs are biased towards learning lowfrequency functions [41, 22], making them suboptimal candidates for encoding natural signals with high signal fidelity. To circumvent this issue, various methods have been proposed in the literature, which we shall discuss next.
3.2.2 PEMLPs:
are the most widely adapted class of coordinatenetworks and were popularized by the seminal work of [18] through the use of positional embedding (PE). In PEMLPs, the lowdimensional input coordinates are projected to a higherdimensional hypersphere via a positional embedding layer , which takes the form
(5) 
where is a hyperparameter that controls the total number of frequency bands. After computing (5), the embedded 3D input points are then passed through a conventional ReLUMLPs to obtain .
3.2.3 SineMLPs:
are a coordinatenetwork type without a positional embedding; as proposed by [30]. In sineMLPs, the activation function is a sinusoid of the form
(6) 
where
is a hyperparameter. A larger
increases the bandwidth of the network, allowing it to encode increasingly higher frequency functions.3.2.4 GaussianMLPs:
are a recent class of positionalembedding less coordinatenetworks [24], where the activation function is defined as
Here, is a hyperparameter that can be used to tune the bandwidth of the network: a larger corresponds to a lower bandwidth, and viseversa.
3.3 GARF for Reconstruction and Pose Estimation
In this paper, we advocate the use of GaussianMLPs for jointly solving pose estimation and scene reconstruction, and show substantial empirical evidence that they yield better accuracy and easier optimization over the other choices. We speculate the reason for this superior performance as follows. The pose parameters are optimized using the gradients flow through the network. Hence, the ability to accurately represent the firstorder derivatives of the encoded signal plays a key role in optimizing pose parameters. However, Sitzmann et al. [30] showed that PEMLPs are incapable of accurately model firstorder derivatives of the target signal, resulting in noisy artifacts. This impacts the Fourier spectrum of the network function, which is implicitly related to the derivatives. It was shown in [26]
that the Fourier transform
of a shallow GaussianMLP takes the form(7) 
where is the frequency index, is the Dirac delta distribution which concentrates along the line spanned by , and
are the weight vectors corresponding to the
layer. Note that Eq. (7) is a smooth distribution, which is parameterized by and ’s. In other words, for a suitably chosen , the bandwidth of the network can be increased in a continuous manner by appropriately learning the weights. Furthermore, as is a continuous parameter, it provides MLPs with the ability to smoothly manipulate the spectrum of the Gaussian MLP.In contrast, [45] demonstrated that spectrum of a PEMLP tends to consist of discrete spikes, where the spikes are placed on the integer harmonics of the positional embedding frequencies. Approximating the ReLU function via a polynomial in the form , where are constants, they showed that the spectrum is concentrated on the frequency set
Recall that in order to increase the frequency support of the positional embedding layer, one needs to increase . It is evident that increasing even by one adds many harmonic spikes on the spectrum at the highfrequency end, irrespective of the network weights. Therefore, it is not possible to manipulate the spectrum of the PEMLP continuously under a controlled setting. This can result in unnecessary highfrequency components that lead to unwanted artifacts.
On the other hand, sineMLPs are able to construct rich spectra and represent firstorder derivatives accurately [30]. A drawback, however, is that sineMLPs are extremely sensitive to initialization. Sitzmann et al. [30] proposed an initialization scheme for sineMLPs in signal reconstruction, under which they show strong convergence properties. However, we empirically demonstrate that when jointly optimizing for the pose parameters and scene reconstruction, the above initialization yields subpar performance, making sineMLPs highly likely to get trapped in local minima. We also show that, in comparison, GaussianMLPs exhibit far superior convergence properties, indicating that they entail a simpler loss landscape.
4 Experiments
This section validates and analyses the effectiveness of our proposed GARF with other coordinate networks. We first unfold the analysis on a 2D planar image alignment problem, and demonstrate extensive results on learning NeRF from unknown camera poses.
4.1 2D Planar Image Alignment
To develop intuition, we first consider the case of 2D planar image alignment problem. More specifically, let be the 2D pixel coordinates and , we aim to optimize a neural image representation parameterised as the weights of coordinate network while also solving for warp parameters as
(8) 
where denotes the warp function parameterised as . Given patches from the image generated with random homography perturbations, we aim to jointly estimate the unknown homography warp parameters and network weights . We fix the gauge freedom by anchoring the first patch as identity; see Fig. 1 for an example.
4.1.1 Experimental settings.
We compare our proposed GARF with the following networks: PEMLP with a coarsetofine embedding annealer (BARF) [12] and sineMLP (SIREN) [30]. We use a layer MLP with 256dimensional hidden units for all networks. We use the Adam optimizer to optimize both the network weights and the warp parameters . We use a learning rate that begins at for , and for , with both decaying exponentially to for GARF and BARF. For SIREN, we use a learning rate of for both and decaying exponentially to . For BARF, we use frequency bands (5), and linearly anneal the frequency from to over K iterations. Note that we use the same parameters as proposed in [12]. At each optimization step, we randomly sample of the pixel coordinates for each patch.
4.1.2 Initialisation.
4.1.3 Results.
We show the quantitative and qualitative registration results in Fig. 2. As GARF is able to correctly estimate the warp parameters of all patches, GARF can reconstruct the image with high fidelity. On the other hand, BARF and SIREN struggle with the image reconstruction due to misalignment. It is important to note that the GaussianMLP initialisation protocol put the proposed method at a disadvantage. This further demonstrates the robustness of GaussianMLP towards initialisation.
4.1.4 Firstorder derivatives analysis.
For completeness, we first inspect the firstorder derivations of each coordinate network when solving for an image reconstruction task as ; note that we use the same notations as in Eq.(8). As discussed in Sec. 3.3, the ability to accurately represent the firstorder derivatives of the encoded signal plays a crucial role in optimizing pose parameters. Fig. 3 reinforces that the firstorder derivative of the encoded signal of PEMLP has a lot of noise artifacts – results in poor error backpropagation to pose parameters. While properlyinitialised SIREN is capable of representing the derivatives of the signal when solving for signal reconstruction, the initialisation strategy of sineactivation is suboptimal when jointly optimizing for neural image reconstruction and warp. As a result, the resulting function derivative is no longer welldefined; see Fig. 4. In contrast, GARF exhibit far superior convergence properties, albeit the model weights are initialised randomly.
4.1.5 Robustness of initialisation scheme.
Additionally, we run a simple experiment to investigate the sensitivity of SIREN and GARF to initialisation. We denote as the optimal model weights, which is obtained by solving Eq. (8) for a neural image representation by fixing the warp parameters, and as the randomly initialised model weights, i.e.
, weights are initialised using PyTorch default initialisation. Our goal is to solve the joint optimisation problem Eq. (
8) by initialising with different scaled model weights, i.e., by linearly adjusting . As shown in Fig 5, GARF (green curve) is marginally affected by the initialisation, while SIREN (blue curve) fails drastically (starting from =0.3). When SIREN is initialised carefully using the initialisation scheme proposed by Sitzmann et al. [30] (red curve), its performance decreases as gradually increases, i.e., as the perturbation to the optimal model weights increases. Note that the variance of performance in the GARF is much smaller compared to SIREN.
4.1.6 Generalisation of coarsetofine scheduling
We exhaustively search through the logspace for the optimal coarsetofine schedulers for BARF; see supp. material for more details. The optimal coarsetofine hyperparameters for each image are datadependent, i.e., the hyperparamaters tuned for one image may not be optimal for another image. In contrast to multidimensional coarsetofine schedulers, Gaussian activation function involves only onedimensional search space, i.e., in (7).
4.2 3D NeRF: Real World Scenes
This section investigates the task of jointly learning neural 3D representations with NeRF [18] on real world scenes where the camera poses are unknown. We evaluate all the methods on the standard benchmark LLFF dataset [17], which consists of 8 real world forwardfacing scenes captured by handheld cameras.
4.2.1 Experimental Settings.
We compare our proposed GARF with BARF and reference NeRF (refNeRF). As we empirically observe that PEMLP with scheduler (BARF) achieves better performance compared to PEMLP [39] in the joint optimisation of neural radiance field and camera poses, we opted not to include the comparisons with PEMLP here; see [12] or supp. for comparisons with PEMLP. We parameterise the camera poses with the Lie algebra and initialise them as identity for GARF and BARF. We assume known intrinsics.
4.3 Implementation Details.
We implement our framework following the settings from [18, 12] with some modifications. For simplicity, we train a single 6layer MLP with 256 hidden units in each layer and without hierarchical sampling. We resize the images to pixels and randomly sample 2048 pixel rays every iteration, each sampled at coordinates. We use the Adam optimizer [9] and train all models for 200K iterations, with a learning rate that begins at decaying exponentially to , and for the poses decaying to . We use the default coarsetofine scheduling for BARF [12]. We use the same network size and sampling strategy for all the methods throughout our evaluation. Note that for BARF and refNeRF, we use the implementation from BARF; all the hyperparameters are configured as per proposed in the paper.
4.3.1 Evaluation Details.
We evaluate the performance of each method in terms of pose accuracy for registration and view synthesis quality for the scene reconstruction. Following [12, 39], we evaluate the pose error by aligning optimized poses to groundtruth via Proscustes analysis which computes the similarity transformation Sim(3) between them. Note that as the “groundtruth” camera poses provided in LLFF realworld scenes are the estimations from Colmap [28], the pose accuracy is only an indicator how well the estimations agree with the classical method. We report the mean rotation and translation errors for pose, as well as PSNR, SSIM and LPIPS [18] for view synthesis in Table 1.
Scene  Pose accuracy  View synthesis  
Rotation  Translation  PSNR  SSIM  LPIPS  
()  ()  
ref  ref  ref  
[12]  Ours  [12]  Ours  [12]  Ours  NeRF  [12]  Ours  NeRF  [12]  Ours  NeRF  
flower  0.47  0.46  0.25  0.22  23.58  26.40  23.20  0.67  0.79  0.66  0.27  0.11  0.27 
fern  0.16  0.47  0.20  0.25  23.53  24.51  23.10  0.69  0.74  0.71  0.34  0.29  0.29 
leaves  1.00  0.13  0.30  0.23  18.15  19.72  14.42  0.48  0.61  0.24  0.40  0.27  0.58 
horns  3.50  0.03  1.32  0.21  19.73  22.54  19.93  0.66  0.69  0.59  0.35  0.33  0.45 
trex  0.42  0.66  0.36  0.48  22.63  22.86  21.42  0.75  0.80  0.69  0.24  0.19  0.32 
orchids  0.71  0.43  0.42  0.41  19.14  19.37  16.54  0.55  0.57  0.46  0.33  0.26  0.37 
fortress  0.17  0.03  0.32  0.27  28.48  29.09  25.62  0.80  0.82  0.78  0.16  0.15  0.19 
room  0.27  0.42  0.20  0.32  31.43  31.90  31.65  0.93  0.94  0.94  0.11  0.13  0.09 
4.3.2 Results.
Table 1 quantitatively contrasts the performance of GARF, BARF and refNeRF. As evident, Gaussian activations enable GARF to recover camera poses which matches the camera poses from offtheshelf SfM methods. Moreover, even with shallower network, Gaussian activations can successfully recover the 3D scene representation with higher fidelity in the absence of positional embedding, compared to BARF and refNeRF; see the qualitative results in Fig. 6.
4.4 RealWorld Demo
To showcase the practicability of GARF, we take one step further to test it on images of lowtextured scene captured using an iPhone. Fig. 7
remarkably demonstrate the potential of GARF on a scene with a lot of lowtextured region while refNeRF exhibits artifacts on the novel view due to existence of outliers in frontend of SfM pipeline, which results in unreliable camera pose estimations; see supp. for more results.
5 Conclusions
We present Gaussian Activated neural Radiance Fields (GARF), a new positional embeddingfree neural radiance field architecture that can reconstruct high fidelity neural radiance fields from imperfect camera poses without cumbersome hyperparameter and model initialisation. By establishing theoretical intuition, we demonstrate that the ability of the model to preserve the firstorder gradients of the target function plays an imperative role in the joint problem of optimizing for pose and radiance field reconstruction. Experimental results reinforced our theoretical intuition and demonstrated the superiority of GARF, even on challenging scenes with low textured region.
References

[1]
(2020)
Deep local shapes: learning local sdf priors for detailed 3d reconstruction.
In
European Conference on Computer Vision
, pp. 608–625. Cited by: §2.1.  [2] (2021) Depthsupervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791. Cited by: §2.1.
 [3] (2021) Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5712–5721. Cited by: §2.1.

[4]
(2020)
Local deep implicit functions for 3d shape.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 4857–4866. Cited by: §2.1.  [5] (2019) Learning shape templates with structured implicit functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7154–7164. Cited by: §2.1.
 [6] (2021) SAPE: spatiallyadaptive progressive encoding for neural optimization. Advances in Neural Information Processing Systems 34. Cited by: §2.2.
 [7] (2021) Selfcalibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5846–5854. Cited by: §2.1, §2.2.
 [8] (2020) Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6001–6010. Cited by: §2.1.
 [9] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.3.
 [10] (1990) Efficient ray tracing of volume data. ACM Transactions on Graphics (TOG) 9 (3), pp. 245–261. Cited by: §3.1.
 [11] (2021) Neural scene flow fields for spacetime view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6508. Cited by: §2.1.
 [12] (2021) Barf: bundleadjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5741–5751. Cited by: Gaussian Activated Neural Radiance Fields for High Fidelity Reconstruction & Pose Estimation, §1, §2.1, §2.2, §4.1.1, §4.1.2, §4.2.1, §4.3.1, §4.3, Table 1.
 [13] (2021) Autoint: automatic integration for fast neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14556–14565. Cited by: §2.1.
 [14] (2021) Nerf in the wild: neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219. Cited by: §2.1.
 [15] (2021) Gnerf: ganbased neural radiance field without posed camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6351–6361. Cited by: §2.1, §2.2.
 [16] (2019) Occupancy networks: learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470. Cited by: §2.1.
 [17] (2019) Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–14. Cited by: Figure 6, §4.2, Table 1.
 [18] (2020) Nerf: representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pp. 405–421. Cited by: §1, §1, §2.1, §2.2, §3.1, §3.1, §3.2.2, §4.1.2, §4.2, §4.3.1, §4.3.
 [19] (2020) Differentiable volumetric rendering: learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3504–3515. Cited by: §2.1.
 [20] (2019) Deepsdf: learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174. Cited by: §2.1.
 [21] (2021) Nerfies: deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874. Cited by: §2.1.

[22]
(2019)
On the spectral bias of neural networks
. In International Conference on Machine Learning, pp. 5301–5310. Cited by: §3.2.1.  [23] (2007) Random features for largescale kernel machines. Advances in neural information processing systems 20. Cited by: §1, §2.2.
 [24] (2021) Beyond periodicity: towards a unifying framework for activations in coordinatemlps. arXiv preprint arXiv:2111.15135. Cited by: §1, §1, §2.3, §3.2.4.
 [25] (2021) Learning positional embeddings for coordinatemlps. arXiv preprint arXiv:2112.11577. Cited by: §2.2.
 [26] (2022) On regularizing coordinatemlps. arXiv preprint arXiv:2202.00790. Cited by: §2.2, §2.3, §3.3.
 [27] (2021) Kilonerf: speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14335–14345. Cited by: §2.1.
 [28] (2016) Structurefrommotion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4104–4113. Cited by: §4.3.1.
 [29] (2020) Graf: generative radiance fields for 3daware image synthesis. Advances in Neural Information Processing Systems 33, pp. 20154–20166. Cited by: §2.1.
 [30] (2020) Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33, pp. 7462–7473. Cited by: §1, §2.1, §2.3, §3.2.3, §3.3, §3.3, Figure 5, §4.1.1, §4.1.2, §4.1.5.
 [31] (2019) Scene representation networks: continuous 3dstructureaware neural scene representations. Advances in Neural Information Processing Systems 32. Cited by: §2.1.
 [32] (2021) Anerf: surfacefree human 3d pose refinement via neural rendering. arXiv preprint arXiv:2102.06199. Cited by: §2.1, §2.2.
 [33] (2021) Imap: implicit mapping and positioning in realtime. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6229–6238. Cited by: §2.1, §2.2.
 [34] (2022) Blocknerf: scalable large scene neural view synthesis. arXiv preprint arXiv:2202.05263. Cited by: §2.1.
 [35] (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems 33, pp. 7537–7547. Cited by: §1, §2.2, §2.2.
 [36] (2021) Advances in neural rendering. arXiv preprint arXiv:2111.05849. Cited by: §2.1.
 [37] (2021) Meganerf: scalable construction of largescale nerfs for virtual flythroughs. arXiv preprint arXiv:2112.10703. Cited by: §2.1.
 [38] (2021) Ibrnet: learning multiview imagebased rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. Cited by: §2.1.
 [39] (2021) NeRF–: neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064. Cited by: §2.1, §2.2, §4.2.1, §4.3.1.
 [40] (2021) Spacetime neural irradiance fields for freeviewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9421–9431. Cited by: §2.1.

[41]
(2019)
Training behavior of deep neural network in frequency domain
. In International Conference on Neural Information Processing, pp. 264–274. Cited by: §3.2.1.  [42] (2021) Inerf: inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. Cited by: §2.1, §2.2.
 [43] (2021) Plenoctrees for realtime rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752–5761. Cited by: §2.1.
 [44] (2021) Pixelnerf: neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4578–4587. Cited by: §2.1.
 [45] (2021) A structured dictionary perspective on implicit neural representations. arXiv preprint arXiv:2112.01917. Cited by: §3.3.
 [46] (2021) Rethinking positional encoding. arXiv preprint arXiv:2107.02561. Cited by: §2.2.
 [47] (2021) NICEslam: neural implicit scalable encoding for slam. arXiv preprint arXiv:2112.12130. Cited by: §2.2.