Representing and manipulating surfaces and 3D shapes is a problem of paramount importance in many diverse applications, ranging from mechanical and architectural design to computer animation, augmented/virtual reality, and physical simulations. It thus comes as no surprise that the representations devised over the years are as many and diverse as the applications, each with their respective advantages and disadvantages. Bézier patches, B-splines and subdivision surfaces are only some of the choices, with the most ubiquitous being the polygon meshes [hearn2004computer].
Although polygon meshes offer a useful and efficient representation it is hard to model diverse topologies, as that would require the vertices or their connectivity to change. To surpass these limitations researchers have tried incorporating different geometrical representations, such as voxel grids, octrees, and implicit functions. Due to the grid (or grid-like) structure of the former two, they have been used with convolutional networks [3dr2n2, hierarchical, ogn]. Nevertheless, voxel grids cannot achieve high resolution and, even though octrees address this, they, too, result in jagged models.
In the last years, given the ever-rising popularity of artificial neural networks, a new class of surface representations has been proposed, namely the Implicit Neural Representations (INRs). In this approach, the surface, which is frequently required to be closed, is represented implicitly as a level set of a neural network with one output. Several papers have presented very interesting and promising results using such representations [DeepSDF, neuralfield_survey]. In most of the papers, the network tries to learn either the signed distance function or the occupancy function. Also, the network can learn only one surface or a class of surfaces by taking a class code along with the spatial coordinates as input. In contrast to 3D meshes, voxel grids, and other common representations, INRs are not coupled to spatial resolution and can be sampled at arbitrary spatial resolutions, since they are continuous functions. In this way, the memory required to accurately represent a 3D shape does not depend on the spatial resolution but on its geometric complexity.
Despite the particularly promising results of implicit representations, there are still limitations to their usage. One of the most important limitations is that the shapes cannot be easily edited. This is due to the fact that in these representations the geometric structures are not represented in a local fashion. Each weight of the corresponding network affects the geometry over an unbounded region of the output space. This means that, in order to perform a localized modification on the 3D shape, generally all weights of the network need to be modified.
This editability problem of INRs is an open challenge that has attracted very limited attention in the literature. Existing works allow for some form of interactive editing, by either optimizing the shape code fed to the network [dif-net, dualsdf] or by training the networks for articulated objects [articulated, a-sdf] and changing the joints’ parameters (which are also fed to the network). In either case, the editing is limited inside a learned shape space and, so, these methods do not support arbitrary modifications of the shape’s 3D geometry.
To overcome the aforementioned limitations, this work introduces the first method that allows interactive editing of INRs, specifically neural Signed Distance Functions (SDFs). We approach the problem from a 3D sculpting perspective, aiming at equipping INRs with functionalities that 3D modeling software have for the standard mesh representations. Our method, which we call 3D Neural Sculpting (3DNS), edits the surface modeled by the zero-level set of a neural network in a brush-based manner. As mentioned above, using a feedforward neural network to represent an SDF creates a problem of locality. For this, we propose using samples of the surface represented by the network to regulate the learning of the desired deformation. The source code is available at https://github.com/pettza/3DNS.
To recap, INRs are in their infancy. They have shown impressive results, but have not yet found many applications. We believe that the editability capabilities that this paper introduces will pave the way to a plethora of applications for INRs, ranging from computer graphics, where machine learning becomes more and more popular, to robotics, where some preliminary works have given evidence about the great benefits of these representations[grasping, simeonovdu2021ndf].
2 Related Work
2.1 Implicit Surface Representations
The idea to represent surfaces implicitly is by no means a new one. In fact, there have been continual attempts to use implicit representations in computer graphics and machine learning. In the shader art community, analytic implicit representations have been used to render from simple primitives to complex scenes and fractal objects [quilez]
. On the other hand, in the machine learning community earlier approaches have relied upon radial basis functions (RBFs)[rbf_sdf] and octrees [adaptive_signed_distance_fields] to express SDFs. The authors of [usingparticles] use points sampled on an implicit surface to control its shape.
Recently, the use of a neural network as the function expressing the surface was proposed in by three concurrent works [chen2019learning, occupancy_net, DeepSDF], which ignited interest in these implicit neural representations. DeepSDF [DeepSDF] uses a network to represent the SDF for a shape (or a shape class, using additionally a shape code as input to the network), while the other two [chen2019learning, occupancy_net] express the occupancy function. Since then many more works on INRs have ensued.
The works referenced above use a regression-type loss function for training. For SDFs, this requires the computation of ground truth distances at points in space which can be difficult. Various attempts have been made to reformulate the loss function for training a neural SDF. SAL[SAL] neural SDFs are trained using a loss that, nevertheless, disregards the sign of the distance, which requires careful initialization. SAL++ [sal++]
extends this method to utilize information about the normal vectors of the surface. The authors of[controlling] incorporate samples of the level sets of the network function to the loss. Further progress was made by IGR [igr], which uses the fact that SDFs satisfy the eikonal equation to train the network as a differential equation solving network [DGM], thus requiring only uniform samples inside a bounding box of the surface and samples that lie on the surface, without computation of ground truth distances. Our loss is derived from this.
Other works have experimented with the architecture of the networks. SIREN [siren], which uses sines instead of the usual RELUs, presented promising results in a variety of tasks including surface reconstruction. Convolutional networks are used in [if_net, ndf] by discretizing the input point cloud. State-of-the-art results have been attained by coupling neural networks with data structures that retain localized spatial information. Octrees, which store learnable weights are used in [deepls, nglod] and a method called hash encoding is used by Instant-NGP [instant-ngp]. Besides high detailed representations, NGLoD [nglod] and Instant-NGP [instant-ngp] achieve interactive framerates, as well.
Besides training using raw geometric data like point clouds and meshes, there have been efforts to train neural SDFs directly from images [sdf_reparametrization, dvr, Oechsle2021ICCV, yariv2020multiview]. The authors of [SDDF] propose an SDF variant that takes direction into account (Signed Directional Distance Function). In contrast to a neural SDF which expresses the distance approximately, they prove that their network structure ensures it expresses the SDDF of some object.
It is worth mentioning that, while we focus on neural representations of shapes, implicit representations have found success in expressing quantities other than distance functions. For example, NeRF [mildenhall2020nerf] produces highly realistic images by expressing the radiance and density of a scene. A recent survey [neuralfield_survey] explores these representations, to which the authors refer as Neural Fields, in depth.
2.2 Neural SDF Editability
The research on editability is quite limited. DualSDF [dualsdf] proposes a two-level representation, one which comprises a collection of primitives (spheres) and a neural SDF which share the shape space. The user is able to manipulate the fine representation of the neural SDF by specifying changes to primitives’ parameters which affect the shape code. A similar process is possible with DIF [dif-net], where instead of primitives a sparse set of points is used. The authors of [articulated] and those of [a-sdf] both deal with articulated objects. The joints’ parameters are given as input to the network and, thus, can be used to manipulate the shape.
Since the above works deform the expressed shape by proxy of the network’s inputs the space of possible shapes is limited to the one learned during training. In contrast, our method allows the user to more freely change the local 3D geometry in ways that do not necessarily lie within a learned shape space, similar to the functionalities that until now were offered by 3D software for meshes only.
We begin by presenting some material upon which we rely to develop our method. In Section 3.1 we give a definition of SDF, in Section 3.2, we describe SIREN [siren] which is the architecture that we build upon, in Section 3.3 we present weight normalization [weight_norm] and finally in Section 3.4, we describe the formulation of the adopted loss function.
3.1 Signed Distance Functions (SDFs)
Let be a surface in , then the (unsigned) distance of a point to the surface is:
where is a metric on , typically (in our case as well) the Euclidean distance.
If is closed the signed distance function is defined as follows:
A Neural SDF is a neural network that takes the spatial coordinate as input and approximates the SDF of a surface. As a consequence, Neural SDFs use a whole trained neural network to parametrize a single shape. They effectively use a continuous function to represent a shape and in this way they are not dependent on a specific spatial resolution.
3.2 SIREN Representation
We build upon SIREN [siren]
, which is a multilayer perceptron (MLP) that uses sines, instead of the usual ReLUs for its non-linearities. Evidently, sines allow the network to more accurately represent finer details. The use of sines is related to Fourier features[benbarka2021seeing], for which the aforementioned property has been theoretically explained [tancik2020fourier] using the NTK framework [ntk]. We chose SIREN as our architecture due to its simplicity and effectiveness. We note, however, that our method is model-independent and, therefore, could be applied to a variety of networks. The only alteration we experiment with is adding weight normalization to each layer which we discuss next.
3.3 Weight Normalization
We propose to extend SIREN by equipping it with weight Normalization, which is a memory-efficient technique proposed by [weight_norm] for accelerating convergence. It works by reparametrizing the weights of a linear layer so that they are represented by a length parameter and a direction vector. This approximately ‘whitens’ the gradients during training and this makes the 3D shape represented by the Neural SDF to update across iterations in a more effective manner. As we show in Section 5.4 applying it to SIREN has a positive effect.
3.4 Loss Function for Neural SDF
We adopt the loss function introduced by SIREN [siren] and briefly present it here for the sake of completeness. In more detail, many of the works on neural SDFs [deepls, ndf, instant-ngp, DeepSDF, nglod] use a regression type loss to train the neural networks. The loss is evaluated for points sampled on the surface, near the surface, and in a bounding box around it. The ground truth signed distance needs to be computed for off-the-surface points which can be difficult, especially for non-mesh data which interest us (see Section 4). The authors of [igr] propose a loss function that comprises a regression loss evaluated only on points on the surface (for which the distance is 0), a term for the normal vectors at the same points, and a term that enforces the network to have unit gradients to the input (this is commonly referred to as the eikonal term). The latter term is evaluated for points that are sampled uniformly inside a bounding box. The authors of [siren], besides some minor changes, expand the above loss with a term that penalizes small values of the neural SDF at off-surface points. In summary, the loss function that we minimize during training is defined as:
where , and are balancing weights (set to , , , respectively), is the target surface, is a distribution on the surface,
is uniform distribution in a bounding box,is the parameter vector, is the network function, is the cosine distance, is the normal vector at and a large positive number (set to ). encompasses the regression and normal terms, is the eikonal term, and is the term described last.
4 Proposed 3D Neural Sculpting
Our goal is to deform a surface (represented as a neural SDF) around a selected point on it in a manner similar to what is possible for meshes with 3D sculpting software. We refer to the selected point as the interaction point, to the area around it as the interaction area and to the process in general as an interaction or edit. The main problem in our case is how to enforce locality. Generally, a change to the parameters of a neural network is expected to affect its output for an unbounded region of the input space. Consequently, naively trying to train the network only where the surface needs to change will ostensibly distort the rest of the surface as well, which is what we confirm through experimentation in Section 5.6. In order to ameliorate this adverse effect as much as possible, we produce the point cloud that is to be used to evaluate the loss of equation 4 by including both samples from the surface that the network already represents (we call these model samples) and samples from the desired deformation. After discarding samples from the former set that are close to the interaction point we use the union with the latter set for training. In Section 4.1 we describe our surface sampling, then we present our formulation of sculpting brushes in Section 4.2 and then, in Section 4.3 we describe how we sample the interaction area.
4.1 Surface Sampling
Two prior works [controlling, ndf] have proposed similar algorithms that sample the zero-level set of a neural network function. They work by sampling points uniformly inside a bounding box and then projecting them on the level set with generalized Newton-Raphson iterations. For a true SDF, this would move a point to the surface point closest to it.
A naive way to produce samples for each training iteration would be to use this algorithm. However, this approach has two major drawbacks. Firstly, it is time inefficient. Secondly, the distribution of the resulting samples can be quite non-uniform. Inspired by [ndf]
where the sample set is extended by perturbing existing samples with gaussian noise we produce the samples for the next iterations using the ones we already have. To that purpose, we add to each point a vector sampled uniformly from the tangent disk and then reproject them on the surface using the aforementioned procedure. We opt for the tangential disks, instead of gaussians, so that we explore the surface as much as possible without moving too far from it. The radius of the tangent disks is a hyperparameter we set to 0.04.
This way of sampling forms a Markov Chain[sheldon_prob]. Naturally, the stationary pdf distribution is of interest. It is relatively easy to see that the requirements of Theorem 1 of [diaconis1997markov] (regarding the existence of the stationary distribution of a continuous space Markov Chain) are satisfied and, hence, the stationary distribution exists and has support over the surface. It is hard to theoretically reason about the shape of the distribution. However, we provide experimental results in Section 5.5, which demonstrate that our sampling process produces more uniformly distributed samples. Uniformness guarantees that every surface region is included equally during training.
We define a brush template as a (or higher) positive 2D function defined over the unit disk centered at the origin which reaches a maximum value of 1 and vanishes at the unit circle (ideally its gradient and its higher derivatives vanish as well). Suppose is a brush template, then the properties above are summarized as follows:
We can, then, define a brush family parametrized over radius and intensity :
We show the behaviour of different radii and intensities in Section 5.3. Notice that a positive value for the intensity creates a bump on the surface, while a negative value creates a dent. In our experiments, we use the following brush template whose profile is shown in Figure 2. For other types, please refer to the supplementary material.
In order to apply the brush at a point on the surface, we consider it defined on the tangent plane at that point. Due to the implicit function theorem [vector_calc] we can express the zero-level set of the network function, in a region of that point, as the graph of a 2D function over the same plane. We can, thus, apply the brush by simply adding the brush function to the latter. As we will see, even though this is not important for computing the samples’ positions, we use it to compute the correct deformed normals.
4.3 Interaction Sampling
We will now describe how we produce samples on the area of the surface that is affected by the brush. We begin by placing uniform samples on a disk tangent to the interaction point whose radius is the same as the brush’s radius. We can, then, project these samples on the unaltered surface and then move them perpendicular to the tangent plane (at the interaction point) a distance equal to the brush function’s value. Theoretically, for what we discussed in the previous section to be applicable this needs to be a parallel projection, however, we use the same procedure that we described in Section 4.1 so that we don’t affect the surface in a greater area than intended. The above process is shown in Figure 4 Next we compute the normals at the sampled position.
Suppose that is a 2D function defined over a 3D plane (its gradient then lies on the plane), and is the normal vector to that plane (analogous to the z axis), then the following is an unnormalized vector, perpendicular to the graph of the function:
|Shape||Chamfer Distance ()|
|Without WN||With WN|
Accordingly, what we need in order to compute the normals of the samples is the gradient of the 2D function whose graph is the deformed surface. As we explained in the previous section, this function is the sum of the brush function and the function defined implicitly by the network. We can directly calculate the gradient of the brush function. By the implicit function theorem, the gradient of the implicit function is given by:
where denotes the component parallel to the tangent plane and the component perpendicular to it.
We, firstly, sample the model and discard the samples that lie inside a sphere centered at the interaction point with a radius equal to the brush’s radius. Then, we sample the interaction. The number of samples we take is the number of discarded samples multiplied by an integer factor bigger than 1. We utilize this factor in order to balance the contributions of the model and interaction samples. We want the influence of the interaction samples to be larger since it is where the surface must change, while adapting to the size of the affected area. We use a factor of 10 for our experiments. Finally, the set of samples for the training is the union of the non-discarded model samples and the interaction samples.
|Shape||Mean Chamfer Distance ()|
|Over whole surface||Inside interaction area|
|Ours||Naive||Simple Mesh||Ours||Naive||Simple Mesh|
In this section, we qualitatively and quantitatively evaluate our proposed 3DNS in various 3D objects and under several different surface edits. For additional results and visualizations, please refer to the Supplementary Material section below. Also, the source code and a demo video can be found at the paper’s webpage (https://pettza.github.io/3DNS/).
For our experiments we have created a small dataset of six 3D shapes, see Fig. 3. For four of them, we start from a mesh representation: The Stanford Bunny, which comes from [turk_zippered] and the frog, pumpkin, and bust, which come from TurboSquid 111https://www.turbosquid.com
. The other two shapes are a sphere and a torus which are usually provided by 3D modeling software as starting shapes. We start from an analytical representation of these shapes. Specifically, we use a sphere with a radius of 0.6 and a torus with a major radius of 0.45 and a minor radius of 0.25. For our implementation, we use PyTorch[pytorch]. As a pre-processing step, we normalize the coordinates of the four meshes of our dataset by translating and scaling them uniformly so that they lie inside , where is a positive number. The latter parameter is used so that there is space around the models where we can edit them. We set
to 0.15. We train networks to represent the shapes of the dataset by sampling them uniformly. The architecture of the networks is SIREN with 2 hidden layers and 128 neurons each, and weight normalization (except in Section5.4). We use 120000 samples for the loss of equation 4 and another 120000 for the losses of equations 5 and 6 and train for iterations.
5.2 Performance Metric
For quantitative comparisons, we use the Chamfer distance, which is a common metric used for geometric tasks. If , are two point clouds, their Chamfer distance is defined as follows:
5.3 Brush Parameters
We demonstrate how the brush parameters allow the user to control the edit in Figure 5. Specifically, we use different radii and intensities on the same interaction point on the sphere. For each row, the intensity is the same and is shown on the left. Similarly, for each column, the radius is the same and is shown on the top.
5.4 Weight Normalization Ablation Study
In order to prove the improvements provided by weight normalization, we train six models for the shapes we presented above in the first case with weight normalization and in the second case without. The Chamfer distance between a point cloud sampled from the ground truth models and one sampled from the trained network by our sampling algorithm is given in Table 1. It can be seen that, even though not by a large margin, the models trained with weight normalization perform better in every case.
5.5 PDF Estimation
We want to study the uniformness of the stationary distribution of our sampling algorithm. We do this by estimating the pdf over the surface. Firstly, we create a triangle mesh of the surface using the Marching Cubes algorithm [marching_cubes]. We, then, generate samples with our algorithm for iterations, with samples per iteration. In effect, we are simulating Markov chains. For each triangle of the mesh, we count the number of samples that are closest to it. The estimated mean value of the pdf over the triangle is then:
where is the area of the triangle.
We visualize the estimated pdf for the bunny and the sphere with the face colors in Figure 6. In the same figure, we show histograms of the pdf estimations, as well. The value of the uniform pdf, which is equal to the inverse of the surface area, is shown in the histograms with a dashed vertical red line. We can see that the histograms are centered tightly around this value, indicating that the stationary distributions of our sampling process are quite uniform. For comparison, we give the corresponding results for the naive sampling outlined in Section 4.1, in the same Figure. Here, we notice the bright areas which are sampled more frequently than the rest of the surface, as well as the longer tails of the histograms and the fact that they are off-centered.
5.6 Mesh Editing Comparison
We compare our proposed method with the editing of meshes which is by far the most popular representation for 3D modeling and sculpting and, also, the naive approach of using only interaction samples. We edit a mesh by changing the positions of the vertices that lie inside the interaction area (the sphere that was used to discard samples). We follow the process described in Section 4.3, the only difference being that, instead of projecting samples from the tangent plane onto the surface, we compute a vertex’s corresponding position on that plane as the intersection of a ray starting from the vertex with direction along its normal. In order to have a fair comparison, we use a mesh with approximately the same size as the network. We begin with a high-resolution mesh as the ground truth (this is the mesh the network was trained on for the four meshes and one constructed via Marching Cubes [marching_cubes] for the sphere and torus) and, following [nglod], use quadratic decimation [mesh_decimation] to get the smallest mesh with size larger or equal to the network’s size. Afterward, we perform the same ten edits on these three representations. Each edit is performed on the unedited models. We compute the mean Chamfer distance of the network and the simplified mesh to the ground truth mesh, over the whole surface, as well as only inside the interaction areas. We set the brush’s radius to 0.08 and its intensity to 0.06 for all the edits. The results are summarized in Table 2, where it is shown that our method outperforms the other approaches. We, also, provide an example in Figure 7.
6 Limitations and Future Work
Despite the successes of neural SDFs and the results we present above, there are also drawbacks. One problem with this way of editing is that the shape cannot easily be edited outside the bounding box where the eikonal loss has been applied because there the network does not approximate an SDF. Also, since the neural network function is smooth it is difficult to model hard edges and corners using a neural SDF. However, there is research on neural SDFs that use auxiliary data structures to represent the surface locally [deepls, instant-ngp, nglod] that can be potentially utilized to address these shortcomings, which is a direction we aim to pursue. Furthermore, our framework could in the future be extended in editing neural scene representations such as NeRFs [mildenhall2020nerf].
We have presented 3DNS, a method for editing neural SDF representations in an interactive fashion inspired by existing 3D software. We believe that research towards the editability of these representations can help render them viable for more applications, either scientific or artistic in nature. Nevertheless, in order to compete with the existing tools and capabilities of meshes, more research is required.
Appendix A Other Brush Types
In the main paper (Section 4.2) we used a single brush template, which was based on a quintic polynomial. Here, we describe a more general approach.
a.1 Smoothstep Functions
In computer graphics, the need to transition smoothly from one real number to another arises very frequently. For this purpose, various functions are used, which take the value for , the value for , and go from to in the interval in an continuously differentiable increasing manner, with vanishing derivatives at and . In graphics, such a function is usually called smoothstep. Refer to Inigo Quilez’s site [quilez_smoothstep] for a presentation of some smoothstep functions.
a.2 Radially Symmetric Brushes
If is a smoothstep function, then we can define a brush template that is radially symmetric, as follows:
Any function from [quilez_smoothstep] can be used. The brush we use for our experiments is of this type. Besides smoothsteps, any function defined over that has value at and a maximum value of can be used in the equation above.
a.3 Vector Brushes
A possible extension we can make to our brush formulation is to allow the brush template function to take vector values instead of just scalars. Such an extension would allow to create brushes that twists the surface locally around the interaction point.
Appendix B Additional PDF Estimation Results
In the main paper, we provided PDF Estimation results only for the bunny and the sphere in the main paper (Section 5.5). In Figure 8, we provide them for all the shapes in the dataset. Here, as well, it is clear the our proposed algorithm produces a more uniform distribution than the naive approach.
Appendix C Additional Surface Editing Comparisons
We also run the surface editing comparison experiment for five shapes from the ShapeNet dataset [shapenet] shown in Figure 9. Since the ShapeNet meshes do not have adequately high resolution to capture the desired edits accurately and in order to have a fair comparison between the approaches, we consider as ground truth a high resolution mesh constructed by Marching Cubes [marching_cubes]. This is similar to the process that we adopted for the Sphere and Torus shapes in the experimental comparisons of the main paper. We follow the same protocol of experimental comparisons as in Section 5.6 of the main paper and the results are reported in Table 3. We observe that, once again, our method outperforms the compared approaches consistently for all shapes and with respect to both metrics used (i.e. Mean Chamfer Distance over the whole surface and inside the interaction area).
|Shape||Mean Chamfer Distance ()|
|Over whole surface||Inside interaction area|
|Ours||Naive||Simple Mesh||Ours||Naive||Simple Mesh|