GenDR: A Generalized Differentiable Renderer

04/29/2022
by   Felix Petersen, et al.
0

In this work, we present and study a generalized family of differentiable renderers. We discuss from scratch which components are necessary for differentiable rendering and formalize the requirements for each component. We instantiate our general differentiable renderer, which generalizes existing differentiable renderers like SoftRas and DIB-R, with an array of different smoothing distributions to cover a large spectrum of reasonable settings. We evaluate an array of differentiable renderer instantiations on the popular ShapeNet 3D reconstruction benchmark and analyze the implications of our results. Surprisingly, the simple uniform distribution yields the best overall results when averaged over 13 classes; in general, however, the optimal choice of distribution heavily depends on the task.

READ FULL TEXT VIEW PDF

page 5

page 6

page 7

page 8

page 14

page 15

page 16

page 17

03/02/2022

Differentiable IFS Fractals

I present my explorations in rendering Iterated Function System (IFS) fr...
11/17/2020

Fisher Information of a Family of Generalized Normal Distributions

In this brief note we compute the Fisher information of a family of gene...
05/28/2019

Differentiable Sorting using Optimal Transport:The Sinkhorn CDF and Quantile Operator

Sorting an array is a fundamental routine in machine learning, one that ...
10/18/2021

Differentiable Rendering with Perturbed Optimizers

Reasoning about 3D scenes from their 2D image projections is one of the ...
06/10/2019

Differentiable Surface Splatting for Point-based Geometry Processing

We propose Differentiable Surface Splatting (DSS), a high-fidelity diffe...
07/30/2021

Shadow Art Revisited: A Differentiable Rendering Based Approach

While recent learning based methods have been observed to be superior fo...
03/03/2022

Learning Selection Bias and Group Importance: Differentiable Reparameterization for the Hypergeometric Distribution

Partitioning a set of elements into a given number of groups of a priori...

1 Introduction

In the past years, many differentiable renderers have been published. These include the seminal differentiable mesh renderer OpenDR [Loper2014], the Neural 3D Mesh Renderer [Kato2017], and SoftRas [Liu2019-SoftRas]

among many others. Using a differentiable renderer enables a multitude of computer vision applications, such as human pose estimation 

[bogo2016keep], camera intrinsics estimation [Palazzi2019], 3D shape optimization [Kato2017], 3D reconstruction [Kato2017, Liu2019-SoftRas, Chen2019DIB], and 3D style transfer [Kato2017].

A fundamental difference between different classes of differentiable renderers is the choice of the underlying 3D representation. In this work, we focus on differentiable 3D mesh renderers [Loper2014, Kato2017, Liu2019-SoftRas, Chen2019DIB]; however, the aspects that we investigate could also be applied to other differentiable rendering concepts, such as rendering voxels [Yan2016], point clouds [insafutdinov18pointclouds], surfels [Yifan2019-DiffSurfaceSplatting], signed distance functions [Jiang2020-SDFDiff], and other implicit representations [liu2019learning, Sitzmann2019SceneRepresentations].

Differentiable mesh renderers can be constructed in different ways: either using an exact and hard renderer with approximate surrogate gradients or using an approximate renderer with natural gradients. Loper et al[Loper2014] and Kato et al[Kato2017] produce approximate surrogate gradients for their differentiable renderer, while their forward rendering is hard. In contrast, other differentiable renderers approximate the forward rendering in such a way that they produce a natural gradient. This can be achieved by modeling or approximating a renderer under a probabilistic perturbation, which is continuous and makes the renderer differentiable. For that, Rhodin et al[rhodin2015versatile]

model it with a Gaussian distribution, while Liu 

et al[Liu2019-SoftRas] model it with the square root of a logistic distribution, Petersen et al[petersen2021learning] use a logistic distribution, and Chen et al[Chen2019DIB]

use the exponential distribution. While this variational interpretation of perturbing by a respective distribution is not stressed in some of these papers 

[Liu2019-SoftRas, Chen2019DIB], we believe it is important because it explicitly allows comparing the characteristics of the differentiable renderers. Moreover, the methods that only approximate gradients can also be seen as approximately modelling a perturbation: the gradient computed for the Neural 3D Mesh Renderer [Kato2017] is approximately a perturbation by a uniform distribution. Note that, here, the solutions for rendering under perturbations are obtained analytically in closed-form without sampling.

In this work, we introduce a generalized differentiable renderer (GenDR). By choosing an appropriate probability distribution, we can (at least approximately) recover the above differentiable mesh renderers, which shows that a core distinguishing aspect of differentiable renderers is the type of distributions that they model. The choice of probability distribution herein is directly linked to the sigmoid (i.e., S-shaped) function used for the rasterization. For example, a Heaviside sigmoid function corresponding to the Dirac delta distribution yields a conventional non-differentiable renderer, while a logistic sigmoid function of squared distances corresponds to the square root of a logistic distribution. Herein, the sigmoid function is the cumulative distribution function (CDF) of the corresponding distribution. In this work, we select and present an array of distributions and examine their theoretical properties.

Another aspect of approximate differentiable renderers is their aggregation function, i.e., the function that aggregates the occupancy probabilities of all faces for each pixel. Existing differentiable renderers commonly aggregate the probabilities via the probabilistic sum ( or ), which corresponds to the probability that at least one face covers the pixel assuming that probabilities for each triangle are stochastically independent (cf. Eq. 4 in [Liu2019-SoftRas] or Eq. 6 in [Chen2019DIB]). In the field of real-valued logics and adjacent fields, this is well-known as a T-conorm, a relaxed form of the logical ‘or’. Two examples of other T-conorms are the maximum T-conorm  and the Einstein sum , which models the relativistic addition of velocities. We generalize our differentiable renderer to use any continuous T-conorm and present a variety of suitable T-conorms.

In total, the set of resulting concrete instances arising from our generalized differentiable renderer and the proposed choices amounts to

concrete differentiable renderers. We extensively benchmark all of them on a shape optimization task and a camera pose estimation task. Further, we evaluate the best performing and most interesting instances on the popular ShapeNet 

[Chang2015ShapeNet] 13 class single-view 3D reconstruction experiment [Kato2017]. Here, we also include those instances that approximate other existing differentiable renderers. We note that we do not introduce a new shading technique in this paper, and rely on existing blended shaders instead.

We summarize our contributions as follows:

  • We propose a generalized differentiable mesh renderer.

  • We identify existing differentiable renderers (approximately) as instances of our generalized renderer.

  • We propose a variety of suitable sigmoid functions and T-conorms and group them by their characteristics.

  • We extensively benchmark concrete differentiable renderers, analyze which characteristics and families of functions lead to a good performance, and find that the best choice heavily depends on the task, class, or characteristics of the data.

2 Related Work

The related work can be classified into those works that present differentiable renderers and those which apply them, although there is naturally also a significant overlap. For additional details on differentiable rendering approaches, cf. the survey by Kato 

et al[kato2020differentiable].

Analytical Differentiable Renderers.

The first large category of differentiable renderers are those which produce approximate gradients in an analytical and sampling-free way. This can either happen by surrogate gradients during backpropagation, as in

[Kato2017], or by making the forward computation naturally differentiable by perturbing the distances between pixels and triangles analytically in closed-form [Liu2018, Chen2019DIB, petersen2019pix2vex]. Our work falls into this category and is of the second case. Existing works each present their renderer for a specific distribution or sigmoid function. We formally characterize the necessary functions to a differentiable renderer and present an array of options.

Monte-Carlo Differentiable Renderers. An alternative to analytical differentiable renderers are those which are based on Monte-Carlo sampling techniques. The first example for this is the “redner” path tracer by Li et al[Li2018], who use edge sampling to approximate the gradients of their renderer. Loubet et al[Loubet2019ReparameterizingRendering] build on these ideas and reparameterize the involved discontinuous integrands yielding improved gradient estimates. Zhang et al[zhang2020path] extend these ideas by differentiating the full path integrals which makes the method more efficient and effective. Lidec et al[lidec2021differentiable] approach Monte-Carlo differentiable rendering by estimating the gradients of a differentiable renderer via the perturbed optimizers method [berthet2020learning].

Applications. Popular applications for differentiable renderers are pose [Loper2014, Kato2017, Liu2019-SoftRas, Chen2019DIB, Palazzi2019, lidec2021differentiable, ravi2020accelerating], shape [Kato2017, zhang2020path, petersen2019pix2vex, ravi2020accelerating], material [liu2017material, shi2020match], texture [Liu2019-SoftRas, Chen2019DIB, Loubet2019ReparameterizingRendering], and lighting [zhang2020path]

estimation. Here, the parameters of an initial scene are optimized to match the scene in a reference image or a set of reference images. Another interesting application is single-view 3D shape prediction without 3D supervision. Here, a neural network predicts a 3D representation from a single image, and the rendering of the image is compared to the original input image. This learning process is primarily guided by supervision of the object silhouette. It is possible to omit this supervision via adversarial style transfer 

[petersen2021style]. Other applications are generating new 3D shapes that match a data set [Henzler2018, Henderson_2020_CVPR] as well as adversarial examples in the real world [liu2019beyond].

In our experiments, we use optimization for pose and shape to benchmark all proposed differentiable renderer combinations. As the single-view 3D mesh reconstruction is a complex experiment requiring training a neural network, we benchmark our method on this task only for a selected subset of differentiable renderers.

T-norms and T-conorms. T-norms and T-conorms (triangular norms and conorms) are binary functions that generalize the logical conjunction (‘and’) and disjunction (‘or’), respectively, to real-valued logics or probability spaces [klement2013triangular, van2022analyzing]. A generalization of ‘or’ is necessary in a differentiable renderer to aggregate the occlusion caused by faces. The existing analytical differentiable renderers all use the probabilistic T-conorm.

3 Generalized Differentiable Renderer

In this section, we present our generalized differentiable mesh renderer. With a differentiable renderer, we refer to a renderer that is continuous everywhere and differentiable almost everywhere (a.e.). Note that, in this context, continuity is a stricter criterion than differentiable a.e. because formally (i) conventional renderers are already differentiable a.e. (which does not mean that they can provide any meaningful gradients), and (ii) almost all existing “differentiable” renderers are not actually differentiable everywhere.

Let us start by introducing how a classic hard rendering algorithm operates. The first step is to bring all objects into image space, which is typically a sequence of affine transformations followed by the camera projection. This step is already differentiable. The second step is the rasterization: For each pixel, we need to compute the set of faces (typically triangles) which cover it. If the pixel is covered by at least one face, the face that is closest to the camera is displayed.

3.1 Differentiable Occlusion Test

To make the test whether a pixel  is occluded by a face  differentiable, we start by computing the signed Euclidean distance  between pixel and face boundary. By convention, pixels inside the triangle have a positive distance, pixels outside the triangle a negative distance. For pixels exactly on the boundary, the distance to the face is .

For a hard occlusion test, we would just check whether  is non-negative. In a differentiable renderer, we instead introduce a perturbation in the form of a probability distribution with density  together with a temperature or scale parameter . We then evaluate the probability that the perturbed distance  is non-negative, where  is distributed according to . Thus, we compute the probability that  occludes as

(1)

where  is the CDF of the distribution and thus yields a closed-form solution for the desired probability (provided that has a closed-form solution or can be appropriately approximated). In a differentiable renderer, we require  being continuous. Typically, has the S-shape of a sigmoid function, see Table 3.2. Therefore, we refer to CDFs as sigmoid functions in this paper.

Most existing differentiable renderers use sigmoid functions or transformations thereof, see Section 4, to softly evaluate whether a pixel lies inside a triangle. This accords to the probabilistic interpretation in Equation (1) where the probability distribution is defined via the sigmoid function used in each case. Here, the logistic sigmoid function is a popular choice of such a sigmoid function. Note that, recently, it has frequently been referred to as “the” sigmoid in the literature, which is not to be confused with the original and more general terminology.

Example 1 (Logistic Sigmoid).

is the logistic sigmoid function, which corresponds to the logistic distribution.

3.2 Aggregation

The second step to be made differentiable is the aggregation of multiple faces. While this is conventionally done via a logical ‘or’, the differentiable real-valued counterpart is a T-conorm. T-conorms are formally defined as follows.

Definition 2 (T-conorm).

A T-conorm is a binary operation , which satisfies

  • associativity: ,

  • commutativity: ,

  • monotonicity: ,

  • is a neutral element .

Remark 3 (T-conorms and T-norms).

While T-conorms  are the real-valued equivalents of the logical ‘or’, so-called T-norms  are the real-valued equivalents of the logical ‘and’. Certain T-conorms and T-norms are dual in the sense that one can derive one from the other using a complement (typically ) and De Morgan’s laws ().

Let us proceed by stating the T-conorm which is used in all applicable previous approximate differentiable renderers with natural gradients.

Example 4 (Probabilistic Sum).

The probabilistic sum is a T-conorm that corresponds to the probability that at least one out of two independent events occurs. It is defined as

(2)

An alternative to this is the Einstein sum, which is based on the relativistic addition of velocities.

Example 5 (Einstein Sum).

The Einstein sum is a T-conorm that corresponds to the velocity addition under special relativity:

(3)

Combining the above concepts, we can compute the occupancy or coverage of a pixel given a set of faces as

(4)

squared just tree/.style= for tree= edge path= {edge}](!u.parentanchor)--+(0,-5pt)-|(.childanchor){edgelabel};},align=center,if={(isodd(n_children))&&(n_children>2)}{forchildren={if={equal(n,((n_children("!u"))+1)/2)}{calignwithcurrent}{},}}{},ifnchildren=0{beforepacking={tier=terminus}}{},},}}{minipage}{0.2375}Taxonomyofprobabilitydistributionscorrespondingtosigmoidfunctions.Thesubdivisionsarechosenwrt.~propertiesthathaveacategoricallydifferentinfluenceonthebehaviorofthecorrespondingrenderer.Theorderofsplitswhengoingdowninthetree(whichcouldbechosendifferently,e.g.,symmetric/asymmetriccouldbethefirstsplit)reflectstheimportanceoftheproperties.}fig:sigmoid-taxonomy}minipage} {minipage}{0.725} } {!}{{justtree}{%leftjustifications,squaredjusttree,}[TaxonomyofDistributions[FiniteSupport[Exact[DiracDelta
(Heaviside)]][Continuous[Uniform
CubicHermite
WignerSemicircle]]][InfiniteSupport[Symmetrical[ExponentialConv.[Gaussian
Laplace
Logistic
Hyperbolicsecant
] & & & & & Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Laplace & & & & & Logistic & Logistic (squares) & Hyperbolic secant & Cauchy & Cauchy (squares) & Reciprocal & & & & & Gumbel-Max & Gumbel-Min & Exponential & Exponential (Rev.) & Levy & Levy (Rev.) & & & & & Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$) (Rev.) & Gamma ($p=1$) (Rev.) & Gamma ($p=2$) (Rev.) & & & & & Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$) (Rev.) & Gamma ($p=1$) (Rev.) & Gamma ($p=2$) (Rev.) (squares) & (squares) & (squares) & (squares) & (squares) & (squares)

Figure 1: {
Figure 2: Visualization of a selection of sigmoid functions, which are the CDFs of probability distributions. For each distribution, we display a single rendered triangle to demonstrate their different effects.

3.3 Shading

The coloring of faces is handled via the Phong model or any other shading model, which is already differentiable. In the literature, Chen~et al. ~[Chen2019DIB] compare different choices. Finally, to aggregate the coloring of each pixel depending on the distance of the face to the camera (depth), there are two popular choices in the literature: no depth perturbations and taking the closest triangle (like [Loper2014, Kato2017, Chen2019DIB]) and Gumbel depth perturbations (like [Liu2019-SoftRas, petersen2019pix2vex]). Only the latter choice is truly continuous, and the closed-form solution for Gumbel depth perturbations is the well known $softmin$. As there are (i) no closed-form solutions for adequate alternatives to Gumbel perturbations in the literature, and (ii) these two options have been extensively studied in the literature~[lidec2021differentiable, Loper2014, Kato2017, Chen2019DIB, Liu2019-SoftRas, petersen2019pix2vex], in this work we do not modify this component and focus on the differentiable silhouette computation and aggregation. While we implement both options in GenDR, in our evaluation, we perform all experiments agnostic to the choice of shading aggregation as the experiments rely solely on the silhouette.

4 Instantiations of the GenDR

Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).

T-conorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $/ $~wrt.~$p$
(Logical `or') & $$ & & (✗) & (✗) & --- & (✓) & --- & --- & ---
Maximum & $^M$ & & ✓& ✗& ✗& ✓& ✗& ✗& ---
Probabilistic & $^P$ & $=^H_1 =^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Einstein & $^E$ & $=^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Hamacher & $^H_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Frank & $^F_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Yager & $^Y_p$ & $p(0, )$ & ✓& ✗& ✗& ✗& ✓& ✓& $$
Aczél-Alsina & $^A_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Dombi & $^D_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Schweizer-Sklar & $^SS_p$& $p(-, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Table 1: Overview over a selection of suitable T-conorms, which we also benchmark.
Figure 3: Plot of four selected T-conorms. From left to right: Maximum, Probabilistic, Einstein, and Yager (w/ $p=2$). While (b) and (c) are smooth, the Yager T-conorm (d) is non-smooth, it plateaus and the value is constant outside the unit circle.
Figure 4: Visual comparison of different instances of GenDR. In each image, moving from left to right increases the temperature or scale~$τ$ of the distribution. Left: we use a logistic distribution to perturb the triangles and use different T-norms for aggregation (top to bottom: $^M, ^P, ^Y_2, ^A_0.5$). Right: for the two first rows, we use a uniform distribution and use $^Y_2$ and $

^A_0.5$. For the last two rows, we use a Cauchy distribution and use $

^P$ and $^Y_2$.

Distributions. Figure~LABEL:fig:sigmoid-taxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, non-differentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$xx / (2 + 2|x|) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The Gumbel-Max and Gumbel-Min are extreme value distributions~[Coles2001] and two-sided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are one-sided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.

SoftRas~[Liu2019-SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as

(5)

That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.

Aggregations. Table~1 provides an overview over selected T-conorms and displays their properties. The logical `or' is not a T-conorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous T-conorms such as the drastic T-cornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager T-conorms are continuously differentiable.

The top four rows in Table~1 contain individual T-conorms, and the remainder are families of T-conorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the T-conorms coincide, e.g, $^P=^H_1 =^A_1$. A discussion of the remaining properties and a mathematical definition of each T-conorm can be found in SM~C. Figure~3 displays some of the T-conorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.

Renderer & Distribution & T-conorm
OpenDR~[Loper2014] & Uniform (backward) & ---
N3MR~[Kato2017] & Uniform (backward) & ---
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $^P$
SoftRas~[Kato2017] & Square-root of Logistic & $^P$
Log.~Relax~[petersen2021learning] & Logistic & $^P$
DIB-R~[Chen2019DIB] & Exponential & $^P$
Table 2: Differentiable renderers that are (approximately) special cases of GenDR. OpenDR and N3MR do not use a specific T-conorm as their forward computation is hard.

Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.

5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.

5.1 Shape Optimization

Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.

Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ{10^-0.1n | nN, 0n80}$. Here, $λ= 10^-1.5

0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${-60^

,-30^,0^,30^,60^}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.

Figure 5: Results for the 24-view airplane shape optimization task. The optimization is done within a tight budget of 100 steps and the metric is the loss, i.e., lower (=yellow) is better. Top: original set of distributions~$F$. Bottom: the respective counter-parts~$F_sq$ in the same location. The marginal histograms display participation in the top $10%$ combinations.
Figure 6: Results for the tea pot camera pose optimization task. The optimization is done with a temperature $τ$ that is scheduled to decay. The metric is fraction of camera poses recovered, while the initialization angle errors are uniformly sampled from $[15^, 75^]$. The figure shows the original set of distributions~$F$, the plot for the respective~$F_sq$ can be found in SM~D.

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.

Comparison of Distributions.

Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive. Confer Table~

3.2 where it becomes clear that this is because Gumbel-Max, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.

Comparison of T-conorms. We find that $^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $^Y_4$, $^A_2$, $^A_4$, $^D_2$, $^D_4$, $^SS_-2$, and $^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.

Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $^Y_2$ and $^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Figure 7: Single-view reconstruction results for each of the $30$ selected renderers as a 3D IoU (in %) heatmap for each class. While the uniform distribution (w/ $^P$) performs best on average and the square root of logistic (w/ $^P, ^E$) performs second-best on average, the optimal setting depends on the characteristics of the respective classes. For the `Airplane' class, the Gamma distribution performed best and this is also the distribution that performed best in the airplane shape optimization experiment in Section~5.1. For classes of furniture with legs, such as `Bench', `Chair', and `Table', we find that the Gaussian distribution consistently performs best. The pairs of similar classes `Display'+`Phone', `Dresser'+`Speaker', and `Vessel'+`Rifle' also show a similar performance patterns. For example, dresser and speakers tend to be cuboid, while rifles and vessels tend to be rather long and slim. Considering the Guassian distribution, it is interesting to see that for some classes $^P$ and $^E$ perform better, while for other classes $^Y_2$ performs much better.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612
Probabilistic + Uniform~~($$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130
Probabilistic + Gaussian~~($$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149
Table 3: Selected single-view reconstruction results measured in 3D IoU.

5.2 Camera Pose Optimization

In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^, 75^]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.

Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.

5.3 Single-View 3D Reconstruction

Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($^P, ^E, ^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.

Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.

6 Discussion and Conclusion

In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.

Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).

Appendix A Implementation Details

For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $310^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.

Using different learning rates (as an ablation) did not improve the results.

Appendix B Distributions

In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.

Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$. Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions. The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5). Therefore, in the following, we will define the distributions via their CDFs~$F$.


#1

(6)

#1

(7)

#1

(8)

where $y:=(x+1)/2$.


#1

(9)

#1

(10)

#1

(11)

#1

(12)

#1

(13)

#1

(14)

#1

(15)

#1

(16)

#1

(17)

#1

(18)

#1

(19)

where $Φ$ is the CDF of the standard normal distribution.


#1

(20)

where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.

& & & & &
& & & & &
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.)
& & & & &
& & & & &
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.)
& & & & &
& & & & &
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min
& & & & &
& & & & &
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.)
& & & & &
& & & & &
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.)
& & & & &
& & & & &
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.)
Table 4: Visualization of CDFs (top) and PDFs (bottom) for different distributions.

~

~

Appendix C T-Norms and T-Conorms

The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.

Definition 6 (T-norm).

A T-norm (triangular norm) is a binary operation~$: [0,1] ×[0,1] [0,1]$, which satisfies

  • associativity: $(a, (b,c)) = ((a,b), c)$,

  • commutativity: $(a,b) = (b,a)$,

  • monotonicity: $(a c) (b d) ⇒⊤(a,b) ≤⊤(c,d)$,

  • $1$ is a neutral element: $(a,1) = a$.

Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) 1$, $(0,1) 0$, $(1,0) 0$. From one of the latter two and monotonicity it follows $(0,0) 0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.

Minimum & $^M(a,b)$ & $min(a,b)$
Probabilistic & $^P(a,b)$ & $ab$
Einstein & $^E(a,b)$ & $ab2-a-b+ab$
Hamacher & $^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$
Frank & $^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$
Yager & $^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $(a^p+b^p-1)^1p$
Table 5: (Families of) T-norms.

In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$$ it is $(a,b) ≤min(a,b)$ and for any T-conorm~$$ it is $(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $^(a,b) = 0$ for $(a,b) (0,1)^2$, and the drastic T-conorm, defined as $^(a,b) = 1$ for $(a,b) (0,1)^2$. Hence it is $(a,b) ≥⊤^(a,b)$ for any T-norm~$$ and $(a,b) ≤⊥^(a,b)$ for any T-conorm~$$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.

As already mentioned in the paper, it is common to combine a T-norm~$$, a T-conorm~$$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].

c.1 T-conorm Plots

Figures~8 and~9 display the considered set of T-conorms.

Maximum & $^M(a,b)$ & $max(a,b)$
Probabilistic & $^P(a,b)$ & $a+b-ab$
Einstein & $^E(a,b)$ & $^H_2(a,b)=a+b1+ab$
Hamacher & $^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$
Frank & $^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$
Yager & $^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$
Table 6: (Families of) T-conorms.
Figure 8: T-conorm plots (1/2). Note that `Average' is not a T-cornom and just included for reference. Also, Note how `Probabilistic' is equal to `Hamacher $p=1$' and `Einstein' is equal to `Hamacher $p=2$'.
Figure 9: T-conorm plots (2/2).

Appendix D Additional Plots

See Figures 10 and 11.

Figure 10: Results for the tea pot camera pose optimization task for the respective square-root distribution~$F_sq$.
Figure 11: Shape optimization (left) and camera pose optimization (right) applied to a model of a chair. Top: set of original distributions $F$. Bottom: set of the respective square-root distributions $F_sq$

4 Instantiations of the GenDR

Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).

T-conorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $/ $~wrt.~$p$
(Logical `or') & $$ & & (✗) & (✗) & --- & (✓) & --- & --- & ---
Maximum & $^M$ & & ✓& ✗& ✗& ✓& ✗& ✗& ---
Probabilistic & $^P$ & $=^H_1 =^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Einstein & $^E$ & $=^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Hamacher & $^H_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Frank & $^F_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Yager & $^Y_p$ & $p(0, )$ & ✓& ✗& ✗& ✗& ✓& ✓& $$
Aczél-Alsina & $^A_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Dombi & $^D_p$ & $p(0, )$ & ✓& ✓& ✓& ✗& ✗& ✓& $$
Schweizer-Sklar & $^SS_p$& $p(-, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓& ---
Table 1: Overview over a selection of suitable T-conorms, which we also benchmark.
Figure 3: Plot of four selected T-conorms. From left to right: Maximum, Probabilistic, Einstein, and Yager (w/ $p=2$). While (b) and (c) are smooth, the Yager T-conorm (d) is non-smooth, it plateaus and the value is constant outside the unit circle.
Figure 4: Visual comparison of different instances of GenDR. In each image, moving from left to right increases the temperature or scale~$τ$ of the distribution. Left: we use a logistic distribution to perturb the triangles and use different T-norms for aggregation (top to bottom: $^M, ^P, ^Y_2, ^A_0.5$). Right: for the two first rows, we use a uniform distribution and use $^Y_2$ and $

^A_0.5$. For the last two rows, we use a Cauchy distribution and use $

^P$ and $^Y_2$.

Distributions. Figure~LABEL:fig:sigmoid-taxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, non-differentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$xx / (2 + 2|x|) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The Gumbel-Max and Gumbel-Min are extreme value distributions~[Coles2001] and two-sided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are one-sided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.

SoftRas~[Liu2019-SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as

(5)

That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.

Aggregations. Table~1 provides an overview over selected T-conorms and displays their properties. The logical `or' is not a T-conorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous T-conorms such as the drastic T-cornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager T-conorms are continuously differentiable.

The top four rows in Table~1 contain individual T-conorms, and the remainder are families of T-conorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the T-conorms coincide, e.g, $^P=^H_1 =^A_1$. A discussion of the remaining properties and a mathematical definition of each T-conorm can be found in SM~C. Figure~3 displays some of the T-conorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.

Renderer & Distribution & T-conorm
OpenDR~[Loper2014] & Uniform (backward) & ---
N3MR~[Kato2017] & Uniform (backward) & ---
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $^P$
SoftRas~[Kato2017] & Square-root of Logistic & $^P$
Log.~Relax~[petersen2021learning] & Logistic & $^P$
DIB-R~[Chen2019DIB] & Exponential & $^P$
Table 2: Differentiable renderers that are (approximately) special cases of GenDR. OpenDR and N3MR do not use a specific T-conorm as their forward computation is hard.

Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.

5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.

5.1 Shape Optimization

Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.

Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ{10^-0.1n | nN, 0n80}$. Here, $λ= 10^-1.5

0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${-60^

,-30^,0^,30^,60^}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.

Figure 5: Results for the 24-view airplane shape optimization task. The optimization is done within a tight budget of 100 steps and the metric is the loss, i.e., lower (=yellow) is better. Top: original set of distributions~$F$. Bottom: the respective counter-parts~$F_sq$ in the same location. The marginal histograms display participation in the top $10%$ combinations.
Figure 6: Results for the tea pot camera pose optimization task. The optimization is done with a temperature $τ$ that is scheduled to decay. The metric is fraction of camera poses recovered, while the initialization angle errors are uniformly sampled from $[15^, 75^]$. The figure shows the original set of distributions~$F$, the plot for the respective~$F_sq$ can be found in SM~D.

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.

Comparison of Distributions.

Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive. Confer Table~

3.2 where it becomes clear that this is because Gumbel-Max, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.

Comparison of T-conorms. We find that $^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $^Y_4$, $^A_2$, $^A_4$, $^D_2$, $^D_4$, $^SS_-2$, and $^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.

Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $^Y_2$ and $^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Figure 7: Single-view reconstruction results for each of the $30$ selected renderers as a 3D IoU (in %) heatmap for each class. While the uniform distribution (w/ $^P$) performs best on average and the square root of logistic (w/ $^P, ^E$) performs second-best on average, the optimal setting depends on the characteristics of the respective classes. For the `Airplane' class, the Gamma distribution performed best and this is also the distribution that performed best in the airplane shape optimization experiment in Section~5.1. For classes of furniture with legs, such as `Bench', `Chair', and `Table', we find that the Gaussian distribution consistently performs best. The pairs of similar classes `Display'+`Phone', `Dresser'+`Speaker', and `Vessel'+`Rifle' also show a similar performance patterns. For example, dresser and speakers tend to be cuboid, while rifles and vessels tend to be rather long and slim. Considering the Guassian distribution, it is interesting to see that for some classes $^P$ and $^E$ perform better, while for other classes $^Y_2$ performs much better.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612
Probabilistic + Uniform~~($$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130
Probabilistic + Gaussian~~($$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149
Table 3: Selected single-view reconstruction results measured in 3D IoU.

5.2 Camera Pose Optimization

In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^, 75^]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.

Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.

5.3 Single-View 3D Reconstruction

Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($^P, ^E, ^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.

Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.

6 Discussion and Conclusion

In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.

Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).

Appendix A Implementation Details

For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $310^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.

Using different learning rates (as an ablation) did not improve the results.

Appendix B Distributions

In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.

Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$. Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions. The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5). Therefore, in the following, we will define the distributions via their CDFs~$F$.


#1

(6)

#1

(7)

#1

(8)

where $y:=(x+1)/2$.


#1

(9)

#1

(10)

#1

(11)

#1

(12)

#1

(13)

#1

(14)

#1

(15)

#1

(16)

#1

(17)

#1

(18)

#1

(19)

where $Φ$ is the CDF of the standard normal distribution.


#1

(20)

where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.

& & & & &
& & & & &
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.)
& & & & &
& & & & &
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.)
& & & & &
& & & & &
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min
& & & & &
& & & & &
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.)
& & & & &
& & & & &
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.)
& & & & &
& & & & &
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.)
Table 4: Visualization of CDFs (top) and PDFs (bottom) for different distributions.

~

~

Appendix C T-Norms and T-Conorms

The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.

Definition 6 (T-norm).

A T-norm (triangular norm) is a binary operation~$: [0,1] ×[0,1] [0,1]$, which satisfies

  • associativity: $(a, (b,c)) = ((a,b), c)$,

  • commutativity: $(a,b) = (b,a)$,

  • monotonicity: $(a c) (b d) ⇒⊤(a,b) ≤⊤(c,d)$,

  • $1$ is a neutral element: $(a,1) = a$.

Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) 1$, $(0,1) 0$, $(1,0) 0$. From one of the latter two and monotonicity it follows $(0,0) 0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.

Minimum & $^M(a,b)$ & $min(a,b)$
Probabilistic & $^P(a,b)$ & $ab$
Einstein & $^E(a,b)$ & $ab2-a-b+ab$
Hamacher & $^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$
Frank & $^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$
Yager & $^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $(a^p+b^p-1)^1p$
Table 5: (Families of) T-norms.

In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$$ it is $(a,b) ≤min(a,b)$ and for any T-conorm~$$ it is $(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $^(a,b) = 0$ for $(a,b) (0,1)^2$, and the drastic T-conorm, defined as $^(a,b) = 1$ for $(a,b) (0,1)^2$. Hence it is $(a,b) ≥⊤^(a,b)$ for any T-norm~$$ and $(a,b) ≤⊥^(a,b)$ for any T-conorm~$$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.

As already mentioned in the paper, it is common to combine a T-norm~$$, a T-conorm~$$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].

c.1 T-conorm Plots

Figures~8 and~9 display the considered set of T-conorms.

Maximum & $^M(a,b)$ & $max(a,b)$
Probabilistic & $^P(a,b)$ & $a+b-ab$
Einstein & $^E(a,b)$ & $^H_2(a,b)=a+b1+ab$
Hamacher & $^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$
Frank & $^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$
Yager & $^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$
Table 6: (Families of) T-conorms.
Figure 8: T-conorm plots (1/2). Note that `Average' is not a T-cornom and just included for reference. Also, Note how `Probabilistic' is equal to `Hamacher $p=1$' and `Einstein' is equal to `Hamacher $p=2$'.
Figure 9: T-conorm plots (2/2).

Appendix D Additional Plots

See Figures 10 and 11.

Figure 10: Results for the tea pot camera pose optimization task for the respective square-root distribution~$F_sq$.
Figure 11: Shape optimization (left) and camera pose optimization (right) applied to a model of a chair. Top: set of original distributions $F$. Bottom: set of the respective square-root distributions $F_sq$

5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.

5.1 Shape Optimization

Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.

Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ{10^-0.1n | nN, 0n80}$. Here, $λ= 10^-1.5

0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${-60^

,-30^,0^,30^,60^}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.

Figure 5: Results for the 24-view airplane shape optimization task. The optimization is done within a tight budget of 100 steps and the metric is the loss, i.e., lower (=yellow) is better. Top: original set of distributions~$F$. Bottom: the respective counter-parts~$F_sq$ in the same location. The marginal histograms display participation in the top $10%$ combinations.
Figure 6: Results for the tea pot camera pose optimization task. The optimization is done with a temperature $τ$ that is scheduled to decay. The metric is fraction of camera poses recovered, while the initialization angle errors are uniformly sampled from $[15^, 75^]$. The figure shows the original set of distributions~$F$, the plot for the respective~$F_sq$ can be found in SM~D.

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.

Comparison of Distributions.

Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive. Confer Table~

3.2 where it becomes clear that this is because Gumbel-Max, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.

Comparison of T-conorms. We find that $^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $^Y_4$, $^A_2$, $^A_4$, $^D_2$, $^D_4$, $^SS_-2$, and $^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.

Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $^Y_2$ and $^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Figure 7: Single-view reconstruction results for each of the $30$ selected renderers as a 3D IoU (in %) heatmap for each class. While the uniform distribution (w/ $^P$) performs best on average and the square root of logistic (w/ $^P, ^E$) performs second-best on average, the optimal setting depends on the characteristics of the respective classes. For the `Airplane' class, the Gamma distribution performed best and this is also the distribution that performed best in the airplane shape optimization experiment in Section~5.1. For classes of furniture with legs, such as `Bench', `Chair', and `Table', we find that the Gaussian distribution consistently performs best. The pairs of similar classes `Display'+`Phone', `Dresser'+`Speaker', and `Vessel'+`Rifle' also show a similar performance patterns. For example, dresser and speakers tend to be cuboid, while rifles and vessels tend to be rather long and slim. Considering the Guassian distribution, it is interesting to see that for some classes $^P$ and $^E$ perform better, while for other classes $^Y_2$ performs much better.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612
Probabilistic + Uniform~~($$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130
Probabilistic + Gaussian~~($$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149
Table 3: Selected single-view reconstruction results measured in 3D IoU.

5.2 Camera Pose Optimization

In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^, 75^]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.

Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.

5.3 Single-View 3D Reconstruction

Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($^P, ^E, ^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.

Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.

6 Discussion and Conclusion

In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.

Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).

Appendix A Implementation Details

For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $310^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.

Using different learning rates (as an ablation) did not improve the results.

Appendix B Distributions

In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.

Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$. Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions. The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5). Therefore, in the following, we will define the distributions via their CDFs~$F$.


#1

(6)

#1

(7)

#1

(8)

where $y:=(x+1)/2$.


#1

(9)

#1

(10)

#1

(11)

#1

(12)

#1

(13)

#1

(14)

#1

(15)

#1

(16)

#1

(17)

#1

(18)

#1

(19)

where $Φ$ is the CDF of the standard normal distribution.


#1

(20)

where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.

& & & & &
& & & & &
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.)
& & & & &
& & & & &
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.)
& & & & &
& & & & &
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min
& & & & &
& & & & &
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.)
& & & & &
& & & & &
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.)
& & & & &
& & & & &
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.)
Table 4: Visualization of CDFs (top) and PDFs (bottom) for different distributions.

~

~

Appendix C T-Norms and T-Conorms

The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.

Definition 6 (T-norm).

A T-norm (triangular norm) is a binary operation~$: [0,1] ×[0,1] [0,1]$, which satisfies

  • associativity: $(a, (b,c)) = ((a,b), c)$,

  • commutativity: $(a,b) = (b,a)$,

  • monotonicity: $(a c) (b d) ⇒⊤(a,b) ≤⊤(c,d)$,

  • $1$ is a neutral element: $(a,1) = a$.

Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) 1$, $(0,1) 0$, $(1,0) 0$. From one of the latter two and monotonicity it follows $(0,0) 0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.

Minimum & $^M(a,b)$ & $min(a,b)$
Probabilistic & $^P(a,b)$ & $ab$
Einstein & $^E(a,b)$ & $ab2-a-b+ab$
Hamacher & $^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$
Frank & $^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$
Yager & $^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $(a^p+b^p-1)^1p$
Table 5: (Families of) T-norms.

In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$$ it is $(a,b) ≤min(a,b)$ and for any T-conorm~$$ it is $(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $^(a,b) = 0$ for $(a,b) (0,1)^2$, and the drastic T-conorm, defined as $^(a,b) = 1$ for $(a,b) (0,1)^2$. Hence it is $(a,b) ≥⊤^(a,b)$ for any T-norm~$$ and $(a,b) ≤⊥^(a,b)$ for any T-conorm~$$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.

As already mentioned in the paper, it is common to combine a T-norm~$$, a T-conorm~$$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].

c.1 T-conorm Plots

Figures~8 and~9 display the considered set of T-conorms.

Maximum & $^M(a,b)$ & $max(a,b)$
Probabilistic & $^P(a,b)$ & $a+b-ab$
Einstein & $^E(a,b)$ & $^H_2(a,b)=a+b1+ab$
Hamacher & $^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$
Frank & $^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$
Yager & $^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$
Table 6: (Families of) T-conorms.
Figure 8: T-conorm plots (1/2). Note that `Average' is not a T-cornom and just included for reference. Also, Note how `Probabilistic' is equal to `Hamacher $p=1$' and `Einstein' is equal to `Hamacher $p=2$'.
Figure 9: T-conorm plots (2/2).

Appendix D Additional Plots

See Figures 10 and 11.

Figure 10: Results for the tea pot camera pose optimization task for the respective square-root distribution~$F_sq$.
Figure 11: Shape optimization (left) and camera pose optimization (right) applied to a model of a chair. Top: set of original distributions $F$. Bottom: set of the respective square-root distributions $F_sq$

6 Discussion and Conclusion

In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.

Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).

Appendix A Implementation Details

For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $310^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.

Using different learning rates (as an ablation) did not improve the results.

Appendix B Distributions

In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.

Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$. Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions. The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5). Therefore, in the following, we will define the distributions via their CDFs~$F$.


#1

(6)

#1

(7)

#1

(8)

where $y:=(x+1)/2$.


#1

(9)

#1

(10)

#1

(11)

#1

(12)

#1

(13)

#1

(14)

#1

(15)

#1

(16)

#1

(17)

#1

(18)

#1

(19)

where $Φ$ is the CDF of the standard normal distribution.


#1

(20)

where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.

& & & & &
& & & & &
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.)
& & & & &
& & & & &
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.)
& & & & &
& & & & &
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min
& & & & &
& & & & &
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.)
& & & & &
& & & & &
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.)
& & & & &
& & & & &
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.)
Table 4: Visualization of CDFs (top) and PDFs (bottom) for different distributions.

~

~

Appendix C T-Norms and T-Conorms

The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.

Definition 6 (T-norm).

A T-norm (triangular norm) is a binary operation~$: [0,1] ×[0,1] [0,1]$, which satisfies

  • associativity: $(a, (b,c)) = ((a,b), c)$,

  • commutativity: $(a,b) = (b,a)$,

  • monotonicity: $(a c) (b d) ⇒⊤(a,b) ≤⊤(c,d)$,

  • $1$ is a neutral element: $(a,1) = a$.

Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) 1$, $(0,1) 0$, $(1,0) 0$. From one of the latter two and monotonicity it follows $(0,0) 0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.

Minimum & $^M(a,b)$ & $min(a,b)$
Probabilistic & $^P(a,b)$ & $ab$
Einstein & $^E(a,b)$ & $ab2-a-b+ab$
Hamacher & $^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$
Frank & $^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$
Yager & $^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $(a^p+b^p-1)^1p$
Table 5: (Families of) T-norms.

In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$$ it is $(a,b) ≤min(a,b)$ and for any T-conorm~$$ it is $(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $^(a,b) = 0$ for $(a,b) (0,1)^2$, and the drastic T-conorm, defined as $^(a,b) = 1$ for $(a,b) (0,1)^2$. Hence it is $(a,b) ≥⊤^(a,b)$ for any T-norm~$$ and $(a,b) ≤⊥^(a,b)$ for any T-conorm~$$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.

As already mentioned in the paper, it is common to combine a T-norm~$$, a T-conorm~$$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].

c.1 T-conorm Plots

Figures~8 and~9 display the considered set of T-conorms.

Maximum & $^M(a,b)$ & $max(a,b)$
Probabilistic & $^P(a,b)$ & $a+b-ab$
Einstein & $^E(a,b)$ & $^H_2(a,b)=a+b1+ab$
Hamacher & $^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$
Frank & $^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$
Yager & $^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$
Table 6: (Families of) T-conorms.
Figure 8: T-conorm plots (1/2). Note that `Average' is not a T-cornom and just included for reference. Also, Note how `Probabilistic' is equal to `Hamacher $p=1$' and `Einstein' is equal to `Hamacher $p=2$'.
Figure 9: T-conorm plots (2/2).

Appendix D Additional Plots

See Figures 10 and 11.

Figure 10: Results for the tea pot camera pose optimization task for the respective square-root distribution~$F_sq$.
Figure 11: Shape optimization (left) and camera pose optimization (right) applied to a model of a chair. Top: set of original distributions $F$. Bottom: set of the respective square-root distributions $F_sq$

Appendix A Implementation Details

For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $310^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.

Using different learning rates (as an ablation) did not improve the results.

Appendix B Distributions

In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.

Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$. Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions. The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5). Therefore, in the following, we will define the distributions via their CDFs~$F$.


#1

(6)

#1

(7)

#1

(8)

where $y:=(x+1)/2$.


#1

(9)

#1

(10)

#1

(11)

#1

(12)

#1

(13)

#1

(14)

#1

(15)

#1

(16)

#1

(17)

#1

(18)

#1

(19)

where $Φ$ is the CDF of the standard normal distribution.


#1

(20)

where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.

& & & & &
& & & & &
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.)
& & & & &
& & & & &
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.)
& & & & &
& & & & &
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min
& & & & &
& & & & &
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.)
& & & & &
& & & & &
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.)
& & & & &
& & & & &
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.)
Table 4: Visualization of CDFs (top) and PDFs (bottom) for different distributions.

~

~

Appendix C T-Norms and T-Conorms

The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.

Definition 6 (T-norm).

A T-norm (triangular norm) is a binary operation~$: [0,1] ×[0,1] [0,1]$, which satisfies

  • associativity: $(a, (b,c)) = ((a,b), c)$,

  • commutativity: $(a,b) = (b,a)$,

  • monotonicity: $(a c) (b d) ⇒⊤(a,b) ≤⊤(c,d)$,

  • $1$ is a neutral element: $(a,1) = a$.

Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) 1$, $(0,1) 0$, $(1,0) 0$. From one of the latter two and monotonicity it follows $(0,0) 0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.

Minimum & $^M(a,b)$ & $min(a,b)$
Probabilistic & $^P(a,b)$ & $ab$
Einstein & $^E(a,b)$ & $ab2-a-b+ab$
Hamacher & $^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$
Frank & $^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$
Yager & $^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $(a^p+b^p-1)^1p$
Table 5: (Families of) T-norms.

In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$$ it is $(a,b) ≤min(a,b)$ and for any T-conorm~$$ it is $(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $^(a,b) = 0$ for $(a,b) (0,1)^2$, and the drastic T-conorm, defined as $^(a,b) = 1$ for $(a,b) (0,1)^2$. Hence it is $(a,b) ≥⊤^(a,b)$ for any T-norm~$$ and $(a,b) ≤⊥^(a,b)$ for any T-conorm~$$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.

As already mentioned in the paper, it is common to combine a T-norm~$$, a T-conorm~$$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].

c.1 T-conorm Plots

Figures~8 and~9 display the considered set of T-conorms.

Maximum & $^M(a,b)$ & $max(a,b)$
Probabilistic & $^P(a,b)$ & $a+b-ab$
Einstein & $^E(a,b)$ & $^H_2(a,b)=a+b1+ab$
Hamacher & $^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$
Frank & $^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$
Yager & $^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$
Aczél-Alsina & $^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$
Dombi & $^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$
Schweizer-Sklar & $^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$
Table 6: (Families of) T-conorms.
Figure 8: T-conorm plots (1/2). Note that `Average' is not a T-cornom and just included for reference. Also, Note how `Probabilistic' is equal to `Hamacher $p=1$' and `Einstein' is equal to `Hamacher $p=2$'.
Figure 9: T-conorm plots (2/2).

Appendix D Additional Plots

See Figures 10 and 11.

Figure 10: Results for the tea pot camera pose optimization task for the respective square-root distribution~$F_sq$.
Figure 11: Shape optimization (left) and camera pose optimization (right) applied to a model of a chair. Top: set of original distributions $F$. Bottom: set of the respective square-root distributions $F_sq$