1 Introduction
In the past years, many differentiable renderers have been published. These include the seminal differentiable mesh renderer OpenDR [Loper2014], the Neural 3D Mesh Renderer [Kato2017], and SoftRas [Liu2019SoftRas]
among many others. Using a differentiable renderer enables a multitude of computer vision applications, such as human pose estimation
[bogo2016keep], camera intrinsics estimation [Palazzi2019], 3D shape optimization [Kato2017], 3D reconstruction [Kato2017, Liu2019SoftRas, Chen2019DIB], and 3D style transfer [Kato2017].A fundamental difference between different classes of differentiable renderers is the choice of the underlying 3D representation. In this work, we focus on differentiable 3D mesh renderers [Loper2014, Kato2017, Liu2019SoftRas, Chen2019DIB]; however, the aspects that we investigate could also be applied to other differentiable rendering concepts, such as rendering voxels [Yan2016], point clouds [insafutdinov18pointclouds], surfels [Yifan2019DiffSurfaceSplatting], signed distance functions [Jiang2020SDFDiff], and other implicit representations [liu2019learning, Sitzmann2019SceneRepresentations].
Differentiable mesh renderers can be constructed in different ways: either using an exact and hard renderer with approximate surrogate gradients or using an approximate renderer with natural gradients. Loper et al. [Loper2014] and Kato et al. [Kato2017] produce approximate surrogate gradients for their differentiable renderer, while their forward rendering is hard. In contrast, other differentiable renderers approximate the forward rendering in such a way that they produce a natural gradient. This can be achieved by modeling or approximating a renderer under a probabilistic perturbation, which is continuous and makes the renderer differentiable. For that, Rhodin et al. [rhodin2015versatile]
model it with a Gaussian distribution, while Liu
et al. [Liu2019SoftRas] model it with the square root of a logistic distribution, Petersen et al. [petersen2021learning] use a logistic distribution, and Chen et al. [Chen2019DIB]use the exponential distribution. While this variational interpretation of perturbing by a respective distribution is not stressed in some of these papers
[Liu2019SoftRas, Chen2019DIB], we believe it is important because it explicitly allows comparing the characteristics of the differentiable renderers. Moreover, the methods that only approximate gradients can also be seen as approximately modelling a perturbation: the gradient computed for the Neural 3D Mesh Renderer [Kato2017] is approximately a perturbation by a uniform distribution. Note that, here, the solutions for rendering under perturbations are obtained analytically in closedform without sampling.In this work, we introduce a generalized differentiable renderer (GenDR). By choosing an appropriate probability distribution, we can (at least approximately) recover the above differentiable mesh renderers, which shows that a core distinguishing aspect of differentiable renderers is the type of distributions that they model. The choice of probability distribution herein is directly linked to the sigmoid (i.e., Sshaped) function used for the rasterization. For example, a Heaviside sigmoid function corresponding to the Dirac delta distribution yields a conventional nondifferentiable renderer, while a logistic sigmoid function of squared distances corresponds to the square root of a logistic distribution. Herein, the sigmoid function is the cumulative distribution function (CDF) of the corresponding distribution. In this work, we select and present an array of distributions and examine their theoretical properties.
Another aspect of approximate differentiable renderers is their aggregation function, i.e., the function that aggregates the occupancy probabilities of all faces for each pixel. Existing differentiable renderers commonly aggregate the probabilities via the probabilistic sum ( or ), which corresponds to the probability that at least one face covers the pixel assuming that probabilities for each triangle are stochastically independent (cf. Eq. 4 in [Liu2019SoftRas] or Eq. 6 in [Chen2019DIB]). In the field of realvalued logics and adjacent fields, this is wellknown as a Tconorm, a relaxed form of the logical ‘or’. Two examples of other Tconorms are the maximum Tconorm and the Einstein sum , which models the relativistic addition of velocities. We generalize our differentiable renderer to use any continuous Tconorm and present a variety of suitable Tconorms.
In total, the set of resulting concrete instances arising from our generalized differentiable renderer and the proposed choices amounts to
concrete differentiable renderers. We extensively benchmark all of them on a shape optimization task and a camera pose estimation task. Further, we evaluate the best performing and most interesting instances on the popular ShapeNet
[Chang2015ShapeNet] 13 class singleview 3D reconstruction experiment [Kato2017]. Here, we also include those instances that approximate other existing differentiable renderers. We note that we do not introduce a new shading technique in this paper, and rely on existing blended shaders instead.We summarize our contributions as follows:

We propose a generalized differentiable mesh renderer.

We identify existing differentiable renderers (approximately) as instances of our generalized renderer.

We propose a variety of suitable sigmoid functions and Tconorms and group them by their characteristics.

We extensively benchmark concrete differentiable renderers, analyze which characteristics and families of functions lead to a good performance, and find that the best choice heavily depends on the task, class, or characteristics of the data.
2 Related Work
The related work can be classified into those works that present differentiable renderers and those which apply them, although there is naturally also a significant overlap. For additional details on differentiable rendering approaches, cf. the survey by Kato
et al. [kato2020differentiable].Analytical Differentiable Renderers.
The first large category of differentiable renderers are those which produce approximate gradients in an analytical and samplingfree way. This can either happen by surrogate gradients during backpropagation, as in
[Kato2017], or by making the forward computation naturally differentiable by perturbing the distances between pixels and triangles analytically in closedform [Liu2018, Chen2019DIB, petersen2019pix2vex]. Our work falls into this category and is of the second case. Existing works each present their renderer for a specific distribution or sigmoid function. We formally characterize the necessary functions to a differentiable renderer and present an array of options.MonteCarlo Differentiable Renderers. An alternative to analytical differentiable renderers are those which are based on MonteCarlo sampling techniques. The first example for this is the “redner” path tracer by Li et al. [Li2018], who use edge sampling to approximate the gradients of their renderer. Loubet et al. [Loubet2019ReparameterizingRendering] build on these ideas and reparameterize the involved discontinuous integrands yielding improved gradient estimates. Zhang et al. [zhang2020path] extend these ideas by differentiating the full path integrals which makes the method more efficient and effective. Lidec et al. [lidec2021differentiable] approach MonteCarlo differentiable rendering by estimating the gradients of a differentiable renderer via the perturbed optimizers method [berthet2020learning].
Applications. Popular applications for differentiable renderers are pose [Loper2014, Kato2017, Liu2019SoftRas, Chen2019DIB, Palazzi2019, lidec2021differentiable, ravi2020accelerating], shape [Kato2017, zhang2020path, petersen2019pix2vex, ravi2020accelerating], material [liu2017material, shi2020match], texture [Liu2019SoftRas, Chen2019DIB, Loubet2019ReparameterizingRendering], and lighting [zhang2020path]
estimation. Here, the parameters of an initial scene are optimized to match the scene in a reference image or a set of reference images. Another interesting application is singleview 3D shape prediction without 3D supervision. Here, a neural network predicts a 3D representation from a single image, and the rendering of the image is compared to the original input image. This learning process is primarily guided by supervision of the object silhouette. It is possible to omit this supervision via adversarial style transfer
[petersen2021style]. Other applications are generating new 3D shapes that match a data set [Henzler2018, Henderson_2020_CVPR] as well as adversarial examples in the real world [liu2019beyond].In our experiments, we use optimization for pose and shape to benchmark all proposed differentiable renderer combinations. As the singleview 3D mesh reconstruction is a complex experiment requiring training a neural network, we benchmark our method on this task only for a selected subset of differentiable renderers.
Tnorms and Tconorms. Tnorms and Tconorms (triangular norms and conorms) are binary functions that generalize the logical conjunction (‘and’) and disjunction (‘or’), respectively, to realvalued logics or probability spaces [klement2013triangular, van2022analyzing]. A generalization of ‘or’ is necessary in a differentiable renderer to aggregate the occlusion caused by faces. The existing analytical differentiable renderers all use the probabilistic Tconorm.
3 Generalized Differentiable Renderer
In this section, we present our generalized differentiable mesh renderer. With a differentiable renderer, we refer to a renderer that is continuous everywhere and differentiable almost everywhere (a.e.). Note that, in this context, continuity is a stricter criterion than differentiable a.e. because formally (i) conventional renderers are already differentiable a.e. (which does not mean that they can provide any meaningful gradients), and (ii) almost all existing “differentiable” renderers are not actually differentiable everywhere.
Let us start by introducing how a classic hard rendering algorithm operates. The first step is to bring all objects into image space, which is typically a sequence of affine transformations followed by the camera projection. This step is already differentiable. The second step is the rasterization: For each pixel, we need to compute the set of faces (typically triangles) which cover it. If the pixel is covered by at least one face, the face that is closest to the camera is displayed.
3.1 Differentiable Occlusion Test
To make the test whether a pixel is occluded by a face differentiable, we start by computing the signed Euclidean distance between pixel and face boundary. By convention, pixels inside the triangle have a positive distance, pixels outside the triangle a negative distance. For pixels exactly on the boundary, the distance to the face is .
For a hard occlusion test, we would just check whether is nonnegative. In a differentiable renderer, we instead introduce a perturbation in the form of a probability distribution with density together with a temperature or scale parameter . We then evaluate the probability that the perturbed distance is nonnegative, where is distributed according to . Thus, we compute the probability that occludes as
(1)  
where is the CDF of the distribution and thus yields a closedform solution for the desired probability (provided that has a closedform solution or can be appropriately approximated). In a differentiable renderer, we require being continuous. Typically, has the Sshape of a sigmoid function, see Table 3.2. Therefore, we refer to CDFs as sigmoid functions in this paper.
Most existing differentiable renderers use sigmoid functions or transformations thereof, see Section 4, to softly evaluate whether a pixel lies inside a triangle. This accords to the probabilistic interpretation in Equation (1) where the probability distribution is defined via the sigmoid function used in each case. Here, the logistic sigmoid function is a popular choice of such a sigmoid function. Note that, recently, it has frequently been referred to as “the” sigmoid in the literature, which is not to be confused with the original and more general terminology.
Example 1 (Logistic Sigmoid).
is the logistic sigmoid function, which corresponds to the logistic distribution.
3.2 Aggregation
The second step to be made differentiable is the aggregation of multiple faces. While this is conventionally done via a logical ‘or’, the differentiable realvalued counterpart is a Tconorm. Tconorms are formally defined as follows.
Definition 2 (Tconorm).
A Tconorm is a binary operation , which satisfies

associativity: ,

commutativity: ,

monotonicity: ,

is a neutral element .
Remark 3 (Tconorms and Tnorms).
While Tconorms are the realvalued equivalents of the logical ‘or’, socalled Tnorms are the realvalued equivalents of the logical ‘and’. Certain Tconorms and Tnorms are dual in the sense that one can derive one from the other using a complement (typically ) and De Morgan’s laws ().
Let us proceed by stating the Tconorm which is used in all applicable previous approximate differentiable renderers with natural gradients.
Example 4 (Probabilistic Sum).
The probabilistic sum is a Tconorm that corresponds to the probability that at least one out of two independent events occurs. It is defined as
(2) 
An alternative to this is the Einstein sum, which is based on the relativistic addition of velocities.
Example 5 (Einstein Sum).
The Einstein sum is a Tconorm that corresponds to the velocity addition under special relativity:
(3) 
Combining the above concepts, we can compute the occupancy or coverage of a pixel given a set of faces as
(4) 
squared just tree/.style=
for tree=
edge path=
{edge}](!u.parentanchor)+(0,5pt)(.childanchor){edgelabel};},align=center,if={(isodd(n_children))&&(n_children>2)}{forchildren={if={equal(n,((n_children("!u"))+1)/2)}{calignwithcurrent}{},}}{},ifnchildren=0{beforepacking={tier=terminus}}{},},}}{minipage}{0.2375}Taxonomyofprobabilitydistributionscorrespondingtosigmoidfunctions.Thesubdivisionsarechosenwrt.~propertiesthathaveacategoricallydifferentinfluenceonthebehaviorofthecorrespondingrenderer.Theorderofsplitswhengoingdowninthetree(whichcouldbechosendifferently,e.g.,symmetric/asymmetriccouldbethefirstsplit)reflectstheimportanceoftheproperties.}fig:sigmoidtaxonomy}minipage} {minipage}{0.725}
}
{!}{{justtree}{%leftjustifications,squaredjusttree,}[TaxonomyofDistributions[FiniteSupport[Exact[DiracDelta
(Heaviside)]][Continuous[Uniform
CubicHermite
WignerSemicircle]]][InfiniteSupport[Symmetrical[ExponentialConv.[Gaussian
Laplace
Logistic
Hyperbolicsecant]
&
&
&
&
&
Heaviside &
Uniform &
Cubic Hermite &
Wigner Semicircle &
Gaussian &
Laplace
&
&
&
&
&
Logistic &
Logistic (squares) &
Hyperbolic secant &
Cauchy &
Cauchy (squares) &
Reciprocal
&
&
&
&
&
GumbelMax &
GumbelMin &
Exponential &
Exponential (Rev.) &
Levy &
Levy (Rev.)
&
&
&
&
&
Gamma ($p=0.5$) &
Gamma ($p=1$) &
Gamma ($p=2$) &
Gamma ($p=.5$) (Rev.) &
Gamma ($p=1$) (Rev.) &
Gamma ($p=2$) (Rev.)
&
&
&
&
&
Gamma ($p=0.5$) &
Gamma ($p=1$) &
Gamma ($p=2$) &
Gamma ($p=.5$) (Rev.) &
Gamma ($p=1$) (Rev.) &
Gamma ($p=2$) (Rev.)
(squares) &
(squares) &
(squares) &
(squares) &
(squares) &
(squares)
3.3 Shading
The coloring of faces is handled via the Phong model or any other shading model, which is already differentiable. In the literature, Chen~et al. ~[Chen2019DIB] compare different choices. Finally, to aggregate the coloring of each pixel depending on the distance of the face to the camera (depth), there are two popular choices in the literature: no depth perturbations and taking the closest triangle (like [Loper2014, Kato2017, Chen2019DIB]) and Gumbel depth perturbations (like [Liu2019SoftRas, petersen2019pix2vex]). Only the latter choice is truly continuous, and the closedform solution for Gumbel depth perturbations is the well known $softmin$. As there are (i) no closedform solutions for adequate alternatives to Gumbel perturbations in the literature, and (ii) these two options have been extensively studied in the literature~[lidec2021differentiable, Loper2014, Kato2017, Chen2019DIB, Liu2019SoftRas, petersen2019pix2vex], in this work we do not modify this component and focus on the differentiable silhouette computation and aggregation. While we implement both options in GenDR, in our evaluation, we perform all experiments agnostic to the choice of shading aggregation as the experiments rely solely on the silhouette.
4 Instantiations of the GenDR
Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).
Tconorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $↑/ ↓$~wrt.~$p$ 
(Logical `or') & $∨$ & & (✗) & (✗) &  & (✓) &  &  &  
Maximum & $⊥^M$ & & ✓& ✗& ✗& ✓& ✗& ✗&  
Probabilistic & $⊥^P$ & $=⊥^H_1 =⊥^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓&  
Einstein & $⊥^E$ & $=⊥^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓&  
Hamacher & $⊥^H_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ 
Frank & $⊥^F_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ 
Yager & $⊥^Y_p$ & $p∈(0, ∞)$ & ✓& ✗& ✗& ✗& ✓& ✓& $↑$ 
AczélAlsina & $⊥^A_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ 
Dombi & $⊥^D_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ 
SchweizerSklar & $⊥^SS_p$& $p∈(∞, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓&  
^A_0.5$. For the last two rows, we use a Cauchy distribution and use $
⊥^P$ and $⊥^Y_2$.Distributions. Figure~LABEL:fig:sigmoidtaxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, nondifferentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$x↦x / (2 + 2x) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The GumbelMax and GumbelMin are extreme value distributions~[Coles2001] and twosided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are onesided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.
SoftRas~[Liu2019SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as
(5) 
That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.
Aggregations. Table~1 provides an overview over selected Tconorms and displays their properties. The logical `or' is not a Tconorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous Tconorms such as the drastic Tcornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager Tconorms are continuously differentiable.
The top four rows in Table~1 contain individual Tconorms, and the remainder are families of Tconorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the Tconorms coincide, e.g, $⊥^P=⊥^H_1 =⊥^A_1$. A discussion of the remaining properties and a mathematical definition of each Tconorm can be found in SM~C. Figure~3 displays some of the Tconorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.
Renderer & Distribution & Tconorm 
OpenDR~[Loper2014] & Uniform (backward) &  
N3MR~[Kato2017] & Uniform (backward) &  
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $⊥^P$ 
SoftRas~[Kato2017] & Squareroot of Logistic & $⊥^P$ 
Log.~Relax~[petersen2021learning] & Logistic & $⊥^P$ 
DIBR~[Chen2019DIB] & Exponential & $⊥^P$ 
Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.
5 Experiments^{1}^{1}1The source code will be available at github.com/FelixPetersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ∈{10^1.25, 10^1.5, 10^1.75}$) and temperatures $τ∈{10^0.1⋅n  n∈N, 0≤n≤80}$. Here, $λ= 10^1.5≈
0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a finegrained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${60^
∘,30^∘,0^∘,30^∘,60^∘}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$13$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $47$) performs well. The block of linearly decaying distributions (columns $89$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions.
Gumbel, exponential, and gamma distributions do not all perform equally well, but GumbelMin, the reversed exponential, and the reversed gamma are all competitive. Confer Table~
3.2 where it becomes clear that this is because GumbelMax, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.Comparison of Tconorms. We find that $⊥^M$ and ``average'' (which is not a Tconorm but was used as a baseline in~[Liu2019SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_2$, and $⊥^SS_4$ perform poorly overall. This can be explained as they are rather extreme members of their respective Tnorm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager Tconorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and Tconorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of Tconorm). Among the Tconorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean 
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 
Liu et al. [Liu2019SoftRas] SoftRas (Squareroot of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 
Chen~et al. ~[Chen2019DIB] DIBR (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 
Probabilistic + Logistic (squares)~~($=$[Liu2019SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^1$ logarithmically to a final value of~$σ=10^7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 SingleView 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet singleview 3D reconstruction experiment from~[Kato2017, Liu2019SoftRas, Chen2019DIB, petersen2021learning]. We select three Tconorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), GumbelMin, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, GumbelMin, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the classwise results for all $30$ selected renderers. In Table~3, we show the (self) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different Tconorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (ProjectID 390829875) and the SFB Transregio 161 (ProjectID 251654672), and the Land Salzburg within the WISS 2025 project IDALab (20102F1901166KZP and 20204WISS/225/1972019).
Appendix A Implementation Details
For the singleview 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019SoftRas]. We use the same model architecture~[Liu2019SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^4$ for the first $150 000$ steps and use a learning rate of $3⋅10^5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1F(x)$, which means that $F_Rev.= F$ for symmetric distributions.
The squareroot distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) 
#1
(7) 
#1
(8) 
where $y:=(x+1)/2$.
#1
(9) 
#1
(10) 
#1
(11) 
#1
(12) 
#1
(13) 
#1
(14) 
#1
(15) 
#1
(16) 
#1
(17) 
#1
(18) 
#1
(20) 
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
& & & & & 
& & & & & 
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) 
& & & & & 
& & & & & 
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) 
& & & & & 
& & & & & 
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & GumbelMax & GumbelMin 
& & & & & 
& & & & & 
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) 
~
~
Appendix C TNorms and TConorms
The axiomatic approach to multivalued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multivalued generalizations of the disjunction (logical ``or''), called Tconorms, in Definition~2. Here we complement this with the axioms for multivalued generalizations of the conjunction (logical ``and''), which are called Tnorms.
Definition 6 (Tnorm).
A Tnorm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies

associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,

commutativity: $⊤(a,b) = ⊤(b,a)$,

monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,

$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of Tconorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) Tnorms (as well as Tconorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ 
Probabilistic & $⊤^P(a,b)$ & $ab$ 
Einstein & $⊤^E(a,b)$ & $ab2ab+ab$ 
Hamacher & $⊤^H_p(a,b)$ & $abp+(1p)(a+bab)$ 
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a1)(p^b1)p1)$ 
Yager & $⊤^Y_p(a,b)$ & $max(0, 1((1a)^p+(1b)^p)^1p)$ 
AczélAlsina & $⊤^A_p(a,b)$ & $exp((log(a)^p+log(b)^p )^1p)$ 
Dombi & $⊤^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊤^S_p(a,b)$ & $(a^p+b^p1)^1p$ 
In the theory of multivalued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible Tnorm is the minimum and the smallest possible Tconorm is the maximum: for any Tnorm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any Tconorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible Tnorm and the largest possible Tconorm are the socalled drastic Tnorm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic Tconorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any Tnorm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any Tconorm~$⊥$. We do not consider the drastic Tconorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a Tnorm~$⊤$, a Tconorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of Tnorms and Tconorms, respectively, where matching lines together with the standard negation $N(a) = 1a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 Tconorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ 
Probabilistic & $⊥^P(a,b)$ & $a+bab$ 
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ 
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p2)ab1+(p1)ab$ 
Frank & $⊥^F_p(a,b)$ & $1log_p(1+(p^1a1)(p^1b1)p1)$ 
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ 
AczélAlsina & $⊥^A_p(a,b)$ & $1 exp((log(1a)^p+log(1b)^p )^1p)$ 
Dombi & $⊥^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊥^S_p(a,b)$ & $1((1a)^p+(1b)^p1)^1p$ 
Appendix D Additional Plots
4 Instantiations of the GenDR
Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).
Tconorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $↑/ ↓$~wrt.~$p$ 
(Logical `or') & $∨$ & & (✗) & (✗) &  & (✓) &  &  &  
Maximum & $⊥^M$ & & ✓& ✗& ✗& ✓& ✗& ✗&  
Probabilistic & $⊥^P$ & $=⊥^H_1 =⊥^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓&  
Einstein & $⊥^E$ & $=⊥^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓&  
Hamacher & $⊥^H_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ 
Frank & $⊥^F_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ 
Yager & $⊥^Y_p$ & $p∈(0, ∞)$ & ✓& ✗& ✗& ✗& ✓& ✓& $↑$ 
AczélAlsina & $⊥^A_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ 
Dombi & $⊥^D_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ 
SchweizerSklar & $⊥^SS_p$& $p∈(∞, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓&  
^A_0.5$. For the last two rows, we use a Cauchy distribution and use $
⊥^P$ and $⊥^Y_2$.Distributions. Figure~LABEL:fig:sigmoidtaxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, nondifferentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$x↦x / (2 + 2x) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The GumbelMax and GumbelMin are extreme value distributions~[Coles2001] and twosided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are onesided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.
SoftRas~[Liu2019SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as
(5) 
That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.
Aggregations. Table~1 provides an overview over selected Tconorms and displays their properties. The logical `or' is not a Tconorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous Tconorms such as the drastic Tcornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager Tconorms are continuously differentiable.
The top four rows in Table~1 contain individual Tconorms, and the remainder are families of Tconorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the Tconorms coincide, e.g, $⊥^P=⊥^H_1 =⊥^A_1$. A discussion of the remaining properties and a mathematical definition of each Tconorm can be found in SM~C. Figure~3 displays some of the Tconorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.
Renderer & Distribution & Tconorm 
OpenDR~[Loper2014] & Uniform (backward) &  
N3MR~[Kato2017] & Uniform (backward) &  
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $⊥^P$ 
SoftRas~[Kato2017] & Squareroot of Logistic & $⊥^P$ 
Log.~Relax~[petersen2021learning] & Logistic & $⊥^P$ 
DIBR~[Chen2019DIB] & Exponential & $⊥^P$ 
Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.
5 Experiments^{1}^{1}1The source code will be available at github.com/FelixPetersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ∈{10^1.25, 10^1.5, 10^1.75}$) and temperatures $τ∈{10^0.1⋅n  n∈N, 0≤n≤80}$. Here, $λ= 10^1.5≈
0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a finegrained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${60^
∘,30^∘,0^∘,30^∘,60^∘}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$13$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $47$) performs well. The block of linearly decaying distributions (columns $89$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions.
Gumbel, exponential, and gamma distributions do not all perform equally well, but GumbelMin, the reversed exponential, and the reversed gamma are all competitive. Confer Table~
3.2 where it becomes clear that this is because GumbelMax, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.Comparison of Tconorms. We find that $⊥^M$ and ``average'' (which is not a Tconorm but was used as a baseline in~[Liu2019SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_2$, and $⊥^SS_4$ perform poorly overall. This can be explained as they are rather extreme members of their respective Tnorm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager Tconorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and Tconorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of Tconorm). Among the Tconorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean 
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 
Liu et al. [Liu2019SoftRas] SoftRas (Squareroot of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 
Chen~et al. ~[Chen2019DIB] DIBR (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 
Probabilistic + Logistic (squares)~~($=$[Liu2019SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^1$ logarithmically to a final value of~$σ=10^7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 SingleView 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet singleview 3D reconstruction experiment from~[Kato2017, Liu2019SoftRas, Chen2019DIB, petersen2021learning]. We select three Tconorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), GumbelMin, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, GumbelMin, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the classwise results for all $30$ selected renderers. In Table~3, we show the (self) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different Tconorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (ProjectID 390829875) and the SFB Transregio 161 (ProjectID 251654672), and the Land Salzburg within the WISS 2025 project IDALab (20102F1901166KZP and 20204WISS/225/1972019).
Appendix A Implementation Details
For the singleview 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019SoftRas]. We use the same model architecture~[Liu2019SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^4$ for the first $150 000$ steps and use a learning rate of $3⋅10^5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1F(x)$, which means that $F_Rev.= F$ for symmetric distributions.
The squareroot distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) 
#1
(7) 
#1
(8) 
where $y:=(x+1)/2$.
#1
(9) 
#1
(10) 
#1
(11) 
#1
(12) 
#1
(13) 
#1
(14) 
#1
(15) 
#1
(16) 
#1
(17) 
#1
(18) 
#1
(20) 
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
& & & & & 
& & & & & 
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) 
& & & & & 
& & & & & 
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) 
& & & & & 
& & & & & 
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & GumbelMax & GumbelMin 
& & & & & 
& & & & & 
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) 
~
~
Appendix C TNorms and TConorms
The axiomatic approach to multivalued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multivalued generalizations of the disjunction (logical ``or''), called Tconorms, in Definition~2. Here we complement this with the axioms for multivalued generalizations of the conjunction (logical ``and''), which are called Tnorms.
Definition 6 (Tnorm).
A Tnorm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies

associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,

commutativity: $⊤(a,b) = ⊤(b,a)$,

monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,

$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of Tconorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) Tnorms (as well as Tconorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ 
Probabilistic & $⊤^P(a,b)$ & $ab$ 
Einstein & $⊤^E(a,b)$ & $ab2ab+ab$ 
Hamacher & $⊤^H_p(a,b)$ & $abp+(1p)(a+bab)$ 
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a1)(p^b1)p1)$ 
Yager & $⊤^Y_p(a,b)$ & $max(0, 1((1a)^p+(1b)^p)^1p)$ 
AczélAlsina & $⊤^A_p(a,b)$ & $exp((log(a)^p+log(b)^p )^1p)$ 
Dombi & $⊤^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊤^S_p(a,b)$ & $(a^p+b^p1)^1p$ 
In the theory of multivalued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible Tnorm is the minimum and the smallest possible Tconorm is the maximum: for any Tnorm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any Tconorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible Tnorm and the largest possible Tconorm are the socalled drastic Tnorm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic Tconorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any Tnorm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any Tconorm~$⊥$. We do not consider the drastic Tconorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a Tnorm~$⊤$, a Tconorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of Tnorms and Tconorms, respectively, where matching lines together with the standard negation $N(a) = 1a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 Tconorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ 
Probabilistic & $⊥^P(a,b)$ & $a+bab$ 
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ 
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p2)ab1+(p1)ab$ 
Frank & $⊥^F_p(a,b)$ & $1log_p(1+(p^1a1)(p^1b1)p1)$ 
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ 
AczélAlsina & $⊥^A_p(a,b)$ & $1 exp((log(1a)^p+log(1b)^p )^1p)$ 
Dombi & $⊥^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊥^S_p(a,b)$ & $1((1a)^p+(1b)^p1)^1p$ 
Appendix D Additional Plots
5 Experiments^{1}^{1}1The source code will be available at github.com/FelixPetersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup. For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$. For each setting, we perform a grid search over three learning rates ($λ∈{10^1.25, 10^1.5, 10^1.75}$) and temperatures $τ∈{10^0.1⋅n  n∈N, 0≤n≤80}$. Here, $λ= 10^1.5≈
0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a finegrained as well as large grid because this behaves differently for each distribution. Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal. We perform this evaluation from five different elevation angles ${60^
∘,30^∘,0^∘,30^∘,60^∘}$ as independent runs, and average the final results for each renderer instance. Additional results for the experiment applied to the model of a chair can be found in SM~D.Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$13$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $47$) performs well. The block of linearly decaying distributions (columns $89$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions.
Gumbel, exponential, and gamma distributions do not all perform equally well, but GumbelMin, the reversed exponential, and the reversed gamma are all competitive. Confer Table~
3.2 where it becomes clear that this is because GumbelMax, exponential and gamma have all of their mass inside the triangle, i.e., they yield smaller faces. This is problematic because in this case, it can cause gaps between neighboring triangles, which hinders optimization. As the reverse counterparts yield larger faces and do not suffer from this problem, they perform better. Note that, in this respect, the asymmetrical distributions have an advantage over the symmetrical distributions because symmetrical distributions always have an accumulated density of~$0.5$ at the edge, and thus the size of the face stays the same. We can see that, among the asymmetrical distributions, Gamma performs best.Comparison of Tconorms. We find that $⊥^M$ and ``average'' (which is not a Tconorm but was used as a baseline in~[Liu2019SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_2$, and $⊥^SS_4$ perform poorly overall. This can be explained as they are rather extreme members of their respective Tnorm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager Tconorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and Tconorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of Tconorm). Among the Tconorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.
Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean 
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 
Liu et al. [Liu2019SoftRas] SoftRas (Squareroot of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 
Chen~et al. ~[Chen2019DIB] DIBR (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 
Probabilistic + Logistic (squares)~~($=$[Liu2019SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^1$ logarithmically to a final value of~$σ=10^7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 SingleView 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet singleview 3D reconstruction experiment from~[Kato2017, Liu2019SoftRas, Chen2019DIB, petersen2021learning]. We select three Tconorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), GumbelMin, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, GumbelMin, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the classwise results for all $30$ selected renderers. In Table~3, we show the (self) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different Tconorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (ProjectID 390829875) and the SFB Transregio 161 (ProjectID 251654672), and the Land Salzburg within the WISS 2025 project IDALab (20102F1901166KZP and 20204WISS/225/1972019).
Appendix A Implementation Details
For the singleview 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019SoftRas]. We use the same model architecture~[Liu2019SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^4$ for the first $150 000$ steps and use a learning rate of $3⋅10^5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1F(x)$, which means that $F_Rev.= F$ for symmetric distributions.
The squareroot distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) 
#1
(7) 
#1
(8) 
where $y:=(x+1)/2$.
#1
(9) 
#1
(10) 
#1
(11) 
#1
(12) 
#1
(13) 
#1
(14) 
#1
(15) 
#1
(16) 
#1
(17) 
#1
(18) 
#1
(20) 
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
& & & & & 
& & & & & 
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) 
& & & & & 
& & & & & 
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) 
& & & & & 
& & & & & 
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & GumbelMax & GumbelMin 
& & & & & 
& & & & & 
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) 
~
~
Appendix C TNorms and TConorms
The axiomatic approach to multivalued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multivalued generalizations of the disjunction (logical ``or''), called Tconorms, in Definition~2. Here we complement this with the axioms for multivalued generalizations of the conjunction (logical ``and''), which are called Tnorms.
Definition 6 (Tnorm).
A Tnorm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies

associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,

commutativity: $⊤(a,b) = ⊤(b,a)$,

monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,

$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of Tconorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) Tnorms (as well as Tconorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ 
Probabilistic & $⊤^P(a,b)$ & $ab$ 
Einstein & $⊤^E(a,b)$ & $ab2ab+ab$ 
Hamacher & $⊤^H_p(a,b)$ & $abp+(1p)(a+bab)$ 
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a1)(p^b1)p1)$ 
Yager & $⊤^Y_p(a,b)$ & $max(0, 1((1a)^p+(1b)^p)^1p)$ 
AczélAlsina & $⊤^A_p(a,b)$ & $exp((log(a)^p+log(b)^p )^1p)$ 
Dombi & $⊤^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊤^S_p(a,b)$ & $(a^p+b^p1)^1p$ 
In the theory of multivalued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible Tnorm is the minimum and the smallest possible Tconorm is the maximum: for any Tnorm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any Tconorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible Tnorm and the largest possible Tconorm are the socalled drastic Tnorm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic Tconorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any Tnorm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any Tconorm~$⊥$. We do not consider the drastic Tconorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a Tnorm~$⊤$, a Tconorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of Tnorms and Tconorms, respectively, where matching lines together with the standard negation $N(a) = 1a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 Tconorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ 
Probabilistic & $⊥^P(a,b)$ & $a+bab$ 
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ 
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p2)ab1+(p1)ab$ 
Frank & $⊥^F_p(a,b)$ & $1log_p(1+(p^1a1)(p^1b1)p1)$ 
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ 
AczélAlsina & $⊥^A_p(a,b)$ & $1 exp((log(1a)^p+log(1b)^p )^1p)$ 
Dombi & $⊥^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊥^S_p(a,b)$ & $1((1a)^p+(1b)^p1)^1p$ 
Appendix D Additional Plots
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different Tconorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (ProjectID 390829875) and the SFB Transregio 161 (ProjectID 251654672), and the Land Salzburg within the WISS 2025 project IDALab (20102F1901166KZP and 20204WISS/225/1972019).
Appendix A Implementation Details
For the singleview 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019SoftRas]. We use the same model architecture~[Liu2019SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^4$ for the first $150 000$ steps and use a learning rate of $3⋅10^5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1F(x)$, which means that $F_Rev.= F$ for symmetric distributions.
The squareroot distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) 
#1
(7) 
#1
(8) 
where $y:=(x+1)/2$.
#1
(9) 
#1
(10) 
#1
(11) 
#1
(12) 
#1
(13) 
#1
(14) 
#1
(15) 
#1
(16) 
#1
(17) 
#1
(18) 
#1
(20) 
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
& & & & & 
& & & & & 
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) 
& & & & & 
& & & & & 
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) 
& & & & & 
& & & & & 
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & GumbelMax & GumbelMin 
& & & & & 
& & & & & 
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) 
~
~
Appendix C TNorms and TConorms
The axiomatic approach to multivalued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multivalued generalizations of the disjunction (logical ``or''), called Tconorms, in Definition~2. Here we complement this with the axioms for multivalued generalizations of the conjunction (logical ``and''), which are called Tnorms.
Definition 6 (Tnorm).
A Tnorm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies

associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,

commutativity: $⊤(a,b) = ⊤(b,a)$,

monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,

$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of Tconorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) Tnorms (as well as Tconorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ 
Probabilistic & $⊤^P(a,b)$ & $ab$ 
Einstein & $⊤^E(a,b)$ & $ab2ab+ab$ 
Hamacher & $⊤^H_p(a,b)$ & $abp+(1p)(a+bab)$ 
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a1)(p^b1)p1)$ 
Yager & $⊤^Y_p(a,b)$ & $max(0, 1((1a)^p+(1b)^p)^1p)$ 
AczélAlsina & $⊤^A_p(a,b)$ & $exp((log(a)^p+log(b)^p )^1p)$ 
Dombi & $⊤^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊤^S_p(a,b)$ & $(a^p+b^p1)^1p$ 
In the theory of multivalued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible Tnorm is the minimum and the smallest possible Tconorm is the maximum: for any Tnorm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any Tconorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible Tnorm and the largest possible Tconorm are the socalled drastic Tnorm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic Tconorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any Tnorm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any Tconorm~$⊥$. We do not consider the drastic Tconorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a Tnorm~$⊤$, a Tconorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of Tnorms and Tconorms, respectively, where matching lines together with the standard negation $N(a) = 1a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 Tconorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ 
Probabilistic & $⊥^P(a,b)$ & $a+bab$ 
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ 
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p2)ab1+(p1)ab$ 
Frank & $⊥^F_p(a,b)$ & $1log_p(1+(p^1a1)(p^1b1)p1)$ 
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ 
AczélAlsina & $⊥^A_p(a,b)$ & $1 exp((log(1a)^p+log(1b)^p )^1p)$ 
Dombi & $⊥^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊥^S_p(a,b)$ & $1((1a)^p+(1b)^p1)^1p$ 
Appendix D Additional Plots
Appendix A Implementation Details
For the singleview 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019SoftRas]. We use the same model architecture~[Liu2019SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^4$ for the first $150 000$ steps and use a learning rate of $3⋅10^5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1F(x)$, which means that $F_Rev.= F$ for symmetric distributions.
The squareroot distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) 
#1
(7) 
#1
(8) 
where $y:=(x+1)/2$.
#1
(9) 
#1
(10) 
#1
(11) 
#1
(12) 
#1
(13) 
#1
(14) 
#1
(15) 
#1
(16) 
#1
(17) 
#1
(18) 
#1
(20) 
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
& & & & & 
& & & & & 
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) 
& & & & & 
& & & & & 
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) 
& & & & & 
& & & & & 
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & GumbelMax & GumbelMin 
& & & & & 
& & & & & 
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) 
& & & & & 
& & & & & 
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) 
~
~
Appendix C TNorms and TConorms
The axiomatic approach to multivalued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multivalued generalizations of the disjunction (logical ``or''), called Tconorms, in Definition~2. Here we complement this with the axioms for multivalued generalizations of the conjunction (logical ``and''), which are called Tnorms.
Definition 6 (Tnorm).
A Tnorm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies

associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,

commutativity: $⊤(a,b) = ⊤(b,a)$,

monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,

$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of Tconorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) Tnorms (as well as Tconorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ 
Probabilistic & $⊤^P(a,b)$ & $ab$ 
Einstein & $⊤^E(a,b)$ & $ab2ab+ab$ 
Hamacher & $⊤^H_p(a,b)$ & $abp+(1p)(a+bab)$ 
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a1)(p^b1)p1)$ 
Yager & $⊤^Y_p(a,b)$ & $max(0, 1((1a)^p+(1b)^p)^1p)$ 
AczélAlsina & $⊤^A_p(a,b)$ & $exp((log(a)^p+log(b)^p )^1p)$ 
Dombi & $⊤^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊤^S_p(a,b)$ & $(a^p+b^p1)^1p$ 
In the theory of multivalued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible Tnorm is the minimum and the smallest possible Tconorm is the maximum: for any Tnorm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any Tconorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible Tnorm and the largest possible Tconorm are the socalled drastic Tnorm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic Tconorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any Tnorm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any Tconorm~$⊥$. We do not consider the drastic Tconorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a Tnorm~$⊤$, a Tconorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of Tnorms and Tconorms, respectively, where matching lines together with the standard negation $N(a) = 1a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 Tconorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ 
Probabilistic & $⊥^P(a,b)$ & $a+bab$ 
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ 
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p2)ab1+(p1)ab$ 
Frank & $⊥^F_p(a,b)$ & $1log_p(1+(p^1a1)(p^1b1)p1)$ 
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ 
AczélAlsina & $⊥^A_p(a,b)$ & $1 exp((log(1a)^p+log(1b)^p )^1p)$ 
Dombi & $⊥^D_p(a,b)$ & $(1+( (1aa)^p +(1bb)^p )^1p)^1$ 
SchweizerSklar & $⊥^S_p(a,b)$ & $1((1a)^p+(1b)^p1)^1p$ 