1 Introduction
In the past years, many differentiable renderers have been published. These include the seminal differentiable mesh renderer OpenDR [Loper2014], the Neural 3D Mesh Renderer [Kato2017], and SoftRas [Liu2019-SoftRas]
among many others. Using a differentiable renderer enables a multitude of computer vision applications, such as human pose estimation
[bogo2016keep], camera intrinsics estimation [Palazzi2019], 3D shape optimization [Kato2017], 3D reconstruction [Kato2017, Liu2019-SoftRas, Chen2019DIB], and 3D style transfer [Kato2017].A fundamental difference between different classes of differentiable renderers is the choice of the underlying 3D representation. In this work, we focus on differentiable 3D mesh renderers [Loper2014, Kato2017, Liu2019-SoftRas, Chen2019DIB]; however, the aspects that we investigate could also be applied to other differentiable rendering concepts, such as rendering voxels [Yan2016], point clouds [insafutdinov18pointclouds], surfels [Yifan2019-DiffSurfaceSplatting], signed distance functions [Jiang2020-SDFDiff], and other implicit representations [liu2019learning, Sitzmann2019SceneRepresentations].
Differentiable mesh renderers can be constructed in different ways: either using an exact and hard renderer with approximate surrogate gradients or using an approximate renderer with natural gradients. Loper et al. [Loper2014] and Kato et al. [Kato2017] produce approximate surrogate gradients for their differentiable renderer, while their forward rendering is hard. In contrast, other differentiable renderers approximate the forward rendering in such a way that they produce a natural gradient. This can be achieved by modeling or approximating a renderer under a probabilistic perturbation, which is continuous and makes the renderer differentiable. For that, Rhodin et al. [rhodin2015versatile]
model it with a Gaussian distribution, while Liu
et al. [Liu2019-SoftRas] model it with the square root of a logistic distribution, Petersen et al. [petersen2021learning] use a logistic distribution, and Chen et al. [Chen2019DIB]use the exponential distribution. While this variational interpretation of perturbing by a respective distribution is not stressed in some of these papers
[Liu2019-SoftRas, Chen2019DIB], we believe it is important because it explicitly allows comparing the characteristics of the differentiable renderers. Moreover, the methods that only approximate gradients can also be seen as approximately modelling a perturbation: the gradient computed for the Neural 3D Mesh Renderer [Kato2017] is approximately a perturbation by a uniform distribution. Note that, here, the solutions for rendering under perturbations are obtained analytically in closed-form without sampling.In this work, we introduce a generalized differentiable renderer (GenDR). By choosing an appropriate probability distribution, we can (at least approximately) recover the above differentiable mesh renderers, which shows that a core distinguishing aspect of differentiable renderers is the type of distributions that they model. The choice of probability distribution herein is directly linked to the sigmoid (i.e., S-shaped) function used for the rasterization. For example, a Heaviside sigmoid function corresponding to the Dirac delta distribution yields a conventional non-differentiable renderer, while a logistic sigmoid function of squared distances corresponds to the square root of a logistic distribution. Herein, the sigmoid function is the cumulative distribution function (CDF) of the corresponding distribution. In this work, we select and present an array of distributions and examine their theoretical properties.
Another aspect of approximate differentiable renderers is their aggregation function, i.e., the function that aggregates the occupancy probabilities of all faces for each pixel. Existing differentiable renderers commonly aggregate the probabilities via the probabilistic sum ( or ), which corresponds to the probability that at least one face covers the pixel assuming that probabilities for each triangle are stochastically independent (cf. Eq. 4 in [Liu2019-SoftRas] or Eq. 6 in [Chen2019DIB]). In the field of real-valued logics and adjacent fields, this is well-known as a T-conorm, a relaxed form of the logical ‘or’. Two examples of other T-conorms are the maximum T-conorm and the Einstein sum , which models the relativistic addition of velocities. We generalize our differentiable renderer to use any continuous T-conorm and present a variety of suitable T-conorms.
In total, the set of resulting concrete instances arising from our generalized differentiable renderer and the proposed choices amounts to
concrete differentiable renderers. We extensively benchmark all of them on a shape optimization task and a camera pose estimation task. Further, we evaluate the best performing and most interesting instances on the popular ShapeNet
[Chang2015ShapeNet] 13 class single-view 3D reconstruction experiment [Kato2017]. Here, we also include those instances that approximate other existing differentiable renderers. We note that we do not introduce a new shading technique in this paper, and rely on existing blended shaders instead.We summarize our contributions as follows:
-
We propose a generalized differentiable mesh renderer.
-
We identify existing differentiable renderers (approximately) as instances of our generalized renderer.
-
We propose a variety of suitable sigmoid functions and T-conorms and group them by their characteristics.
-
We extensively benchmark concrete differentiable renderers, analyze which characteristics and families of functions lead to a good performance, and find that the best choice heavily depends on the task, class, or characteristics of the data.
2 Related Work
The related work can be classified into those works that present differentiable renderers and those which apply them, although there is naturally also a significant overlap. For additional details on differentiable rendering approaches, cf. the survey by Kato
et al. [kato2020differentiable].Analytical Differentiable Renderers.
The first large category of differentiable renderers are those which produce approximate gradients in an analytical and sampling-free way. This can either happen by surrogate gradients during backpropagation, as in
[Kato2017], or by making the forward computation naturally differentiable by perturbing the distances between pixels and triangles analytically in closed-form [Liu2018, Chen2019DIB, petersen2019pix2vex]. Our work falls into this category and is of the second case. Existing works each present their renderer for a specific distribution or sigmoid function. We formally characterize the necessary functions to a differentiable renderer and present an array of options.Monte-Carlo Differentiable Renderers. An alternative to analytical differentiable renderers are those which are based on Monte-Carlo sampling techniques. The first example for this is the “redner” path tracer by Li et al. [Li2018], who use edge sampling to approximate the gradients of their renderer. Loubet et al. [Loubet2019ReparameterizingRendering] build on these ideas and reparameterize the involved discontinuous integrands yielding improved gradient estimates. Zhang et al. [zhang2020path] extend these ideas by differentiating the full path integrals which makes the method more efficient and effective. Lidec et al. [lidec2021differentiable] approach Monte-Carlo differentiable rendering by estimating the gradients of a differentiable renderer via the perturbed optimizers method [berthet2020learning].
Applications. Popular applications for differentiable renderers are pose [Loper2014, Kato2017, Liu2019-SoftRas, Chen2019DIB, Palazzi2019, lidec2021differentiable, ravi2020accelerating], shape [Kato2017, zhang2020path, petersen2019pix2vex, ravi2020accelerating], material [liu2017material, shi2020match], texture [Liu2019-SoftRas, Chen2019DIB, Loubet2019ReparameterizingRendering], and lighting [zhang2020path]
estimation. Here, the parameters of an initial scene are optimized to match the scene in a reference image or a set of reference images. Another interesting application is single-view 3D shape prediction without 3D supervision. Here, a neural network predicts a 3D representation from a single image, and the rendering of the image is compared to the original input image. This learning process is primarily guided by supervision of the object silhouette. It is possible to omit this supervision via adversarial style transfer
[petersen2021style]. Other applications are generating new 3D shapes that match a data set [Henzler2018, Henderson_2020_CVPR] as well as adversarial examples in the real world [liu2019beyond].In our experiments, we use optimization for pose and shape to benchmark all proposed differentiable renderer combinations. As the single-view 3D mesh reconstruction is a complex experiment requiring training a neural network, we benchmark our method on this task only for a selected subset of differentiable renderers.
T-norms and T-conorms. T-norms and T-conorms (triangular norms and conorms) are binary functions that generalize the logical conjunction (‘and’) and disjunction (‘or’), respectively, to real-valued logics or probability spaces [klement2013triangular, van2022analyzing]. A generalization of ‘or’ is necessary in a differentiable renderer to aggregate the occlusion caused by faces. The existing analytical differentiable renderers all use the probabilistic T-conorm.
3 Generalized Differentiable Renderer
In this section, we present our generalized differentiable mesh renderer. With a differentiable renderer, we refer to a renderer that is continuous everywhere and differentiable almost everywhere (a.e.). Note that, in this context, continuity is a stricter criterion than differentiable a.e. because formally (i) conventional renderers are already differentiable a.e. (which does not mean that they can provide any meaningful gradients), and (ii) almost all existing “differentiable” renderers are not actually differentiable everywhere.
Let us start by introducing how a classic hard rendering algorithm operates. The first step is to bring all objects into image space, which is typically a sequence of affine transformations followed by the camera projection. This step is already differentiable. The second step is the rasterization: For each pixel, we need to compute the set of faces (typically triangles) which cover it. If the pixel is covered by at least one face, the face that is closest to the camera is displayed.
3.1 Differentiable Occlusion Test
To make the test whether a pixel is occluded by a face differentiable, we start by computing the signed Euclidean distance between pixel and face boundary. By convention, pixels inside the triangle have a positive distance, pixels outside the triangle a negative distance. For pixels exactly on the boundary, the distance to the face is .
For a hard occlusion test, we would just check whether is non-negative. In a differentiable renderer, we instead introduce a perturbation in the form of a probability distribution with density together with a temperature or scale parameter . We then evaluate the probability that the perturbed distance is non-negative, where is distributed according to . Thus, we compute the probability that occludes as
(1) | ||||
where is the CDF of the distribution and thus yields a closed-form solution for the desired probability (provided that has a closed-form solution or can be appropriately approximated). In a differentiable renderer, we require being continuous. Typically, has the S-shape of a sigmoid function, see Table 3.2. Therefore, we refer to CDFs as sigmoid functions in this paper.
Most existing differentiable renderers use sigmoid functions or transformations thereof, see Section 4, to softly evaluate whether a pixel lies inside a triangle. This accords to the probabilistic interpretation in Equation (1) where the probability distribution is defined via the sigmoid function used in each case. Here, the logistic sigmoid function is a popular choice of such a sigmoid function. Note that, recently, it has frequently been referred to as “the” sigmoid in the literature, which is not to be confused with the original and more general terminology.
Example 1 (Logistic Sigmoid).
is the logistic sigmoid function, which corresponds to the logistic distribution.
3.2 Aggregation
The second step to be made differentiable is the aggregation of multiple faces. While this is conventionally done via a logical ‘or’, the differentiable real-valued counterpart is a T-conorm. T-conorms are formally defined as follows.
Definition 2 (T-conorm).
A T-conorm is a binary operation , which satisfies
-
associativity: ,
-
commutativity: ,
-
monotonicity: ,
-
is a neutral element .
Remark 3 (T-conorms and T-norms).
While T-conorms are the real-valued equivalents of the logical ‘or’, so-called T-norms are the real-valued equivalents of the logical ‘and’. Certain T-conorms and T-norms are dual in the sense that one can derive one from the other using a complement (typically ) and De Morgan’s laws ().
Let us proceed by stating the T-conorm which is used in all applicable previous approximate differentiable renderers with natural gradients.
Example 4 (Probabilistic Sum).
The probabilistic sum is a T-conorm that corresponds to the probability that at least one out of two independent events occurs. It is defined as
(2) |
An alternative to this is the Einstein sum, which is based on the relativistic addition of velocities.
Example 5 (Einstein Sum).
The Einstein sum is a T-conorm that corresponds to the velocity addition under special relativity:
(3) |
Combining the above concepts, we can compute the occupancy or coverage of a pixel given a set of faces as
(4) |
squared just tree/.style=
for tree=
edge path=
{edge}](!u.parentanchor)--+(0,-5pt)-|(.childanchor){edgelabel};},align=center,if={(isodd(n_children))&&(n_children>2)}{forchildren={if={equal(n,((n_children("!u"))+1)/2)}{calignwithcurrent}{},}}{},ifnchildren=0{beforepacking={tier=terminus}}{},},}}{minipage}{0.2375}Taxonomyofprobabilitydistributionscorrespondingtosigmoidfunctions.Thesubdivisionsarechosenwrt.~propertiesthathaveacategoricallydifferentinfluenceonthebehaviorofthecorrespondingrenderer.Theorderofsplitswhengoingdowninthetree(whichcouldbechosendifferently,e.g.,symmetric/asymmetriccouldbethefirstsplit)reflectstheimportanceoftheproperties.}fig:sigmoid-taxonomy}minipage} {minipage}{0.725}
}
{!}{{justtree}{%leftjustifications,squaredjusttree,}[TaxonomyofDistributions[FiniteSupport[Exact[DiracDelta
(Heaviside)]][Continuous[Uniform
CubicHermite
WignerSemicircle]]][InfiniteSupport[Symmetrical[ExponentialConv.[Gaussian
Laplace
Logistic
Hyperbolicsecant]
&
&
&
&
&
Heaviside &
Uniform &
Cubic Hermite &
Wigner Semicircle &
Gaussian &
Laplace
&
&
&
&
&
Logistic &
Logistic (squares) &
Hyperbolic secant &
Cauchy &
Cauchy (squares) &
Reciprocal
&
&
&
&
&
Gumbel-Max &
Gumbel-Min &
Exponential &
Exponential (Rev.) &
Levy &
Levy (Rev.)
&
&
&
&
&
Gamma ($p=0.5$) &
Gamma ($p=1$) &
Gamma ($p=2$) &
Gamma ($p=.5$) (Rev.) &
Gamma ($p=1$) (Rev.) &
Gamma ($p=2$) (Rev.)
&
&
&
&
&
Gamma ($p=0.5$) &
Gamma ($p=1$) &
Gamma ($p=2$) &
Gamma ($p=.5$) (Rev.) &
Gamma ($p=1$) (Rev.) &
Gamma ($p=2$) (Rev.)
(squares) &
(squares) &
(squares) &
(squares) &
(squares) &
(squares)
3.3 Shading
The coloring of faces is handled via the Phong model or any other shading model, which is already differentiable. In the literature, Chen~et al. ~[Chen2019DIB] compare different choices. Finally, to aggregate the coloring of each pixel depending on the distance of the face to the camera (depth), there are two popular choices in the literature: no depth perturbations and taking the closest triangle (like [Loper2014, Kato2017, Chen2019DIB]) and Gumbel depth perturbations (like [Liu2019-SoftRas, petersen2019pix2vex]). Only the latter choice is truly continuous, and the closed-form solution for Gumbel depth perturbations is the well known $softmin$. As there are (i) no closed-form solutions for adequate alternatives to Gumbel perturbations in the literature, and (ii) these two options have been extensively studied in the literature~[lidec2021differentiable, Loper2014, Kato2017, Chen2019DIB, Liu2019-SoftRas, petersen2019pix2vex], in this work we do not modify this component and focus on the differentiable silhouette computation and aggregation. While we implement both options in GenDR, in our evaluation, we perform all experiments agnostic to the choice of shading aggregation as the experiments rely solely on the silhouette.
4 Instantiations of the GenDR
Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).
T-conorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $↑/ ↓$~wrt.~$p$ |
(Logical `or') & $∨$ & & (✗) & (✗) & --- & (✓) & --- & --- & --- |
Maximum & $⊥^M$ & & ✓& ✗& ✗& ✓& ✗& ✗& --- |
Probabilistic & $⊥^P$ & $=⊥^H_1 =⊥^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
Einstein & $⊥^E$ & $=⊥^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
Hamacher & $⊥^H_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ |
Frank & $⊥^F_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ |
Yager & $⊥^Y_p$ & $p∈(0, ∞)$ & ✓& ✗& ✗& ✗& ✓& ✓& $↑$ |
Aczél-Alsina & $⊥^A_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ |
Dombi & $⊥^D_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ |
Schweizer-Sklar & $⊥^SS_p$& $p∈(-∞, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
^A_0.5$. For the last two rows, we use a Cauchy distribution and use $
⊥^P$ and $⊥^Y_2$.Distributions. Figure~LABEL:fig:sigmoid-taxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, non-differentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$x↦x / (2 + 2|x|) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The Gumbel-Max and Gumbel-Min are extreme value distributions~[Coles2001] and two-sided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are one-sided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.
SoftRas~[Liu2019-SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as
(5) |
That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.
Aggregations. Table~1 provides an overview over selected T-conorms and displays their properties. The logical `or' is not a T-conorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous T-conorms such as the drastic T-cornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager T-conorms are continuously differentiable.
The top four rows in Table~1 contain individual T-conorms, and the remainder are families of T-conorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the T-conorms coincide, e.g, $⊥^P=⊥^H_1 =⊥^A_1$. A discussion of the remaining properties and a mathematical definition of each T-conorm can be found in SM~C. Figure~3 displays some of the T-conorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.
Renderer & Distribution & T-conorm |
OpenDR~[Loper2014] & Uniform (backward) & --- |
N3MR~[Kato2017] & Uniform (backward) & --- |
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $⊥^P$ |
SoftRas~[Kato2017] & Square-root of Logistic & $⊥^P$ |
Log.~Relax~[petersen2021learning] & Logistic & $⊥^P$ |
DIB-R~[Chen2019DIB] & Exponential & $⊥^P$ |
Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.
5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup.
For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$.
For each setting, we perform a grid search over three learning rates ($λ∈{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ∈{10^-0.1⋅n | n∈N, 0≤n≤80}$.
Here, $λ= 10^-1.5≈ 0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution.
Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal.
We perform this evaluation from five different elevation angles ${-60^
![]() |
![]() |

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions. Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive.
Confer Table~
Comparison of T-conorms. We find that $⊥^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_-2$, and $⊥^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean |
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 |
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 |
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 |
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 |
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 |
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 |
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 |
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 |
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 |
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 |
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 |
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 Single-View 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).
Appendix A Implementation Details
For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $3⋅10^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions.
The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) |
#1
(7) |
#1
(8) |
where $y:=(x+1)/2$.
#1
(9) |
#1
(10) |
#1
(11) |
#1
(12) |
#1
(13) |
#1
(14) |
#1
(15) |
#1
(16) |
#1
(17) |
#1
(18) |
#1
(20) |
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) |
~
~
Appendix C T-Norms and T-Conorms
The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.
Definition 6 (T-norm).
A T-norm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies
-
associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,
-
commutativity: $⊤(a,b) = ⊤(b,a)$,
-
monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,
-
$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ |
Probabilistic & $⊤^P(a,b)$ & $ab$ |
Einstein & $⊤^E(a,b)$ & $ab2-a-b+ab$ |
Hamacher & $⊤^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$ |
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$ |
Yager & $⊤^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$ |
Aczél-Alsina & $⊤^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$ |
Dombi & $⊤^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$ |
Schweizer-Sklar & $⊤^S_p(a,b)$ & $(a^p+b^p-1)^1p$ |
In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any T-conorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic T-conorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any T-norm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any T-conorm~$⊥$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a T-norm~$⊤$, a T-conorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 T-conorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ |
Probabilistic & $⊥^P(a,b)$ & $a+b-ab$ |
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ |
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$ |
Frank & $⊥^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$ |
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ |
Aczél-Alsina & $⊥^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$ |
Dombi & $⊥^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$ |
Schweizer-Sklar & $⊥^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Appendix D Additional Plots

![]() |
![]() |
![]() |
![]() |
4 Instantiations of the GenDR
Let us proceed by discussing instantiations of the generalized differentiable renderer (GenDR).
T-conorm & & equal to / where & continuous & contin.~diff. & strict & idempotent & nilpotent & Archimedean & $↑/ ↓$~wrt.~$p$ |
(Logical `or') & $∨$ & & (✗) & (✗) & --- & (✓) & --- & --- & --- |
Maximum & $⊥^M$ & & ✓& ✗& ✗& ✓& ✗& ✗& --- |
Probabilistic & $⊥^P$ & $=⊥^H_1 =⊥^A_1$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
Einstein & $⊥^E$ & $=⊥^H_0$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
Hamacher & $⊥^H_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ |
Frank & $⊥^F_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↓$ |
Yager & $⊥^Y_p$ & $p∈(0, ∞)$ & ✓& ✗& ✗& ✗& ✓& ✓& $↑$ |
Aczél-Alsina & $⊥^A_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ |
Dombi & $⊥^D_p$ & $p∈(0, ∞)$ & ✓& ✓& ✓& ✗& ✗& ✓& $↑$ |
Schweizer-Sklar & $⊥^SS_p$& $p∈(-∞, 0)$ & ✓& ✓& ✓& ✗& ✗& ✓& --- |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
^A_0.5$. For the last two rows, we use a Cauchy distribution and use $
⊥^P$ and $⊥^Y_2$.Distributions. Figure~LABEL:fig:sigmoid-taxonomy provides a taxonomy of the distributions and sigmoid functions that are visualized in Table~3.2. We classify the distributions into those with finite support as well as others with infinite support, where the support is the set of points for which the PDF is greater than zero. Note that the CDFs are constant outside the support region. Among the distributions with finite support, there is the exact Dirac delta distribution corresponding to the Heaviside function, which yields a discrete, non-differentiable renderer. There are also continuous distributions allowing meaningful gradients, but (due to finite support) only in a limited proximity to each face. Here, we have, among others, the uniform distribution, which corresponds to a piecewise linear step function. The derivative of the uniform distribution is equivalent or very similar (due to minor implementation aspects) to the surrogate gradient of the Neural 3D Mesh Renderer~[Kato2017]. The distributions with infinite support can be categorized into symmetrical and asymmetrical. Among the symmetrical distributions, the Gaussian, the Laplace, the logistic, and the hyperbolic secant have an exponential convergence behavior or exponential decay of probability density. On the other hand, there is also the Cauchy distribution which has a linear convergence. This yields a significantly different behavior. We include the algebraic function~$x↦x / (2 + 2|x|) + 1/2$ and call it reciprocal sigmoid. This also has a linear convergence. Finally, we consider asymmetrical distributions with infinite support. The Gumbel-Max and Gumbel-Min are extreme value distributions~[Coles2001] and two-sided, which means that their support covers both positive and negative arguments. The exponential, Gamma, and Levy distributions are one-sided distributions. Here, it is important to not only consider the original distributions but also their mirrored or reversed variants, as well as shifted variations as can be seen in the last three rows of Table~3.2.
SoftRas~[Liu2019-SoftRas] squares the absolute part of the distance before applying the logistic sigmoid function and thus models the square roots of logistic perturbations. Instead of modifying the argument of~$F$, we instead interpret it as applying a transformed counterpart CDF~$F_sq$, which is more in line with the probabilistic interpretation in Equation~(1). More precisely, we compute the occlusion probability as
(5) |
That means that for each choice of~$F$, we obtain a counterpart~$F_sq$. A selection of these for different CDFs~$F$ is visualized in Table~3.2 denoted by ``(squares)''. For a mathematical definition of each sigmoid function, see SM~B.
Aggregations. Table~1 provides an overview over selected T-conorms and displays their properties. The logical `or' is not a T-conorm but the discrete and discontinuous equivalent, which is why we include it here. While there are also discontinuous T-conorms such as the drastic T-cornom, these are naturally not suitable for a differentiable renderer, which is why we exlude them. All except for the Max and Yager T-conorms are continuously differentiable.
The top four rows in Table~1 contain individual T-conorms, and the remainder are families of T-conorms. Here, we selected only suitable ranges for the parameter~$p$. Note that there are some cases in which the T-conorms coincide, e.g, $⊥^P=⊥^H_1 =⊥^A_1$. A discussion of the remaining properties and a mathematical definition of each T-conorm can be found in SM~C. Figure~3 displays some of the T-conorms and illustrates different properties. In Figure~4, we display example renderings with different settings and provide a visual comparison on how the aggregation function affects rendering.
Renderer & Distribution & T-conorm |
OpenDR~[Loper2014] & Uniform (backward) & --- |
N3MR~[Kato2017] & Uniform (backward) & --- |
Rhodin~et al. ~[rhodin2015versatile] & Gaussian & $⊥^P$ |
SoftRas~[Kato2017] & Square-root of Logistic & $⊥^P$ |
Log.~Relax~[petersen2021learning] & Logistic & $⊥^P$ |
DIB-R~[Chen2019DIB] & Exponential & $⊥^P$ |
Existing Special Cases of GenDR. In Table~2, we list which existing differentiable renderers are conceptually instances of GenDR. These renderers do each have some other differences, but one key difference lies in the type of distribution employed. Differences regarding shading are also discussed at the end of Section~3.
5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup.
For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$.
For each setting, we perform a grid search over three learning rates ($λ∈{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ∈{10^-0.1⋅n | n∈N, 0≤n≤80}$.
Here, $λ= 10^-1.5≈ 0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution.
Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal.
We perform this evaluation from five different elevation angles ${-60^
![]() |
![]() |

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions. Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive.
Confer Table~
Comparison of T-conorms. We find that $⊥^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_-2$, and $⊥^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean |
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 |
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 |
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 |
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 |
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 |
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 |
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 |
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 |
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 |
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 |
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 |
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 Single-View 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).
Appendix A Implementation Details
For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $3⋅10^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions.
The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) |
#1
(7) |
#1
(8) |
where $y:=(x+1)/2$.
#1
(9) |
#1
(10) |
#1
(11) |
#1
(12) |
#1
(13) |
#1
(14) |
#1
(15) |
#1
(16) |
#1
(17) |
#1
(18) |
#1
(20) |
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) |
~
~
Appendix C T-Norms and T-Conorms
The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.
Definition 6 (T-norm).
A T-norm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies
-
associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,
-
commutativity: $⊤(a,b) = ⊤(b,a)$,
-
monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,
-
$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ |
Probabilistic & $⊤^P(a,b)$ & $ab$ |
Einstein & $⊤^E(a,b)$ & $ab2-a-b+ab$ |
Hamacher & $⊤^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$ |
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$ |
Yager & $⊤^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$ |
Aczél-Alsina & $⊤^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$ |
Dombi & $⊤^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$ |
Schweizer-Sklar & $⊤^S_p(a,b)$ & $(a^p+b^p-1)^1p$ |
In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any T-conorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic T-conorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any T-norm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any T-conorm~$⊥$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a T-norm~$⊤$, a T-conorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 T-conorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ |
Probabilistic & $⊥^P(a,b)$ & $a+b-ab$ |
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ |
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$ |
Frank & $⊥^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$ |
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ |
Aczél-Alsina & $⊥^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$ |
Dombi & $⊥^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$ |
Schweizer-Sklar & $⊥^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Appendix D Additional Plots

![]() |
![]() |
![]() |
![]() |
5 Experiments111The source code will be available at github.com/Felix-Petersen/gendr.
5.1 Shape Optimization
Our first experiment is a shape optimization task. Here, we use the mesh of an airplane, and render it from $24$ azimuths using a hard renderer. The task is to optimize a mesh (initialized as a sphere) to fit the silhouette of the airplane within $100$ optimization steps. Limiting the task to $100$ optimization steps is critical for two reasons: (i) The task can be considered to be solved perfectly with any differentiable renderer that produces the correct gradient sign within a large number of steps, but we are interested in the quality of the gradients for the optimization task and how efficient each renderer is. (ii) The total evaluation is computationally expensive because we evaluate a total of $1 242$ renderers and perform a grid search over the distribution parameters for each one to provide a fair and reliable comparison.
Setup.
For optimization, we use the Adam optimizer~[Kingma2014AdamOpt] with parameters $β_1=0.5, β_2=0.95$.
For each setting, we perform a grid search over three learning rates ($λ∈{10^-1.25, 10^-1.5, 10^-1.75}$) and temperatures $τ∈{10^-0.1⋅n | n∈N, 0≤n≤80}$.
Here, $λ= 10^-1.5≈ 0.03$ performs best in almost all cases. As for the scale hyperparameter, it is important to use a fine-grained as well as large grid because this behaves differently for each distribution.
Here, we intentionally chose the grid larger than the range of reasonable values to ensure that the best choice is used for each setting; the extreme values for the scale were never optimal.
We perform this evaluation from five different elevation angles ${-60^
![]() |
![]() |

Results. In Figure~5, we display the results of our evaluation. We can observe that the regular distributions~$F$ typically perform better than the counterpart~$F_sq$, except for the case of Cauchy and reciprocal sigmoid, which are those with a linear convergence rate. We explain this by the fact that by squaring the distance before applying the sigmoid function, the function has a quadratic convergence rate instead. As the linearly converging functions also perform poorly in comparison to the exponentially converging functions (Gaussian, Laplace, Logistic, Gudermannian), we conclude that linear convergence is inferior to quadratic and exponential convergence. Columns~$1-3$ contain the distributions with finite support, and these do not perform very well on this task. The block of exponentially decaying distributions (columns $4-7$) performs well. The block of linearly decaying distributions (columns $8-9$) performs badly, as discussed above. The block of Levy distributions (last $4$ columns) performs even worse because it has an even slower convergence. Here, it also becomes slightly better in the squared setting, but it still exhibits worse performance than for linear convergence.
Comparison of Distributions. Gumbel, exponential, and gamma distributions do not all perform equally well, but Gumbel-Min, the reversed exponential, and the reversed gamma are all competitive.
Confer Table~
Comparison of T-conorms. We find that $⊥^M$ and ``average'' (which is not a T-conorm but was used as a baseline in~[Liu2019-SoftRas]) perform poorly. Also, $⊥^Y_4$, $⊥^A_2$, $⊥^A_4$, $⊥^D_2$, $⊥^D_4$, $⊥^SS_-2$, and $⊥^SS_-4$ perform poorly overall. This can be explained as they are rather extreme members of their respective T-norm families; in all of them, the $p$th power is involved, which can become a problematic component, e.g., $x^4$ is vanishingly small for~$x=0.5$. Interestingly, the gamma and the exponential distributions still perform well with these, likely since they are not symmetric and have an accumulated probability of~$1$ on the edge. Notably, the Yager T-conorm ($p=2$) performs very well, although having a plateau and thus no meaningful gradient outside the unit disc, see Figure~3.
Finally, we compute histograms of how many times each respective distribution and T-conorm is involved in the best~$10%$ of overall results. This is independent for the top and bottom plots. We can observe that Gamma ($p=0.5$, Reversed) performs the best overall (because it is more robust to the choice of T-conorm). Among the T-conorms, we find that $⊥^Y_2$ and $⊥^D_0.5$ perform best. The probabilistic and Einstein sums perform equally, and share the next place.

Method & Airplane& Bench & Dresser& Car & Chair & Display& Lamp & Speaker& Rifle & Sofa & Table & Phone & Vessel & Mean |
Kato et al. [Kato2017] N3MR (Uniform Backward) & 0.6172 & 0.4998 & 0.7143 & 0.7095 & 0.4990 & 0.5831 & 0.4126 & 0.6536 & 0.6322 & 0.6735 & 0.4829 & 0.7777 & 0.5645 & 0.6015 |
Liu et al. [Liu2019-SoftRas] SoftRas (Square-root of Logistic) & 0.6419 & 0.5080 & 0.7116 & 0.7697 & 0.5270 & 0.6156 & 0.4628 & 0.6654 & 0.6811 & 0.6878 & 0.4487 & 0.7895 & 0.5953 & 0.6234 |
Chen~et al. ~[Chen2019DIB] DIB-R (Exponential) & 0.570 & 0.498 & 0.763 & 0.788 & 0.527 & 0.588 & 0.403 & 0.726 & 0.561 & 0.677 & 0.508 & 0.743 & 0.609 & 0.612 |
Probabilistic + Uniform~~($≈$[Kato2017, Loper2014]) & 0.6456 & 0.4855 & 0.7113 & #1 & 0.5276 & 0.6126 & 0.4611 & 0.6651 & #1 & 0.6835 & 0.4514 & 0.8148 & 0.5971 & #1 |
Probabilistic + Logistic~~($=$[petersen2021learning]) & 0.6396 & 0.5005 & 0.7105 & 0.7471 & 0.5288 & 0.6022 & 0.4586 & 0.6639 & 0.6742 & 0.6660 & 0.4666 & 0.7771 & 0.5980 & 0.6179 |
Probabilistic + Logistic (squares)~~($=$[Liu2019-SoftRas]) & 0.6416 & 0.4966 & #1 & 0.7386 & 0.5224 & #1 & 0.4550 & #1 & #1 & 0.6818 & 0.4529 & #1 & 0.5984 & 0.6217 |
Probabilistic + Exponential (R)~~($=$[Chen2019DIB]) & 0.6321 & 0.4857 & 0.7123 & 0.7298 & 0.5178 & 0.5983 & 0.4611 & 0.6642 & 0.6713 & 0.6546 & 0.4700 & 0.7717 & #1 & 0.6130 |
Probabilistic + Gaussian~~($≈$[rhodin2015versatile]) & 0.5922 & #1 & 0.7104 & 0.7561 & #1 & 0.6080 & 0.4399 & #1 & 0.6533 & #1 & #1 & 0.7301 & 0.5894 & 0.6125 |
Probabilistic + Gamma (R) & #1 & 0.4842 & 0.7093 & 0.7220 & 0.5159 & 0.6033 & #1 & 0.6626 & 0.6719 & 0.6505 & 0.4642 & 0.7778 & 0.5978 & 0.6133 |
Einstein + Gamma (R, squares) & 0.6438 & 0.4816 & #1 & 0.7284 & 0.5170 & 0.6111 & 0.4654 & 0.6647 & 0.6760 & 0.6546 & 0.4626 & #1 & 0.5973 & 0.6184 |
Yager (p=2) + Cauchy (squares) & 0.6380 & #1 & 0.7047 & 0.7359 & 0.5188 & 0.5976 & 0.4617 & 0.6612 & 0.6726 & 0.6619 & 0.4819 & 0.7560 & #1 & 0.6149 |
5.2 Camera Pose Optimization
In our second experiment, the goal is to find the camera pose for a model of a teapot from a reference image. The angle is randomly modified by an angle uniformly drawn from $[15^∘, 75^∘]$, and the distance and camera view angle are also randomized. We sample~$600$ pairs of a reference image and an initialization and use this set of settings for each method. For optimization, we use Adam with a learning rate of either~$0.1$ or~$0.3$ (via grid search) and optimize for~$1000$ steps. During the optimization, we transition an initial scale of~$σ=10^-1$ logarithmically to a final value of~$σ=10^-7$. This allows us to avoid a grid search for the optimal scale, and makes sense since an initially large~$σ$ is beneficial for pose optimization, because a smoother model has a higher probability of finding the correct orientation of the object. This contrasts with the setting of shape estimation, where this would be fatal because the vertices would collapse to the center.
Results. In Figure~6, we display the results of this experiment. A corresponding image of the counterpart distributions~$F_sq$ as well as results for the experiment applied to the model of a chair can be found in SM~D. The metric is the fraction of settings which achieved matching the ground truth pose up to $3^∘$. We find that in this experiment, the results are similar to those in the shape optimization experiment. Note that there are larger yellow areas because the color map ranges from $0%$ to $90%$, while in the shape optimization plot the color map ranges in a rather narrow loss range.
5.3 Single-View 3D Reconstruction
Setup. Finally, we reproduce the popular ShapeNet single-view 3D reconstruction experiment from~[Kato2017, Liu2019-SoftRas, Chen2019DIB, petersen2021learning]. We select three T-conorms ($⊥^P, ⊥^E, ⊥^Y_2$) and~$10$ distributions (Uniform, Gaussian, Logistic, Logistic (squares), Cauchy, Cauchy (squares), Gumbel-Min, Gamma (R, $p=0.5$), Gamma (R, $p=0.5$, squares), and Exponential (R)). These have been selected because they have been used in previous works, are notable (Cauchy, Gumbel-Min, Einstein), or have performed especially well in the aircraft shape optimization experiment (Gamma, Yager). For each setting, we perform a grid search of~$τ$ at resolution~$10^0.5$. Further experimental details can be found in SM~A.
Results. In Figure~7, we display and discuss the class-wise results for all $30$ selected renderers. In Table~3, we show the (self-) reported results for existing differentiable renderers in the top block. In the bottom block, we display our results for the methods that are equivalent ($=$) or very similar ($≈$) to the six existing differentiable renderers. The differences for equivalent methods can be explained with small variations in the setting and minor implementation differences. Additionally, we include three noteworthy alternative renderers, such as the one that also performed best on the prior airplane shape optimization task. We conclude that the optimal choice of renderer heavily depends on the characteristics of the 3D models and the task. Surprisingly, we find that the simple uniform method achieves consistently good results and the best average score.
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).
Appendix A Implementation Details
For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $3⋅10^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions.
The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) |
#1
(7) |
#1
(8) |
where $y:=(x+1)/2$.
#1
(9) |
#1
(10) |
#1
(11) |
#1
(12) |
#1
(13) |
#1
(14) |
#1
(15) |
#1
(16) |
#1
(17) |
#1
(18) |
#1
(20) |
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) |
~
~
Appendix C T-Norms and T-Conorms
The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.
Definition 6 (T-norm).
A T-norm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies
-
associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,
-
commutativity: $⊤(a,b) = ⊤(b,a)$,
-
monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,
-
$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ |
Probabilistic & $⊤^P(a,b)$ & $ab$ |
Einstein & $⊤^E(a,b)$ & $ab2-a-b+ab$ |
Hamacher & $⊤^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$ |
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$ |
Yager & $⊤^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$ |
Aczél-Alsina & $⊤^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$ |
Dombi & $⊤^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$ |
Schweizer-Sklar & $⊤^S_p(a,b)$ & $(a^p+b^p-1)^1p$ |
In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any T-conorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic T-conorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any T-norm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any T-conorm~$⊥$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a T-norm~$⊤$, a T-conorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 T-conorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ |
Probabilistic & $⊥^P(a,b)$ & $a+b-ab$ |
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ |
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$ |
Frank & $⊥^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$ |
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ |
Aczél-Alsina & $⊥^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$ |
Dombi & $⊥^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$ |
Schweizer-Sklar & $⊥^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Appendix D Additional Plots

![]() |
![]() |
![]() |
![]() |
6 Discussion and Conclusion
In this work, we generalized differentiable mesh renderers and explored a large space of instantiations of our generalized renderer GenDR. We found that there are significant differences between different distributions for the occlusion test but also between different T-conorms for the aggregation. In our experiments, we observed that the choice of renderer has a large impact on the kind of models that can be rendered most effectively. We find that the uniform distribution outperforms the other tested distributions on average, which is surprising considering it simplicity. Remarkably, the uniform distribution had already been used implicitly for the early surrogate gradient renderers but was later discarded for the approximate differentiable renderers.
Acknowledgments. This work was supported by the DFG in the Cluster of Excellence EXC 2117 (Project-ID 390829875) and the SFB Transregio 161 (Project-ID 251654672), and the Land Salzburg within the WISS 2025 project IDA-Lab (20102-F1901166-KZP and 20204-WISS/225/197-2019).
Appendix A Implementation Details
For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $3⋅10^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions.
The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) |
#1
(7) |
#1
(8) |
where $y:=(x+1)/2$.
#1
(9) |
#1
(10) |
#1
(11) |
#1
(12) |
#1
(13) |
#1
(14) |
#1
(15) |
#1
(16) |
#1
(17) |
#1
(18) |
#1
(20) |
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) |
~
~
Appendix C T-Norms and T-Conorms
The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.
Definition 6 (T-norm).
A T-norm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies
-
associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,
-
commutativity: $⊤(a,b) = ⊤(b,a)$,
-
monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,
-
$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ |
Probabilistic & $⊤^P(a,b)$ & $ab$ |
Einstein & $⊤^E(a,b)$ & $ab2-a-b+ab$ |
Hamacher & $⊤^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$ |
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$ |
Yager & $⊤^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$ |
Aczél-Alsina & $⊤^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$ |
Dombi & $⊤^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$ |
Schweizer-Sklar & $⊤^S_p(a,b)$ & $(a^p+b^p-1)^1p$ |
In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any T-conorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic T-conorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any T-norm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any T-conorm~$⊥$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a T-norm~$⊤$, a T-conorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 T-conorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ |
Probabilistic & $⊥^P(a,b)$ & $a+b-ab$ |
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ |
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$ |
Frank & $⊥^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$ |
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ |
Aczél-Alsina & $⊥^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$ |
Dombi & $⊥^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$ |
Schweizer-Sklar & $⊥^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Appendix D Additional Plots

![]() |
![]() |
![]() |
![]() |
Appendix A Implementation Details
For the single-view 3D reconstruction experiment, we closely orient ourselves on the setup by Liu~et al. ~[Liu2019-SoftRas]. We use the same model architecture~[Liu2019-SoftRas] and also train with a batch size of $64$ for $250 000$ steps using the Adam optimizer~[Kingma2014AdamOpt] We also schedule the learning rate to $10^-4$ for the first $150 000$ steps and use a learning rate of $3⋅10^-5$ for the remaining training. At this point (after the first $150 000$ steps), we also decrease the temperature $τ$ by a factor of $0.3$.
Using different learning rates (as an ablation) did not improve the results.
Appendix B Distributions
In this section, we define each of the presented distributions / sigmoid functions. Figure~4 displays the respective CDFs and PDFs.
Note that, for each distribution, the PDFs $f$ is defined as the derivative of the CDF $F$.
Also, note that a reversed (Rev.) CDF is defined as $F_Rev.(x) = 1-F(-x)$, which means that $F_Rev.= F$ for symmetric distributions.
The square-root distribution~$F_sq$ is defined in terms of $F$ as in Equation (5).
Therefore, in the following, we will define the distributions via their CDFs~$F$.
#1
(6) |
#1
(7) |
#1
(8) |
where $y:=(x+1)/2$.
#1
(9) |
#1
(10) |
#1
(11) |
#1
(12) |
#1
(13) |
#1
(14) |
#1
(15) |
#1
(16) |
#1
(17) |
#1
(18) |
#1
(20) |
where $γ(p,x)$ is the lower incomplete gamma function and $p>0$ is the shape parameter.
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Heaviside & Uniform & Cubic Hermite & Wigner Semicircle & Gaussian & Gaussian (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Laplace & Laplace (sq.) & Logistic & Logistic (sq.) & Hyperbolic secant & Hyperbolic secant (sq.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Cauchy & Cauchy (sq.) & Reciprocal & Reciprocal (sq.) & Gumbel-Max & Gumbel-Min |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Exponential & Exponential (sq.) & Exponential (Rev.) & Levy & Levy (sq.) & Levy (Rev.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$) & Gamma ($p=1$) & Gamma ($p=2$) & Gamma ($p=.5$, R.) & Gamma ($p=1$, R.) & Gamma ($p=2$, R.) |
![]() ![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
Gamma ($p=0.5$, sq.) & Gamma ($p=1$, sq.) & Gamma ($p=2$, sq.) & Gamma ($p=.5$, R., sq.) & Gamma ($p=1$, R., sq.) & Gamma ($p=2$, R., sq.) |
~
~
Appendix C T-Norms and T-Conorms
The axiomatic approach to multi-valued logics (which we need to combine the occlusions by different faces in a ``soft'' manner) is based on defining reasonable properties for truth functions. We stated the axioms for multi-valued generalizations of the disjunction (logical ``or''), called T-conorms, in Definition~2. Here we complement this with the axioms for multi-valued generalizations of the conjunction (logical ``and''), which are called T-norms.
Definition 6 (T-norm).
A T-norm (triangular norm) is a binary operation~$⊤: [0,1] ×[0,1] →[0,1]$, which satisfies
-
associativity: $⊤(a, ⊤(b,c)) = ⊤(⊤(a,b), c)$,
-
commutativity: $⊤(a,b) = ⊤(b,a)$,
-
monotonicity: $(a ≤c) ∧(b ≤d) ⇒⊤(a,b) ≤⊤(c,d)$,
-
$1$ is a neutral element: $⊤(a,1) = a$.
Clearly these axioms ensure that the corners of the unit square, that is, the value pairs considered in classical logic, are processed as with a standard conjunction: neutral element and commutativity imply that $(1,1) ↦1$, $(0,1) ↦0$, $(1,0) ↦0$. From one of the latter two and monotonicity it follows $(0,0) ↦0$. Analogously, the axioms of T-conorms ensure that the corners of the unit square are processed as with a standard disjunction. Actually, the axioms already fix the values not only at the corners, but on the boundaries of the unit square. Only inside the unit square (that is, for $(0,1)^2$) T-norms (as well as T-conorms) can differ.
Minimum & $⊤^M(a,b)$ & $min(a,b)$ |
Probabilistic & $⊤^P(a,b)$ & $ab$ |
Einstein & $⊤^E(a,b)$ & $ab2-a-b+ab$ |
Hamacher & $⊤^H_p(a,b)$ & $abp+(1-p)(a+b-ab)$ |
Frank & $⊤^F_p(a,b)$ & $log_p(1+(p^a-1)(p^b-1)p-1)$ |
Yager & $⊤^Y_p(a,b)$ & $max(0, 1-((1-a)^p+(1-b)^p)^1p)$ |
Aczél-Alsina & $⊤^A_p(a,b)$ & $exp(-(|log(a)|^p+|log(b)|^p )^1p)$ |
Dombi & $⊤^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^1p)^-1$ |
Schweizer-Sklar & $⊤^S_p(a,b)$ & $(a^p+b^p-1)^1p$ |
In the theory of multi-valued logics, and especially in fuzzy logic [Klir_and_Yuan_1995], it was established that the largest possible T-norm is the minimum and the smallest possible T-conorm is the maximum: for any T-norm~$⊤$ it is $⊤(a,b) ≤min(a,b)$ and for any T-conorm~$⊥$ it is $⊥(a,b) ≥max(a,b)$. The other extremes, that is, the smallest possible T-norm and the largest possible T-conorm are the so-called drastic T-norm, defined as $⊤^∘(a,b) = 0$ for $(a,b) ∈(0,1)^2$, and the drastic T-conorm, defined as $⊥^∘(a,b) = 1$ for $(a,b) ∈(0,1)^2$. Hence it is $⊤(a,b) ≥⊤^∘(a,b)$ for any T-norm~$⊤$ and $⊥(a,b) ≤⊥^∘(a,b)$ for any T-conorm~$⊥$. We do not consider the drastic T-conorm for an occlusion test, because it clearly does not yield useful gradients.
As already mentioned in the paper, it is common to combine a T-norm~$⊤$, a T-conorm~$⊥$ and a negation~$N$ (or complement, most commonly $N(a) = 1-a$) so that DeMorgan's laws hold. Such a triplet is often called a dual triplet. In Tables~5 and~6 we show the formulas for the families of T-norms and T-conorms, respectively, where matching lines together with the standard negation $N(a) = 1-a$ form dual triplets. Note that, for some families, we limited the range of values for the parameter~$p$ (see Table~1) compared to more general definitions~[Klir_and_Yuan_1995].
c.1 T-conorm Plots
Maximum & $⊥^M(a,b)$ & $max(a,b)$ |
Probabilistic & $⊥^P(a,b)$ & $a+b-ab$ |
Einstein & $⊥^E(a,b)$ & $⊥^H_2(a,b)=a+b1+ab$ |
Hamacher & $⊥^H_p(a,b)$ & $a+b+(p-2)ab1+(p-1)ab$ |
Frank & $⊥^F_p(a,b)$ & $1-log_p(1+(p^1-a-1)(p^1-b-1)p-1)$ |
Yager & $⊥^Y_p(a,b)$ & $min(1, (a^p+b^p)^1p)$ |
Aczél-Alsina & $⊥^A_p(a,b)$ & $1 -exp(-(|log(1-a)|^p+|log(1-b)|^p )^1p)$ |
Dombi & $⊥^D_p(a,b)$ & $(1+( (1-aa)^p +(1-bb)^p )^-1p)^-1$ |
Schweizer-Sklar & $⊥^S_p(a,b)$ & $1-((1-a)^p+(1-b)^p-1)^1p$ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Appendix D Additional Plots

![]() |
![]() |
![]() |
![]() |