1 Introduction
Mathematically it is well known that the problem of inferring shape from shading information—or from contour information—is illposed: there exist many possible surfaces that could give rise to (almost) any type of image structure. In our everyday experience, however, we unconsciously “solve” this inverse problem routinely; we readily and effortlessly infer threedimensional (3D) shape from ambiguous image information. This presents a huge conundrum for theorists: does there exist an invariant that could ground inferences about surface structure on the many different types of image structures and, if so, how might this invariant be used by both brains and machines? In this paper we answer the first part of this question in the affirmative, motivated by two aspects of the second part.
Our approach is motivated by an important (but frequently overlooked) property of human perception: different individuals (or the same individual at different times) perceive quantitatively different but qualitatively similar surfaces—not identical ones—from either shading or contour information (references in the background section below). We take this property to be key: the goal is not to find a unique map between an image and a surface, but rather to identify an equivalence class structure, i.e., to identify which classes of images are consistent with which classes of surfaces. The identification map is then at this abstract level. As we shall show, there are parts of images that do indeed provide a kind of scaffold from which (parts of) surfaces can be reconstructed. We call this scaffold critical contours; in this paper we define them, prove that they are part of a global description, and characterize their invariance over different rendering models. Basically, we show that critical contours in the image are nearly equivalent to critical contours of the surface (slant) function.
We concentrate on image ridges, a geometrical construct that has been important in vision for some time. Viewing the image as a height function, ridges seem intuitively connected to image edges [33, 19], especially those that arise within the interior of a shape [32]. For similar reasons, ridges also relate to features of surface relief [51]. But, to our knowledge, image ridges and surface relief have not been formally identified with one another, except for specific rendering models (e.g., Lambertian). We establish this connection generically; see Figure 1.
The development proceeds in three main steps. We first exploit ridge structure to motivate a limiting process that characterizes how shading distributions concentrate into contours. Second, when the shading distribution orthogonal to the contour is sufficiently “steep,” it becomes what we define as a critical contour. These are special contours that capture the edge connection alluded to above and resemble artists’ line drawings. Building on the surface relief view, they form part of a global, topological network that separates “hills” from “dales.” Formally this network comprises the Morse–Smale (MS) complex on the image [31] and is built with integral curves through the (image) gradient flow that connect maxima, saddles, and minima in a prescribed manner. The MS idea has a rich history in geology [70, 11] but in the modern form is based on singularities of gradient flows [91, 72]. This has three important consequences: (i) it allows a principled (global) simplification process to remove insignificant “bumps” [20]; (ii) the flows ground the computations in physiologically meaningful terms [38]; and (iii) it allows the contours to be interpreted as boundaries of surface parts.
Finally, we show that critical contours are part of the MS complex in a generic sense. Since the natural world is hardly Lambertian, we consider a general class of rendering functions that require little more than dependence on the surface normal (or tangent plane at that point). We provide a completeness theorem: if there is a critical contour in the image for a given surface with one rendering function in the class, then there is a critical contour in the image for every rendering function in this class and, furthermore, they coincide under the limiting process mentioned above. This third result has an unexpected implication: Since the surface slant function [93] is in our class of rendering functions, critical contours from the image and critical contours from the surface slant are, in the limit above, equivalent. Thus we relate imagederived properties directly to surface properties.
Since the MS complex is global, there is a shared partition between the image and the surface. By identifying certain contour inferences with shading inferences, it also constrains how the surface can be “filled in” between the image contours. But it does not reduce the reconstructed surface to a singleton. In effect, by working between the geometry of ridges and the topology of surfaces, we are able to find a foundation for qualitative surface inferences. A roadmap for our approach is shown later in Figure 3; it is described in more detail following the background review.
2 Background
Standard computational approaches to shapefromshading rely on either imposing strong priors (on light sources [94], reflectance models [58], etc.) or imposing a form of regularization tied to a reflectance model (reviews in [36, 10, 82, 105]; see Mach [66] for the original); in either case the goal is a single, unique surface from among the different possibilities [77, 89]. This remains problematic: Recent attempts are brittle for related reasons—they work for some images but not others, largely because of reliance on artificial reflectance models, bas relief priors [78], a delicate combination of regularization terms [4], or training on restricted scene classes [96, 22, 12]. We move beyond this brittleness by seeking a qualitative solution.
2.1 Surface perception is qualitative
Results in visual psychophysics question the goal of seeking a unique surface. While subjects tend to agree on the overall shape, constancy is elusive and percepts differ quantitatively; see [95, 74, 68, 73, 97, 21, 13, 21, 90, 15, 44, 53]. Remarkably, this lack of constancy holds even for special shapes such as cylinders (but see [35, 54]). One possibility is that there are different operational modes [95]; another is that priors are applied selectively to advantage [23]. We propose that the solution is topological in nature.
The idea of different modes is consistent with the view, prominent in computer vision, that shapefromcontour is a separate problem from shapefromshading: Contours are onedimensional entities, while shading is a twodimensional distribution of intensities. For contours the emphasis tends to be on junctions
[100]—the places where surfaces join—to make sure that the surfaces “fit” together properly; see [5, 92, 67, 39]. Again, research in visual psychophysics provides a challenge to this separate view; mutual influences between contours and shading are well documented [84, 98]. We shall show that welldefined, salient, and stable contours can arise out of shaded images; thus both contours and shading play the role of defining surface parts. In our view, shapefromshading and shapefromcontour are deeply related inverse problems; another of our contributions is to show how via a limiting process.While the inference problem works on the inverse direction, the relationship between shading and contour in the forward direction is well established. Nonphotorealistic rendering algorithms are used as visualization techniques for given surfaces [61, 80], which shows how rich surface information can be conveyed to a viewer; this is not unlike what artists draw [14]. For example, “suggestive contours” [17] are built from the loci of points (in the image) where the object almost occludes itself when computed from the surface [17]; see also [42, 87] for related forward computations. Our critical contours—which work for the inverse problem—are related to, but not identical with, suggestive contours. More details on the relationship can be found in the appendix.
Folds in material are a prominent example of when suggestive contours are useful (see da Vinci’s notebooks and [62, 29, 43]); we exploit the fact that folds have rather structured shading across them [56]. They tend to occur along extended anisotropic curvature regions of the shape [60] and are related to ridges in computer vision. Detecting these areas of rapid change in image intensity is a wellstudied problem [33, 65], although the characterization often remains local in terms of differential geometry [32] or singularity theory [16]. But this local characterization may not yield globally connected patterns, and “small” ridges are no different from “steep” ones. Many have explored a multiscale approach to deal with these difficulties [63, 81, 30, 19]; our work exploits a topological multiscale idea.
The transition to global patterns from local descriptors is necessary, and the classical work on “hills and dales” [70, 11, 86] provides a way forward. It considers the integration of vector fields, and applications to describing waterways [75, 3] and images [64, 52, 30, 7, 65, 51] abound. Putting these together, the idea is to view the image as a “landscape,” with “height” proportional to intensity; “water” then flows from the peaks to the valleys. But viewing the image as a landscape does not provide a formal connection back to the underlying surface from which it was rendered. Our critical contours, also computed from the image, will relate directly to the image landscapes sought by these ridge detectors but will further have a connection to the underlying surface.
It is now that we appeal directly to the psychophysical observation above; that perception is qualitatively similar but not quantitatively identical across subjects [95, 74, 68, 73, 76, 97, 21, 13, 21, 90, 15, 44, 53, 103], plus many others. To us “qualitative” implies that we should be seeking that family of surfaces which are “locked down” by the image. Since global, qualitative constructions are the domain of topology, it is here that the MS complex [31, 20] is central. We review the MS complex in the next section; later we shall show that the critical contours are 1cells of the MS complex of the shading function with high transversal intensity changes. It follows, then, that they are also 1cells of the slant (foreshortening) function on the surface.
2.2 The Morse–Smale complex
Our goal is to find patterns in the image, computable from orientations and invariant to a large class of rendering functions, that “anchor” the illposed shapefromshading problem in a qualitative manner. We were inspired by representations of the phase space in nonlinear dynamics and wish to understand how particular contours on the image can constrain global, qualitative shape. It formalizes an intuition from Koenderink (and Picasso): that “vision grasps shape as a hierarchical structure of elliptic patches” [50]. For this, we use the MS complex. Like the watershed algorithms referenced above, the gradient flow is used to assign different regions of the domain of critical points; 2D contours separate these domains into monotonic regions (called 2cells); these regions are then the “parts” of the shape. Importantly, associated with the MS complex is persistence simplification, a principled way to collapse critical points (equivalently, merge the watershed regions) to create a hierarchy [31]. This is the multiscale component of our approach. This introduction is necessarily brief. More complete treatments can be found in [72, 26, 27, 69, 8] and, for motivation, see [71]. See Figure 2 for an illustration.
Given a manifold , consider a smooth scalar function . (Later, we will consider and as an image of a surface.) The gradient exists at every point. A point is called a critical point when . The function is a Morse function if all its critical points are nondegenerate (meaning the Hessian at those points is nonsingular) and if no two critical points have the same function value.
The gradient field gives a direction at every point in the image, except for the critical points, a set of measure zero. Following the vector field will trace out an integral line. Precisely, an integral line is a maximal path on the image whose tangent vectors agree with at every point of the path. These integral lines must end at critical points, where the gradient direction is undefined. Thus, one can define an origin and a destination for each integral line. Further, for each critical point, its ascending manifold is defined as the union of integral lines having that critical point as a common origin. Similarly, its descending manifold is the union of integral lines with that critical point as a common destination.
The type of each critical point is defined by its index
: the number of negative eigenvalues of the Hessian at that point. For scalar functions on
, there are only three types: a maximum (with index 2), a minimum (with index 0), and a saddle point (with index 1). MS functions satisfy an additional transversality condition and are dense in the set of continuous functions. For these, the integral lines only connect critical points of differing index. The ascending manifold associated with a critical point of index is of dimension . Similarly, the descending manifold for an index critical point is dimension .For two critical points and , with the index of one greater than the index of , consider the intersection of the descending manifold of with the ascending manifold of . This intersection will be either a 1D manifold (a curve called a 1cell or watershed) or the empty set. For two critical points and , with the index of two greater than the index of , the intersection of the descending manifold of with the ascending manifold of will either be a 2D manifold (a region called a 2cell) or the empty set. Thus, the intersection of all ascending manifolds with all descending manifolds partitions the manifold into 2D regions surrounded by 1D curves with intersections at the critical points.
The MS complex is a structure that relates a set of contours to a qualitative function representation. With knowledge only of the scalar function at the critical points and 1cells, one could reconstruct the 2cells (and thus the entire function) relatively accurately. For some insight, see [1, 101]. The position, heights, and boundaries of all the bumps, dimples, and ridges are already known and the choices left are how steep to make the transitioning 2cells in between. Thus, there is a natural connection between the scalar function restricted to the 2D curves (the salient 1cells) and the scalar function on the entire domain (the unknown 2cells). Our main theorem will show that particular 1cells of the image will be nearly invariant under changes in the rendering function.
2.3 Biological considerations
Since our approach is partly motivated by psychophysical considerations, we also consider the underlying physiology; Connor [104, Figure 2]
, shows the sensitivity of higherlevel neurons to ridgelike structure. Our concern here is how this process can get started. Orientationselective neurons in the visual cortex are the natural substrate for representing shading information as a flow pattern
[49, 9, 6], and it clearly relates to the gradient flow above. Shapefromshadingflow computations have been analyzed [55], and research supports it [25, 45, 34, 2, 45]. But Todd [97, 21], among others, has questioned (forcefully and, to us, in an influential way) whether isophotes and shading flows suffice, because the isophote pattern changes significantly for different renderings and lightings of the same object (cf. Figure 1, row 2), but our perception hardly varies (with regard to shape). If our percepts were based on the isophotes alone, then they, too, Todd argues, should also change. But isophotes change more in some places than others, and the conditions in our shading limit proposition identify precisely those locations where the isophote structure remains invariant. Anchoring the shape reconstruction on the locations where the isophote structure is consistent could explain how it is possible for brains to make robust (but qualitative) inferences about shape in three dimensions. Neural responses should be robust around critical contours, but not necessarily elsewhere, which implies that different positions are represented differently. (Earlier computational approaches treat all positions as equals.)2.4 Overview
An overview of our argument is as follows (Figure 3). Starting in the upper left corner, we are given an image of a surface created via an unknown rendering function. It might be a shaded image, a line drawing, etc. Classical shapefromshading methods compute directly from this pixel representation and must confront this illposed problem (dotted right arrow). Instead, we proceed to first identify image features (that will correspond to surface features) that are invariant to the (unknown) rendering function; this is shown as “image parts.” For shading functions, these image parts are abstracted to stylized lines in section 5; the relationship is summarized in Lemma 1. This allows us to move to “critical contours,” defined in section 7. Then, Corollary 2 allows us to interpret these critical contours as surface curves with important properties (1cells of the slant MS complex), so that we arrive at “surface 1cells.” These surface 1cells correspond to boundaries of qualitative parts of the surface (bumps, valleys, ridges, etc.) and function as a kind of scaffold on which the surface can be “built.” Various inpainting or diffusion algorithms could complete this scaffold of 1cells back into a scalar field; for an example, see Figure 11. The completion is not unique, which relates directly to perception; the scaffold is the qualitative invariant on which different subjects build their quantitative percept. Thus we arrive at a surface exemplar in the upper right.
The qualitative nature of the solution we are proposing for shape perception bears some resemblance to the categorical parts and necks that can be inferred from the medial axis (or skeleton) [46, 40], but our scheme focuses on interior lines and shading distributions rather than bounding contours. Importantly, just as the medial axis can be used to structure grasping of objects [83], our critical contours may suffice for this as well. When reaching to grasp an object, we preform our hands to reflect the pose and extension of the handle [41, 79]. For both people and robots, then, qualitative properties seem to suffice. In effect, we get global constraints by integrating local conditions on contours; evidence is beginning to accumulate that such segmentations induce psychophysical limits [48].
3 Image formation and assumptions
We now describe the image formation process. Consider an orthogonal projection model and define the 3D coordinate axis so that parametrizes the image plane and is the view direction. Let represent the standard basis (unit vectors in these cardinal directions). We think of the image as being created from a “cue” or, more precisely, by applying a rendering function to the normal field of a smooth surface . That is,
(1)  
(2) 
Many familiar cues have this structure. For example, Lambertian shading is equivalent to for diffuse light sources . The spatial frequency cue of isotropic texture (once ideally blurred) is monotonically related to . Specular shading is equivalent to for specular light sources and some constant . We seek image contours that are always present independent of the choice of .
For theoretical analysis, we consider rendering functions so that gradient fields are Lipschitz continuous. The differential of (2) yields
(3)  
(4)  
(5) 
To go from (3) to (4), note that function composition here is matrix multiplication. To simplify notation, we also drop the point of application of
. (A comment on notation: we will use bold lettering to denote tensors and matrices, while vectors will be in standard lettering. For an introduction to this tensor notation, see the appendix in
[35].)Here, is the 1form corresponding to dot product with the gradient . is a vector and is a matrix. The image gradient orientation, that is, the angle of the vector , is generally dependent on both the surface through the operator and the material/cue through the operator . In the most general scenario, where has no constraint, we cannot constrain from the data . Thus, we will now put some weak limits on , , and .
3.1 Rendering function assumptions
Definition 1
The admissible cue class is the set of differentiable rendering functions satisfying the following two criteria:

Bounded variation: there exists a such that for all rendering functions and .

is concave and is bounded: There exists a constant s.t. for all .
We elaborate on the conditions. The bounded variation condition ensures that arbitrarily large changes in the image cannot be due to the rendering function alone but must also require some change in the normal field. Without a constraint on the rendering function such as this one, we could not decipher between gradients due to material changes (such as a painting) and gradients due to natural shading changes. There is perceptual evidence supporting this condition. If an image feature is to be seen as “shading,” it must have generally low contrast. Very high contrast features are often seen as material changes [34]. The concave condition ensures that if the unit sphere were imaged with a rendering function, we would see only one highlight (point of maximum brightness). Note that this condition also relates to “cloudy day” [58, 57] illumination, where the aperture function plays the role of, or is aided by, the surface normal; this also eliminates [37]. Although our rendering function is quite general, it is designed to facilitate our analysis; some light field effects may not be included [59, 47].
Thus, a given rendering function creates an image of a given imaged surface . Suppose we now choose a new rendering function (e.g., by changing the light source) to get a new image of the same surface. Our main theorem describes an important commonality between these images. A registration correspondence exists between these two images, and , but we will not focus on describing or calculating it here. Instead, to simplify analysis, we will regard the second image as a new scalar field on the same coordinate system: we will prove things regarding .
We now restrict to generic interactions between the rendering function and imaged surface.
3.2 Generic surface assumptions
An image of a surface rarely completely describes the surface. Since shape reconstruction is generally illposed, surfaces can collude with rendering functions to create images that hide surface features. We seek to remove these rare cases via assumptions here. For example, in Lambertian shading, the image is a projection of the surface normal field onto an unknown direction (given by the light source), so variation in the normal field in directions perpendicular to the light source will have no effect on a local image patch: e.g., a Lambertian right cylinder cone with a light source directly above would create a constant intensity image. In this case, can be arbitrarily large while is 0. We wish to avoid cases like these.
A slight change of light source or viewpoint would make the curvature of the cone visible and would therefore lead to large changes in the image. Thus, we use the term “generic” to represent assumptions that remove rare or unstable configurations. These unstable configurations often vanish with a small perturbation of scene parameters, i.e., light source or viewpoint changes. There are two forms of generic that we will assume for our setup:

Given a curve on a surface , we assume that the three column vectors of the unfolded tensor ,
contain at least two that are linearly independent.

Let represent the acute angle between two vectors in . Given a curve on a surface and rendering function , we assume that there exists an such that for all ,
(6) (7) (8) (9) That is, the rendering function’s differential does not happen to align along certain differential properties of the surface normal field. Many of these properties are arbitrarily small measure in the space of continuous configurations. (This removes the Lambertian cone example.) Experimentally, violations of these conditions are rare.
Of course, there are other obstacles to 3D shape reconstruction that we are ignoring, such as multiple scattering, partial occlusions, or textured objects. We are instead focused on understanding the stability and geometric meaning of image contours derived from shading. We now define this relationship.
4 The contour interpreted as a shading limit
We seek a visual pattern that is present across many views and many renderings. We are inspired by artist sketches, where a collection of thin strokes on paper inspires a surface perception. Eventually we will think of these strokes as a robust skeleton for describing part boundaries implied by a shaded image. To understand what is the physical meaning (constraint on the viewed surface) inherent in each stroke, we start by investigating their differential properties.
Consider a line drawing image as a collection of 1D contours. Focusing on one contour, , assume it has bounded image (planar) curvature. For each point , we have a scalar intensity value ; we require this value to be 0 at the endpoints. Without loss of generality, let be arclength parametrized.
Definition 2
An ideal 1D contour , , can be expressed as a scalar field in the following way:
(10)  
(11) 
We wish to understand the behavior of the image derivatives of but, since is discontinuous, the derivatives do not exist. However, we can approximate these derivatives by considering as the limit of a sequence of shaded images as on a tubular neighborhood . We define each shaded image as a convolution with Gaussian functions of
with successively smaller standard deviation
in the following manner.For every point on , we parametrize the local neighborhood with two directions. For convenience, write . Define to be the transversal direction at the point , so . is an orthonormal basis. Let be the corresponding coordinate functions. As we are only interested in the limiting behavior and as has bounded curvature, we realign the frame so that is at the origin and define
(12) 
We can now calculate image derivatives of in the directions (see the appendix) with the results summarized below and illustrated in Figure 4.
Lemma 1
Let be defined as above. The sequence of shaded images converge pointwise to the original line drawing and have the following properties on the derivatives as :

for every .

for .

There exists a constant such that for every .
The first and most important condition implies that the contour sides “pinch in” as . Thus an “ideal contour” can be seen as pointwise close to a shading pattern with the above derivatives. This leads us to define critical contours in the next section, which are nearly invariant shading patterns that mimic these artist’s strokes; afterward we will connect such critical contours to MS complexes.
5 Critical contours
We now define a critical contour, the visual pattern that is (nearly) invariant across the admissible rendering class defined above. This critical contour will have image derivatives similar to those calculated in Lemma 1 for the ideal contour.
Definition 3
A critical contour is a curve on an image such that the following conditions hold for all :

for every ,

for ,

for every ,
for positive .
For the remainder of the paper, let denote a critical contour with the conditions from Definition 3. As , Kcritical contours converge pointwise to the ideal contour defined in (10). In Theorem 1, we show these Kcritical contours are also 1cells of the MS complex in any image obtained from a rendering function in our admissible class. In general, Kcritical contours are “stronger” 1cells that persist if the rendering function is changed. Note the condition above is stronger than the usual condition for intensity to be at a transversal maximum for differential geometric ridges [33].
Theorem 1
Let be any two rendering functions in the admissible cue class. Applying these rendering functions to a generic surface , we obtain two corresponding images . For any , there exists a such that the surface region corresponding to an neighborhood of a critical contour in contains an MS cell for image .
To gain intuition for Theorem 1, consider the surface . Note that is a critical point of , and think of as a height function above a plane with normal vector . Now, define to be another height function from the surface defined by to a different plane with normal vector . In general, is not a critical point for . However, if and is large enough, then will have a critical point arbitrarily close to . Thus, and almost “share” critical points.
From the above example, it is plausible to believe that if certain curvatures of a surface are large enough, scalar fields on resulting from different projections of the normal field may share critical points. We generalize this to find when they also share MS 1cells, a 1D analogue to critical points. See Figure 5. We now need to show that the presence of a critical contour implies sufficient curvature across the contour to support the above intuition.
The proof will follow in three steps (Figure 6). We will show that the presence of a critical contour implies the following “box structure” on in the unknown image function :
We now state Lemmas 2 and 3. Their proofs involve technical calculations via Taylor approximations, so we leave them to the appendix.
Lemma 2
Let be an endpoint of . Given a new rendering function , resulting image , and any , there exists a such that if , the following holds: such that and is a critical point of .
Lemma 3
Recall that is the transversal direction in to . Given a new rendering function , resulting image , and , there exists a such that if , the following holds: Define two curves and . On , and on , .
These two lemmas prove that the vector field behaves as shown in red in Figure 6(b) and that stationary points of are at and . It remains to show that this vector field constraint implies the integral line for inside .
Lemma 4
Let be a critical contour with the conditions from Definition 3. Given a new rendering function , resulting image , and , apply the previous two lemmas. We can find critical points of and two curves arbitrarily close to , as illustrated in Figure 6. Parametrize two line segments with the following properties:
Define the region as that bounded by the curves . Without loss of generality, every integral path of that intersects enters from a point on and leaves on a point on .
Proof
First, we assume that there are no critical points of in the interior of . If there are, bisect into and and repeat the following argument.
Let be the set of all points in on integral curves entering from points on . We say that an integral path enters from when there exists an such that and . Let be the set of all points in on integral curves entering from points on .
Clearly, . It suffices to show that one of the is empty. Suppose not; suppose . Being a tubular neighborhood of a curve , is a topologically connected space in . Thus, and must not be disjoint. There exists a point . As there are no critical points in , . For any , there exists an neighborhood of containing both an integral path and an integral path . However, an integral path is the solution to a differential equation with initial condition . For a Lipschitz continuous gradient field, there is continuous dependence of solutions on the initial conditions [88]. Thus, and must be arbitrarily close together, which yields a contradiction, as they go through points on opposite sides of .
We now have all the pieces to prove the main theorem. It remains to show that, given the conditions in the above lemma, there is an MS 1cell of contained in .
Proof (Proof of Theorem 1)
From Lemma 4, we see that all integral lines flow from a point on to a point on or vice versa. is a critical point on and is a critical point on . As the flow direction on points outward for all , the critical index of and can only differ by at most 1. Without loss of generality, has an incoming integral path starting from the other side of . Thus must be a saddle point and must be an MS 1cell traversing .
Corollary 1
As , as in the case of our ideal contour in Lemma 1, a critical contour in any admissible image represents an MS cell in any other admissible image.
Proof
As , the tubular neighborhood of shrinks to zero width. The integral path must traverse and thus must eventually lie on .
This means that an ideal contour represents a visual commonality among all images of the surface . As the normal slant function is a member of our admissible rendering functions, an ideal contour also lies infinitely close to a surface property: an MS 1cell of the slant function. A decomposition of the slant function into stable and unstable manifolds via its MS 1cells is a representation of the surface (very similar to a concave/convex representation) that we are investigating further. Thus, we can now interpret an ideal image contour as a surface property that “shines through” in every image created by any of the rendering functions.
Corollary 2
For sufficiently large, a critical contour in image of surface aligns with an MS cell of the slant of the surface normal field of .
Proof
Define = as the rendering function corresponding to a Lambertian surface with light source in the view direction . Define as the image associated with the surface using this rendering function. As the slant of the normal field is a monotonic function of the image , it shares the same MS complex as . Apply the theorem to show that aligns with an MS 1cell of .
6 The Morse–Smale complex on shading and slant
We now apply the above theory to a number of different shapes to illustrate how critical contours computed from a shaded image relate to the MS complex on the slant function of the surface. The results are ordered in complexity and correspond to Figures 7, 8, and 9.
A note on methodology. A 3D mesh was generated for each figure, which was then rendered under different conditions to produce each image. We use [85] to calculate the MS complex and consider persistence simplifications with few critical points from these images. Alternate ways to simplify the MS complex in a more salient manner are [87, 102]. We experimentally verified that MS 1cells with large remain positionally stable across these images, as predicted by our theorem. We observe that, because the computations are run directly on quantized pixel values, there are certain numerical issues. We do believe that results could be further improved, while also generalizing to nonsmooth images, by computing on oriented filter responses instead.
The first example (Figure 7) consists of a large bump which, as the light source moves, illustrates the common critiques of flowbased approaches: large movement in the isophotes (and in the location of the maximum in intensity). Notice, however, that two of the MS 1cells (blue curves) form a circle and remain fixed surrounding the bump: these have large ; i.e., these lie along the large brightdarkbright transitions and are the critical contours. Note that there are other 1cells and critical points; these do not satisfy the condition and so are irrelevant to the shape representation.
The second example is the furrow shape (Figure 8) shown from two views and with drastically different lightings. Notice how the isophotes move, how the maxima move, but how the critical contours remain stable.
The next example consists of images of a “blob” shape (Figure 9) constructed from random perturbations of a sphere. Notice again the stability of the critical contours, how these agree across lightings and for the slant function, and how these stable 1cells correspond to the suggestive contours that were computed from the true 3D shape.
In our next example, we experimentally verify Corollary 2. In Figure 10, we overlay the MS complex for the horse image with the MS complex of the slant field. Note the correspondence between the red segmentation, blue segmentation, and suggestive contours, as predicted by our theory. On those curves where the two MS complexes are not in exact alignment, the value of is not sufficiently large. This indicates where the qualitative structure of the slant (of the normal field) can be immediately and robustly inferred from a shaded image via the MS complex.
A consequence of the global nature of the MS complex is that it provides a qualitative solution that segments the surface into salient parts as in Figure 11(b). There is a maximum on each of the four primary lobes, plus several others. The part regions surrounding these are delimited by MS 2cells, as are the interior (less reliable) 2cells. It is these interior 2cells that will shift with the light sources.
A remaining question is how to quantitatively reconstruct a scalar field from only knowledge of its critical contours. This question, for the complete 1cells, has been considered, for example, in [1, 101]. In Figure 11(b), we show a simple example for the furrow object that the segmentation induced by 1cells of the MS complex can be sufficient for a qualitative understanding of the slant. We used the results from row 1 of Figure 8 and diffused the slant value from the occluding contour (where the slant is ) onto the critical contour. Then, we applied an inpainting algorithm [18] to “reconstruct” the scalar slant field. We admit that this is a simple example, but in [28], one can see more complex examples of how the graph structure of the MS complex can capture the essential phenomena of realworld data.
7 Conclusion
We seek a biologically plausible approach to 3D inferences in which an imaged surface is represented via a set of (isophote) contours or equivalent flows. This allows us to separate those portions of the flow that are stable from those that wander, with respect to lighting and the image formation process. We believe this will result in a more robust, nearly invariant approach. To achieve this, we have defined critical contours, computable from the image, with two important characteristics. First, we showed that they are stable over changes in the rendering function. Second, they relate to the MS complex of the surface slant. This allows us to interpret critical contours as boundaries of surface features. Thus, the critical contours are part of a meaningful segmentation of the surface shared by almost all our admissible renderings. (As , “almost” becomes “all.”) It is these stable contours with which we hope to transition from a local (individual gradients) representation to a more global (unions of bumps) representation.
Further, using the MS complex reveals relationships between shading inferences and shapefromsketching, under the same model. Certain (e.g., isotropic) textures may also allow for a similar analysis, when based on estimated foreshortening rather than intensity. In addition, the invariance of the MS complex to monotonic transformations relates to psychophysical observations seen in
[23, 99]. Modeling with the MS complex allows us to assign meaning to individual contours as, e.g., the boundary of a “bump.” By seeking this much weaker surface structure than, e.g., a 3D mesh, we hope to avoid most of the illposedness inherent in the 3D reconstruction problem.We are focusing on two future directions. Critical contours rarely completely segment the image, as the MS complex may also have unstable 1cells. Thus, we are pursuing methods to find the “nearest segmentation” to a set of critical contours to complete the complex. Second, we are analyzing the qualitative conclusions that can be drawn from a full segmentation. A constraint labeling problem arises, namely, which contours bound locally convex parts and which bound locally concave parts.
To summarize, we are arguing for the use of critical contours in 3D shape reconstruction from shading and contours. These critical contours are part of the MS complex of the image and give a shape description that is stable, qualitative, and meaningful. Therefore, reconstruction algorithms explicitly using these image features should be more stable while also explicitly capturing important surface features. We believe studying these topological properties will aid our understanding of how the human visual system is able to see a veridical 3D shape under complex and noisy renderings.
8 Appendix
8.1 Comparison of critical contours and suggestive contours
Suggestive contours are contours (drawn from the surface mesh) that illustrate shape [17]. Their generating equation (notation from [17]) is , where is the view direction and is the view direction projected onto the tangent plane. They are the set of minima of in direction . The slant function is simply the angle between and and so is an transformation of
. We note that extremal curves are invariant under strictly monotonic transformations via the chain rule, so the suggestive contours can also be seen as maxima of the slant function
in the direction .Under orthographic projection, it is a simple calculation to see that is proportional to the surface gradient (or tilt direction) projected onto the image plane. Thus, under orthographic projection, we can rewrite the generating equation for suggestive contours as the set of points satisfying .
We compare to a critical contour; these are the 1cells of by Theorem 1, yet computable from the image. We see that suggestive contours depend on whether points in the gradient direction, whereas critical contours depend on the global properties of the field.
8.2 Proof of Lemma 1
Lemma
Let be defined as in section 4. The sequence of shaded images converge pointwise to the original line drawing and have the following properties on the derivatives as :

for every .

for .

There exists a constant such that for every .
Proof
Start from (12):
(13)  
(14) 
which defines a sequence of shaded images so that . We now compute image derivatives, up to second order, along our contour for these shaded images, . Taking the limit as , we will inherit derivatives in the limit for .
We differentiate in the tangent direction:
(15)  
(16) 
For each point on , the limit as is as expected. (That is, the image derivative along the contour is the limit of the derivatives along the shading approximations to the contour.)
We repeat the same process for the remainder of the derivatives up to second order and get
(17)  
(18)  
(19)  
(20) 
We also would like the image derivatives at the endpoints of . To calculate these approximations, we apply a Heaviside step function to the endpoints of the curve . To make the integral feasible, we Taylor approximate the intensity on the contour up to second order:
(21) 
for some constants . This “approximation” becomes exact as . If we are at an endpoint and we move in the positive tangent direction, the image intensity is defined by this Taylor expansion. If we move in the negative tangent direction, the image intensity is zero. For example,
(22)  
(23)  
(24) 
As we require the contour intensity to be at the endpoint, we can set and calculate the limit of as to get .
We can also calculate the other image derivatives at the endpoint . (Note that the other endpoint, , is just the mirror version; we use instead of .)
(25)  
(26)  
(27)  
(28) 
These calculations are consolidated into Lemma 1.
8.3 Proof of Lemmas 2 and 3
We first prove the following lemma, Lemma 5, that allows us to bound terms in a Taylor expansion of and from image derivatives at a point.
Lemma 5
Assume the generic and rendering function assumptions in section 3 hold. Let be a point on a critical contour . If , then is bounded. Similarly, and imply is bounded.
Proof
(29)  
(30) 
By genericity property 2, we must have bounded in Frobenius norm . We see here that this prevents an infinitesimal change in the rendering function (that is, ) resulting in an unbounded change in the image gradient .
We repeat the same argument for , taking one further derivative and leaving off the subscript for clarity:
(31)  
(32)  
(33) 
On the lefthand side, is bounded as each of are bounded. Applying the second generic property, we see that generically is also bounded.
8.3.1 Proof of Lemma 2
Lemma
Let be an endpoint of the critical contour with the conditions from Definition 3. Given a new rendering function , resulting image , and any , there exists a such that if , the following holds: such that and is a critical point of .
Proof
We will consider the image of the normal field on the Gauss sphere in the neighborhood of . The main idea is that if a differentiable function has a large enough gradient at a point , then it has a zero inside a neighborhood of . We take two derivatives of the equation to get the following equation of tensors:
(34) 
To calculate , we replace both with and let , where is any point on . By the concave rendering function assumption,
(35) 
If , then
(36) 
As by the rendering function assumptions in Definition 1,
(37) 
Similarly, for , we get
(38) 
Recall that is the differential of the second rendering function. We expand the operator with a first order multivariate Taylor expansion of around . For example, the derivative of in the direction is .
(39) 
From (37) and (38), we know that and are sufficiently large; we want to find a such that is precisely 0.
We use the first generic property to assume that the span of three vectors contains at least two linearly independent ones. This implies that and are not parallel vectors and thus they span a plane . Generically, contains an intersection point with the unknown vector . That intersection point defines a satisfying .
The above matrix equation (40) represents two equations:
(41)  
(42) 
We note that and
Comments
There are no comments yet.