Critical Contours: An Invariant Linking Image Flow with Salient Surface Organization

05/20/2017 ∙ by Benjamin S. Kunsberg, et al. ∙ Brown University Yale University 0

We exploit a key result from visual psychophysics -- that individuals perceive shape qualitatively -- to develop a geometrical/topological invariant (the Morse-Smale complex) relating image structure with surface structure. Differences across individuals are minimal near certain configurations such as ridges and boundaries, and it is these configurations that are often represented in line drawings. In particular, we introduce a method for inferring qualitative 3D shape from shading patterns that link the shape-from-shading inference with shape-from-contour. For a given shape, certain shading patches become "line drawings" in a well-defined limit. Under this limit, and invariantly, these shading patterns provide a topological description of the surface. We further show that, under this model, the contours partition the surface into meaningful parts using the Morse-Smale complex. Critical contours are the (perceptually) stable parts of this complex and are invariant over a wide class of rendering models. Intuitively, our main result shows that critical contours partition smooth surfaces into bumps and valleys, in effect providing a scaffold on the image from which a full surface can be interpolated.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 11

page 23

page 24

page 25

page 26

page 27

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mathematically it is well known that the problem of inferring shape from shading information—or from contour information—is ill-posed: there exist many possible surfaces that could give rise to (almost) any type of image structure. In our everyday experience, however, we unconsciously “solve” this inverse problem routinely; we readily and effortlessly infer three-dimensional (3D) shape from ambiguous image information. This presents a huge conundrum for theorists: does there exist an invariant that could ground inferences about surface structure on the many different types of image structures and, if so, how might this invariant be used by both brains and machines? In this paper we answer the first part of this question in the affirmative, motivated by two aspects of the second part.

Our approach is motivated by an important (but frequently overlooked) property of human perception: different individuals (or the same individual at different times) perceive quantitatively different but qualitatively similar surfaces—not identical ones—from either shading or contour information (references in the background section below). We take this property to be key: the goal is not to find a unique map between an image and a surface, but rather to identify an equivalence class structure, i.e., to identify which classes of images are consistent with which classes of surfaces. The identification map is then at this abstract level. As we shall show, there are parts of images that do indeed provide a kind of scaffold from which (parts of) surfaces can be reconstructed. We call this scaffold critical contours; in this paper we define them, prove that they are part of a global description, and characterize their invariance over different rendering models. Basically, we show that critical contours in the image are nearly equivalent to critical contours of the surface (slant) function.

We concentrate on image ridges, a geometrical construct that has been important in vision for some time. Viewing the image as a height function, ridges seem intuitively connected to image edges [33, 19], especially those that arise within the interior of a shape [32]. For similar reasons, ridges also relate to features of surface relief [51]. But, to our knowledge, image ridges and surface relief have not been formally identified with one another, except for specific rendering models (e.g., Lambertian). We establish this connection generically; see Figure 1.

Figure 1: Qualitative descriptions of D shapes from unknown cues are stable. Row 1: A random shape with different lightings, a slight rotation, and a nonmonotonic transformation () as in [24]. Row 2: Isophote patterns generally vary for each corresponding case above yet are stable in some positions (e.g., along the sides of the protrusion). Rows 3, 4: Similar to Rows , but with a horse model. Right: Suggestive contours [17]

for the same models. Note the similarity between the stable flow patterns and the suggestive contour positions. The method we introduce is inspired by such line drawing images. We will develop this visual commonality (critical contours) using vector field topology.

The development proceeds in three main steps. We first exploit ridge structure to motivate a limiting process that characterizes how shading distributions concentrate into contours. Second, when the shading distribution orthogonal to the contour is sufficiently “steep,” it becomes what we define as a critical contour. These are special contours that capture the edge connection alluded to above and resemble artists’ line drawings. Building on the surface relief view, they form part of a global, topological network that separates “hills” from “dales.” Formally this network comprises the Morse–Smale (MS) complex on the image [31] and is built with integral curves through the (image) gradient flow that connect maxima, saddles, and minima in a prescribed manner. The MS idea has a rich history in geology [70, 11] but in the modern form is based on singularities of gradient flows [91, 72]. This has three important consequences: (i) it allows a principled (global) simplification process to remove insignificant “bumps” [20]; (ii) the flows ground the computations in physiologically meaningful terms [38]; and (iii) it allows the contours to be interpreted as boundaries of surface parts.

Finally, we show that critical contours are part of the MS complex in a generic sense. Since the natural world is hardly Lambertian, we consider a general class of rendering functions that require little more than dependence on the surface normal (or tangent plane at that point). We provide a completeness theorem: if there is a critical contour in the image for a given surface with one rendering function in the class, then there is a critical contour in the image for every rendering function in this class and, furthermore, they coincide under the limiting process mentioned above. This third result has an unexpected implication: Since the surface slant function [93] is in our class of rendering functions, critical contours from the image and critical contours from the surface slant are, in the limit above, equivalent. Thus we relate image-derived properties directly to surface properties.

Since the MS complex is global, there is a shared partition between the image and the surface. By identifying certain contour inferences with shading inferences, it also constrains how the surface can be “filled in” between the image contours. But it does not reduce the reconstructed surface to a singleton. In effect, by working between the geometry of ridges and the topology of surfaces, we are able to find a foundation for qualitative surface inferences. A roadmap for our approach is shown later in Figure 3; it is described in more detail following the background review.

2 Background

Standard computational approaches to shape-from-shading rely on either imposing strong priors (on light sources [94], reflectance models [58], etc.) or imposing a form of regularization tied to a reflectance model (reviews in [36, 10, 82, 105]; see Mach [66] for the original); in either case the goal is a single, unique surface from among the different possibilities [77, 89]. This remains problematic: Recent attempts are brittle for related reasons—they work for some images but not others, largely because of reliance on artificial reflectance models, bas relief priors [78], a delicate combination of regularization terms [4], or training on restricted scene classes [96, 22, 12]. We move beyond this brittleness by seeking a qualitative solution.

2.1 Surface perception is qualitative

Results in visual psychophysics question the goal of seeking a unique surface. While subjects tend to agree on the overall shape, constancy is elusive and percepts differ quantitatively; see [95, 74, 68, 73, 97, 21, 13, 21, 90, 15, 44, 53]. Remarkably, this lack of constancy holds even for special shapes such as cylinders (but see [35, 54]). One possibility is that there are different operational modes [95]; another is that priors are applied selectively to advantage [23]. We propose that the solution is topological in nature.

The idea of different modes is consistent with the view, prominent in computer vision, that shape-from-contour is a separate problem from shape-from-shading: Contours are one-dimensional entities, while shading is a two-dimensional distribution of intensities. For contours the emphasis tends to be on junctions

[100]—the places where surfaces join—to make sure that the surfaces “fit” together properly; see [5, 92, 67, 39]. Again, research in visual psychophysics provides a challenge to this separate view; mutual influences between contours and shading are well documented [84, 98]. We shall show that well-defined, salient, and stable contours can arise out of shaded images; thus both contours and shading play the role of defining surface parts. In our view, shape-from-shading and shape-from-contour are deeply related inverse problems; another of our contributions is to show how via a limiting process.

While the inference problem works on the inverse direction, the relationship between shading and contour in the forward direction is well established. Nonphotorealistic rendering algorithms are used as visualization techniques for given surfaces [61, 80], which shows how rich surface information can be conveyed to a viewer; this is not unlike what artists draw [14]. For example, “suggestive contours” [17] are built from the loci of points (in the image) where the object almost occludes itself when computed from the surface [17]; see also [42, 87] for related forward computations. Our critical contours—which work for the inverse problem—are related to, but not identical with, suggestive contours. More details on the relationship can be found in the appendix.

Folds in material are a prominent example of when suggestive contours are useful (see da Vinci’s notebooks and [62, 29, 43]); we exploit the fact that folds have rather structured shading across them [56]. They tend to occur along extended anisotropic curvature regions of the shape [60] and are related to ridges in computer vision. Detecting these areas of rapid change in image intensity is a well-studied problem [33, 65], although the characterization often remains local in terms of differential geometry [32] or singularity theory [16]. But this local characterization may not yield globally connected patterns, and “small” ridges are no different from “steep” ones. Many have explored a multiscale approach to deal with these difficulties [63, 81, 30, 19]; our work exploits a topological multiscale idea.

The transition to global patterns from local descriptors is necessary, and the classical work on “hills and dales” [70, 11, 86] provides a way forward. It considers the integration of vector fields, and applications to describing waterways [75, 3] and images [64, 52, 30, 7, 65, 51] abound. Putting these together, the idea is to view the image as a “landscape,” with “height” proportional to intensity; “water” then flows from the peaks to the valleys. But viewing the image as a landscape does not provide a formal connection back to the underlying surface from which it was rendered. Our critical contours, also computed from the image, will relate directly to the image landscapes sought by these ridge detectors but will further have a connection to the underlying surface.

It is now that we appeal directly to the psychophysical observation above; that perception is qualitatively similar but not quantitatively identical across subjects [95, 74, 68, 73, 76, 97, 21, 13, 21, 90, 15, 44, 53, 103], plus many others. To us “qualitative” implies that we should be seeking that family of surfaces which are “locked down” by the image. Since global, qualitative constructions are the domain of topology, it is here that the MS complex [31, 20] is central. We review the MS complex in the next section; later we shall show that the critical contours are 1-cells of the MS complex of the shading function with high transversal intensity changes. It follows, then, that they are also 1-cells of the slant (foreshortening) function on the surface.

2.2 The Morse–Smale complex

Our goal is to find patterns in the image, computable from orientations and invariant to a large class of rendering functions, that “anchor” the ill-posed shape-from-shading problem in a qualitative manner. We were inspired by representations of the phase space in nonlinear dynamics and wish to understand how particular contours on the image can constrain global, qualitative shape. It formalizes an intuition from Koenderink (and Picasso): that “vision grasps shape as a hierarchical structure of elliptic patches” [50]. For this, we use the MS complex. Like the watershed algorithms referenced above, the gradient flow is used to assign different regions of the domain of critical points; 2D contours separate these domains into monotonic regions (called 2-cells); these regions are then the “parts” of the shape. Importantly, associated with the MS complex is persistence simplification, a principled way to collapse critical points (equivalently, merge the watershed regions) to create a hierarchy [31]. This is the multiscale component of our approach. This introduction is necessarily brief. More complete treatments can be found in [72, 26, 27, 69, 8] and, for motivation, see [71]. See Figure 2 for an illustration.

Given a -manifold , consider a smooth scalar function . (Later, we will consider and as an image of a surface.) The gradient exists at every point. A point is called a critical point when . The function is a Morse function if all its critical points are nondegenerate (meaning the Hessian at those points is nonsingular) and if no two critical points have the same function value.

The gradient field gives a direction at every point in the image, except for the critical points, a set of measure zero. Following the vector field will trace out an integral line. Precisely, an integral line is a maximal path on the image whose tangent vectors agree with at every point of the path. These integral lines must end at critical points, where the gradient direction is undefined. Thus, one can define an origin and a destination for each integral line. Further, for each critical point, its ascending manifold is defined as the union of integral lines having that critical point as a common origin. Similarly, its descending manifold is the union of integral lines with that critical point as a common destination.

The type of each critical point is defined by its index

: the number of negative eigenvalues of the Hessian at that point. For scalar functions on

, there are only three types: a maximum (with index 2), a minimum (with index 0), and a saddle point (with index 1). MS functions satisfy an additional transversality condition and are dense in the set of continuous functions. For these, the integral lines only connect critical points of differing index. The ascending manifold associated with a critical point of index is of dimension . Similarly, the descending manifold for an index critical point is dimension .

For two critical points and , with the index of one greater than the index of , consider the intersection of the descending manifold of with the ascending manifold of . This intersection will be either a 1D manifold (a curve called a 1-cell or watershed) or the empty set. For two critical points and , with the index of two greater than the index of , the intersection of the descending manifold of with the ascending manifold of will either be a 2D manifold (a region called a 2-cell) or the empty set. Thus, the intersection of all ascending manifolds with all descending manifolds partitions the manifold into 2D regions surrounded by 1D curves with intersections at the critical points.

The MS complex is a structure that relates a set of contours to a qualitative function representation. With knowledge only of the scalar function at the critical points and 1-cells, one could reconstruct the 2-cells (and thus the entire function) relatively accurately. For some insight, see [1, 101]. The position, heights, and boundaries of all the bumps, dimples, and ridges are already known and the choices left are how steep to make the transitioning 2-cells in between. Thus, there is a natural connection between the scalar function restricted to the 2D curves (the salient 1-cells) and the scalar function on the entire domain (the unknown 2-cells). Our main theorem will show that particular 1-cells of the image will be nearly invariant under changes in the rendering function.

Figure 2: Illustration of the MS complex for a scalar function in two dimensions. (left column) A “mountain range” seen from the side and from above. Contours are level sets in height. If this scalar function was image intensity, the level curves would be isophotes. Colored regions represent -cells of the MS complex. White curves represent -cells (contours) of the MS complex. Maxima, saddles, and minima are represented by solid blue points, crosses, and solid white points, respectively. Each -cell is combinatorially a quadrilateral, but it is possible that the saddles may be identified together creating a loop. This figure illustrates how a set of D contours can represent the D surface, up to monotonic transformations on each region. Figure from [31].

2.3 Biological considerations

Since our approach is partly motivated by psychophysical considerations, we also consider the underlying physiology; Connor [104, Figure 2]

, shows the sensitivity of higher-level neurons to ridge-like structure. Our concern here is how this process can get started. Orientation-selective neurons in the visual cortex are the natural substrate for representing shading information as a flow pattern

[49, 9, 6], and it clearly relates to the gradient flow above. Shape-from-shading-flow computations have been analyzed [55], and research supports it [25, 45, 34, 2, 45]. But Todd [97, 21], among others, has questioned (forcefully and, to us, in an influential way) whether isophotes and shading flows suffice, because the isophote pattern changes significantly for different renderings and lightings of the same object (cf. Figure 1, row 2), but our perception hardly varies (with regard to shape). If our percepts were based on the isophotes alone, then they, too, Todd argues, should also change. But isophotes change more in some places than others, and the conditions in our shading limit proposition identify precisely those locations where the isophote structure remains invariant. Anchoring the shape reconstruction on the locations where the isophote structure is consistent could explain how it is possible for brains to make robust (but qualitative) inferences about shape in three dimensions. Neural responses should be robust around critical contours, but not necessarily elsewhere, which implies that different positions are represented differently. (Earlier computational approaches treat all positions as equals.)

2.4 Overview

Figure 3: Overview of our approach, starting in the upper left corner. Suppose we are given an image of a surface created via an unknown rendering function. Classical shape-from-shading methods attempt to infer, from this pixel representation, a unique surface. We follow the vertical path and identify those image features that will correspond to surface features. These are the critical contours that delimit “image parts.” Critical contours are invariant to the (unknown) rendering function. We show specifically how shading “concentrates” into such contours; we call this the shading-contour limit. This allows us to move to “critical contours,” defined in section , and Corollary 2 allows us to interpret these critical contours as surface curves with important properties. We arrive at “surface -cells,” which correspond to qualitative parts of the surface (bumps, valleys, ridges, etc.); these can be completed back into a scalar field such as surface slant, which leads, finally, to a surface exemplar in the upper right.

An overview of our argument is as follows (Figure 3). Starting in the upper left corner, we are given an image of a surface created via an unknown rendering function. It might be a shaded image, a line drawing, etc. Classical shape-from-shading methods compute directly from this pixel representation and must confront this ill-posed problem (dotted right arrow). Instead, we proceed to first identify image features (that will correspond to surface features) that are invariant to the (unknown) rendering function; this is shown as “image parts.” For shading functions, these image parts are abstracted to stylized lines in section 5; the relationship is summarized in Lemma 1. This allows us to move to “critical contours,” defined in section 7. Then, Corollary 2 allows us to interpret these critical contours as surface curves with important properties (1-cells of the slant MS complex), so that we arrive at “surface 1-cells.” These surface 1-cells correspond to boundaries of qualitative parts of the surface (bumps, valleys, ridges, etc.) and function as a kind of scaffold on which the surface can be “built.” Various inpainting or diffusion algorithms could complete this scaffold of 1-cells back into a scalar field; for an example, see Figure 11. The completion is not unique, which relates directly to perception; the scaffold is the qualitative invariant on which different subjects build their quantitative percept. Thus we arrive at a surface exemplar in the upper right.

The qualitative nature of the solution we are proposing for shape perception bears some resemblance to the categorical parts and necks that can be inferred from the medial axis (or skeleton) [46, 40], but our scheme focuses on interior lines and shading distributions rather than bounding contours. Importantly, just as the medial axis can be used to structure grasping of objects [83], our critical contours may suffice for this as well. When reaching to grasp an object, we preform our hands to reflect the pose and extension of the handle [41, 79]. For both people and robots, then, qualitative properties seem to suffice. In effect, we get global constraints by integrating local conditions on contours; evidence is beginning to accumulate that such segmentations induce psychophysical limits [48].

3 Image formation and assumptions

We now describe the image formation process. Consider an orthogonal projection model and define the 3D coordinate axis so that parametrizes the image plane and is the view direction. Let represent the standard basis (unit vectors in these cardinal directions). We think of the image as being created from a “cue” or, more precisely, by applying a rendering function to the normal field of a smooth surface . That is,

(1)
(2)

Many familiar cues have this structure. For example, Lambertian shading is equivalent to for diffuse light sources . The spatial frequency cue of isotropic texture (once ideally blurred) is monotonically related to . Specular shading is equivalent to for specular light sources and some constant . We seek image contours that are always present independent of the choice of .

For theoretical analysis, we consider rendering functions so that gradient fields are Lipschitz continuous. The differential of (2) yields

(3)
(4)
(5)

To go from (3) to (4), note that function composition here is matrix multiplication. To simplify notation, we also drop the point of application of

. (A comment on notation: we will use bold lettering to denote tensors and matrices, while vectors will be in standard lettering. For an introduction to this tensor notation, see the appendix in

[35].)

Here, is the 1-form corresponding to dot product with the gradient . is a vector and is a matrix. The image gradient orientation, that is, the angle of the vector , is generally dependent on both the surface through the operator and the material/cue through the operator . In the most general scenario, where has no constraint, we cannot constrain from the data . Thus, we will now put some weak limits on , , and .

3.1 Rendering function assumptions

Definition 1

The admissible cue class is the set of differentiable rendering functions satisfying the following two criteria:

  1. Bounded variation: there exists a such that for all rendering functions and .

  2. is concave and is bounded: There exists a constant s.t. for all .

We elaborate on the conditions. The bounded variation condition ensures that arbitrarily large changes in the image cannot be due to the rendering function alone but must also require some change in the normal field. Without a constraint on the rendering function such as this one, we could not decipher between gradients due to material changes (such as a painting) and gradients due to natural shading changes. There is perceptual evidence supporting this condition. If an image feature is to be seen as “shading,” it must have generally low contrast. Very high contrast features are often seen as material changes [34]. The concave condition ensures that if the unit sphere were imaged with a rendering function, we would see only one highlight (point of maximum brightness). Note that this condition also relates to “cloudy day” [58, 57] illumination, where the aperture function plays the role of, or is aided by, the surface normal; this also eliminates [37]. Although our rendering function is quite general, it is designed to facilitate our analysis; some light field effects may not be included [59, 47].

Thus, a given rendering function creates an image of a given imaged surface . Suppose we now choose a new rendering function (e.g., by changing the light source) to get a new image of the same surface. Our main theorem describes an important commonality between these images. A registration correspondence exists between these two images, and , but we will not focus on describing or calculating it here. Instead, to simplify analysis, we will regard the second image as a new scalar field on the same coordinate system: we will prove things regarding .

We now restrict to generic interactions between the rendering function and imaged surface.

3.2 Generic surface assumptions

An image of a surface rarely completely describes the surface. Since shape reconstruction is generally ill-posed, surfaces can collude with rendering functions to create images that hide surface features. We seek to remove these rare cases via assumptions here. For example, in Lambertian shading, the image is a projection of the surface normal field onto an unknown direction (given by the light source), so variation in the normal field in directions perpendicular to the light source will have no effect on a local image patch: e.g., a Lambertian right cylinder cone with a light source directly above would create a constant intensity image. In this case, can be arbitrarily large while is 0. We wish to avoid cases like these.

A slight change of light source or viewpoint would make the curvature of the cone visible and would therefore lead to large changes in the image. Thus, we use the term “generic” to represent assumptions that remove rare or unstable configurations. These unstable configurations often vanish with a small perturbation of scene parameters, i.e., light source or viewpoint changes. There are two forms of generic that we will assume for our setup:

  1. Given a curve on a surface , we assume that the three column vectors of the unfolded tensor ,

    contain at least two that are linearly independent.

  2. Let represent the acute angle between two vectors in . Given a curve on a surface and rendering function , we assume that there exists an such that for all ,

    (6)
    (7)
    (8)
    (9)

    That is, the rendering function’s differential does not happen to align along certain differential properties of the surface normal field. Many of these properties are arbitrarily small measure in the space of continuous configurations. (This removes the Lambertian cone example.) Experimentally, violations of these conditions are rare.

Of course, there are other obstacles to 3D shape reconstruction that we are ignoring, such as multiple scattering, partial occlusions, or textured objects. We are instead focused on understanding the stability and geometric meaning of image contours derived from shading. We now define this relationship.

4 The contour interpreted as a shading limit

a) b) c)

Figure 4: We inherit the meaning of a contour by considering it as a limit of shaded images. (a) From left to right, we start with a blurred version of the lower left contour from Figure 1, drawn in blue. Isophotes are in green. As we remove the blurring, we represent an element of the shading sequence of the contour with successively smaller . (b) We show isophotes from a Lambertian shaded image of the surface. Note the similarity of the two isophote patterns near the contour: on either side of the contour the direction of the level curves rotates to be nearly tangential to the contour. Also, note how the contour (blue) is nearly a gradient flow of the shaded image. The dotted red line represents a transversal direction, with plotted pixel values in (c). Note the steep local shading minimum across the contour; this relates to the definition of height ridge found in [63].

We seek a visual pattern that is present across many views and many renderings. We are inspired by artist sketches, where a collection of thin strokes on paper inspires a surface perception. Eventually we will think of these strokes as a robust skeleton for describing part boundaries implied by a shaded image. To understand what is the physical meaning (constraint on the viewed surface) inherent in each stroke, we start by investigating their differential properties.

Consider a line drawing image as a collection of 1D contours. Focusing on one contour, , assume it has bounded image (planar) curvature. For each point , we have a scalar intensity value ; we require this value to be 0 at the endpoints. Without loss of generality, let be arc-length parametrized.

Definition 2

An ideal 1D contour , , can be expressed as a scalar field in the following way:

(10)
(11)

We wish to understand the behavior of the image derivatives of but, since is discontinuous, the derivatives do not exist. However, we can approximate these derivatives by considering as the limit of a sequence of shaded images as on a tubular neighborhood . We define each shaded image as a convolution with Gaussian functions of

with successively smaller standard deviation

in the following manner.

For every point on , we parametrize the local neighborhood with two directions. For convenience, write . Define to be the transversal direction at the point , so . is an orthonormal basis. Let be the corresponding coordinate functions. As we are only interested in the limiting behavior and as has bounded curvature, we realign the frame so that is at the origin and define

(12)

We can now calculate image derivatives of in the directions (see the appendix) with the results summarized below and illustrated in Figure 4.

Lemma 1

Let be defined as above. The sequence of shaded images converge pointwise to the original line drawing and have the following properties on the derivatives as :

  1. for every .

  2. for .

  3. There exists a constant such that for every .

The first and most important condition implies that the contour sides “pinch in” as . Thus an “ideal contour” can be seen as pointwise close to a shading pattern with the above derivatives. This leads us to define critical contours in the next section, which are nearly invariant shading patterns that mimic these artist’s strokes; afterward we will connect such critical contours to MS complexes.

5 Critical contours

We now define a critical contour, the visual pattern that is (nearly) invariant across the admissible rendering class defined above. This critical contour will have image derivatives similar to those calculated in Lemma 1 for the ideal contour.

Definition 3

A -critical contour is a curve on an image such that the following conditions hold for all :

  1. for every ,

  2. for ,

  3. for every ,

for positive .

For the remainder of the paper, let denote a -critical contour with the conditions from Definition 3. As , K-critical contours converge pointwise to the ideal contour defined in (10). In Theorem 1, we show these K-critical contours are also 1-cells of the MS complex in any image obtained from a rendering function in our admissible class. In general, K-critical contours are “stronger” 1-cells that persist if the rendering function is changed. Note the condition above is stronger than the usual condition for intensity to be at a transversal maximum for differential geometric ridges [33].

Theorem 1

Let be any two rendering functions in the admissible cue class. Applying these rendering functions to a generic surface , we obtain two corresponding images . For any , there exists a such that the surface region corresponding to an -neighborhood of a -critical contour in contains an MS -cell for image .

To gain intuition for Theorem 1, consider the surface . Note that is a critical point of , and think of as a height function above a plane with normal vector . Now, define to be another height function from the surface defined by to a different plane with normal vector . In general, is not a critical point for . However, if and is large enough, then will have a critical point arbitrarily close to . Thus, and almost “share” critical points.

Figure 5: In a surface region with anisotropic curvature (the two principal curvatures differ vastly), the image gradient flows are robust as we change the light source. Each curve represents an MS -cell (critical contour) on the image corresponding to the light source with the same color. As we move the light source from A to B, the integral path shifts a small amount.

a) b)

Figure 6: We have drawn the tubular neighborhood with critical contour as a straight line for simplicity. We show there exists an MS -cell of the new image given the conditions in Definition 3 on . Lemmas 2 and 3 show the existence of and with the property that is outward facing, shown as the red vectors pointing outside . Theorem 1 concludes that there must be an integral path of , shown as , inside and this must be a -cell of .

From the above example, it is plausible to believe that if certain curvatures of a surface are large enough, scalar fields on resulting from different projections of the normal field may share critical points. We generalize this to find when they also share MS 1-cells, a 1D analogue to critical points. See Figure 5. We now need to show that the presence of a critical contour implies sufficient curvature across the contour to support the above intuition.

The proof will follow in three steps (Figure 6). We will show that the presence of a -critical contour implies the following “-box structure” on in the unknown image function :

  1. In Lemma 2, we show there exist critical points that are -close to the endpoints of .

  2. In Lemma 3, we show there exists two curves in the -tubular neighborhood where the gradient points away from . These two lemmas give the “-box structure” on .

  3. In Lemma 4, we show contains a 1-cell, shown as . This is proven by first showing, without loss of generality, that all integral paths flow from left to right. Then, either or must be a saddle and there is an integral path that traverses connecting to at least one of them, proving Theorem 1.

We now state Lemmas 2 and 3. Their proofs involve technical calculations via Taylor approximations, so we leave them to the appendix.

Lemma 2

Let be an endpoint of . Given a new rendering function , resulting image , and any , there exists a such that if , the following holds: such that and is a critical point of .

Lemma 3

Recall that is the transversal direction in to . Given a new rendering function , resulting image , and , there exists a such that if , the following holds: Define two curves and . On , and on , .

These two lemmas prove that the vector field behaves as shown in red in Figure 6(b) and that stationary points of are at and . It remains to show that this vector field constraint implies the integral line for inside .

Lemma 4

Let be a -critical contour with the conditions from Definition 3. Given a new rendering function , resulting image , and , apply the previous two lemmas. We can find critical points of and two curves arbitrarily close to , as illustrated in Figure 6. Parametrize two line segments with the following properties:

Define the region as that bounded by the curves . Without loss of generality, every integral path of that intersects enters from a point on and leaves on a point on .

Proof

First, we assume that there are no critical points of in the interior of . If there are, bisect into and and repeat the following argument.

Let be the set of all points in on integral curves entering from points on . We say that an integral path enters from when there exists an such that and . Let be the set of all points in on integral curves entering from points on .

Clearly, . It suffices to show that one of the is empty. Suppose not; suppose . Being a tubular neighborhood of a curve , is a topologically connected space in . Thus, and must not be disjoint. There exists a point . As there are no critical points in , . For any , there exists an neighborhood of containing both an integral path and an integral path . However, an integral path is the solution to a differential equation with initial condition . For a Lipschitz continuous gradient field, there is continuous dependence of solutions on the initial conditions [88]. Thus, and must be arbitrarily close together, which yields a contradiction, as they go through points on opposite sides of .

We now have all the pieces to prove the main theorem. It remains to show that, given the conditions in the above lemma, there is an MS 1-cell of contained in .

Proof (Proof of Theorem 1)

From Lemma 4, we see that all integral lines flow from a point on to a point on or vice versa. is a critical point on and is a critical point on . As the flow direction on points outward for all , the critical index of and can only differ by at most 1. Without loss of generality, has an incoming integral path starting from the other side of . Thus must be a saddle point and must be an MS 1-cell traversing .   

Corollary 1

As , as in the case of our ideal contour in Lemma 1, a -critical contour in any admissible image represents an MS -cell in any other admissible image.

Proof

As , the tubular neighborhood of shrinks to zero width. The integral path must traverse and thus must eventually lie on .

This means that an ideal contour represents a visual commonality among all images of the surface . As the normal slant function is a member of our admissible rendering functions, an ideal contour also lies infinitely close to a surface property: an MS 1-cell of the slant function. A decomposition of the slant function into stable and unstable manifolds via its MS 1-cells is a representation of the surface (very similar to a concave/convex representation) that we are investigating further. Thus, we can now interpret an ideal image contour as a surface property that “shines through” in every image created by any of the rendering functions.

Corollary 2

For sufficiently large, a -critical contour in image of surface aligns with an MS -cell of the slant of the surface normal field of .

Proof

Define = as the rendering function corresponding to a Lambertian surface with light source in the view direction . Define as the image associated with the surface using this rendering function. As the slant of the normal field is a monotonic function of the image , it shares the same MS complex as . Apply the theorem to show that aligns with an MS 1-cell of .

6 The Morse–Smale complex on shading and slant

We now apply the above theory to a number of different shapes to illustrate how critical contours computed from a shaded image relate to the MS complex on the slant function of the surface. The results are ordered in complexity and correspond to Figures 7, 8, and 9.

A note on methodology. A 3D mesh was generated for each figure, which was then rendered under different conditions to produce each image. We use [85] to calculate the MS complex and consider persistence simplifications with few critical points from these images. Alternate ways to simplify the MS complex in a more salient manner are [87, 102]. We experimentally verified that MS 1-cells with large remain positionally stable across these images, as predicted by our theorem. We observe that, because the computations are run directly on quantized pixel values, there are certain numerical issues. We do believe that results could be further improved, while also generalizing to nonsmooth images, by computing on oriented filter responses instead.

The first example (Figure 7) consists of a large bump which, as the light source moves, illustrates the common critiques of flow-based approaches: large movement in the isophotes (and in the location of the maximum in intensity). Notice, however, that two of the MS 1-cells (blue curves) form a circle and remain fixed surrounding the bump: these have large ; i.e., these lie along the large bright-dark-bright transitions and are the critical contours. Note that there are other 1-cells and critical points; these do not satisfy the condition and so are irrelevant to the shape representation.

The second example is the furrow shape (Figure 8) shown from two views and with drastically different lightings. Notice how the isophotes move, how the maxima move, but how the critical contours remain stable.

The next example consists of images of a “blob” shape (Figure 9) constructed from random perturbations of a sphere. Notice again the stability of the critical contours, how these agree across lightings and for the slant function, and how these stable 1-cells correspond to the suggestive contours that were computed from the true 3D shape.

Figure 7: Top: A slightly perturbed sigmoid rotated around the z-axis. The color indicates the absolute value of the Gaussian curvature. An arrow points to a turquoise band which is centered along a contour of near zero Gaussian curvature; this is a -cell of the MS complex of the slant function. Under a wide class of rendering functions (as described above), the resulting shaded image will contain a -cell along this band. First row: From left to right, the first two images are Lambertian shaded renderings of the above rotated sigmoid with different light sources. The third image is a specular rendering. The fourth image is the slant function. Second row: Corresponding MS complexes to the images above along with isophotes in red. Blue arcs correspond to the -cells. Yellow, green, and red points correspond to maximum, saddle, and minimum critical points. Notice the blue common circular contour (which consist of unions of -cells).
Figure 8: A second example that shows the commonality between some -cells over both large light source changes and large view changes. Row , first column: A “furrow shape” image with a sketched red contour showing the critical contour. Row , columns 1–3: The furrow shape lit from three directions. Row , column 4: True slant. Row 2: The MS -cells with critical points (minima in red, saddles in green, maxima in yellow) corresponding to the images in the first row. Third and fourth rows: Analogous to rows and , with a different viewpoint.
Figure 9: First row, columns 1–3: Lambertian images of a blob under different rendering functions. Light source differences across images are at least degrees. First row, column 4: The true slant function. First row, column 5: Sketched critical contours. Second row, columns 1–4: Corresponding MS complexes to above images. Second row, column 5: For contrast, the suggestive contours (in perspective projection) [17] for the same surface. The extra suggestive contour in upper right is not seen in orthographic projection.

In our next example, we experimentally verify Corollary 2. In Figure 10, we overlay the MS complex for the horse image with the MS complex of the slant field. Note the correspondence between the red segmentation, blue segmentation, and suggestive contours, as predicted by our theory. On those curves where the two MS complexes are not in exact alignment, the value of is not sufficiently large. This indicates where the qualitative structure of the slant (of the normal field) can be immediately and robustly inferred from a shaded image via the MS complex.

a) b)

Figure 10: Ideal line drawings, as modeled by suggestive contours [17], relate to the MS complex of -cells of both the image and slant scalar functions. In particular, lines are often drawn at the 1D intersections of these two MS complexes. (a) The suggestive contours by [17]. (b) The persistence simplified -cells of the image function (red) and persistence simplified -cells of the slant function (blue). In cases where the two MS complexes don’t exactly align, the value of is not large enough.

A consequence of the global nature of the MS complex is that it provides a qualitative solution that segments the surface into salient parts as in Figure 11(b). There is a maximum on each of the four primary lobes, plus several others. The part regions surrounding these are delimited by MS 2-cells, as are the interior (less reliable) 2-cells. It is these interior 2-cells that will shift with the light sources.

A remaining question is how to quantitatively reconstruct a scalar field from only knowledge of its critical contours. This question, for the complete 1-cells, has been considered, for example, in [1, 101]. In Figure 11(b), we show a simple example for the furrow object that the segmentation induced by 1-cells of the MS complex can be sufficient for a qualitative understanding of the slant. We used the results from row 1 of Figure 8 and diffused the slant value from the occluding contour (where the slant is ) onto the critical contour. Then, we applied an inpainting algorithm [18] to “reconstruct” the scalar slant field. We admit that this is a simple example, but in [28], one can see more complex examples of how the graph structure of the MS complex can capture the essential phenomena of real-world data.

a) b) c)

Figure 11: (a), (b) This figure illustrates how a surface is segmented by the full MS complex for the shading into salient parts. Notice the four major lobes, pointing outward like the ends of an “X,” plus some interior parts. Crisp maxima in intensity signal the four dominant lobes. The middle maximum signals an interior part. (c) True slant function (upper left). Remaining three figures: Sample slant reconstructions from images in row of Figure 8. We used a linear inpainting algorithm [18] and knowledge of the position of red critical contour corresponding to a -cell in each of the images in Figure 8. We also used slant information at the occluding boundary, where the normal must be perpendicular to the view direction. Note the strong similarity between the slant reconstructions even though the original images are pointwise very different.

7 Conclusion

We seek a biologically plausible approach to 3D inferences in which an imaged surface is represented via a set of (isophote) contours or equivalent flows. This allows us to separate those portions of the flow that are stable from those that wander, with respect to lighting and the image formation process. We believe this will result in a more robust, nearly invariant approach. To achieve this, we have defined critical contours, computable from the image, with two important characteristics. First, we showed that they are stable over changes in the rendering function. Second, they relate to the MS complex of the surface slant. This allows us to interpret critical contours as boundaries of surface features. Thus, the -critical contours are part of a meaningful segmentation of the surface shared by almost all our admissible renderings. (As , “almost” becomes “all.”) It is these stable contours with which we hope to transition from a local (individual gradients) representation to a more global (unions of bumps) representation.

Further, using the MS complex reveals relationships between shading inferences and shape-from-sketching, under the same model. Certain (e.g., isotropic) textures may also allow for a similar analysis, when based on estimated foreshortening rather than intensity. In addition, the invariance of the MS complex to monotonic transformations relates to psychophysical observations seen in

[23, 99]. Modeling with the MS complex allows us to assign meaning to individual contours as, e.g., the boundary of a “bump.” By seeking this much weaker surface structure than, e.g., a 3D mesh, we hope to avoid most of the ill-posedness inherent in the 3D reconstruction problem.

We are focusing on two future directions. Critical contours rarely completely segment the image, as the MS complex may also have unstable 1-cells. Thus, we are pursuing methods to find the “nearest segmentation” to a set of critical contours to complete the complex. Second, we are analyzing the qualitative conclusions that can be drawn from a full segmentation. A constraint labeling problem arises, namely, which contours bound locally convex parts and which bound locally concave parts.

To summarize, we are arguing for the use of critical contours in 3D shape reconstruction from shading and contours. These critical contours are part of the MS complex of the image and give a shape description that is stable, qualitative, and meaningful. Therefore, reconstruction algorithms explicitly using these image features should be more stable while also explicitly capturing important surface features. We believe studying these topological properties will aid our understanding of how the human visual system is able to see a veridical 3D shape under complex and noisy renderings.

8 Appendix

8.1 Comparison of critical contours and suggestive contours

Suggestive contours are contours (drawn from the surface mesh) that illustrate shape [17]. Their generating equation (notation from [17]) is , where is the view direction and is the view direction projected onto the tangent plane. They are the set of minima of in direction . The slant function is simply the angle between and and so is an transformation of

. We note that extremal curves are invariant under strictly monotonic transformations via the chain rule, so the suggestive contours can also be seen as maxima of the slant function

in the direction .

Under orthographic projection, it is a simple calculation to see that is proportional to the surface gradient (or tilt direction) projected onto the image plane. Thus, under orthographic projection, we can rewrite the generating equation for suggestive contours as the set of points satisfying .

We compare to a critical contour; these are the 1-cells of by Theorem 1, yet computable from the image. We see that suggestive contours depend on whether points in the gradient direction, whereas critical contours depend on the global properties of the field.

8.2 Proof of Lemma 1

Lemma

Let be defined as in section 4. The sequence of shaded images converge pointwise to the original line drawing and have the following properties on the derivatives as :

  1. for every .

  2. for .

  3. There exists a constant such that for every .

Proof

Start from (12):

(13)
(14)

which defines a sequence of shaded images so that . We now compute image derivatives, up to second order, along our contour for these shaded images, . Taking the limit as , we will inherit derivatives in the limit for .

We differentiate in the tangent direction:

(15)
(16)

For each point on , the limit as is as expected. (That is, the image derivative along the contour is the limit of the derivatives along the shading approximations to the contour.)

We repeat the same process for the remainder of the derivatives up to second order and get

(17)
(18)
(19)
(20)

We also would like the image derivatives at the endpoints of . To calculate these approximations, we apply a Heaviside step function to the endpoints of the curve . To make the integral feasible, we Taylor approximate the intensity on the contour up to second order:

(21)

for some constants . This “approximation” becomes exact as . If we are at an endpoint and we move in the positive tangent direction, the image intensity is defined by this Taylor expansion. If we move in the negative tangent direction, the image intensity is zero. For example,

(22)
(23)
(24)

As we require the contour intensity to be at the endpoint, we can set and calculate the limit of as to get .

We can also calculate the other image derivatives at the endpoint . (Note that the other endpoint, , is just the mirror version; we use instead of .)

(25)
(26)
(27)
(28)

These calculations are consolidated into Lemma 1.

8.3 Proof of Lemmas 2 and 3

We first prove the following lemma, Lemma 5, that allows us to bound terms in a Taylor expansion of and from image derivatives at a point.

Lemma 5

Assume the generic and rendering function assumptions in section 3 hold. Let be a point on a -critical contour . If , then is bounded. Similarly, and imply is bounded.

Proof
(29)
(30)

By genericity property 2, we must have bounded in Frobenius norm . We see here that this prevents an infinitesimal change in the rendering function (that is, ) resulting in an unbounded change in the image gradient .

We repeat the same argument for , taking one further derivative and leaving off the subscript for clarity:

(31)
(32)
(33)

On the left-hand side, is bounded as each of are bounded. Applying the second generic property, we see that generically is also bounded.

8.3.1 Proof of Lemma 2

Lemma

Let be an endpoint of the -critical contour with the conditions from Definition 3. Given a new rendering function , resulting image , and any , there exists a such that if , the following holds: such that and is a critical point of .

Proof

We will consider the image of the normal field on the Gauss sphere in the neighborhood of . The main idea is that if a differentiable function has a large enough gradient at a point , then it has a zero inside a neighborhood of . We take two derivatives of the equation to get the following equation of tensors:

(34)

To calculate , we replace both with and let , where is any point on . By the concave rendering function assumption,

(35)

If , then

(36)

As by the rendering function assumptions in Definition 1,

(37)

Similarly, for , we get

(38)

Recall that is the differential of the second rendering function. We expand the operator with a first order multivariate Taylor expansion of around . For example, the derivative of in the direction is .

(39)

From (37) and (38), we know that and are sufficiently large; we want to find a such that is precisely 0.

We use the first generic property to assume that the span of three vectors contains at least two linearly independent ones. This implies that and are not parallel vectors and thus they span a plane . Generically, contains an intersection point with the unknown vector . That intersection point defines a satisfying .

Define . It remains to show . Recall that is a bounded vector by Lemma 5. From (39),

(40)

The above matrix equation (40) represents two equations:

(41)
(42)

We note that and