Carpentered, wooden furniture and objects abound in our everyday lives, from stools to bookshelves to trays for carrying food. Imagine though you find a piece you like at a friend’s house, but you are unable to find it for purchase, or perhaps you even want a slightly different version of it, wider or with changes to the curves in its shape. Suppose then you could simply take some photos of the piece, run some software, and generate a CAD model that represents how the object was made—one that is easy to edit, if desired—and then build just by cutting parts out of sheets of wood and assembling them, which you could even do yourself. In this paper, we propose to tackle just this problem, taking as input a set of images of a carpentered object and generating a CAD model that is ready for building a replica of the object, or an edited version of it.
Typical methods for object capture from images perform structure-from-motion to recover camera poses and sparse scene points, followed by multi-view stereo to densify the reconstruction. The result is usually a point cloud or, with some processing, a dense triangle mesh, exhibiting noise and incomplete coverage. These representations are far from being complete, concise, editable CAD models and further tell us nothing about how to cut and assemble parts to build the model.
In this work, we propose a novel direction for recovering representations of fabricated objects: describing the fabrication process itself. By framing this as a reverse engineering problem in a specific fabrication domain, we introduce constraints that significantly reduce the search space of viable 3D models. In particular, we operate on carpentered objects consisting of parts that are cut from sheets of wood and then connected together. The space of fabrication instructions in this domain is still highly expressive, covering a variety of everyday objects, as we will show, while also adhering to the real-world constraints governing the construction process, so that the output is by definition ready to build.
This reverse engineering problem introduces its own set of challenges: arriving at a fabricable solution requires identifying the parts, and optimizing for their precise shapes and the part-to-part connections constraining those shapes. This mixture of discrete and continuous degrees of freedom makes for a challenging optimization problem; to make this more tractable, we propose a multi-stage algorithm in which we first select the initial geometry and positions of parts in the assembled object, progressively detect assembly constraints (i.e. connections between parts), and then refine the geometry subject to these new constraints. The input images, captured by simply walking around an object and taking photos with a smartphone, guide this process at multiple stages. First, we use the images to recover a multi-view stereo point cloud that, though incomplete, drives the initial CAD part recovery. Second, the images provide evidence of seams – discontinuities in appearance – that indicate how different pieces of wood fit together when the connections are otherwise ambiguous based on geometry alone. Finally, by rectifying the images to each part plane, we can co-segment the part faces to obtain more accurate contours, i.e., cut paths for fabrication. Each of these co-segmented contours, extracted at the pixel level, is not concise and may not respect assembly constraints; we additionally propose an algorithm to find a simple parametric boundary that accurately represents the cut path of each part while respecting contact constraints between parts.
Our key contributions are:
A fabrication-aware pipeline for selecting a plausible part structure representing the input object
An algorithm for recovering contacts between connected parts using geometric and image evidence
A method for extracting cut paths representing part shapes using multiple image views
A method for incorporating assembly constraints into fitting of regularized, parametric contours to imperfect data
We note that, since our approach is based on features observed in images, we assume that the carpentered objects are textured, which is typical for wood that is unfinished or varnished, but not painted a uniform color. Further, though we don’t require complete reconstruction of the surface, we do require that the wood sheet surface of each part be at least partially visible. Finally, we restrict the class of objects reconstructed to those that can be assembled with parts cut from sheets of wood. We demonstrate results on a variety of objects of different size and complexity, and show the efficacy of our method by fabricating two of our results, along with edited versions.
2 Related Work
Reverse Engineering (RE) aims to recover CAD models from measured data [buonamici2018reverse]
, concisely representing the various parts and geometric features using parametric primitives and surfaces. RE methods can be labeled according to their target CAD representation, which contain varying levels of structure. We can classify these aslow-level (volumetric models [chivate1993solid], meshes [xu2011photo]), medium-level (primitives [li2019supervised], B-Rep surfaces [benkHo2001algorithms], surface patch-based representations [eck1996automatic], procedural shape structures [du2018inversecsg]) and high-level (parametric CAD models [smirnov2019deep], and multi-component 3D models with geometric constraints [xu2016interactive, Lau2011FabPartsConnectors]). The distinction we make between mid- and high-level representations is that high-level contains both full geometric information and some additional structure relating to the semantics of the object, such as shape parameters corresponding to degrees of freedom in the design and relationships between assembled 3D parts. While Mid-level representations such as primitives, B-Reps, surface patches, and CSG trees may capture the geometry more concisely than low-level, they do not inherently encode these global semantics. Since our target is a CAD model complete with parts and connections that would allow us to physically reproduce the object, we focus on high-level representations. For a survey of works related to optimizing rigid assemblies of parts, see wang2021stateoftheart.
One approach to reverse engineering is to retrieve plausible objects from a shape library and align them with the input. Avetisyan2019CVPR and lim2013parsing detect and align representative objects from a known database with an input 3D scan or image, respectively. schulz2017retrieval and uy2020deformation proposed strategies for deformation-aware retrieval of single CAD shapes. xu2011photo present an interactive method for using predefined 3D models as templates to deform to match an input photograph, and huang2015single retrieves individual parts from a small 3D model database to reverse engineer the object from a single image. Because these methods rely on a database containing examples that sufficiently resemble the query object, they do not generalize to unseen classes of objects. While we recognize the need to limit the scope to ensure that the problem is tractable, we wish to do this by restricting the fabrication process instead of the types of allowed objects, which permits a variety of input beyond what can be memorized in a database, so retrieval-based methods are not suited to our task.
Classical Reverse Engineering
Obtaining CAD models from measured data is a well-studied problem. Many works concerned with reconstructing CAD models follow a similar strategy, with the common features being segmentation of the input point cloud/mesh, fitting analytical surfaces to the segmented regions, and stitching the disjoint surfaces into a complete CAD model buonamici2018reverse, reverseeng2012. The goal of these methods is accurate recovery of CAD surface primitives, which lack the high-level assembly information that we require. Nevertheless, the techniques of feature-based reverse-engineering are widely useful tools for inferring geometry from dense 3D data; we build our model in part based on detected plane and cylinder features.
Interactive Reverse Engineering
In solving the difficult inverse problem of reverse engineering, some works allow for user interaction to resolve ambiguities and provide hints for reconstruction. chen20133 recovers interactive manipulable 3D shapes from a single photograph, guided by user-supplied sketches of generalized cylinders and other primitives. xu2016interactive recovers dynamic mechanical structures from a single image, where the user sketches part profiles and supplies hints for how detected parts should snap to their surroundings. xu2011photo, which deforms retrieved models to match a target image, relies on the user to aid in semantic segmentation of the input image, as well as for selecting the candidate shape to deform. arikan2013osnap reconstructs architectural models from point clouds by first automatically fitting a coarse 3D model, then using a sketch-based interface to allow users to add additional geometric details while optimizing the consistency and accuracy of the model. Our proposed reverse engineering method, once provided with a 3D reconstruction of the object, is fully automatic.
Learning-based Reverse Engineering
Some existing works infer high-level abstract 3D structures from images niu2018im2struct, 3d meshes [tulsiani2017learning] and shape structures jones2020Assembly. In these works, the output structures are represented using coarse cuboids, which fall short of our goal of a full part-based CAD representation, while some require segmented part hierarchies as input jones2020Assembly. In ganapathi2018parsing, hand-crafted abstract structures, consisting of a tree of axis-aligned cuboids and connectors, are fitted to incomplete 3D scans to aid in classification and shape completion. Inferred abstract structures can be used to aid in retrieval of complete CAD models, but this again is limited by what is recorded in a database, along with a learned template for the corresponding object class. In a recent work, smirnov2019deep introduce a unified learning framework in which parametric 2D and 3D primitives can be inferred from raster or voxel representations. Using this framework, it is possible to learn primitives in 2D and 3D that encode certain semantics of reconstructed objects, for example particular CSG primitives might correspond to the armrests of chairs. However, as with many learning-based works, these semantics are specific to the class of objects on which the method is trained, and for each such class, a template encoding these semantics is a prerequisite for training. If we wished to reconstruct a unique type of object whose only familiar feature is the carpentry construction process, it would be difficult to acquire the data to produce meaningful shape and structure predictions using any of the aforementioned learning- or retrieval-based methods. Conversely, our method can recover the shape and structure of such an unknown object, as we only assume that objects obey the geometric constraints imposed by carpentry.
Grammar- and program-based Reverse Engineering
A number of domain-specific reverse engineering methods exist which utilize known structures and semantics of a narrow class of objects. [Lau2011FabPartsConnectors] infers a rich fabricable design consisting of parts and connectors, but it does not generalize beyond a few furniture classes for which specific hand-crafted grammars are defined. [fan2016probabilistic] models the exterior of residential buildings by learning a probabilistic model from street-view images. This enables them to infer plausible building geometry using (potentially occluded) single views as input, but it is inherently limited in how accurately it can model the input due to its non-determinism and the lack of additional evidence to constrain the result. [tian2019learning] Presents a domain-specific language and neural program executor for learning to synthesize CAD models, with the ability to represent symmetries and repeated structures through programmatic loops. Their neural program executor allows for differentiable rendering of varying-length programs, facilitating unsupervised fine-tuning on unannotated shapes, which helps the method to generalize beyond training categories. However, the design of the DSL incorporates furniture-specific semantic annotations, so it is still limited to classes of objects where such labels apply.
In the carpentry domain, grammars have also been used outside of reverse engineering tasks, including interactive design systems [Umetani:GuidedExploration, koo2014creating, Song2017reconfigurableFurniture, garg2016computational, Fu:InterlockingFurniture] and optimization of fabrication plans [yang2015reforming, Koo:ZeroWasteFurniture, jigfab, Lau2011FabPartsConnectors, wu2019carpentry]. Our work uses similar fabrication-aware representations for reverse-engineering.
[Lin2018RecoveringFM] solves a different but related problem, and employs similar ideas to ours for reverse engineering functional mechanical assemblies from raw scans, in that their general approach consists of detecting parts from input scans, determining interactions between parts, and globally optimizing the model geometry to satisfy the resulting constraints. However, the parts they consider are limited to a set of predefined templates suited to their domain.
The goal of these methods is to convert a 3D surface representation into solid, fabricable parts that recreate the desired surface araujo2019surface2volume,filoscia2020optimizing,yao2017interactive. A related method is to segment structured 3D models into repeated sub-components demir2015coupled. However, these methods do not directly apply to the problem of reverse engineering from images.
Model-based Reconstruction from Images
Many works in reconstruction from images seek to make use of model-based assumptions, dating back to early “blocks world” work for recovering 3D edges in an image roberts63blocks. A large body of work in model-based reconstruction from images has focused on architecture, making use of vanishing points criminisi2000single and repeated structures such as windows dick2004modelling that can be exploited for geometric information. This has given rise to interactive modeling systems for recovering building facades sinha2008interactive, facades, as well as fully automatic systems werner2002new. Such systems typically exploit architectural assumptions of abutting cuboids sitting on the ground and sloped roofs which hold for the structures they model, while we seek to accurately reconstruct individual, solid parts of objects with arbitrary curved shapes in potentially complex arrangements.
The input to our algorithm is a set of images of a carpentered object taken from different viewpoints. The output is a fabricable model describing the set of parts along with how they should be connected. Parts are assumed to be cut from wood sheets, so that their boundary contains two sheet planes (from the front and back of the sheet). Parts are represented as a triplet of the wood sheet position (a rigid transformation), the sheet thickness , and cut path represented in the 2D plane of the sheet. We allow individual straight cuts to be made at arbitrary angles against the wood plane to allow for slanted contacts between parts; also contains these bevel angles. Note that the cut path may include holes in the interior of the shape. The assembly information specifies a list of part pairs that are joined, and the surfaces along which these connections occur, which we represent using interfaces that we assume to be planar, as planar contacts are common in carpentry assembly. We say the model is fabricable if, in their assembled configuration, the parts do not overlap and meet snugly at the joints. We refer to the pairwise contact constraints implied by this requirement as assembly constraints.
While assembly constraints assist in recovering partially unobserved parts, the need to infer both the set of parts and the assembly constraints from incomplete data presents a challenge: Inferring the shapes and positions of parts depends on how they are connected, and likewise finding probable connections depends on the part geometry. This interdependence implies that these properties should be considered jointly in order to arrive at a feasible solution. To address the complex search space of possible part assemblies, we adopt a multi-stage approach in which we first detect an approximate set of parts absent any connections, then iteratively refine the model by alternately optimizing for the connection contacts and the part geometry subject to the new contact constraints. Finally, we approximate each cut path with a concise, piecewise-smooth curve that balances simplicity and accuracy. Our approach is illustrated in Figure2.
Preparing the Input.
Given our images, we use 3D reconstruction software [capturingreality] to obtain camera poses and a semi-dense, oriented, point cloud (positions and normals) for the observed surfaces. The point cloud is not expected to capture every surface; entire sides of the input model may be missed. The images are acquired by walking around the model and taking photos, with enough coverage that at least one sheet plane per part is observed well enough to be partially reconstructed. The point cloud is expected to have the model separated from background points as well as being oriented so that the object is approximately vertical, which in practice means the user indicates a ground plane and rough bounding volume for the object.
The goal of the first stage is to recover a set of parts, each with a rough approximation of , , . These parts should closely match and, although they may not strictly satisfy assembly constraints, they should maximise assembly feasibility in order to serve as a plausible basis for subsequent refinement. Our approach is to initially detect an over-complete set of candidate parts by searching for planes that could be wooden sheets in the point cloud and extracting initial thickness and shape from point cloud features. We merge candidate parts when they are better represented as single sheets, e.g. when they originate from disparate points observed from opposite sides of the sheet. From among these candidates, we extract a subset of the parts with the best coverage of that is also plausible from a fabrication standpoint; parts should not represent cuts through implausibly thick wood planes, and every part should subtend a minimum volume free of overlaps with other parts.
Having decided on an initial set of parts from the part identification stage, we can proceed to infer the assembly—defined by the connections between parts—and refine the part orientations to regularize the angles in the design. Detecting the correct joinery between parts is challenging since it is “hidden” beneath the surface geometry of the model; we address this problem with two key insights. First, we can identify a small set of types of connections possible with our fabrication assumptions, which significantly reduces the search space. Second, we can use image cues, such as the presence of seams or material changes visible in the wood, to identify regions where a connection interface is likely to exist (if geometric cues are insufficient). We use these ideas to disambiguate connections, followed by a global optimization step that aligns near-orthogonal connected parts while staying close to the point cloud.
The final step is to refine the geometry of the parts to obtain a fabricable model consisting of concise parametric curves, ensuring that the final model is consistent with the assembly constraints and represented using only the necessary number of primitives. We strive for simplicity to facilitate editing and because it is also usually consistent with how objects are designed. We utilize the shapes visible in the input images to obtain more accurate cut path contours: Since the sheet plane position has already been identified for every part, we can use it as a reference to drive a multi-view image segmentation after projecting each image into this plane. We then apply a curve fitting algorithm over the resulting segmentation mask boundary which preserves the assembly constraints while globally minimizing an energy function that balances complexity and accuracy. Our final result is regularized by aligning curves and lines in the resulting shapes. In practice, the refined shapes may reveal new connections, so we iterate model assembly and refinement until no new connections are found.
Given oriented point cloud , we define the global diameter as the points’ maximum bounding box dimension. Our method has many parameters which depend on the scale of the model, which is arbitrary; we therefore define these parameters in terms of . For detecting orthogonality and parallel features throughout our pipeline, we have a global angle threshold .
4.2 Part Identification
In this stage, we both identify parts that comprise the model and estimate each part’s rigid pose and rough shapes, as a basis for subsequent refinement. Our strategy for identifying the set of parts builds on the assumption that they are cut-outs from flat sheets and the fact that planes can be easily detected from 3D point clouds. Based on this insight, we can begin with primitive detection on the point cloud, followed by generating 3D parts from extrusions of paths in those planes, which amounts to detecting a cut pathand the sheet thickness . Note that using all detected planes as potential part planes results in many more parts than are actually in the model, as any given part has at least two planar surfaces, and more for straight line cuts; ultimately, only a subset of planes should be used. Our approach is to first generate part geometry for all detected planes to form an over-complete set of candidate parts, then optimize for the subset that best approximates the model while representing a feasible construction.
4.2.1 Primitive Detection and Adjacency
We employ Efficient RANSAC [schnabel2007efficient] to segment the oriented point cloud into clusters of points that fit planes and cylinders; the points are roughly contiguous sets on those primitives. The planes correspond to wood sheets, as well as any straight line cuts. The points that fit better to cylinders tend to lie only on curved cut paths. We used an inlier threshold of for the RANSAC algorithm.
We say that two primitives are adjacent if the minimum distance between their respective point sets is less than . We track adjacency between plane primitives and other primitives; let be the set of primitives adjacent to plane primitive . We later use adjacency to guide part depth estimation and to help bound the cut path for each part.
At this stage, every plane primitive now corresponds to a part with transformation that maps the - plane to that primitive plane. This set of parts is highly redundant; e.g., a fully observed cuboid part would have six planes corresponding to the sides of the part, and thus this one part would initially be over-represented by six parts. We address part redundancy in the part selection phase at the end of this section. First, we estimate the cut path and thickness for every candidate part.
4.2.2 Cut Region Approximation
Each detected plane is a candidate to be a part’s wood sheet plane. We approximate its cut path as follows: collect the plane’s associated points, transform them by and project them onto the plane, convolve (in 2D) with a Gaussian kernel (), sample the resulting field over a regular grid, and extract a set of isocontours (isovalue of where is Euler’s number). We assign the contour with the largest enclosed region, along with any countours inside that region, to be the approximate cut path. Note that interior contours enable parts to have holes cut into them. For more details, see the supplementary material.
4.2.3 Initial Sheet Thickness Estimation
The thickness of a part can be determined in two ways: by measuring the cut surface or by the distance between its front and back sheet planes. As we may not always see the back face (e.g., a part facing downward near the floor that happens to be missed during capture), we initially estimate the cut surface width to determine thickness.
For each candidate part , we estimate the cut surface width using a discrete plane sweep approach. Specifically, we collect the points associated with the part plane’s adjacent primitives . We then move the plane of in the direction opposite its normal in discrete jumps and form a histogram of adjacent points within of the plane offset by . We set .
When the plane sweeps past its cut surface, we expect a sharp discontinuity in the histogram, i.e., a peak in . We compute using finite differences and identify robust peaks with non-maximum suppression. The largest peak may not correspond to the correct thickness; peaks closer to the part plane should take precedence (see Figure 3 (b)). To address this, we weight the peak magnitudes with a spatial discounting factor, . We set . The part thickness after this stage is set to , where is the bin of the largest robust, spatially-weighted peak.
We additionally estimate thickness by considering planes that could be the opposite side of a part’s wood sheet. Given part , for each part with opposite sheet normal, we transform its cut path by and project it onto the plane of . If and overlap, we consider the distance between the planes of and to be a candidate thickness; we take the min over all of these thicknesses, call it . If is within of computed above, we then set . During this step, we also record all other opposing parts with cut path overlap that are within of the final thickness (not just the closest part); call this set , to be used later for merging parts.
4.2.4 Part Selection
The final step in the part identification stage is to select a subset of parts to be assembled and refined in the next stages. We first reduce the number of parts through pruning and merging steps, and then perform a global optimization to give a set of parts that covers well without too much overlap between parts.
To prune the part set, we first adopt a heuristic: parts are unlikely to be much deeper (thicker) than they are wide. For example, if we have a cuboid part that is 30cm x 30cm x 1cm, we will prefer a 30cm x 30cm face cut into a sheet 1cm thick over a 1cm x 30cm face cut into sheet 30cm thick. We prune as follows: if the estimated thickness of ayields a cut surface with total surface area greater than five times the area of , we discard it. In Figure 3, the red part is one such candidate and is thus discarded.
Next, we merge part candidates if we have evidence they correspond to a single cut of the same wood sheet. In particular, for part , the set contains parts with opposite faces, cut paths overlapping ’s, and with planes roughly away from the plane of . These opposing parts are likely part of the same cut from the same sheet, and thus we transform and project the cut path for each part into the plane of and take its union with , after which we discard part . This step is useful to recover parts only reconstructed partially from different sides due to occlusions. We perform this process recursively until no such opposite-and-overlapping candidates remain. Note that, when merging into , we also merge the adjacency sets, i.e., , useful later for geometry refinement.
Among the remaining parts, we generally still have over-representation, i.e., parts that overlap each other heavily. Some amount of overlap is tolerable. E.g., two cuboid parts that meet at a corner may overlap because it is unclear which part goes all the way to the corner and which has a cut face that abuts that other part; we will allow this small overlap and disambiguate it in the next section. We now pose a (non-trivial) discrete optimization problem: select the set of parts that minimizes distance between and without conflict. We say that a part is in conflict if more than half of its volume overlaps with other parts in . We use simulated annealing to optimize for the final subset, where our energy function is the total squared distance between points in and . We scale down all dimensions by to ensure consistent behavior regardless of scale. To solve this problem, we employ the Metropolis-Hastings (MH) algorithm with transitions consisting of either adding or removing parts from the solution set, while prohibiting changes that lead to conflicts. Simply adding or removing individual parts at each step, however, results in poor convergence due to the large number of potential conflicts, so we additionally permit “replacement” transitions in which a part in the set may be swapped for another outside the set if adding the outside part would have otherwise caused it to be in conflict. To be precise, our proposal distribution is the result of the following decision process: with equal probability, either choose a part uniformly at random to add or remove from the set, or perform a replacement move between two parts as discussed above. We run MH for 1000 iterations with start and end temperatures of 10 and 0.1, respectively, and find that results typically converge within 10 seconds.
4.3 Assembly Refinement
Model assembly involves determining which pairs of parts are connected and the surfaces at which parts make contact, known as interfaces. Our approach is to first identify the connections for individual pairs of parts followed by a global alignment step that ensures manufacturability constraints over the graph of connections. The result of finding and classifying the connections between parts is a set of planar interface surfaces along which the pairs of parts join, shown in blue in Figure 4.
First, we identify pairs of parts to connect. Specifically, if the minimum distance between the surfaces of two parts is less than , then they are connected.
Next, we determine the type of connection between the parts. Based on our fabrication assumptions we identify 4 possible types of unique connections according to the types of surfaces that make contact, illustrated in Figure 4. We exclusively deal with connection types 1 and 2, as we did not observe the other types in any of our example models. If the sheet planes of parts a and b are not orthogonal, as shown in Figure 4 (1), we use bevel cuts to satisfy the planar contact. To detect type 1 connections between two parts a and b, we determine whether a and b form a T-junction by checking if b terminates at a’s sheet plane, and in the reverse case, if a terminates at b’s sheet plane. If both a and b terminate at the other part’s sheet plane, we say they meet at a corner and classify the connection as type 2. For more details on the geometry involved, see the supplementary material.
4.3.2 Disambiguating corners
A type 2 connection has two equally viable solutions for how the two parts meet. To resolve this ambiguity, we look for evidence of a seam in the images that may indicate which part extends to the corner (see Figure 5 (a) and (b)). Given the approximate part geometry, we can determine which images have an unoccluded view of the (possible) seam. For each such image , we compute two measures of visual discontinuity. First, we compute the gradient in the direction orthogonal to the seam at each pixel along the seam and average their magnitudes; call this value . We additionally compute a very coarse gradient across the seam by computing the average color within a rectangle of width on either side of the seam (see Figure 5 (c)) and compute the magnitude of the difference of the colors, call it . The seam score for this view is just . We then average this score across all views of the seam to compute the final seam score and choose the type 2 configuration with the higher seam score. If there are no views of a seam (e.g., if it is against the floor and thus not viewable), then we assign it a seam score of 0.01 (where pixel intensities range form 0.0 to 1.0) so that it can still be chosen if the other seam score is low (not a seam).
4.3.3 Interface surfaces & constraints
After determining the connection types, we compute the finite interface surfaces within each interface plane indicating where the parts make contact. These give an estimate of where the final parts will make contact for purposes of constraining their shapes. In both cases 1 and 2, we find this by intersecting the solid shape of part b with the abutting plane of part a, offset by toward part b to correct for any gaps between parts caused by . This interface can only take the form of one or more rectangles, each with width equal to the thickness of part b.
The constraints imposed by an interface surface on part b’s cut shape are line segments that the shape cannot cross (without butting into another part), formed by the projection of the above surfaces onto the plane of part b. There are also additional constraints imposed by type 2 connections: In our final shape, the parts should meet perfectly at the corner determined by the line of intersection of the sheet planes from a and b on the outer surface of the corner. Finally, aside from connections, we have the constraints implied by adjacent plane primitives in that meet the part plane with a convex interior angle, since neighboring cut surfaces are evidence of the shape boundary. We exclude adjacent planes associated with connected parts, as satisfying the interface constraints should take precedence. These planes only correspond to planar cut faces; we do not include adjacent cylinder primitives, as we found their fits to be less faithful to the curves that they tend to fit. These constraint segments in each part’s sheet plane are used during geometry refinement so that the result conforms to detected surfaces and the precise connections inferred above. They define a half-space in the plane that belongs outside the cut region, for points that project to the line within the segment boundaries. For each constraint segment, we also make note of the angle of the interface surface relative to the part so that we can accurately define the bevel cut angle to enable this contact later.
We also expect the final model to lie flat against the ground. To incorporate this constraint, we add a ground plane, positioned at the lowest point of the input geometry, and form a fake "part" which we include in the above assembly analysis above to obtain additional contact constraints for any parts in contact with the floor.
4.4 Geometry Refinement
In the final stage of our pipeline, we refine the cut path for each part to be consistent with assembly constraints and image evidence and to be represented with concise, piecewise smooth curves. We do this in two stages: Image co-segmentation and constrained curve fitting.
4.4.1 Joint Image-Based Segmentation
We leverage multiple views to optimize for a binary segmentation mask in the sheet plane representing the cut region for each part. Taking inspiration from [kowdle2012multiple], aimed at segmentation and plane reconstruction, we pose our multi-view segmentation problem as an MRF optimization. Unlike [kowdle2012multiple], we use the known part plane as reference, leverage visibility cues in the rest of the reconstruction, and incorporate assembly constraints in the segmentation.
In particular, for part , we project each image from its camera viewpoint onto the nearest part plane of . We resample the projection in the plane to form rectified image . We then optimize for by defining a binary MRF on the set of pixels in with 4-connected grid edges denoted by , with the energy
where is an appearance-based cost, and is a pairwise edge-sensitive smoothness term. These energy terms depend on the per pixel mask labels (1 for inside the cut path, 0 for outside), and we seek the lowest energy labeling with respect to all views together. We set .
We model appearance using Gaussian Mixture Models (GMMs) with 5 components for the colors (in LAB space) inside and outside the part’s current cut region for each view, which gives us probabilitiesand that a pixel belongs inside or outside, respectively, for view . For the interior, we erode the cut region by (multiplied by the world-to-pixel scale factor) since the pixels near the boundary of the initial cut region are uncertain, and then consider only the subset of those pixels potentially visible to view ; for this purpose we construct a visibility mask which is 0 if a ray from view to pixel intersects another part before reaching the part plane and 1 otherwise. For the exterior, we similarly dilate the cut region and consider all pixels outside of it with .
We now define the data term as:
where is the number of views that can see pixel on the part plane. If is 0, we set to 0 regardless of . In general, some “outside” pixels may belong to other parts with similar appearance, so we modify to incorporate the constraint segments computed earlier: is set to for if is in the excluded region of any of the constraint segments. The result of using these constraint segments for segmentation is shown in Figure 6 (b). The red interface constraints prevent the mask region from including surfaces from adjacent parts, which purely appearance-based segmentation would do.
We define using a contrast-sensitive Potts model to regularize the result while aligning the mask with high-contrast regions which we expect at shape boundaries.
where (set to 50) controls the strength of the smoothing penalty falloff as contrast increases. Analogous to how we modify , we also set to zero in cases where the 3D locations corresponding to and straddle a constraint segment, to encourage the boundary of to adhere to these known edges; i.e., there is no penalty for label change at these boundaries where label changes are likely.
In practice, for efficiency, we only consider views for which for at least half the pixels inside , and then use the top seven views sorted by how close their central viewing rays are aligned with the plane normal. Note that this set of views may come from one or both sides of the part. We solve the MRF using graph cuts boykov2001interactive to obtain the final mask. We also re-use the resulting (cut path) to learn more accurate GMM parameters, and re-run the above algorithm once more to slightly improve results.
Updating model topology
It is possible for to have more than one connected component after optimizing with the assembly constraints, if multiple parts were detected as one in previous steps (as is the case in Figure 6; the legs are forced into separate pieces by the assembly constraints). We restructure the model in these cases by adding each connected component in as a separate part. Finally, we rerun the assembly stage to find new connections and constraints due to the updated shapes and potentially separated parts. We repeat the segmentation and assembly steps until no new connections are found (we observe at most 1 or 2 iterations in our experiments).
4.4.2 Global Alignment
Before extracting a final CAD model, we align connected parts that are close to orthogonal, as right angles are a feature of many manmade designs. This is important because it simplifies the fabrication process considerably as well; parts connected at right angles only require orthogonal cuts, which can be made with a wider variety of tools. Since this optimization only concerns small perturbations to the orientation, we represent each part by its sheet plane and optimize over plane parameters (normal and offset ) such that detected orthogonal connected parts are aligned. We minimize the total squared distance of the planes to their detected point sets to regularize the result. We take an approach similar to [li2011globfit] for plane alignment; we find
subject to for all for which and are connected and the angle between is within of , where is plane ’s point set, and is the total squared distance. To ensure unit normals, we represent each using 2 angle parameters. We solve this global optimization problem using a sequential least-squares quadratic programming (SLSQP) solver, and then update to align the parts with these new plane parameters.
4.4.3 Thickness Regularization
Typically, an object is constructed by cutting from a small number of wood sheets, with a small number of thicknesses. However, since our initial thicknesses are based on analysis of noisy point clouds (4.2.3), we typically estimate a different thickness for every part. Minimizing the number of distinct thicknesses in a model makes it more practical to build and therefore more plausible. Thus, we cluster thicknesses by averaging any part thicknesses that differ by less than a threshold .
4.4.4 Constrained Curve Fitting
Our curve fitting approach draws ideas from from prior work that trades-off global fit accuracy with curve complexity [plass1983, farin2002curves, fleishman2005robust] as well as prior work that favors straight edges and sharp corners dominici2020polyfit, and applies them to the context of handling imperfect binary masks, with certain known edges that the solution must adhere to.
For each part, the output of the segmentation step is a cut path defined by the raster boundary of a segmentation mask; it is neither exact nor concise. Our final step is to extract a CAD representation of this path by fitting a low-dimensional 2D shape representation that approximates the segmentation boundary while adhering to any contact constraints. In related works on vectorization, the perceptual criteria of accuracy and simplicity, along with continuity and regularity, are prominent objectives ([hoshyarivectorization, Kopf2011, dominici2020polyfit]). We find these objectives to be well-suited to our problem: We desire a shape that is close to the input boundary while adhering exactly to the contact constraints, and which also provides a simple, continuous explanation for the input mask boundary, while capturing regularity in the man-made objects that are our focus.
We represent cut paths as closed continuous polycurves consisting of connected cubic Bézier curves and straight line segments, where we call the endpoints between neighboring segments nodes. For each part, our algorithm takes as input the raster boundary of the segmentation mask, a (clockwise) ordered set of 2D points . We restrict nodes to lie on points in and therefore have a discrete set of possible nodes, where each segment is the least-squares best fit for the range of data points between nodes.
We solve for the curves by building on the dynamic programming approach outlined in [plass1983]. Let be the cost of fitting a curve to the subrange between and , which is the sum of squared point distance error plus a constant curve cost . We define a sub-total energy as the least total error over all possible choices of nodes between and , giving rise to the recurrence relation , allowing us to solve for the optimal node locations using dynamic programming. Support for continuity at curve transitions is added by pre-computing tangents at each point in using curves fit to a local point neighborhoods, and constraining the end tangent directions of curves during fitting.
We add support for different curve types by extending to , where is the type of the curve ending at ; sub-total energies are now computed by summing over all previous curve types, as well as all previous nodes. We now have separate curve costs for each type. We let indicate line segments, and set the cost to encode a preference for straight lines. For more implementation details, we refer the reader to the supplementary material. We also wish to capture both sharp corners and smooth transitions in our solution. Because the input may contain artifacts, and furthermore is not representative of the final boundary that adheres to all the desired constraints, we do not detect sharp corners in the input as is usually done in vectorization, but rather incorporate the choice into our curve fitting algorithm to encourage sharp corners that lead to a better fit to the data. To allow sharp and smooth tangent behavior at nodes, neighboring curves must be able to agree on either continuous or unconstrained tangents. To make this possible, we parameterize right end tangent behavior using two types and , where each type is a cubic Bézier curve with a constrained and free right end tangent, respectively. This way, we can ensure that each curve’s left tangent behavior matches the previous curve’s right tangent behavior when computing . Finally, we can filter out sharp corners by requiring that a curve with type must meet the next curve with an angle greater than . Because an unconstrained curve will always fit with smaller MSE error, cases where the tangent angle is represent unconstrained curves that differ significantly, and therefore should be preferred in the interest of accuracy. We set the curve cost where is the width of the input boundary’s mask, and in our experiments.
The constraint segments used in the segmentation stage are also used in curve fitting; we wish for the curve to "snap" to these segments wherever they are near enough, or if it would result in a simpler solution. We incorporate these constraints into our curve fitting algorithm by first identifying points in within of a constraint segment, and forcing any segment fit to a range containing these points to be a straight line segment. The result is shown in Figure 7 (a) and (b); the dynamic programming fit is guaranteed to produce line segments where they are needed, and does so while still fulfilling its other objectives of continuity and simplicity.
Having guaranteed straight edge segments in the vicinity of constraint segments, we project the nodes bordering lines near these constraint segments to the exact lines of these constraints to obtain a fabricable solution (Figure 7 (c)). Neighboring curves are modified so as to preserve their tangent angle with the displaced lines. It is not always possible to ensure consistent tangent behavior in the above framework; transitions between curves and line segments are troublesome since the latter lack the degrees of freedom to adhere to the pre-computed tangents used for smooth transitions. The inherent order of curves considered by the dynamic programming algorithm prevents curves from correcting for the behavior of subsequent neighbors. We therefore apply an additional smoothing step in which corners below angle are made smooth by altering curve tangents. This step is only done if the resulting change to the shape is not too drastic; we approximate this change by the total displacement of Bézier control points, which we limit to .
We additionally find lines that are parallel/orthogonal (within angle ) and further align them. Constraint line orientations are left unchanged, and any edges nearly parallel to them copy their orientation, and likewise for nearly orthogonal edges. In many cases, this produces 90 degree angles in shapes where parts connect.
5 Experimental Results
We tested our algorithm on seven carpentered objects of varying complexity, i.e., varying numbers of parts with some objects exhibiting more difficult features, such as non-axis aligned parts (the diagonal bookshelf and bookholder), shapes with holes and curves (the stool and tray), non-orthogonal connections, and one which deviates slightly from our model assumptions in the form of smoothed corners and grooves in the sheet plane, as well as having some highly occluded part sheet planes (complex stool). We obtained fabricable reconstructions for six of the objects, and discuss the seventh as a limitation in the final section.
We photographed our objects with a hand-held Google Pixel 3 camera in two distinct, well-lit indoor locations with a variety of backgrounds (different rugs, etc.). For each model, we took between 30 and 70 photos from viewpoints facing the object and situated approximately on a hemisphere around it. We used RealityCapture [capturingreality] to recover camera poses and semi-dense point cloud reconstructions which requires some minor user input to select the reconstruction region to isolate the object one desires to capture, in particular to omit the ground plane.
Figure 8 shows results for six of the models. In all cases, there were faces of the model that were either missing or incompletely represented in the point cloud (second row), such as the unseen undersides of the top of the stool and nightstand, as well as faces that are less well-textured or in shadow, as with some parts of the bookshelf. For all six models shown, we were able to generate fabricable results, shown in the fourth row. In the third row, we superimpose the reconstructed CAD model on the input image showing how faithfully our reconstructions fit to the observed object. Some minor failure regions include the horizontal pieces of the nightstand, which are slightly thicker and, in the bookshelf, two of the parts are slightly translated from their original position. In the final row, the cut shapes for all the parts in the output are shown as combinations of line segments and Bézier curves. The nodes, shown in blue, indicate the transitions between curves and may either be smooth or sharp corners. For the most part, these simplified curves are consistent with the input, but in the case of the tray, the smooth transitions on either side of the rounded handles, as well as on the bottom of the handholds, are sharpened. The cut shapes also reveal that the back side of the nightstand was detected as a single piece. In fact, that piece is made up of 4 smaller pieces, as shown in Figure 9 (a). Though we do not decompose detected parts based on detected seams, this result can be “fixed” by adding cuts after the fact. To evaluate the importance of the methods in our technique, we also show some results with various parts of our pipeline simplified in Section 4 of the supplemental material.
To measure the accuracy of the final fabricable, simple CAD model to the original object, we use the RMSE distance of the point cloud to the model surface, multiplied by for scale independence. As shown in Table 1 for the same five models in Fig 8, this error is on the order of 0.4% of the model width, indicating that in terms of geometric displacement, our reconstructed shapes remain true to the original objects.
We also measure the number of incorrect connections, i.e. extra, missing, or misclassified connections. Among the five models, only the nightstand misses some connections; the four parts making up the back surface are detected as one part (see Figure 9 (a)).
Another important measure of accuracy is in the representation of the cut paths themselves; smooth curves and sharp corners should be reflected in the final result. Two of our models contain smooth curves: In the case of the stool, these features were handled without problems; for the tray, some additional corners are present in the handle holes, as well as on either side of the arches.
6 Limitations and Future Work
A key feature and a notable limitation of our method is its reliance on image evidence. The images enable the MVS reconstruction, corner assembly disambiguation, and recovery of nice cut paths. However, we can only reconstruct what we observe sufficiently. If a model has structure that hides some parts from view, it can be difficult or impossible for our method to accurately reconstruct the model. In the model in Figure 9b (“complex stool”), the parts highlighted in red are occluded by the boards directly above them, causing them to be detected as disjoint, floating pieces; the result is not fabricable or even connected. This might have been addressed by observing the underside of the red parts (and straightforwardly handling type 3 connections), but the underside was out of view. In future work, it would be interesting to explore the use of additional priors or learned semantics to reconstruct objects with incomplete observations. For instance, if a model could not be assembled only with detected parts (perhaps due to floating, unconnected pieces), we could potentially hallucinate new structures. Another way to address the method’s reliance on image evidence would be to extend the system to enable real-time capture by incorporating new observations incrementally, and guiding the user to capture new images that resolve ambiguities or missing geometry in the model.
Our method has a number of thresholds, cost terms, and other parameters that affect the final quality of the result. Many of these parameters were related to object dimensions, to limit dependency on object size. The method, particularly the segmentation, and curve fitting stages, are sensitive to the choice of some of these parameters. We determined good values by experimentation up front and then used the same parameter values (or proportionality constants for parameters related to object scale) for every model. In the future, it would be desirable to to detect some of these parameters adaptively to improve the robustness of our method. For instance, some specific stock thicknesses are more common as building materials, so part thicknesses could be used to detect the true object scale. This can further be used to ensure that parts can be made from readily available materials.
It might also be worth exploiting alternative or complementary features derived from the input data. For example, we could use all seams in the wood as evidence of cuts to find connected co-planar parts as in Figure 9a. To incorporate arbitrary seams robustly, we would need to account for other texture features, such as ever-present wood grain, that could be mistaken for cuts. In addition, man-made objects often have symmetries and repetitions, which could be exploited both to aid in detection and to further regularize the model. Lastly, in addition to planar surfaces detected from point clouds, we could also use the detected curved (cut) surfaces to guide segmentation. This would also help minimize the bias towards straight lines observed in the tray example.
Since our method identifies where and how parts are connected by classifying the types of connection (Figure 4), we can procedurally define connectors over the contact surface for each connection type. In our implementation we use two nails for each contact between parts, one at each end of the participating cut surface, as shown in Figure 10. This information can be used to create physical reproductions of the models, such as that shown in Figure 1. We leave it to future work to consider the problem of assembly in greater detail, in particular ensuring that connectors do not obstruct each other and determining easy-to-follow assembly instructions, which should ensure that a sequence of steps exists such that the model can be assembled without parts obstructing other parts, or the tools needed to assemble them. One direction is to adapt prior work in automatic generation of assembly instructions to the output of our method [agrawala2003designing].
Finally, it would be interesting to extend our method to more fabrication operations and objectives. Within carpentry, supporting more complex joinery would expand the possible geometries that could be reverse engineered. We currently assume planar contacts, both because they are common and because the internal structure of joints cannot always be observed without disassembly (such as with mortise and tenon joints). In some cases, however, it may be possible to parse joinery directly through image analysis. Optimizing joinery with respect to structural stability, similar to yao2017interactive, or additional objectives such as fabrication cost and packing efficiency wang2021stateoftheart could also guide reconstruction of joinery and other hidden structure.
In this work, we propose a method for recovering accurate representations of built, carpentered objects from a set of photographs by working within the space of the fabrication process itself. This representation is both highly expressive and subject to real world constraints, as the process describes models that can be physically realized, making it a good candidate for solving inverse problems in 3D reconstruction. Given enough images covering the surfaces, our solution in the carpentry domain can recover the parts and connections that comprise captured real world objects, complete with the simplified contours that most concisely describe the cut paths of the model, making it easy to edit with CAD software to create design variations. We hope this result will inspire future work at the intersection of fabrication and computer vision, leading to more end-to-end systems for 3D reconstruction that can take into account multiple materials and fabrication processes.
This work was supported by National Science Foundation grants CCF-2017927 and EEC-2035717, UW Reality Lab funding from Facebook, Google, and Futurewei. A. Schulz acknowledges the generous support of the Google Faculty Research Award.
Appendix A Supplementary Material
a.1 Cut Region Approximation
The function from which we extract isocontours to initially approximate the cut shape in Section 4.2.2 is
where are the 2D points, and is (/400). We extract a level set using the Marching Squares algorithm with a grid length of .
a.2 Inferring Connection Types
To detect whether part b approximately terminates at part a’s sheet plane, thus forming a T-junction necessary for considering type 1 connections, we use part a’s sheet normal , and a’s sheet plane offsets and to compute the projected offsets of all points in part b: and . If
are both true, part b is not confined to either side of part a’s sheet, preventing connection type 1. As long as only one is true, the connection is allowed; the interface plane offset is if (6) holds, and if (7) holds. Type 2 connections occur when a type 1 connection is valid for both orderings of parts a and b.
Consequently, there are actually 4 discrete configurations for a type 1 connection between two parts, as shown in Figure 11: For the connection and its reverse (where a and b are swapped), the connection may involve contact with one of two sides of the sheet plane (whether the interface plane offset is or ). For type 2 connections, the positions of these contacts for both the the type 1 connection and its reverse are used to determine the corner configuration, as shown in the bottom of Figure 11; knowing where the potential contact surfaces lie is crucial to knowing where in the images to look for seams.
We assume corners (type 2 connections) are right angles. Though it would not be difficult to allow them to vary, our choice of two "natural" corner configurations requiring only orthogonal cuts no longer makes sense; bevel cuts will be needed no matter what, so potentially more complex joints would need to be detected. Furthermore, non-orthogonal corners are uncommon.
a.3 Curve Fitting
The dynamic programming curve fitting algorithm can infer the optimal set of nodes from the input point set, with the caveat that it requires a starting point from which the optimal sub-ranges belonging to separate curves are determined. This starting point is necessarily a node, since it is the start of the first such range returned by the algorithm. A first instinct might be to choose a starting point that looks like it should be a corner; however, if the input shape has no true sharp corners, the exact tangent behavior at the start of the loop will depend on the behavior at the end of the loop, which violates the sequential order in which we find the curves.
Instead, we do the opposite: We look for a starting node in a region that is as flat as possible (which we determine using the curvature of a Bézier curve fit to a neighborhood of points centered at the query point). Such a region can be assumed to always exist for well-behaved inputs approximating continuous shapes, and allows the tangents at the loop boundaries to be assumed to be smooth. The downside is that the starting node often bisects a region that could be better described with a single curve or line segment. We therefore filter out the extra node whenever possible by merging collinear line segments (the most likely case).
In practice, it is very inefficient to consider every possible sub-range of points as a candidate curve. The space and time complexity of the dynamic programming algorithm is quadratic in the number of candidate nodes, and this number is potentially very large when the input is a dense bitmap mask boundary. So given a maximum number of candidate nodes , we find the input points with the highest curvature and mark them as candidates, since such corners are likely to mark the boundaries between separate curves.
The full recurrence relation for our energy function is
where and are the start and end points of the considered range of points, and is the type of curve fit to the range ending at (so is the energy of a sequence of curves whose last curve has type ). Note that the sub-range energy depends not only on the starting and end node indices, but also the type of the previous and current curve. Because also defines whether a Bézier curve should use the fixed (precomputed) tangent at its last endpoint, this allows us to define the rules, detailed in Section 4.4.4, governing the behavior of neighboring curves, including the angles between their tangents where they meet.
Appendix B Evaluating Design Choices
To gauge the necessity of some of the stages of our pipeline, we discuss how disabling or simplifying them impacts the quality of results.
Joint Image-Based Segmentation
Figure 12 shows the result of skipping image segmentation, and directly applying curve fitting to the point set boundaries from Section 4.2.2. Because regions of the model with less visibility, such as cavities and undersides, are often missing from the point cloud, the final result contains gaps. Compared to the full pipeline, we lack the ability to expand part shapes into regions of similar texture, which is a method to close gaps such as this.
Using Multiple Views in Segmentation
We also show results from using only a single image per part in the segmentation phase in Figure 13. Without multiple views to disambiguate foreground and background, similarly-colored surfaces become erroneously associated with the part shape, leading to an artifact-laden result. In general, material and lighting conditions can make discerning part shapes from certain views difficult; averaging the segmentation energy over multiple reprojected views exploits the view-dependence of pixels not belonging to the part, since similarly-colored background surfaces are less likely to interfere with the same pixels in every view.
Constrained Curve Fitting
The constraints in the curve fitting stage straighten curves in the vicinity of connection contacts, effectively flattening nearby artifacts arising from the segmentation stage (which occur due to the poor visibility at some junctions). In our tilted stool example, one such artifact occurs due to heavy shadowing (see the top of Figure 14), but our constrained curve fitting gives a clean result (see Figure 8 in the main paper). The bottom of Figure 14 shows the result without considering constraints in the curve fitting stage. The artifact persists; furthermore, the contacts between all parts are imperfect. Amending the contacts purely in a post-process gives rise to new challenges, for without enforcing straight segments in the vicinity of contacts, there may not be a single part of the boundary curve that can be “snapped” to the surface. For example, the cut edge on the right of the central part has been chosen as part of a longer, continuous curve.