Sketch-based Shape Retrieval using Pyramid-of-Parts

02/14/2015 ∙ by Changqing Zou, et al. ∙ 0

We present a multi-scale approach to sketch-based shape retrieval. It is based on a novel multi-scale shape descriptor called Pyramidof- Parts, which encodes the features and spatial relationship of the semantic parts of query sketches. The same descriptor can also be used to represent 2D projected views of 3D shapes, allowing effective matching of query sketches with 3D shapes across multiple scales. Experimental results show that the proposed method outperforms the state-of-the-art method, whether the sketch segmentation information is obtained manually or automatically by considering each stroke as a semantic part.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Due to their simplicity and intuitiveness, sketch-based interfaces have been popular for 3D shape retrieval. A standard approach is to turn the 2D-3D matching problem involved into a 2D-2D matching problem by first rendering every 3D repository model as 2D contours under multiple views and then matching the query sketch with every resulting contour. How to define effective features to represent both input sketches and 2D contours is a key challenge in sketch-based shape retrieval. We use 2D contours of models and model sketches interchangeably in the following discussion.

Various feature presentations (e.g., Gist [Oliva and Torralba 2001], Spherical Harmonics [Funkhouser et al. 2003], Eccentricity [Li and Johan 2012]) have been proposed to represent both queries and model sketches globally. Such global sketch representations are able to encode high-level shape information, but sensitive to intra-class variation and shape deformation. Recently, BoW representation [Eitz et al. 2012] has been brought into the field to address this problem. This representation is based on the statistics of the local features, such as GALIF [Eitz et al. 2012] and SIFT [Li et al. 2013], and it is proven more robust against the variations in the query and model sketches. However, approaches based on this representation may easily return locally similar but globally very different models (Figure 1). This is because the local features are still defined at the pixel level, without leveraging any high-level semantics, such as the semantically meaningful parts in the sketch and the 3D models.

Part-level representations have been proven useful for object detection and recognition [Felzenszwalb and Huttenlocher 2005]

in the computer vision community. However, existing sketch representations are still largely defined at the pixel level only. On the other hand, many techniques have been proposed to consistently decompose a set of 3D models into semantically meaningful parts 

[Kalogerakis et al. 2010, Huang et al. 2011, Hu et al. 2012]. In recent years, several techniques have also been developed to semantically segment freehand sketches, either automatically or interactively [Noris et al. 2012, Sun et al. 2012, Huang et al. 2014]. Hence, it is interesting to explore if the use of semantic parts could lead to more discriminative sketch representations for retrieval.

In this paper, we present a new sketch representation, called Pyramid-of-Parts, for sketch-based shape retrieval. Our representation is derived from the available part-level information associated with the query sketch and the 3D repository models. We consider two ways of obtaining sketch segmentation information, manually specified and automatically obtained simply by assuming that each input stroke forms a semantic part. As the semantic segmentation of the query sketch and that of the 3D models might be different or at different levels of detail due to the multi-scale nature of objects, we thus adapt the idea of image pyramids to encode semantic parts in a multi-scale manner. Our retrieval algorithm will then compare the Pyramid-of-Parts of the input sketch with those of the 3D models across different scales and return the list of models ranked according to how well they match with the input sketch semantically.

Figure 1: Compared with [Eitz et al. 2012], our sketch-based retrieval method based on the Pyramid-of-Parts returns more relevant models. The segmentation in the query sketch is color-coded.

Thanks to the Pyramid-of-Parts, our sketch-based shape retrieval technique outperforms the state-of-the-art techniques, which are based on either local descriptor [Eitz et al. 2012] or global descriptor [Zou et al. 2014], on two sketch datasets. We present a new sketching interface that supports the commonly used coarse-to-fine drawing practice and naturally provides semantically segmented sketches as query sketches. Since our representation encodes spatial information of the query sketch, our technique often produces desired results even if only a subset of parts are depicted in a query sketch (Figure 2) and performs better than  [Eitz et al. 2012] in this task. We refer to this kind of matching based on incomplete input sketches as incomplete matching in this paper.

Figure 2: Given some incomplete query sketches (top row), our method is able to return the desired models (bottom row).

2 Related Work

Sketch-based Shape Retrieval. This problem is often tackled by finding a repository model which rendered 2D contour (i.e., a silhouette rendering of the 3D model) under a certain viewpoint best matches the query sketch. Existing solutions mainly differ in the feature descriptors used to represent query/model sketches and can be largely categorized into two groups. The first group of approaches make use of global descriptors (e.g., [Loffler 2000, Funkhouser et al. 2003]) to represent the sketch globally. However, global descriptors are sensitive to intra-class variations and shape deformation, which would bring global changes to the descriptors. Global descriptors have difficulty in handling incomplete query sketches, as the missing information would also affect the global descriptors.

The second group of approaches use statistics about local descriptors for sketch representation. For example, Yoon et al. yoon2010sketch represent sketches using statistics of their diffusion tensor fields, leading to a histogram of orientations. Saavedra et al. saavedra2012sketch represent a sketch by the HOG feature of its “key shape”, which is an approximation of the contour with straight lines. Eitz et al. Eitz:2012:SketchRetrieval adopt the Bag-of-Features (BoF) model to represent a sketch as a histogram of visual words. These methods strike a balance between local and global features of 2D shapes, and are able to tolerate the inaccuracies inherent in sketches to some extent. However, since they discard spatial relationship among local descriptors, they often return unrelated shapes as similar (Figure 

1). Such spatial information of local descriptors can be captured by our proposed Pyramid-of-Parts, resulting in more discriminative power. Moreover, there are some other methods (e.g., [Shao et al. 2011, Li and Johan 2013]) that directly align the query sketch to rendered model views to compute sketch-to-model distances. The main limitation of these approaches is that they usually suffer from the efficiency problem.

Sketch-based Image Retrieval.

Sketch-based image retrieval has been under intensive research since 1990s, and many shape descriptors have been explored. (See 

[Eitz et al. 2010] for a nice survey.) Among them, Histograms of Oriented Gradients (HOG) [Dalal and Triggs 2005] and Shape Context [Belongie et al. 2002] have become very popular due to their simplicity, generality and discriminative power. The BoF model can be built upon these local descriptors [Eitz et al. 2011], and the resulting feature is shown to be more tolerant to sketch variations. These shape descriptors can be extended to multi-scale, leading to, for example, multi-scale HOG [Hu et al. 2010] and multi-sample BoF [Wu et al. 2009], where the sampled image patches do not have a fixed size. Compared to these works, our multi-scale descriptor is defined over semantic parts in the sketch, rather than image patches which content might bear no semantic information.

Part-based Models. In recent years, part-based models have been widely used in the computer vision community for the detection or recognition of objects in images. For example, Felzenszwalb and Huttenlocher FelzenszwalbH05 present a pictorial structure model to encode the relationship among different body parts. More recently, Felzenszwalb et al. FelzenszwalbGMR10 introduce the Deformable Part Model (DPM), which is able to successfully identify complex objects. Ferrari et al. FerrariFJS08 proposes a AS feature for object detection, with each “part” constructed by linking roughly-straight adjacent contour segments. However, it is unclear how to apply these models designed for object detection to our shape retrieval problem.

Image Pyramids. Since the pioneering works in [Burt 1981, Burt and Adelson 1983]

, pyramid methods have been extensively used for image analysis to capture the underlying patterns in multiple scales. For example, Lazebnik et al. Lazebnik2006 introduce the idea of spatial pyramid matching (SPM) for natural scene categorization. SPM uses features extracted in regions of different sizes, and organizes them into a spatial pyramid. This idea has been later extended and used in many applications, such as image classifications 

[Yang et al. 2009], image matching [Shrivastava et al. 2011] and 3D object recognition [Li and Guskov 2007, Redondo-Cabrera et al. 2012]. We also adopt this idea of spatial pyramid, since it is proven to be more effective than single-level approaches. However, unlike existing methods, which construct pyramids of pixels, our method constructs a pyramid of semantic parts.

3 Pyramid-of-Parts

We assume that both the input query sketch and the 2D model contours have been pre-segmented into semantic parts. Due to the multi-scale nature of objects, it is not uncommon that a query sketch and a model contour correspond to the same object but have different segmentations. It is thus important to know the scale of each part and compare parts only at the same scales.

Since each part does not have its corresponding label, it is challenging to form a semantically meaningful hierarchy of parts for matching. Instead, we adapt the idea of image pyramids into our problem for part scale normalization. Note that the query sketch and the model contours are both represented by pyramids of parts. The use of a common pyramid for both of them not only makes it possible to compare parts at the same scales but also capture the multi-scale nature of objects. Although the following discussion focuses mainly on how we process input sketchs, model contours are processing in exactly the same way.

Definition. Like image pyramids, a Pyramid-of-Parts consists of multiple scale levels, with each level containing groups of pre-segmented parts in the input sketch. Each group of parts as a whole at the same level have similar scale, and upper levels have larger scales, as illustrated in Figure 3.

3.1 Pyramid Generation

Figure 3: Constructing a 3-level Pyramid-of-Parts by assigning the semantic parts (color-coded) to different regions of the pyramid. is the i-th region in j-th level.

To generate a Pyramid-of-Parts, we associate each level with a set of regions in the sketch image. Let denote a region at level . For example, , , are the nine regions at level 1, as illustrated in Figure 3. Each region corresponds to a group of parts. Note that a region might be associated with zero (e.g., ), one (e.g., ) or multiple parts (e.g., ). Parts associated with a region is considered as one group of parts at level . Since groups of parts at upper levels are of larger scale, we require larger regions at upper levels.

The main criteria to determine if a part belongs to a region is to check whether is enclosed by or not. When is completely within (e.g., the leg part in in Figure 3), it is easy to conclude that should be in the group of parts associated with . However, the problem becomes tricky when covers multiple regions (e.g., the red arm in Figure 3 covering and ). Some part may cover multiple regions at the current level mainly because it should belong to a region at an upper level. For example, the body part in Figure 3 belongs to region , instead of , or .

With the above observations we determine if is assigned to by considering both the inclusion of inside and the relative size of to . Specifically, we formulate the likelihood of belonging to as :


where enforces penalty when the size of , denoted as and defined as the longer side of the bounding box of , is larger than . Here is a parameter ( in our implementation) and is the length of the longer size of . Precisely, when , we set , corresponding to no penalty. Otherwise, decreases as deviates from , which is computed as , and is clamped to 0 if it becomes negative.

Even if the size of is smaller than the size of , it is still possible that extrudes from , since is not necessarily centered at . We thus use to penalize the extrusion of from . Let be the stroke length of , be the part of inside , and we have , which reaches the maximum when is completely inside .

In the end, is assigned to if . Also, we compute a reliability value for each region to quantify the certainty of the assignments happened to this region, which in later stages is used to downplay those regions having many uncertain assignments of parts. Let be the parts assigned to region , then the reliability of is computed as , where . Each part will eventually get assigned to at least one region, because the topmost level region ( in Figure 3) covers the entire image.

3.2 Feature Extraction

The Pyramid-of-Parts feature is the concatenation of all the features extracted from all the regions. To begin with, each of the regions in the pyramid is either empty or contains a group of parts. For empty regions, their features are simply all zeros. For others, their features are gabor features extracted from the groups of parts in them, as shown in Figure 4. A group is first placed in a bounding square, and then convolved with a set of gabor filters. Each response is averaged by a grid (Figures 4(c) and 4(d)), and the outcome becomes part of the final feature (Figure 4(e)). The parameters of the gabor filters are different for each level of the pyramid, and is discussed in Section 5.1.

Figure 4: Extracting the feature for a group of parts with gabor filters. The group is first placed in a bounding square (a), and then convolved with a set of gabor filters (b), each resulting in a response map (c), which are further averaged by a coarse grid (d) and concatenated into the final feature (e).

4 Shape Retrieval Framework

Our shape retrieval engine is built upon the Pyramid-of-Parts feature, and the pipeline is shown in Figure Sketch-based Shape Retrieval using Pyramid-of-Parts. As in [Eitz et al. 2012], we take a 2D-to-2D matching approach, i.e., matching the input sketch against all the views of all the models in database. To construct the database, we render each model for each selected view using suggestive contour [DeCarlo et al. 2003] (Figure Sketch-based Shape Retrieval using Pyramid-of-Parts(b)), and extract its Pyramid-of-Parts feature (Figures Sketch-based Shape Retrieval using Pyramid-of-Parts(d) and Sketch-based Shape Retrieval using Pyramid-of-Parts(e)). Given a query sketch, its Pyramid-of-Parts feature will be extracted and matched against all the features in the database, after which the top matched models will be retrieved.

4.1 The Query Sketch

The input query sketch consists of a set of strokes drawn by the user. The sketch is scaled such that a fixed-sized canvas (of resolution in our implementation) forms its bounding square. To extract the Pyramid-of-Parts feature, segmentation of the sketch is required, which can be done automatically [Sun et al. 2012, Huang et al. 2014] or manually [Noris et al. 2012]. For maximum accuracy, here we opt for the manual approach, which is discussed in detail in Section 5.1.

4.2 Database Construction

To construct a 3D database, we need a set of segmented 3D models. This database stores the Pyramid-of-Parts features of the 2D contours of each segmented models under a selected set of views.

The procedure of computing the Pyramid-of-Parts feature of a 3D model is shown in the second and third rows of Figure Sketch-based Shape Retrieval using Pyramid-of-Parts. Given a segmented 3D model, a 2D model contour is generated from a given view of the model using Suggestive Contours [DeCarlo et al. 2003], with the segmentation information transferred from the 3D model (Figure Sketch-based Shape Retrieval using Pyramid-of-Parts(b)). The semantic parts are then processed into a Pyramid-of-Parts (Figure Sketch-based Shape Retrieval using Pyramid-of-Parts(c)), which is used to produce the Pyramid-of-Parts feature (Figure Sketch-based Shape Retrieval using Pyramid-of-Parts(d)) following the procedures described in Section 3. Multiple views are used for each model, which are representative views generated using [Zou et al. 2014]

. The view generation process starts by sampling many views uniformly distributed on the viewpoint sphere, among which 42 views covering most of the information given by the dense views are selected.

4.3 Retrieval

To retrieve a model, the Pyramid-of-Parts feature of the sketch is compared with all the Pyramid-of-Parts features in the database, and the nearest neighbors are returned as matches. The distance between two Pyramid-of-Parts features is the weighted sum of distances between the constituent gabor features in the corresponding regions. Let be a Pyramid-of-Parts feature, where is the gabor feature of region . (The level is not important here.) The distance of two features and is computed as:


The weight is proportional to the product of region importance and reliability. For region , the importance is set roughly proportional to the area of the region. In our 3-level implementation, and is set to 1, 4 and 9 for regions at levels 1, 2 and 3, respectively. The reliability is the degree of certainty of assigning the semantic parts in to , as described in Section 3.1. Given these quantities, the unnormalized weight , and the final weight .

5 Evaluation

We have conducted four experiments, Exp. 1-4, to evaluate the performance of the proposed method. Exp. 1 evaluates the performance when using different region subdivision strategies (Section 5.2). Exp. 2 evaluates the performance when using user-provided sketch segmentation information (Section 5.3). Exp. 3 evaluates the performance of a simple, automatic sketch segmentation strategy by grouping strokes (Section 5.5). Finally, Exp. 4 evaluates the performance on incomplete matching (Section 5.6).

5.1 Experimental Settings

Tools. We have developed a prototype retrieval system, which we used to collect input sketches and evaluate the performance of the proposed method. The interface of the system allows users to draw three types of strokes: bounding box, segmentation and the query sketch. The user may draw these strokes in any order. The bounding box strokes represent a bounding box of the sketch, which is only useful for incomplete matching (Seciton 5.6), where a bounded canvas is needed. The segmentation strokes are used to segment the query sketch into semantic parts. Each segmentation stroke is a closed curve forming a zone, and each stroke of a query sketch is assigned to the zone that contains more than half of it. All the query sketch strokes assigned to one zone are assumed to form one semantic part. Note that it is possible for one semantic part to lie completely within another (e.g., a human eye and head), and they cannot be separated because the zone for the larger semantic part will contain that of the smaller one. In this case, a query sketch stroke will be assigned to the smaller zone only if more than half of the stroke is inside it. Finally, if there exists semantic parts, only zones are needed, and the strokes not belonging to any zone are assigned to a background zone, which also represents one semantic part.

3D models dataset. Our 3D models come from the PSB dataset [Chen et al. 2009]. This dataset contains 380 models in 19 categories. It contains segmentation results from different segmentation methods and we selected the segmentation results produced by Randomized Cut [Chen et al. 2009].

Sketch dataset. With our prototype retrieval system, we collected a total of 428 complete sketches and 205 partial sketches (see the supplemental). Both full and partial sketches covered all 19 categories of the PSB model dataset. 10 users were invited to freely draw query sketches after we had shown them an example model from each category of the 3D dataset. Users were asked to freely specify the segmentation strokes for their drawings. This sketch dataset is used in most of the experiments where segmentation is needed. To compare with [Eitz et al. 2012] fairly when sketch segmentation is not available, we use a subset of their sketch data, which includes 395 sketches, covering 10 of the PSB model categories. Other sketches used by [Eitz et al. 2012] do not have a corresponding category in the PSB model dataset and thus are discarded.

Performance metrics. To qualitatively evaluate the proposed method, we have adopted four performance metrics: 1) Precision-recall; 2) Top One (TO), which measures the precision of the top-one results, averaged over all queries; 3) First Tier (FT), which measures the precision of the top results (where is the number of ground-truth models relevant to the query), averaged over all queries; and 4) Mean Average Precision (mAP), which summarizes the average precision of ranking lists for all queries.

Methods for Comparison. We mainly compared our framework with the popular Bag-of-Words framework (denoted as BOW[Eitz et al. 2012], and the global feature based framework (denoted as GF) as used in [Eitz et al. 2011], which encodes the whole query sketch using a chosen shape descriptor (GALIF in our case). Our full method is denoted as OUR-FULL. As all the methods use Gabor filter somewhere, the parameters of the Gabor filters for all of them in all the experiments are set to the same (as described below).

Parameters of the Gabor filters. Gabor filters are used in all the methods compared, and the following parameters are shared among them: peak response frequency , frequency bandwidth , angular bandwidth , and the orientations are . The explanation of these parameters can be found in [Eitz et al. 2012]. When averaging the Gabor response (Figure 4(d)), the grid size needs specified. For our method, it is 2x2, 4x4 and 6x6 for the image regions in Levels 1, 2 and 3, respectively. For GF, it is 6x6, same as the grid size used for the top-level image region in our method. For BOW, it is 4x4 as in [Eitz et al. 2012].

Figure 5: Three types of input strokes in our system: bounding box (left, gray), sketch (black) and segmentation strokes (right, gray).

5.2 Exp. 1: Region Subdivision Strategy

Our method is based on Pyramid-of-Parts as illustrated in Fig.Sketch-based Shape Retrieval using Pyramid-of-Parts. The default number of levels for the Pyramid-of-Parts is 3 and the subdivision of regions is as shown in Figure 3. In this experiment, we study the effect of using different subdivision schemes on our retrieval performance:

  • Without overlapping regions: We divide the sketch into four regions, where is the side length of the square region , as shown in Figure 6(a). This is denoted as 4R_NO.

  • Using different ways of constructing Level 2 regions: First, we divide the sketch into four different overlapped regions, as shown in Figure 6(b). This is denoted as 4R_O. Second, in addition to the original four regions shown in Figure 3(b), we add two new regions to Level 2 as shown in Figures 6(c1) and 6(c2). This is denoted as 6R_O.

  • Using a different number of levels: First, we add one more level between the current Levels 2 and 3 with four or regions to 4R_NO. This is denoted as 4LV. Second, we remove one level (Level 2) from 4R_NO. This is denoted as 2LV.

In this experiment, the region subdivision for Levels 1 and 3 are fixed. Figure 7 compares the retrieval performances of the above four schemes. It shows that introducing region overlapping, adding more regions and adding more levels all help improve the performance. This is because these operations increase the amount of information in the resulting feature, improving its discriminative power. Since the best performance is obtained using the scheme 6R_O, we use it in later experiments.

Figure 6: Different region subdivision schemes: (a) four regions without overlapping in Level 2 (i.e., 4R_NO); (b) four regions with overlapping in Level 2 (i.e., 4R_O); (c1)(c2) two new regions added to Level 2 (i.e., 6R_O); (d) regions of the new level added between Levels 2 and 3 of (i.e., 4LV).
Figure 7: Performance comparison of different region subdivision schemes.

5.3 Exp. 2: Full Method Comparison

In this experiment, we compare our full method (OUR-FULL) to the two competing methods, namely Bag-of-Words model (BOW) and retrieval by global feature (GF), over the 428 segmented sketches collected using our system. The results are shown in Figure 8 as red, purple and black curves, respectively.

We can see that our method (OUR-FULL

) has achieved the best retrieval performance on all four evaluation metrics.

BOW has achieved the second best average retrieval precision (mAP), but its retrieval accuracies evaluated by TO and FT are worse than those of GF. These results motivated us to investigate if it is the multi-scale nature of the Pyramid-of-Parts or the use of semantic parts that faciliates the better performance of OUR-FULL over BOW and GF.

5.4 Multi-scale vs. Semantic Parts

Since our method adopts two main ideas, the multi-scale nature of the Pyramid-of-Parts or the use of semantic parts, we would like to understand how two ideas contribute to the overall retrieval performance. Hence, we tested two approaches to evaluate the two ideas individually.

The first approach (denoted as OUR-NOG

) skips the grouping stage, i.e., removing the effect of multi-scale. However, as the sketches may contain different number of semantic parts, if we simply extract a Gabor feature for each part, the final feature vectors will be of different lengths for different sketches, making them hard to compare. As such, to obtain a fixed-length feature vector, we still assign the semantic parts to image regions in one of the levels, but each semantic part is only assigned to one region (the one having the highest assignment likelihood in Eq. 

1). After that, the process is the same as the full method.

The second approach (denoted as OUR-PIX) removes all the information about semantic parts but keeps the multi-scale process. To do that, the sketch is rasterized into an image, and all segmentation information is discarded. In the grouping stage, the image patch bounded by each region is regarded as a part, which Gabor feature is extracted to compose the final feature.

The average retrieval performances of the two approaches on all the sketches are shown in Figure 8. We can see that OUR-PIX performs only slightly better than GF, and OUR-NOG performs much worst than GF and OUR-PIX, while OUR-FULL performs the best. This experiment shows that the combination of multi-scale and usage of semantic parts significantly improves the retrieval improvement than only using one of ideas. From our analyses of the results, we have also found that OUR-NOG often performs better on sketches that are segmented into a small number of parts by the user, such as the lower diagram shown in Figure 1(a). The main reason is that with a small number of parts, the segmentation of the 3D models tends to correspond to the segmentation of the input sketches.

Figure 8: Retrieval performance on the 428 segmented sketches collected using our system.

To further evaluate this point, we selected all the sketches which segmentation information provided by the users are largely consistent with that of the 3D models in the dataset. There are 96 such sketches in total. Most of these sketches fall into three categories, ”Cup”, ”Glass”, and ”Teddy Bear”, where the segmentation is less ambiguous. The retrieval results of these query sketches, as shown in Figure 9, indicate that OUR-FULL significantly outperforms the other methods, when the segmentation information of the input sketches is consistent with those of the 3D models.

Figure 9: Retrieval performance on the 96 sketches having consistent segmentation information as the 3D models in the dataset.

5.5 Exp. 3: Strokes as Semantic Parts

In this experiment, we investigate how our method performs when segmentation information of the input query sketches is not available. A straightfoward approach to cope with this problem is to consider each stroke as a semantic part. This approach is denoted as OUR-STK and is evaluated here over the sketch dataset provided by [Eitz et al. 2012]. The results are shown in Figure 10.

It is interesting to see that OUR-STK performs better than BOW and GF. The reason is that users’ strokes tend to approximate the true segmentation to some extent. These results also indicate that OUR-FULL can achieve higher performance even on sketches without user segmentation.

Figure 10: Retrieval performance using stroke-based segmentation on sketches provided by [Eitz et al. 2012].

5.6 Exp. 4: Incomplete Matching

As the Pyramid-of-Parts feature is a collection of Gabor features obtained from different image regions, it is possible to compare the Pyramid-of-Parts features of some local regions only. This characteristic of the Pyramid-of-Parts feature suggests an interesting application, incomplete matching, where a partially drawn query sketch can be used for model retrieval. Note that incomplete matching is not exactly the same as partial matching. With partial matching, the input sketch can be matched any local region of a database model. With incomplete matching, we may make use the location information of the drawn strokes relative to the canvas (or the user provided bounding box) so that the matching can be localized.

Here, it may be interesting to compare our method with [Eitz et al. 2012]. Although [Eitz et al. 2012] can also be used for incomplete matching, as their method computes some global statistics of local features, the comparison itself is therefore global, i.e., comparing the global statistics of an incomplete input sketch with those of the 2D model contour of a database model. On the contrary, with our method, we may simply skip the comparison of the Gabor features of those regions with no semantic parts in them.

To evaluate the performance of our method for incomplete matching, we have performed an experiment on incomplete matching. We collected 205 partial sketches covering all 19 categories of the PSB model dataset. The methods for comparison include our full method (i.e., OUR-FULL), our method without user segmentation but considering each stroke as a part (i.e., OUR-STK), Bag-of-Words model [Eitz et al. 2012] (i.e., BOW) and retrieval using global features (i.e., GF). Figure 11 compares the retrieval performances of the above methods. Our method outperforms the existing methods whether the segmentation information is obtained manually or automatically. This is mainly because the competing methods do not support localized matching, and they are matching the incomplete sketch to the complete model contours of the models.

Figure 11: Retrieval performance of incomplete matching.

6 Conclusion

In this paper, we have investigated the use of semantic segmentation information to improve the performance of sketch-based 3D shape retrieval. We proposed the Pyramid-of-Parts to support multi-scale matching of semantic parts. With the proposed method, we have evaluated the retrieval performances with and without user-provided segmentation information. Our experimental results show that the proposed method performs better than the state-of-the-art method by [Eitz et al. 2012] in both situations. We have also compared the two methods with incomplete input sketches. Our experimental results show that the proposed method performs significantly better than  [Eitz et al. 2012].


  • [Belongie et al. 2002] Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE TPAMI 24, 4, 509–522.
  • [Burt and Adelson 1983] Burt, P., and Adelson, E. 1983. The laplacian pyramid as a compact image code. IEEE Trans. on Communications 31, 4.
  • [Burt 1981] Burt, P. 1981. Fast filter transform for image processing. CGIP 16, 1, 20–51.
  • [Chen et al. 2009] Chen, X., Golovinskiy, A., and Funkhouser, T. 2009. A benchmark for 3D mesh segmentation. ACM TOG 28, 3.
  • [Dalal and Triggs 2005] Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proc. IEEE CVPR, vol. 1, 886–893.
  • [DeCarlo et al. 2003] DeCarlo, D., Finkelstein, A., Rusinkiewicz, S., and Santella, A. 2003. Suggestive contours for conveying shape. ACM TOG 22, 3, 848–855.
  • [Eitz et al. 2010] Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics 34, 5.
  • [Eitz et al. 2011] Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE TVCG 17, 11, 1624–1636.
  • [Eitz et al. 2012] Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4.
  • [Felzenszwalb and Huttenlocher 2005] Felzenszwalb, P., and Huttenlocher, D. 2005. Pictorial structures for object recognition. IJCV 61, 1, 55–79.
  • [Felzenszwalb et al. 2010] Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE TPAMI 32, 9, 1627–1645.
  • [Ferrari et al. 2008] Ferrari, V., Fevrier, L., Jurie, F., and Schmid, C. 2008. Groups of adjacent contour segments for object detection. IEEE TPAMI 30, 1, 36–51.
  • [Funkhouser et al. 2003] Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., and Jacobs, D. 2003. A search engine for 3D models. ACM TOG 22, 1, 83–105.
  • [Hu et al. 2010] Hu, R., Barnard, M., and Collomosse, J. 2010. Gradient field descriptor for sketch based retrieval and localization. In Proc. IEEE ICIP, 1025–1028.
  • [Hu et al. 2012] Hu, R., Fan, L., and Liu, L. 2012. Co-segmentation of 3D shapes via subspace clustering. Computer Graphics Forum 31, 5, 1703–1713.
  • [Huang et al. 2011] Huang, Q., Koltun, V., and Guibas, L. 2011.

    Joint shape segmentation with linear programming.

    ACM TOG 30, 6.
  • [Huang et al. 2014] Huang, Z., Fu, H., and Lau, R. 2014. Data-driven segmentation and labeling of freehand sketches. ACM TOG 33, 6.
  • [Kalogerakis et al. 2010] Kalogerakis, E., Hertzmann, A., and Singh, K. 2010. Learning 3D mesh segmentation and labeling. ACM TOG 29, 4.
  • [Lazebnik et al. 2006] Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE CVPR, vol. 2, 2169–2178.
  • [Li and Guskov 2007] Li, X., and Guskov, I. 2007. 3D object recognition from range images using pyramid matching. In Proc. IEEE ICCV, 1–6.
  • [Li and Johan 2012] Li, B., and Johan, H. 2012. Sketch-based 3D model retrieval by incorporating 2D-3D alignment. Multimedia Tools and Applications, 1–23.
  • [Li and Johan 2013] Li, B., and Johan, H. 2013. Sketch-based 3D model retrieval by incorporating 2D-3D alignment. Multimedia Tools and Applications 65, 3, 363–385.
  • [Li et al. 2013] Li, B., Lu, Y., Godil, A., Schreck, T., Aono, M., Johan, H., Saavedra, J., and Tashiro, S. 2013. Shrec’13 track: large scale sketch-based 3D shape retrieval. In Proc. EG Workshop on 3D Object Retrieval, 89–96.
  • [Loffler 2000] Loffler, J. 2000. Content-based retrieval of 3D models in distributed web databases by visual shape information. In Proc. IEEE InfoVis, 82–87.
  • [Noris et al. 2012] Noris, G., Sýkora, D., Shamir, A., Coros, S., Whited, B., Simmons, M., Hornung, A., Gross, M., and Sumner, R. 2012. Smart scribbles for sketch segmentation. Computer Graphics Forum 31, 8, 2516–2527.
  • [Oliva and Torralba 2001] Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 3, 145–175.
  • [Redondo-Cabrera et al. 2012] Redondo-Cabrera, C., Lopez-Sastre, R., Acevedo-Rodriguez, J., and Maldonado-Bascon, S. 2012. Surfing the point clouds: Selective 3D spatial pyramids for category-level object recognition. In Proc. IEEE CVPR, 3458–3465.
  • [Saavedra et al. 2012] Saavedra, J., Bustos, B., Schreck, T., Yoon, S., and Scherer, M. 2012. Sketch-based 3D model retrieval using keyshapes for global and local representation. In Proc. EG Workshop on 3D Object Retrieval, 47–50.
  • [Shao et al. 2011] Shao, T., Xu, W., Yin, K., Wang, J., Zhou, K., and Guo, B. 2011. Discriminative sketch-based 3D model retrieval via robust shape matching. Computer Graphics Forum 30, 7, 2011–2020.
  • [Shrivastava et al. 2011] Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. ACM TOG 30, 6.
  • [Sun et al. 2012] Sun, Z., Wang, C., Zhang, L., and Zhang, L. 2012. Free hand-drawn sketch segmentation. In Proc. ECCV, 626–639.
  • [Wu et al. 2009] Wu, Z., Ke, Q., Sun, J., and Shum, H. 2009. A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In Proc. IEEE ICCV, 1992–1999.
  • [Yang et al. 2009] Yang, J., Yu, K., Gong, Y., and Huang, T. 2009. Linear spatial pyramid matching using sparse coding for image classification. In Proc. IEEE CVPR, 1794–1801.
  • [Yoon et al. 2010] Yoon, S., Scherer, M., Schreck, T., and Kuijper, A. 2010. Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In Proc. ACM Multimedia, 193–200.
  • [Zou et al. 2014] Zou, C., Wang, C., Wen, Y., Zhang, L., and Liu, J. 2014. Viewpoint-aware representation for sketch-based 3D model retrieval. IEEE Signal Processing Letters 21, 8, 966–770.