3D Shape Segmentation via Shape Fully Convolutional Networks

02/28/2017 ∙ by Pengyu Wang, et al. ∙ 0

We propose a novel fully convolutional network architecture for shapes, denoted by Shape Fully Convolutional Networks (SFCN). 3D shapes are represented as graph structures in the SFCN architecture, based on novel graph convolution and pooling operations, which are similar to convolution and pooling operations used on images. Meanwhile, to build our SFCN architecture in the original image segmentation fully convolutional network (FCN) architecture, we also design and implement a generating operation with bridging function. This ensures that the convolution and pooling operation we have designed can be successfully applied in the original FCN architecture. In this paper, we also present a new shape segmentation approach based on SFCN. Furthermore, we allow more general and challenging input, such as mixed datasets of different categories of shapes which can prove the ability of our generalisation. In our approach, SFCNs are trained triangles-to-triangles by using three low-level geometric features as input. Finally, the feature voting-based multi-label graph cuts is adopted to optimise the segmentation results obtained by SFCN prediction. The experiment results show that our method can effectively learn and predict mixed shape datasets of either similar or different characteristics, and achieve excellent segmentation results.



There are no comments yet.


page 1

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Shape segmentation aims to divide a 3D shape into meaningful parts, and to reveal its internal structure. This is the basis and prerequisite to explore the inherent characteristics of the shape. The results obtained from shape segmentation can be applied to various fields of computer graphics, such as shape editing Yu et al. (2004), deformation Yang et al. (2013), and modelling Chen et al. (2015). Shape segmentation, therefore, has become a research hotspot, yet difficulties in the fields of digital geometric model processing and instance modelling persist.

Convolutional networks have shown excellent performance in various image processing problems, such as image classification Krizhevsky et al. (2012); Szegedy et al. (2015); Simonyan and Zisserman (2014), and semantic segmentation Ciresan et al. (2012); Farabet et al. (2013); Pinheiro and Collobert (2014). With the emerging encouraging study results, many researchers have devoted their efforts to various deformation studies on CNNs, one of which is the fully convolutional network (FCN) Long et al. (2015). This method can train end-to-end, pixels-to-pixels on semantic segmentation, with no requirement over the size of the input image. Thus, it has become one of the key research topics in CNN networks.

Figure 1: Example of our shape segmentation results on one mixed shape dataset. The shapes on the left are part of the training set, and some segmentation results are shown on the right.

Although FCNs can generate good results in image segmentation, we cannot directly apply it to 3D shape segmentation. This is mainly because image is a type of a static 2D array, which has a very standard data structure and regular neighbourhood relations. Therefore, convolution and pooling can be easily operated when processing FCNs. However, the data structure of a 3D shape is irregular, so it cannot be directly represented as the data structure of an image. As triangle meshes have no regular neighbourhood relations like image pixels, direct convolution and pooling operations on a 3D shape is difficult to fulfil.

Inspired by the FCN architecture in image segmentation, we design and implement a new FCN architecture that operates directly on 3D shapes by coverting a 3D shape into a graph structure. Based on the FCN process of convolution and pooling operation on the image and existing methods of Graph Convolutional Neural Networks

Edwards and Xie (2016); Niepert et al. (2016); Defferrard et al. (2016), we design a shape convolution and pooling operation, which can be applied directly to the 3D shape. Combined with the original FCN architecture, we build a new shape fully convolutional network architecture and name it Shape Fully Convolutional Network (SFCN). Secondly, following the SFCN architecture mentioned above and the basic flow of image segmentation of FCN Long et al. (2015), we devise a novel trained triangles-to-triangles model for shape segmentation. Thirdly, for higher accuracy of segmentation, we use the multiple features of the shape to complete the training on the SFCN. Utilising the complementarity between features, and combined with the multi-label graph cuts method Boykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004), we optimise the segmentation results obtained by the SFCN prediction, through which the final shape segmentation results are obtained. Our approach can realize the triangles-to-triangles learning and prediction with no requirements on the triangle numbers of the input shape. Furthermore, many experiments show that our segmentation results perform better than those of existing methods Kalogerakis et al. (2010); Guo et al. (2015); Xie et al. (2014), especially when dealing with a large dataset. Finally, our method permits mixed dataset learning and prediction. Datasets of different categories are combined together in the test, and the accuracy of the segmentation results of different shapes decreases very little. As shown in Figure 1, for a mixed shape dataset from COSEG Wang et al. (2012) with several categories of shapes, part of the training set are displayed on the left, and some corresponding segmentation results are shown on the right. Figure 2 shows the process of our method.

Figure 2: The pipeline of our method. It may be divided into 3 stages: training process, using the SFCN architecture to train under three different features; testing process, predicting the test sets through the SFCN architecture; optimisation process, optimising the segmentation results by the voting-based multi-label graph cuts method to obtain the final segmentation results.

The main contributions of this paper include:

  1. We design a fully convolutional network architecture for shapes, named Shape Fully Convolutional Network (SFCN), and is able to achieve effective convolution and pooling operations on the 3D shapes.

  2. We present the shape segmentation and labelling based on SFCN. It can be triangles-to-triangles by three low-level geometric features, and outperforms the state-of-the-art shape segmentation.

  3. Excellent segmentation results on training and predicting mixed datasets of different categories of shapes are achieved.

2 Related Work

Fully convolutional networks. The fully convolutional network Long et al. (2015) proposed in 2015 is pioneering research which can effectively solve the problem of semantic image segmentation by pixel-level classification. Later, a great deal of research has emerged based on the FCN algorithms and achieved good results in various fields such as edge detection Xie and Tu (2015), depth regression Liu et al. (2015), optical flow Dosovitskiy et al. (2015), simplifying sketches Simo-Serra et al. (2016) and so on. However, the existing research on FCN is mainly restricted to image processing, largely because image has a standard data structure, easy for convolution, pooling and other operations.

Graphs CNN. Standard CNN cannot work directly on data which have graph structures. However, there are some previous works Bruna et al. (2014); Duvenaudy et al. (2015); Niepert et al. (2016); Defferrard et al. (2016) that discuss how to train CNN on graphs. These works Bruna et al. (2014); Defferrard et al. (2016) use a spectral method by computing bases of the graph Laplacian matrix, and perform the convolution in the spectral domain. Niepert et al. (2016)

use a PATCHY-SAN architecture which maps from a graph representation onto a vector space representation by extracting locally connected regions from graphs.

Deep learning on 3D shapes. Recent works introduce various methods of deep architectures for 3D shapes. By using volumetric representation Wu et al. (2015); Qi et al. (2016), standard CNN 2d convolution and pooling methods can be easily extended to 3D convolution and pooling. However, it leads to some problems which cannot be completely solved, such as data sparsity and the computation cost of 3D convolution. Kalogerakis et al. (2017) proposed multi-view image-based Fully Convolutional Networks (FCNs) and surface-based CRFs to do the 3D mesh labelling by using RGB-D data as input. However, it may need a lot of time to compute the surface-pixel reference, and more viewpoints to maximally cover the shape surface. These methods Yi et al. (2017); Su et al. (2017) proposed different deep architectures which can be trained directly on point clouds to achieve points segmentation and other 3D points analysis. However, the point cloud structure sampled from the origin meshes may lack the intrinsic structure of the meshes. For better understanding the intrinsic structures, many works directly adapt neural networks to meshes. The first mesh CNN method GCNNMasci et al. (2015) used convolution filters in local geodesic polar coordinates. ACNNBoscaini et al. (2016) used anisotropic heat kernels as filters. MoNetMonti et al. (2017) considered a more generalized CNN architectures by using mixture model. Maron et al. (2017) using a global seamless parameterization for which the convolution operator is well defined but can only be used on sphere-like meshes. As Boscaini et al. (2015); Litany et al. (2017) show that intrinsic CNN methods can achieve better result on shape correspondence and descriptor learning with less training data. More details of these techniques can be found in Bronstein et al. (2017).

Supervised methods for segmentation and labelling.

When using the supervised method to train a collection of labelled 3D shapes, an advanced machine learning approach is used to complete the related training

Kalogerakis et al. (2010); Guo et al. (2015); Xie et al. (2014); Wang et al. (2012, 2013). For example, Kalogerakis et al. (2010) used Conditional Random Fields (CRF) to model and learn the sample example, in order to realize the component segmentation and labelling of 3D mesh shapes. Wang et al. (2013) first projected 3D shapes to 2D space, and the labelling results in 2D space were then projected back to 3D shapes for mesh labelling. Xie et al. (2014) use Extreme Learning Machines (ELM), which can be used to achieve consistent segmentation for unknown shapes. Guo et al. (2015); Kalogerakis et al. (2017) applied deep architectures to produce the mesh segmentation.

Unsupervised segmentation and labelling. A lot of research methods can build a segmentation model Huang et al. (2011); Sidi et al. (2011); Hu et al. (2012); Meng et al. (2013); Xu et al. (2010)

and achieve joint segmentation without any label information. There are predominantly two unsupervised learning methods: matching and clustering. Using the matching method, the matching relation between pair 3D shapes is obtained based on the similarity of a relative unit given by correlation calculation. The segmentation shape of matching shape is then established to realize the joint segmentation

Kreavoy et al. (2007); Huang et al. (2011) of 3D shapes. By contrast, clustering methods analyze all the 3D shapes in the model set and cluster the consistent correlation units of 3D shapes into the same class. A segmentation model is then obtained and applied to consistent segmentation Hu et al. (2012); Meng et al. (2013); Xu et al. (2010).

3 Shape Fully Convolution Network

In the process of image semantic segmentation, it is mainly through operations such as convolution and pooling that fully convolution network architecture completes the image segmentation Long et al. (2015). As analyzed above, the regular data structure amongst the pixels of the image makes it easy to implement these operations. By analogy with images, triangles of the 3D shape can be seen as pixels on the image, but unlike pixels, triangles of the shape have no ordered sequence rules. Figure 3(a) represents the regular pixels on the image, while Figure 3(b) represents the irregular triangles on the shape. It is difficult to complete the convolution and pooling operation on the 3D shape like that of the image. Therefore, based on the characteristics of 3D shape data structure and analogous to the operation on the image, we need to design a new shape convolution and pooling operation.

Figure 3: Representation of different forms of data. (a) Image data representation; (b) Shape data representation; (c) Shape data represented as graph structure.

As a 3D shape is mainly composed of triangles and the connections among them, we can use graph structure to describe it. Each 3D shape can be represented as , with vertex for a triangle and edge for the connection of adjacent triangles. The small triangle shown in 3(b) corresponds to the graph structure shown in Figure 3(c). Based on the graph structure, we design and implement a shape convolution and pooling operation, which will be detailed in the following section.

3.1 Convolution on Shape

Convolution is one of the two key operations in the FCN architecture, allowing for locally receptive features to be highlighted in the input image. When convolution is applied to images, a receptive field (a square grid) moves over each image with a particular step size. The receptive field reads the feature values of the pixels for each channel once, and a patch of values is created for each channel. Since the pixels of an image have an implicit arrangement, a spatial order, the receptive fields always move from left to right and top to bottom. Analogous to the convolution operation on the image, therefore, we need to focus on the following two key ideas when employing convolution operation on the shapes:

  1. Determining the neighbourhood sets around each triangle for the convolution operation according to the size of the receptive field.

  2. Determining the order of execution of the convolution operation on each neighbourhood set.

For the first point, the convolution operation on the image is mainly based on the neighbourhood relationship between pixels. Accordingly, we need to construct locally connected neighbourhoods from the input shape. These neighbourhoods are generated efficiently and serve as the receptive fields of a convolutional architecture, permitting the architecture to learn shape representations effectively.

Figure 4: Convolution process on a shape. (a) Shape represented as a graph; (b) The neighbourhood nodes of different source nodes searched by a breadth-first search, among which 4, 6, 7 represent source nodes. The areas circled by orange, red and green dotted lines are the neighbourhood nodes searched by each source node. (c) Convolution order of each neighbourhood set. The ellipsis is for other neighbourhood sets not represented.

Shape has a neighbourhood relationship just like an image, but its irregularity restrains it to be directly represented and applied to the FCN learning. Yet when expressed as graph structures, the locally connected neighbourhoods of each triangle of the shape can be easily determined with various search strategies. In this paper, each triangle of the shape is viewed as a source node. We use the breadth-first search to expand its neighbourhood nodes on the constructed graph so as to obtain the neighbourhood sets of each triangle in the shape. Suppose the receptive field is set as , the size of the neighbourhood set will be the same, including neighbourhood nodes and a source node, all of which will be used for a follow-up convolution operation. Figure 4(a) shows the graph structure of the shape, while Figure 4(b) shows the neighbourhood sets of each source node (that is, each triangle on the 3D shape) we obtained with the method outlined above.

As for the second point, when performing the convolution operation on the image, it is easy to determine the order of the convolution operation according to the spatial arrangement of the pixels. However, it is rather difficult to determine the spatial orders among triangles on the 3D shape. A new strategy is thus needed to reasonably sort the elements in the neighbourhood sets. Sorting is to ensure that the elements in each neighbourhood set can be convolved by the same rules, so that the convolution operation can better activate features. For each node, all nodes in its neighbourhood set can be sorted by feature similarity (L2 similarity in the input feature space). Using this method, we can not only determine the order of the convolution operation of each neighbourhood set, but also ensure that the nodes in different sets have the same contribution regularity to their own source nodes in the convolution operation. The final convolution order for each neighbourhood set is shown in Figure 4(c). As shown in Figure 4(b), the execution order of the convolution operation of the neighbourhood set obtained from the source node is determined by calculating the input feature similarity.

3.2 Pooling on Shape

Pooling is the other key operation in the FCN architecture. The pooling operation is utilised to compress the resolution of each feature map (the result of the convolution operation) in the spatial dimensions, leaving the number of feature maps unchanged. Applying a pooling operation across a feature map enables the algorithm to handle a growing number of feature maps and generalises the feature maps by resolution reduction. Common pooling operations are those of taking the average and maximum of receptive fields over the input map Edwards and Xie (2016). We share the same pooling operation on shape fully convolutional networks with the operation mentioned above. However, we need to address a key concern; that is, to determine the pooling operating order of the SFCN on the shape feature map.

Similar to the convolution operation, we cannot directly determine the pooling operation order on SFCN based on spatial relationships among the triangles of the 3D shape. Since the 3D shape has been expressed as a graph structure, we can determine the pooling operation order according to the operation of convolutional neural networks on the graph. In this paper, the pooling operation on the SFCN is computed by adopting the fast pooling of graphs Niepert et al. (2016); Defferrard et al. (2016).

The pooling method for graphs Niepert et al. (2016); Defferrard et al. (2016) coarsens the graph with the Graclus multi-level clustering algorithm Dhillon et al. (2007). Graph coarsening aims to determine the new structure of the graph after pooling. We first present each shape as a graph structure, then we exploit the feature information on the triangles of the shape and the Graclus multi-level clustering algorithm Dhillon et al. (2007) to complete shape coarsening; that is, to determine the new connection relationship of the shape feature map after pooling, which is shown in Figure 5(a).

In the pooling process, traversing the feature map in a certain order according to the size of the receptive field is a key step to complete the calculation, namely, to determine the operation order of pooling. Following the method of pooling for the graph proposed by Defferrard et al. (2016), the vertices of the input graph and its coarsened versions are irregularly arranged after graph coarsening. To define the pooling order, therefore, a balanced binary tree is built by adding fake code to sort the vertices. Lastly, the pooling operation of the graph is completed based on the nodes order and the use of a 1D signal pooling method. After shape coarsening, we apply the same approach in this study to determine the order of pooling operations on the shape fully convolutional networks architecture, as shown in Figure 5(b).

4 Shape Segmentation via Shape Fully Convolution Network

Figure 5: Example of graph coarsening and pooling. (a) Graph coarsening process. Note: the original graph has 9 arbitrarily ordered vertices. For a pooling of size 4, two coarsenings of size 2 are needed. To ensure that in the coarsening process the balanced binary tree can be constructed, we need to add appropriate fake nodes through calculation, which are identified in red. After coarsening, the node order on is still arbitrary, yet it can be manually set. Then backstepping the coarsening process, we can determine the node order of and , and the corresponding relationship between nodes in each layer according to the order of . At that point the arrangement of vertices in permits a regular 1D pooling. (b) Pooling order. The nodes in the first layer (blue and red) represent the node order; the second layer (yellow and red) represents node order; the third layer (purple) represents the node order and the corresponding relationship between nodes in each layer. The red nodes are fake nodes, which are set to 0 in the pooling process, as we carry out maximum pooling.

We design a novel shape segmentation method based on SFCN, analogous to the basic process of image semantic segmentation on FCN Long et al. (2015). Firstly, we extract three categories of commonly used geometric features of the shape as an input for SFCN training and learning. Secondly, based on the shape convolution and pooling operation proposed in Section 3 and the basic process of image semantic segmentation on FCN, we design a lattice structure suitable for 3D shape segmentation. Through training and learning of the network, we produce triangles-to-triangles labelling prediction. Finally, we introduce the optimisation process of shape segmentation.

4.1 Geometric Feature Extraction

Our approach is designed to complete the network training and learning based on some common and effective low-level features. In this paper, therefore, we extract three from the existing commonly used ones as the main features for the network training and learning. The three features include: average geodesic distance (AGD) Hilaga et al. (2001), shape context (SC) Belongie et al. (2002) and spin image (SI) Johnson and Hebert (1999). These features describe the characteristics of each triangle on a shape from multiple perspectives well. We also found in the experiment that these three features are complementary, which will be analyzed in detail in the experimental part.

4.2 Shape Segmentation Networks Structure

As the convolution and pooling operations on a shape are different from those on an image, the FCN architecture originally used in image segmentation cannot be directly applied on the 3D shape. We modify the original FCN architecture according to the convolution and pooling characteristics, so that it can be conducted in shape segmentation.

Our training network is made up of four parts: convolution, pooling, generating and deconvolution layers, as shown in Figure 7. The convolution layer corresponds to a feature extractor that transforms the input shape to multidimensional feature representation. The convolution operation is completed by the method proposed in section 3.1. The pooling layer is used to reduce the feature vector of the convolution layer, and expand its receptive field to integrate feature points in the small neighbourhood into the new ones as output. The pooling operation is completed by the method proposed in section 3.2.

As our convolution and pooling operations are designed for shapes, the original FCN architecture cannot be used directly. Compared with the original FCN’s architecture, the architecture of the SFCN in this paper needs to record every neighbourhood set of each shape that participated in the convolution obtained in Section 3, as well as the convolution and pooling order of each shape. Thus, we add a generating layer in the original FCN architecture, whose diagram of concrete meaning is shown in Figure 6.

Figure 6: Diagram of the generating layer. (a) The pooling order. A red number represents the offset, nodes in the blue circles represent nodes on the graph, nodes in the red circles are fake nodes. (b) Recorded neighbourhood set of each node. The numbers in the table represent the offset of each node after pooling sorting. (c) Data storage of the generating layer proposed in this paper, based on which our method can implement convolution operation by row and pooling operation by column.
Figure 7: The SFCN architecture we designed, mainly including 4 parts: convolution, pooling, generating, and deconvolution layers, which are represented by “gen”, “con”, “pool” and “deconv” respectively. Additionally, “fully” stands for fully connectivity and the numerical values reflect the dimension changes of each layer after calculation. C represents feature dimension, h represents the number of triangles of each shape and n represents the shape segmentation labels.

Firstly, as shown in Figure 6(a), we can calculate the pooling order between nodes of the graph by the shape pooling method proposed in Section 3.2 (these nodes are equivalent to the triangle of the shape). We store the nodes in the calculation order on the generating layer, as shown in Figure 6(c). Figure 6(a) gives the pooling order on a shape, where the figures represent the number of nodes (i.e. triangles).

Secondly, we need to record the neighbourhood sets of each node (i.e. each triangle of the shape) involved in the convolution computation, as shown in Figure 6(b), where we store the neighbourhood sets by reading the offset in Figure 6(a). After the nodes are sorted by column on the generating layer, we record their neighbourhood sets, in which the nodes are sorted according to the convolution order calculated in Section 3.1. As shown in Figure 6(c), each row that is sequenced in the convolution order represents a neighbourhood set of a node, where the figures represent the number of nodes (i.e. triangles). By storing the data in this way, we can achieve the convolution operation by row and the pooling operation by column, as shown in Figure 6(c). Another advantage of such storage is that, after pooling, the new neighbourhood set and the pooling order required by the next convolution layer of each new node can still be obtained and be applied to the next generating layer using the method in Section 3.2.

The deconvolution layer can be used to perform upsampling and densify our graph representation. As mention above, we regard pooling as a graph coarsening by clustering 4 vertices, as shown in Figure7. Conversely, the upsampling operation can be regarded as a reverse process of the graph coarsening. We record how vertices are being matched before and after graph coarsening. Therefore it is easy to reuse the efficient deconvolution implementation based on the image-based FCNLong et al. (2015)

. In this paper, the width of the convolution kernel in the deconvolution layer of the original FCN architecture is changed to 1 and the height is set to the size of the pooling we use, thereby obtaining the deconvolution layer of SFCN. The final output of the layer is a probability map, indicating the probability of each triangle on a shape that belongs to one of the predefined classes.

Based on the FCN architecture proposed by Long et al. (2015), we design an SFCN architecture suitable for shape segmentation. Our convolution has five convolutional layers altogether, with each convolution layer having a generating layer before generating data for the next convolution and pooling, and followed by a pooling layer. Two fully connected layers are augmented at the end to impose class-specific projection. Corresponding to the pooling layer, there are five deconvolution layers, through which we can obtain the prediction probability of each triangle of the shape that belongs to each class. In the prediction process, we used the same skip architecture Long et al. (2015). It can combine segmentation information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations as the original FCN architecture. The specific process is shown in Figure 8

. The prediction probability of each layer can be obtained by adding the results of a deconvolution layer and the corresponding results of the pooling layer after convolution, which also functions as the input for the next deconvolution layer. The number of rows will be repeated 5 times to return to the initial number of triangles, where the value of each channel is the probability of this class, realizing the triangle level prediction. Another difference from the original FCN architecture is that, in order to normalise the data, we add a batch normalisation layer after the convolution operation of the first layer of the original FCN architecture, using the default implementation of the BN in the Caffe

Jia et al. (2014).

Figure 8: The skip architecture for prediction. Our DAG nets learn to combine the coarse, high layer information with the fine, low layer one. We get the prediction results of the final layer by upsampling the score, then upsample again the prediction results combined with the final layer and the pool4 layer. After upsampling 5 times, we obtain the final prediction results, combining the final layer, pool5, pool4, pool3, pool2 and pool1 layer information, which then achieves triangles-to-triangles prediction.

4.3 Shape Segmentation Optimisation

We train these three features separately using the network structure provided in Section 4.2. Given testing shapes, the SFCN produces the corresponding segmentation results under each feature, which can describe the segmentation of 3D shapes from different perspectives. Besides, due to the different input geometry features, there may be some differences among the predicted segmentation results of the same shape. To obtain the final segmentation results, we leverage the complementarity among features and the multi-label graph cuts method Boykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004) to optimise the segmentation results of the testing shape. The final segmentation result is obtained mainly through the optimisation of the following formula.


In this formula, and are labels of triangles and , data item describes the energy consumption of triangle marked as label , and smoothing item describes the energy consumption of neighboring triangles marked using different labels.

The first item of the formula is optimised mainly based on the probability that triangle is marked as label . We predict the shape under the three features respectively, so the same triangle will have its own prediction probability under each feature. In this paper, utilising the feature’s complementarity, we vote the labelling results to get the final prediction probability, and serve its negative logarithm similar to the paper Guo et al. (2015) as the first item of the multi-label graph cut. The second item in the formula smooths the segmentation results mainly through the calculation of the dihedral angle of the triangle and its neighbouring one. In this paper, the dihedral angle multiplied by the side length makes the second item of the formula to complete optimisation. Energy E is minimized by employing Multi-label Graph Cuts OptimisationBoykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004), through which we can obtain more accurate shape segmentation results.

5 Experimental Results and Discussion

Data. In this paper, deep learning is used to segment the shape. Therefore to verify the effectiveness of this method, we first tested 3 existing large datasets from COSEG Wang et al. (2012), including the chair dataset of 400 shapes, the vase dataset of 300 shapes and the alien dataset of 200 shapes. The experiment of the mixed dataset was also carried out on the Princeton Segmentation Benchmark (PSB) Chen et al. (2009) dataset. In addition, to confirm the robustness of the above method, we also selected 11 categories in the PSBChen et al. (2009) dataset for testing, each of them containing 20 shapes. For COSEG datasets, we chose the groundtruth as in the paperWang et al. (2012). For PSB datasets, we chose the groundtruth as in the dissertationKalogerakis et al. (2010).

SFCN Parameters.

We trained by SGD with momentum, using momentum 0.9 and weight decay of 1e-3. We chose ReLU as the activation function. Dropout used in the original classifier nets is included. The per-triangle, unnormalised softmax loss is a natural choice for segmenting shapes of any size, with which we trained our nets.

Computation Time. We used two Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with 12 cores and NVIDIA GeForce TITAN X GPU. In large datasets, when we used 50 shapes for training and the triangles of each shape range from 1000 to 5000, the training would take approximately 120 minutes (including the time to compute the input descriptors), and the test and optimisation of a shape took about 30 seconds.

Results. In the experiment, we used the classification accuracy proposed by Kalogerakis Kalogerakis et al. (2010) and Sidi Sidi et al. (2011) for the quantitative evaluation of our method.

Firstly, to compare with the methods of Guo et al. (2015) in the COSEG’s large dataset, we randomly selected 50 shapes for training from the chair dataset and 20 shapes for training from the vase dateset. We compared our method with three shape segmentation methods Sidi et al. (2011); Kim et al. (2013); Guo et al. (2015). The results are presented in Table 1. It should be noted that the results and data obtained by other methods come from Guo et al. (2015). The table shows that the results obtained by our method outperform the existing methods, thus proving that our method is effective.

TOG [2011] TOG [2013] TOG [2015] Ours
Chair 80.20% 91.20% 92.52% 93.11%
Vase 69.90% 85.60% 88.54% 88.91%

ToG[2011] is short for Sidi et al. [TOG 2011], ToG[2013] is short for Kim

et al.[TOG 2013],ToG[2015] is short for Guo et al.[2015].

Table 1: Labelling Accuracy of Large Datasets.
Figure 9: Results under Large Datasets

Secondly, to verify the effectiveness of our method for large datasets, we randomly selected 25%, 50% and 75% of the shapes for training from each of the three large datasets from COSEG. To verify whether the SFCN prediction accuracy becomes higher as the number of training sets increases, we used the same 25% shapes for testing in each experiment for each large dateset. In other words, the training sets gradually increased, while the test set remained the same. The results of this can be seen in Table 2. Because the work of Xie et al. (2014) carried out a similar experiment under a 75% training condition with their method, we also make a comparison with their results in Table 2. It can be seen from the table that the results obtained by our method perform better than theirs. Furthermore, with the increase of the training set, the classification accuracy of our method grows steadily in the same test set. This shows that with the increase of the training set, both the learning ability and the generalisation ability of the network architecture become stronger, which also proves the effectiveness of the designed network in this paper.

Ours 25% Ours 50% Ours 75% CGF[2014] 75%
Chair 93.43% 94.38% 95.91% 87.09%
Vase 88.04% 90.95% 91.17% 85.86%
Tele-alien 91.02% 92.76% 93.00% 83.23%

CGF[2014] is short for Xie et al.[CGF 2014]

Table 2: Labelling Accuracy of Large Datasets. Here are the results of the same test set for different numbers of training sets.

Thirdly, we visualize the segmentation results obtained by using our method in the three large datasets of COSEG, as shown in Figure 9. All the results shown are optimised and obtained using the 75% training set. The segmentation results appear visually accurate, which proves the effectiveness of this method.

Mixed dataset training and testing were performed as well. We respectively mixed airplane and bird, human and teddy, which are of similar classes. It must be noted that, unlike the method of Guo et al. (2015) which merges similar class labels, we retained these different labels. So the body of the plane and bird have different labels; their wings as well. 12 shapes of each dataset were selected as the training set, and the remaining as the test set. Our approach was repeated three times to compute the average performance, the segmentation results of which are shown in Figure 10. Figure 10(a) is part of the result of the training set while Figure 10(b) is part of the testing set. The classification accuracy of the two datasets is shown in Table 3, which suggests that we can obtain good segmentation results when mixing similar data together. Although the segmentation accuracy is not as high as training, the average is above 90%. Additionally, the basic segmentation is correct according to the visualisation of the results, proving that SPFCN network architecture is powerful in distinguishing features and learning.

Airplane & Bird Human & Teddy
Accuracy 90.04% 92.28%
Table 3: Labelling Accuracy of Mixed Datasets of Similar Shapes

In addition, we mixed glasses and pliers, glasses, pliers, fish and tables of different categories. From the mixed datasets, we selected some data for training, with the remaining used for testing. Here 12 shapes of each dataset were selected as the training set, and the remaining as the test set. We also mixed two large datasets with 400 chairs and 300 vases. We selected 50% of the shapes for training from each dataset and the remaining 50% as the test set. Our approach was repeated three times to compute the average performance, the segmentation results of which are shown in Figure 11 and Figure 1. Figure 11(a) is part of the result of the training set while Figure 11(b) is part of the testing set. The classification accuracy of the three datasets is shown in Table 4, which also indicates that both the segmentation results and classification accuracy achieve impressive performance. In other words, this method can be used to train and predict any shape set, and proves once again that our SFCN architecture has good feature distinguishing ability and learning ability.

Glasses & Plier Chair(400) & Vase(300) Glasses & Plier & Fish & Table
Accuracy 96.53% 87.71% 91.82%
Table 4: Labelling Accuracy of Mixed Datasets of Different Shapes

Lastly, as in the papers of Guo et al. (2015), we separately trained several small datasets of PSB when and (i.e., SB6, SB12), in which is the number of randomly selected shapes in each training process. For each category of dataset, as in the papers of Guo et al. (2015), we repeated our approach five times to compute the average performance. The comparison results with the related methods are presented in Table 5. It should be noted that the results and data obtained by other methods come from Guo et al. (2015). As shown in Table 5, the results of several datasets of PSB obtained by our method perform much better in most categories than the existing methods Kalogerakis et al. (2010); Guo et al. (2015); Xie et al. (2014); Wang et al. (2013); Chang and Lin (2007); Torralba et al. (2007), which proves the effectiveness of the method. On a few individual datasets, such as airplane, chair and table etc., our results do not go beyond those of other methods, yet are very close to the best ones, which also serves to prove that our method is effective. Moreover, the segmentation effect gradually strengthens as the training data increases, which shows that the learning ability of the SFCN architecture is enhanced with the increase of training samples. We also visualize the segmentation results of several datasets of PSB including teddy, pliers and fish etc. on the condition of SB12, the optimised ones of which are shown in Figure 12. Just like the large dataset, the results of the small one are visually accurate, which indicates that our method is feasible.

Figure 10: The Segmentation Results of Mixed Datasets of Similar Shapes. (a) Part of the shapes in the training set; (b) Segmentation results of part of the shapes in the testing set.
Figure 11: The Segmentation Results of Mixed Datasets of Different Shapes. (a) Part of the shapes in the training set; (b) Segmentation results of part of the shapes in the testing set.
SVM SB6 JointBoost SB6 ToG[2010]1 SB6 ToG[2013]2 SB6 ToG[2015]3 SB6 Ours SB6 ToG[2010]1 SB12 ToG[2013]2 SB12 ToG[2015]3 SB12 Ours SB12
Cup 94.11% 93.12% 99.1% 97.5% 99.49% 99.59% 99.60% 99.60% 99.73% 99.74%
Glasses 95.92% 93.59% 96.10% - 96.78% 97.15% 97.20% - 97.60% 97.79%
Airplane 80.43% 91.16% 95.50% - 95.56% 93.90% 96.10% - 96.67% 95.30%
Chair 81.38% 95.67% 97.80% 97.90% 97.93% 97.05% 98.40% 99.60% 98.67% 98.26%
Octopus 97.04% 96.26% 98.60% - 98.61% 98.67% 98.40% - 98.79% 98.80%
Table 90.16% 97.37% 99.10% 99.60% 99.11% 99.25% 99.30% 99.60% 99.55% 99.41%
Teddy 91.01% 85.68% 93.30% - 98.00% 98.04% 98.10% - 98.24% 98.27%
Plier 92.04% 81.55% 94.30% - 95.01% 95.71% 96.20% - 96.22% 96.55%
Fish 87.05% 90.76% 95.60% - 96.22% 95.63% 95.60% - 95.64% 95.76%
Bird 81.49% 81.80% 84.20% - 87.51% 89.03% 87.90% - 88.35% 89.48%
Mech 81.87% 75.73% 88.90% 90.20% 90.56% 91.72% 90.50% 91.30% 95.60% 96.05%
Average 88.41% 89.34% 94.77% 96.30% 95.89% 95.98% 96.12% 97.53% 96.82% 96.85%

ToG[2010] is short for Kalogerakis et al. [2010], ToG[2013] is short for Wang et al. [2013],ToG[2015] is short for Guo et al.[2015].

Table 5: Labelling Accuracy on Princeton Benchmark(SB6/SB12)
Figure 12: More results of our method.

Feature sensibility. In this paper, we combine three features, average geodesic distance (AGD) Hilaga et al. (2001), shape context (SC) Belongie et al. (2002) and spin image (SI) Johnson and Hebert (1999) to produce the shape segmentation. To verify the validity of this combination, we carried out a comparative experiment. The classification accuracy of each dataset of PSB under a single feature, and the one obtained with our method are compared in the condition of SB6, as shown in Figure 13

. It can be seen that the classification accuracy of individual datasets under individual features is higher, but not significantly more so than that of our method. On the contrary, most datasets perform much better under our method. This shows that the features selected in this paper are complementary and the feature combination is effective.

Figure 13: The Comparison of Segmentation Accuracy under Different Features

In the papers of Kalogerakis et al. (2010) and Guo et al. (2015), seven features are combined to perform segmentation experiments. In addition to the three features used in this paper, they also used four other features including curvature (CUR) Kavukcuoglu et al. (2009), PCA feature (PCA) Shapira et al. (2010), shape diameter function (SDF) Liu et al. (2009) and distance from medial surface (DIS) Long et al. (2015). In this paper, several datasets of PSB are randomly selected to perform experiments with the combination of seven features. Under the same experimental conditions, the comparison results with the combination of three features are presented in Figure 14. The experiment tells us that for most datasets, the combination of three features brings higher classification accuracy than seven features, and for the remaining few datasets, the classification accuracy of the two is very close to each other. This indicates that the three features used in this paper can not only better complement, but also are more suitable for the network structure. In addition, we also trained the combining three features in one SFCN network. However, it didn’t produce a higher accuracy result and nearly doubled the training time. So, as a trade-off between performance and efficiency, we empirically trained 3 features separately to perform a segmentation task.

Figure 14: The Comparison between Segmentation Accuracy under Three Features and under Seven Features

Skip architecture. During the SFCN training process, we utilised skip architecture of five layers and integrated the cross layer information with different steps to improve the SFCN segmentation results, and gradually improve the segmentation details. To verify the skip architecture can do mesh labelling learning and improve the segmentation results, we visualised the cross-layer prediction results with different steps. The segmentation results of several shapes crossing one to five layers are shown in Figure 15, where (f) is the groundtruth of the corresponding shape. Through tracking of the network computing process, we find that the network convergence is faster with the increase of the cross layers. In addition, it can be seen from the comparison results of cross layers and groundtruth in Figure 15, that with the increase of cross layers, the classification quality is gradually improved.

Comparison before and after optimisation. In this paper, we use the multi-label graph cut optimisation method Boykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004) to optimise the segmentation results of testing shapes, based on the complementarity between features. The comparison results before and after optimisation of several shapes are shown in Figure 16. As shown in Figure 16(a), the results before optimisation are the final ones obtained by voting on the three different features tested by SFCN architecture. Figure 16(b) represents results after optimisation. As the results before optimisation are predicted in the triangle level, the boundary may be not smooth or there may be prediction errors of individual triangles in some areas. The above problems are well addressed after the optimisation with the multi-label graph cuts optimisation method Boykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004), which proves that optimisation plays a significant role. In addition, the number below each figure is the classification accuracy of the corresponding shape, which shows that optimisation can improve classification accuracy.

Figure 15: The Segmentation Results of Layer Information with Different Steps. (a)- (e) Segmentation results Crossing 1 to 5 layers; (f) Groundtruth.
Figure 16: Comparison Results before and after optimisation. (a) Segmentation results before optimisation; (b) Segmentation results after optimisation. The number below each shape is the segmentation accuracy.

Limitations. Although the proposed method is effective at completing the 3D shape segmentation task, it has some limitations. Firstly, the shapes involved in the calculation must be manifold meshes, for they can easily determine the connection between triangles. Secondly, the designed SFCN architecture has no feature selection function, thus we carry on the separating training for the three features. Thirdly, to obtain better parameter sharing, the SFCN structure may need all training meshes to be the same triangulation granularity. Therefore, in future work, we will seek to improve the SFCN architecture, making it possible for automatic feature selection, and build an end-to-end structure.

6 Conclusions

We design a novel shape fully convolutional network architecture, which can automatically carry out triangles-to-triangles learning and prediction, and complete the segmentation task with high quality. In the SFCN architecture proposed here, similar to convolution and pooling operation on images, we design a novel shape convolution and pooling operation with a 3D shape represented as a graph structure. Moreover, based on the original image segmentation FCN architecture Long et al. (2015), we first design and implement a new generating operation, which functions as a bridge to facilitate the execution of shape convolution and pooling operations directly on 3D shapes. Additionally, accurate and detailed segmentation of 3D shapes is completed through skip architecture. To produce more accurate segmentation results, we optimise the segmentation results obtained by SFCN prediction by utilising the complementarity between the three geometric features and the multi-label graph cut method Boykov et al. (2001); Kolmogorov and Zabih (2004); Boykov and Kolmogorov (2004), which can also improve the local smoothness of triangle labels. The experiments show that the proposed method can not only obtain good segmentation results both in large datasets and small ones with the use of a small number of features, but also outperform some existing state-of-the-art shape segmentation methods. More importantly, our method can effectively learn and predict mixed shape datasets either of similar or of different characters, and achieve excellent segmentation results, which demonstrates that our method has strong generalisation ability.

In the future, we want to strengthen our method to overcome some limitations mentioned above and produce more results based on various challenging datasets, such as ShapeNet. Additionally, we hope that the SFCN architecture proposed can be applied to other shape fields, such as shape synthesis, line drawing extraction and so on.


We would like to thank all anonymous reviewers for their constructive comments. This research has been supported by the National Science Foundation of China (61321491, 61100110, 61272219) and the Science and Technology Program of Jiangsu Province (BY2012190, BY2013072-04).


  • Yu et al. (2004) Yu, Y, Zhou, K, Xu, D, Shi, X, Bao, H, Guo, B, et al. Mesh editing with poisson-based gradient field manipulation. ACM Transactions on Graphics 2004;23(3):644–651.
  • Yang et al. (2013) Yang, Y, Xu, W, Guo, X, Zhou, K, Guo, B. Boundary-aware multi-domain subspace deformation. IEEE Transactions on Visualization & Computer Graphics 2013;19(19):1633–45.
  • Chen et al. (2015) Chen, X, Zhou, B, Lu, F, Tan, P, Bi, L, Tan, P. Garment modeling with a depth camera. ACM Transactions on Graphics 2015;34(6):1–12.
  • Krizhevsky et al. (2012) Krizhevsky, A, Sutskever, I, Hinton, GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 2012;25(2):2012.
  • Szegedy et al. (2015) Szegedy, C, Liu, W, Jia, Y, Sermanet, P, Reed, S, Anguelov, D, et al. Going deeper with convolutions.

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015;:1–9.

  • Simonyan and Zisserman (2014) Simonyan, K, Zisserman, A. Very deep convolutional networks for large-scale image recognition. Computer Science 2014;.
  • Ciresan et al. (2012) Ciresan, D, Giusti, A, Gambardella, LM, Schmidhuber, J.

    Deep neural networks segment neuronal membranes in electron microscopy images.

    In: Advances in neural information processing systems. 2012, p. 2843–2851.
  • Farabet et al. (2013) Farabet, C, Couprie, C, Najman, L, Lecun, Y. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence 2013;35(8):1915–29.
  • Pinheiro and Collobert (2014) Pinheiro, P, Collobert, R. Recurrent convolutional neural networks for scene parsing. ICML 2014;.
  • Long et al. (2015) Long, J, Shelhamer, E, Darrell, T. Fully convolutional networks for semantic segmentation. CVPR 2015;.
  • Edwards and Xie (2016) Edwards, M, Xie, X. Graph based convolutional neural network. BMVC 2016;.
  • Niepert et al. (2016) Niepert, M, Ahmed, M, Kutzkov, K. Learning convolutional neural networks for graphs. ICML 2016;.
  • Defferrard et al. (2016) Defferrard, M, Bresson, X, Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. NIPS 2016;.
  • Boykov et al. (2001) Boykov, Y, Veksler, O, Zabih, R. Efficient approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis & Machine Intelligence 2001;20(12):1222–1239.
  • Kolmogorov and Zabih (2004) Kolmogorov, V, Zabih, R. What energy functions can be minimizedvia graph cuts? Pattern Analysis & Machine Intelligence IEEE Transactions on 2004;26(2):147–59.
  • Boykov and Kolmogorov (2004) Boykov, Y, Kolmogorov, V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis & Machine Intelligence 2004;26(9):1124–37.
  • Kalogerakis et al. (2010) Kalogerakis, E, Hertzmann, A, Singh, K. Learning 3d mesh segmentation and labeling. Acm Transactions on Graphics 2010;29(4):157–166.
  • Guo et al. (2015) Guo, K, Zou, D, Chen, X. 3d mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics 2015;35(1):3.
  • Xie et al. (2014) Xie, Z, Xu, K, Liu, L, Xiong, Y. 3d shape segmentation and labeling via extreme learning machine. Computer Graphics Forum 2014;33(5):85–95.
  • Wang et al. (2012) Wang, Y, Asafiy, S, van Kaickz, O, Zhang, H, Cohen-Or, D, Chen, B. Active co-analysis of a set of shapes. Acm Transactions on Graphics 2012;31(6):157:1–157:10.
  • Xie and Tu (2015) Xie, S, Tu, Z. Holistically-nested edge detection. IEEE International Conference on Computer Vision 2015;:1395–1403.
  • Liu et al. (2015) Liu, F, Shen, C, Lin, G, Reid, I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis & Machine Intelligence 2015;38(10):1–1.
  • Dosovitskiy et al. (2015) Dosovitskiy, A, Fischer, P, Ilg, E, H’́ausser, P, Hazırbaş, C, Golkov, V, et al. Flownet: Learning optical flow with convolutional networks. IEEE International Conference on Computer Vision & (ICCV) 2015;:2758–2766.
  • Simo-Serra et al. (2016) Simo-Serra, E, Iizuka, S, Sasaki, K, Ishikawa, H. Learning to simplify: Fully convolutional networks for rough sketch cleanup. Acm Transactions on Graphics 2016;35(4):1–11.
  • Bruna et al. (2014) Bruna, J, Zaremba, W, Szlam, A, LeCun, Y. Spectral networks and locally connected networks on graphs. ICLA 2014;.
  • Duvenaudy et al. (2015) Duvenaudy, D, Maclauriny, D, Aguilera-Iparraguirre, J, Gomez-Bombarelli, R, Hirzel, T, Aspuru-Guzik, A, et al. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems 2015;:2224–2232.
  • Wu et al. (2015) Wu, Z, Song, S, Khosla, A, Yu, F, Zhang, L, Tang, X, et al. A deep representation for volumetric shapes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015;.
  • Qi et al. (2016) Qi, CR, Su, H, Nießner, M, Dai, A, Yan, M, Guibas, L. Volumetric and multi-view cnns for object classification on 3d data. Computer Vision and Pattern Recognition& (CVPR) 2016;.
  • Kalogerakis et al. (2017) Kalogerakis, E, Averkiou, M, Maji, S, Chaudhuri, S. 3d shape segmentation with projective convolutional networks. Proceedings of the IEEE Computer Vision and Pattern Recognition 2017;.
  • Yi et al. (2017) Yi, L, Su, H, Guo, X, Guibas, L. Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. Proceedings of the IEEE Computer Vision and Pattern Recognition 2017;.
  • Su et al. (2017) Su, H, Qi, C, Mo, K, Guibas, L. Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Computer Vision and Pattern Recognition 2017;.
  • Masci et al. (2015) Masci, J, Boscaini, D, Bronstein, MM, Vandergheynst, P. Geodesic convolutional neural networks on riemannian manifolds. 3DRR 2015;.
  • Boscaini et al. (2016) Boscaini, D, Masci, J, Rodolà, E, Bronstein, MM. Learning shape correspondence with anisotropic convolutional neural networks. NIPS 2016;.
  • Monti et al. (2017) Monti, F, Boscaini, D, Masci, J, Rodolà, E, Svoboda, J, Bronstein, MM. Geometric deep learning on graphs and manifolds using mixture model cnns. CVPR 2017;.
  • Maron et al. (2017) Maron, H, Galun, M, Aigerman, N, Trope, M, Dym, N, Yumer, E, et al. Convolutional neural networks on surfaces via seamless toric covers. SIGGRAPH 2017;.
  • Boscaini et al. (2015) Boscaini, D, Masci, J, Bronstein, MM, Castellani, U. Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Eurographics Symposium on Geometry Processing 2015;34(5):13–23.
  • Litany et al. (2017) Litany, O, Remez, T, Rodolà, E, Bronstein, AM, Bronstein, MM. Deep functional maps: Structured prediction for dense shape correspondence. ICCV 2017;.
  • Bronstein et al. (2017) Bronstein, MM, Bruna, J, LeCun, Y, Szlam, A, Vandergheynst, P. Geometric deep learning: going beyond euclidean data. IEEE Sig Proc Magazine 2017;.
  • Wang et al. (2013) Wang, Y, Gongy, M, Wang, T, Cohen-Or, D, Zhang, H, Chen, B. Projective analysis for 3d shape segmentation. Acm Transactions on Graphics 2013;32(6):1–12.
  • Huang et al. (2011) Huang, Q, Koltun, V, Guibas, L.

    Joint shape segmentation with linear programming.

    ACM Transactions on Graphics(TOG) 2011;30(6):125:1–125:12.
  • Sidi et al. (2011) Sidi, O, van Kaick, O, Kleiman, Y, Zhang, H, Cohen-Or, D.

    Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering.

    In: SIGGRAPH Asia Conference. 2011, p. 1.
  • Hu et al. (2012) Hu, R, Fan, L, Liu, L. Co-segmentation of 3d shapes via subspace clustering. Computer Graphics Forum 2012;31(5):1703–1713.
  • Meng et al. (2013) Meng, M, Xia, J, Luo, J, He, Y. Unsupervised co-segmentation for 3d shapes using iterative multi-label optimization. Computer-Aided Design 2013;45(2):312–320.
  • Xu et al. (2010) Xu, K, Li, H, Zhang, H, Cohen-Or, D, Xiong, Y, Cheng, ZQ. Style-content separation by anisotropic part scales. Acm Transactions on Graphics 2010;29(6):184.
  • Kreavoy et al. (2007) Kreavoy, V, Julius, D, Sheffer, A. Model composition from interchangeable components. Conference on Computer Graphics & Applications 2007;:129–138.
  • Dhillon et al. (2007) Dhillon, IS, Guan, Y, Kulis, B.

    Weighted graph cuts without eigenvectors: A multilevel approach.

    IEEE Trans Pattern Anal Mach Intell 2007;29(11):1944–1957.
  • Hilaga et al. (2001) Hilaga, M, Shinagawa, Y, Kohmura, T, Kunii, TL.

    Topology matching for fully automatic similarity estimation of 3d shapes.

    In: Conference on Computer Graphics and Interactive Techniques. 2001, p. 203–212.
  • Belongie et al. (2002) Belongie, S, Malik, J, Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis & Machine Intelligence 2002;24(4):509–522.
  • Johnson and Hebert (1999) Johnson, AE, Hebert, M. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis & Machine Intelligence 1999;21(5):433–449.
  • Jia et al. (2014) Jia, Y, Shelhamer, E, Donahue, J, Karayev, S, Long, J, Girshick, R, et al. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint 2014;arXiv:1408.5093.
  • Chen et al. (2009) Chen, X, leksey Golovinskiy, , Funkhouser, T. A benchmark for 3d mesh segmentation. Acm Transactions on Graphics 2009;28(3):341–352.
  • Kim et al. (2013) Kim, VG, Li, W, Mitra, NJ, Chaudhuri, S, DiVerdi, S, Funkhouser, T. Learning part-based templates from large collections of 3d shapes. Acm Transactions on Graphics 2013;32(4):1.
  • Chang and Lin (2007) Chang, CC, Lin, CJ.

    Libsvm: A library for support vector machines.

    Acm Transactions on Intelligent Systems & Technology 2007;2(3, article 27):389–396.
  • Torralba et al. (2007) Torralba, A, Murphy, KP, Freeman, WT. Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis & Machine Intelligence 2007;29(5):854–869.
  • Kavukcuoglu et al. (2009) Kavukcuoglu, K, Ranzato, M, Fergus, R, LeCun, Y. Learning invariant features through topographic filter maps. In: IEEE Conference on Computer Vision & Pattern Recognition. 2009, p. 1605–1612.
  • Shapira et al. (2010) Shapira, L, Shalom, S, Shamir, A, Cohen-Or, D, Zhang, H. Contextual part analogies in 3d objects. International Journal of Computer Vision 2010;89(2):309–326.
  • Liu et al. (2009) Liu, R, Zhang, H, Shamir, A, Cohen-Or, D. A part-aware surface metric for shape analysis. Computer Graphics Forum 2009;28(2):397–406.