Accelerate 3D Object Processing via Spectral Layout

by   Yongyu Wang, et al.

3D image processing is an important problem in computer vision and pattern recognition fields. Compared with 2D image processing, its computation difficulty and cost are much higher due to the extra dimension. To fundamentally address this problem, we propose to embed the essential information in a 3D object into 2D space via spectral layout. Specifically, we construct a 3D adjacency graph to capture spatial structure of the 3D voxel grid. Then we calculate the eigenvectors corresponding to the second and third smallest eigenvalues of its graph Laplacian and perform spectral layout to map each voxel into a pixel in 2D Cartesian coordinate plane. The proposed method can achieve high quality 2D representations for 3D objects, which enables to use 2D-based methods to process 3D objects. The experimental results demonstrate the effectiveness and efficiency of our method.



There are no comments yet.


page 3

page 4


3D Object Segmentation for Shelf Bin Picking by Humanoid with Deep Learning and Occupancy Voxel Grid Map

Picking objects in a narrow space such as shelf bins is an important tas...

Multi-Resolution 3D Convolutional Neural Networks for Object Recognition

Learning from 3D Data is a fascinating idea which is well explored and s...

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network

Linear layouts are a graph visualization method that can be used to capt...

Spatial Priming for Detecting Human-Object Interactions

The relative spatial layout of a human and an object is an important cue...

Quaternion-based dynamic mode decomposition for background modeling in color videos

Scene Background Initialization (SBI) is one of the challenging problems...

Spectral Filter Tracking

Visual object tracking is a challenging computer vision task with numero...

Go With the Flow, on Jupiter and Snow. Coherence From Video Data without Trajectories

Viewing a data set such as the clouds of Jupiter, coherence is readily a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

3D object processing is an important problem in computer vision, pattern recognition and image processing fields. Processing 3D object is a challenging task due to the following reasons: 1) 3D object has more complex representation compare to 2D image. 3D object usually represented by volumes or point clouds. In contrast, 2D image can be represented simply by pixel grid [11]. 2) Typical size of 3D representation is usually very large [16]. So, the difficulty of processing 3D object is much higher compared to 2D object.

In recent years, substantial effort has been devoted to effectively process 3D object, and numerous methods have been developed. [12, 5, 9] proposed to use multi-view representation of point cloud to handle 3D object classification tasks. [1] proposed to encode 3D point cloud using multi-view feature maps. [3, 4]

proposed to use feature-based deep convolutional neural networks for shape classification.

[17] proposed a regularization based pruning method for 3D CNN acceleration; [12] proposed to use varying camera extrinsics to extract image features for different views. [6]

proposed to build light weight deep neural networks by using depth-wise separable convolutions. However, none of these methods achieve one-to-one mapping between 3D object and 2D images, so that the computational complexity of processing 3D object can be alleviated but not fundamentally solved.

In this work, we propose to generate one-to-one mapping between 3D object and 2D image via spectral layout. By mapping 3D object into 2D image, we can use 2D-based methods to do 3D tasks. Experimental results demonstrate the effectiveness of our method.

2 Preliminaries

2.1 k-Nearest Neighbor Graph

The underlying manifold structure has proven to be useful for improving the performance of image processing and computer vision tasks such as shape retrieval and image clustering [8, 15]. Among various manifold representation methods, -nearest neighbor (k-NN) graph is most widely used due to its superior capability of capturing the local manifold [10]. In -nearest neighbor graph, each node is connected to its nearest neighbors. The algorithm is shown in Algorithm 1.

Input: Data samples , neighborhood size .
Output: Graph .

1:  for each data sample  do
2:     Compute distances between and all the other data samples;
3:     Sort the computed distances;
4:     Connect with its nearest data samples;
5:  end for
Algorithm 1 -NN graph construction

2.2 Spectral Graph Theory

Consider a graph , where is its vertex set and is its edge set, denotes the weight function that assigns positive weights to all of its edges. The elements of its graph Laplacian matrix are given by:


Spectral graph theory uses the the eigenvalues and eigenvectors of graph Laplacian matrix to study the properties and structure of a graph. [2]

have shown that second smallest eigenvalue and its corresponding eigenvector can provide a good approximation to the optimal cut. Spectral clustering

[14] makes use of the bottom eigenvectors of the adjacency matrix of the data to detects clusters.

2.3 Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have became the most popular tool in machine learning fields. It has powerful ability in the context of learning features and image descriptors

[15]. The architectures of 2D CNNs are simple, consisting of an input layer, hidden layers and an output layer, as shown in in Fig. 1.

Figure 1: architecture of LeNet-5 CNN.

While it is natural to generalize 2D CNN to 3D, 3D CNNs often have more complex architecture with significantly more feature maps and parameters, making the training process very difficult and prone to over-fitting [13].

3 Methods

3.1 3D adjacency graph construction

In 2D space, graphs are often used to capture the relationship of entities to analyze the underlying structure. The most common way is to depict each entity as a node and use node-link diagram to connect closely related entities. In this work, we extend this representation to 3D space by connecting each voxel in 3D voxel grid with its adjacent voxels to form a 3D graph, as shown in Fig. 2, where the nodes in the graph represent the voxels and the dash lines represent edges between them. Then we calculate the Laplacian matrix corresponding to this 3D adjacency graph.

Figure 2: 3D adjacency graph.

3.2 Spectral Layout

Spectral layout aims to place nodes in high-dimensional space in a two dimensional plane using eigenvectors of a matrix [7].

Using the eigenvectors corresponding to the bottom eigenvalues has proven to significantly improve performance of recognizing complex patterns [14]. In this paper, we first calculate the two eigenvectors corresponding to the second and third smallest eigenvalues of Laplacian matrix of the 3D adjacency graph. Then, we use the entries of these two eigenvectors as the 2D Cartesian coordinates for locating each voxel in a 2D Cartesian coordinate plane, as shown in Fig. 3.

(a) 3D adjacency graph
(b) 2D grid
Figure 3: Mapping from 3D space to 2D plance

3.3 Aggregate voxel values to form pixel intensities

One of the novelties of our method is that the size of embedded 2D image can be set arbitrarily. On the 2D Cartesian coordinate plane, the start points and end points of the X-axis and Y-axis are set to the minimum values and maximum values of the entries of eigenvectors corresponding to the second and third smallest eigenvalues, respectively. Then, we divide the interval in subintervals of equal length. The number of subintervals is equal to the desired dimension of the embedded 2D image. Each square on the Cartesian coordinate plane is used as a pixel of the embedded 2D image. If the dimension is small, multiple voxels will be mapped into the same pixel. We simply sum their voxel values to form this pixel’s intensity. The complete algorithm flow has been shown in Algorithm 2.

Input: A 3D object and desired 2D image dimension .
Output: A 2D image.

1:  Construct a adjacency graph to represent the spatial structure of a 3D voxel grid ;
2:  Compute the Laplacian matrix corresponding to graph ;
3:  Calculate the eigenvectors and of ;
4:  Perform spectral layout using and to map the 3D grid in to 2D plane ;
5:  Partition the 2D Cartesian coordinate plane into squares.
6:  Map each voxel into a square on 2D Cartesian coordinate plane based on its entries of and .
7:  Use each square on 2D Cartesian coordinate plane as a pixel of 2D image and sum the voxels in it to form its pixel intensity.
Algorithm 2 Spectral layout-based 3D object processing

4 Experiments

4.1 Classification Result

In this paper, we sample 5 categories ( cup, bowl, laptop, lamp, and stool) of the ModelNet40 dataset to perform classification. ModelNet40 is available on Princeton ModelNet data set 111

We set the 2D image dimension to

and use the simplest 2D CNN with one convolutional layer, one pooling layer and two dense layers. By training only 2 epochs, we got 91% classification accuracy.

4.2 Embedding Result

We show the embedded 2D images ( dimension) of some categories to demonstrate the mapping quality.

Figure 4: embedded 2D table.
Figure 5: embedded 2D stool.
Figure 6: embedded 2D bed.
Figure 7: embedded 2D lamp.
Figure 8: embedded 2D bowl.
Figure 9: embedded 2D laptop.
Figure 10: embedded 2D toilet.
Figure 11: embedded 2D guitar.

5 Conclusion

In this paper, we use spectral layout the provide very high quality embedding of 3D objects. Our method enables us to use very simple 2D CNNs to process 3D objects with guaranteed solution quality.

6 Acknowledgments

Some explorations of this work were made during 2017 Fall and 2018 Spring semesters when Yongyu Wang was a student at Michigan Technological University. The authors would like to thank Zhuo Feng for his helpful discussions during that period.


  • [1] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia (2017) Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915. Cited by: §1.
  • [2] F. R. Chung and F. C. Graham (1997) Spectral graph theory. American Mathematical Soc.. Cited by: §2.2.
  • [3] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. Cited by: §1.
  • [4] K. Guo, D. Zou, and X. Chen (2015) 3d mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics (TOG) 35 (1), pp. 1–12. Cited by: §1.
  • [5] V. Hegde and R. Zadeh (2016) Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695. Cited by: §1.
  • [6] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §1.
  • [7] Y. Koren (2005) Drawing graphs by eigenvectors: theory and practice. Computers & Mathematics with Applications 49 (11-12), pp. 1867–1888. Cited by: §3.2.
  • [8] V. Premachandran and R. Kakarala (2013) Consensus of k-nns for robust neighborhood selection on graph-based manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1594–1601. Cited by: §2.1.
  • [9] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas (2016) Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656. Cited by: §1.
  • [10] S. T. Roweis and L. K. Saul (2000) Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. Cited by: §2.1.
  • [11] X. Shen (2019) A survey of object classification and detection based on 2d/3d data. arXiv preprint arXiv:1905.12683. Cited by: §1.
  • [12] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller (2015) Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953. Cited by: §1.
  • [13] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 4489–4497. Cited by: §2.3.
  • [14] U. Von Luxburg (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §2.2, §3.2.
  • [15] M. Yan, Y. Zhu, N. Jin, and J. Bohg (2020)

    Self-supervised learning of state estimation for manipulating deformable linear objects

    IEEE robotics and automation letters 5 (2), pp. 2372–2379. Cited by: §2.1, §2.3.
  • [16] A. Yang (2019) 3D object detection from ct scans using a slice-and-fuse approach. Ph.D. Thesis, Doctoral dissertation, Robotics Institute. Cited by: §1.
  • [17] Y. Zhang, H. Wang, Y. Luo, L. Yu, H. Hu, H. Shan, and T. Q. Quek (2019) Three-dimensional convolutional neural network pruning with regularization-based method. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4270–4274. Cited by: §1.