1 Introduction
3D object processing is an important problem in computer vision, pattern recognition and image processing fields. Processing 3D object is a challenging task due to the following reasons: 1) 3D object has more complex representation compare to 2D image. 3D object usually represented by volumes or point clouds. In contrast, 2D image can be represented simply by pixel grid [11]. 2) Typical size of 3D representation is usually very large [16]. So, the difficulty of processing 3D object is much higher compared to 2D object.
In recent years, substantial effort has been devoted to effectively process 3D object, and numerous methods have been developed. [12, 5, 9] proposed to use multiview representation of point cloud to handle 3D object classification tasks. [1] proposed to encode 3D point cloud using multiview feature maps. [3, 4]
proposed to use featurebased deep convolutional neural networks for shape classification.
[17] proposed a regularization based pruning method for 3D CNN acceleration; [12] proposed to use varying camera extrinsics to extract image features for different views. [6]proposed to build light weight deep neural networks by using depthwise separable convolutions. However, none of these methods achieve onetoone mapping between 3D object and 2D images, so that the computational complexity of processing 3D object can be alleviated but not fundamentally solved.
In this work, we propose to generate onetoone mapping between 3D object and 2D image via spectral layout. By mapping 3D object into 2D image, we can use 2Dbased methods to do 3D tasks. Experimental results demonstrate the effectiveness of our method.
2 Preliminaries
2.1 kNearest Neighbor Graph
The underlying manifold structure has proven to be useful for improving the performance of image processing and computer vision tasks such as shape retrieval and image clustering [8, 15]. Among various manifold representation methods, nearest neighbor (kNN) graph is most widely used due to its superior capability of capturing the local manifold [10]. In nearest neighbor graph, each node is connected to its nearest neighbors. The algorithm is shown in Algorithm 1.
2.2 Spectral Graph Theory
Consider a graph , where is its vertex set and is its edge set, denotes the weight function that assigns positive weights to all of its edges. The elements of its graph Laplacian matrix are given by:
(1) 
Spectral graph theory uses the the eigenvalues and eigenvectors of graph Laplacian matrix to study the properties and structure of a graph. [2]
have shown that second smallest eigenvalue and its corresponding eigenvector can provide a good approximation to the optimal cut. Spectral clustering
[14] makes use of the bottom eigenvectors of the adjacency matrix of the data to detects clusters.2.3 Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have became the most popular tool in machine learning fields. It has powerful ability in the context of learning features and image descriptors
[15]. The architectures of 2D CNNs are simple, consisting of an input layer, hidden layers and an output layer, as shown in in Fig. 1.While it is natural to generalize 2D CNN to 3D, 3D CNNs often have more complex architecture with significantly more feature maps and parameters, making the training process very difficult and prone to overfitting [13].
3 Methods
3.1 3D adjacency graph construction
In 2D space, graphs are often used to capture the relationship of entities to analyze the underlying structure. The most common way is to depict each entity as a node and use nodelink diagram to connect closely related entities. In this work, we extend this representation to 3D space by connecting each voxel in 3D voxel grid with its adjacent voxels to form a 3D graph, as shown in Fig. 2, where the nodes in the graph represent the voxels and the dash lines represent edges between them. Then we calculate the Laplacian matrix corresponding to this 3D adjacency graph.
3.2 Spectral Layout
Spectral layout aims to place nodes in highdimensional space in a two dimensional plane using eigenvectors of a matrix [7].
Using the eigenvectors corresponding to the bottom eigenvalues has proven to significantly improve performance of recognizing complex patterns [14]. In this paper, we first calculate the two eigenvectors corresponding to the second and third smallest eigenvalues of Laplacian matrix of the 3D adjacency graph. Then, we use the entries of these two eigenvectors as the 2D Cartesian coordinates for locating each voxel in a 2D Cartesian coordinate plane, as shown in Fig. 3.
3.3 Aggregate voxel values to form pixel intensities
One of the novelties of our method is that the size of embedded 2D image can be set arbitrarily. On the 2D Cartesian coordinate plane, the start points and end points of the Xaxis and Yaxis are set to the minimum values and maximum values of the entries of eigenvectors corresponding to the second and third smallest eigenvalues, respectively. Then, we divide the interval in subintervals of equal length. The number of subintervals is equal to the desired dimension of the embedded 2D image. Each square on the Cartesian coordinate plane is used as a pixel of the embedded 2D image. If the dimension is small, multiple voxels will be mapped into the same pixel. We simply sum their voxel values to form this pixel’s intensity. The complete algorithm flow has been shown in Algorithm 2.
4 Experiments
4.1 Classification Result
In this paper, we sample 5 categories ( cup, bowl, laptop, lamp, and stool) of the ModelNet40 dataset to perform classification. ModelNet40 is available on Princeton ModelNet data set ^{1}^{1}1https://modelnet.cs.princeton.edu/.
We set the 2D image dimension to
and use the simplest 2D CNN with one convolutional layer, one pooling layer and two dense layers. By training only 2 epochs, we got 91% classification accuracy.
4.2 Embedding Result
We show the embedded 2D images ( dimension) of some categories to demonstrate the mapping quality.
5 Conclusion
In this paper, we use spectral layout the provide very high quality embedding of 3D objects. Our method enables us to use very simple 2D CNNs to process 3D objects with guaranteed solution quality.
6 Acknowledgments
Some explorations of this work were made during 2017 Fall and 2018 Spring semesters when Yongyu Wang was a student at Michigan Technological University. The authors would like to thank Zhuo Feng for his helpful discussions during that period.
References
 [1] (2017) Multiview 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907–1915. Cited by: §1.
 [2] (1997) Spectral graph theory. American Mathematical Soc.. Cited by: §2.2.
 [3] (2017) Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. Cited by: §1.
 [4] (2015) 3d mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics (TOG) 35 (1), pp. 1–12. Cited by: §1.
 [5] (2016) Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695. Cited by: §1.
 [6] (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §1.
 [7] (2005) Drawing graphs by eigenvectors: theory and practice. Computers & Mathematics with Applications 49 (1112), pp. 1867–1888. Cited by: §3.2.
 [8] (2013) Consensus of knns for robust neighborhood selection on graphbased manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1594–1601. Cited by: §2.1.
 [9] (2016) Volumetric and multiview cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656. Cited by: §1.
 [10] (2000) Nonlinear dimensionality reduction by locally linear embedding. science 290 (5500), pp. 2323–2326. Cited by: §2.1.
 [11] (2019) A survey of object classification and detection based on 2d/3d data. arXiv preprint arXiv:1905.12683. Cited by: §1.
 [12] (2015) Multiview convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953. Cited by: §1.
 [13] (2015) Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 4489–4497. Cited by: §2.3.
 [14] (2007) A tutorial on spectral clustering. Statistics and computing 17 (4), pp. 395–416. Cited by: §2.2, §3.2.

[15]
(2020)
Selfsupervised learning of state estimation for manipulating deformable linear objects
. IEEE robotics and automation letters 5 (2), pp. 2372–2379. Cited by: §2.1, §2.3.  [16] (2019) 3D object detection from ct scans using a sliceandfuse approach. Ph.D. Thesis, Doctoral dissertation, Robotics Institute. Cited by: §1.
 [17] (2019) Threedimensional convolutional neural network pruning with regularizationbased method. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4270–4274. Cited by: §1.
Comments
There are no comments yet.