Learning for Active 3D Mapping

08/07/2017 ∙ by Karel Zimmermann, et al. ∙ Czech Technical University in Prague 0

We propose an active 3D mapping method for depth sensors, which allow individual control of depth-measuring rays, such as the newly emerging solid-state lidars. The method simultaneously (i) learns to reconstruct a dense 3D occupancy map from sparse depth measurements, and (ii) optimizes the reactive control of depth-measuring rays. To make the first step towards the online control optimization, we propose a fast prioritized greedy algorithm, which needs to update its cost function in only a small fraction of pos- sible rays. The approximation ratio of the greedy algorithm is derived. An experimental evaluation on the subset of the KITTI dataset demonstrates significant improve- ment in the 3D map accuracy when learning-to-reconstruct from sparse measurements is coupled with the optimization of depth-measuring rays.



There are no comments yet.


page 3

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In contrast to rotating lidars, the SSL uses an optical phased array as a transmitter of depth measuring light pulses. Since the built-in electronics can independently steer pulses of light by shifting its phase as it is projected through the array, the SSL can focus its attention on the parts of the scene important for the current task. Task-driven reactive control steering hundreds of thousands of rays per second using only an on-board computer is a challenging problem, which calls for highly efficient parallelizable algorithms. As a first step towards this goal, we propose an active mapping method for SSL-like sensors, which simultaneously (i) learns to reconstruct a dense 3D voxel-map from sparse depth measurements and (ii) optimize the reactive control of depth-measuring rays, see Figure 1. The proposed method is evaluated on a subset of the KITTI dataset [5], where sparse SSL measurements are artificially synthesized from captured lidar scans, and compared to a state-of-the-art 3D reconstruction approach [3].

Figure 1: Active 3D mapping with Solid State Lidar. Iteratively learned deep convolutional network reconstructs local dense occupancy map from sparse depth measurements. The local map is registered to a global occupancy map, which in turn serves as an input for the optimization of depth-measuring rays along the expected vehicle trajectory. The dense occupancy maps are visualized as isosurfaces. See https://www.youtube.com/watch?v=KNex0zjeGYE

The main contribution of this paper lies in proposing a computationally tractable approach for very high-dimensional active perception task, which couples learning of the 3D reconstruction with the optimization of depth-measuring rays. Unlike other approaches such as active object detection [6] or segmentation [7]

, SSL-like reactive control has significantly higher dimensionality of the state-action space, which makes a direct application of unsupervised reinforcement learning 

[6] prohibitively expensive. Keeping the on-board reactive control in mind, we propose prioritized greedy optimization of depth measuring rays, which in contrast to a naïve greedy algorithm re-evaluates only rays in each iteration. We derive the approximation ratio of the proposed algorithm.

The 3D mapping is handled by an iteratively learned convolution neural network (CNN), as CNNs proved their superior performance in 

[3, 17]

. The iterative learning procedure stems from the fact that both (i) the directions in which the depth should be measured and (ii) the weights of the 3D reconstruction network are unknown. We initialize the learning procedure by selecting depth-measuring rays randomly to learn an initial 3D mapping network which estimates occupancy of each particular voxel. Then, using this network, depth-measuring rays along the expected vehicle trajectory can be planned based on the expected reconstruction (in)accuracy in each voxel. To reduce the training-planning discrepancy, the mapping network is re-learned on optimized sparse measurements and the whole process is iterated until validation error stops decreasing.

2 Previous work

High performance of image-based models is demonstrated in [14], where a CNN pooling results from multiple rendered views outperforms commonly used 3D shape descriptors in object recognition task. Qi  [10] compare several volumetric and multi-view network architectures and propose an anisotropic probing kernel to close the performance gap between the two approaches. Our network architecture uses a similar design principle.

Choy  [3] proposed a unified approach for single and multi-view 3D object reconstruction which employs a recurrent neural architecture. Despite providing competitive results in the object reconstruction domain, the architecture is not suitable for dealing with high-dimensional outputs due to its high memory requirements and would need significant modifications to train with full-resolution maps which we use. We provide a comparison of this method to ours in Sec. 6.2, in a limited setting.

Model-fitting methods such as [13, 15, 11] rely on a manually-annotated dataset of models and assume that objects can be decomposed into a predefined set of parts. Besides that these methods are suited mostly for man-made objects of rigid structure, fitting of the models and their parts to the input points is computationally very expensive; e.g., minutes per input for [13, 15]. Decomposition of the scene into plane primitives as in [8] does not scale well with scene size (quadratically due to candidate pairs) and could not most likely deal with the level of sparsity we encounter.

Geometrical and physical reasoning comprising stability of objects in the scene is used by Zheng  [18] to improve object segmentation and 3D volumetric recovery. Their assumption of objects being aligned with coordinate axes which seems unrealistic in practice. Moreover, it is not clear how to incorporate learned shape priors for complex real-world objects which were shown to be beneficial for many tasks (e.g., in [9]). Firman  [4] use a structured-output regression forest to complete unobserved geometry of tabletop-sized objects. A generative model proposed by Wu  [17]

, termed Deep Belief Network, learns joint probability distribution

of complex 3D shapes across various object categories .

End-to-end learning of stochastic motion control policies for active object and scene categorization is proposed by Jayaraman and Grauman [6]. Their CNN policy successively proposes views to capture with RGB camera to minimize categorization error. The authors use a look-ahead error as an unsupervised regularizer on the classification objective. Andreopoulos  [2] solve the problem of an active search for an object in a 3D environment. While they minimize the classification error of a single yet apriori unknown voxel containing the searched object, we minimize the expected reconstruction error of all voxels. Also, their action space is significantly smaller than ours because they consider only local viewpoint changes at the next position while the SSL planning chooses from tens of thousands of rays over a longer horizon.

3 Overview of the active 3D mapping

We assume that the vehicle follows a known path consisting of discrete positions and a depth measuring device (SSL) can capture at most rays at each position. The set of rays to be captured at position is denoted .

We denote the global ground-truth occupancy map, its estimate, and the map of the sparse measurements. All these map share common global reference frame corresponding to the first position in the path. For each of these maps there are local counterparts , and , respectively. Local maps corresponding to position all share a common reference frame which is aligned with the sensor and captures its local neighborhood. The global ground-truth map is used to synthesize sensor measurements and to generate local ground-truth maps for training.

The active mapping pipeline, consisting of a measure-reconstruct-plan loop, is depicted in Fig. 1 and detailed in Alg. 1.

1:Initialize position and select depth-measuring rays randomly.
2:Measure depth in the directions selected for position and update global sparse measurements and dense reconstruction with these measurements.
3:Obtain local measurements

by interpolating

4:Compute local occupancy confidence using the mapping network .
5:Update global occupancy confidence .
6:Plan depth-measuring rays along the expected vehicle trajectory over horizon given reconstruction .
7:Repeat from line 2 for next position .
Algorithm 1 Active mapping

Neglecting sensor noise, the set of depth-measuring rays obtained from the planning, the measurements , and the resulting reconstruction can all be seen as a deterministic function of mapping parameters and . If we assume that that ground-truth maps come from a probability distribution, both learning of and planning of the depth-measuring rays approximately minimize common objective


where is the weighted logistic loss, and denote the elements of and , respectively, corresponding to voxel . In learning, are used to balance the two classes, empty with and occupied with , and to ignore the voxels with unknown occupancy. We assume independence of measurements and use, for corresponding voxels , additive updates of the occupancy confidence with . denotes the conditional probability of voxel being occupied given measurements and is its current estimate.

4 Learning of 3D mapping network

The learning is defined as approximate minimization of Equation 1. Since (i) the result of planning is not differentiable with respect to and (ii) we want to reduce variability of training data111We introduce a canonical frame by using the local maps instead of the global ones, which helps the mapping network to capture local regularities., we locally approximate the criterion around a point as


by fixing the result of planning in . We also introduce a canonical frame by using the local maps instead of the global ones, which helps the mapping network to capture local regularities. The learning then becomes the following iterative optimization


where minimization in each iteration is tackled by Stochastic Gradient Descent. Learning is summarized in Alg. 


1:Initialize and obtain dataset by running the pipeline with the rays being selected randomly, instead of using the planner.
2:Train the mapping network on to obtain with parameters .
3:Obtain by running Alg. 1 and using for mapping.
4:Set and repeat from line 2 until validation error stops decreasing.
Algorithm 2 Learning of active mapping

Note, that in order to achieve (i) local optimality of the criterion and (ii) statistical consistency of the learning process (i.e., that the training distribution of sparse measurements corresponds to the one obtained by planning), one would have to find a fixed point of Equation 3. Since there are no guarantees that any fixed point exists, we instead iterate the minimization until validation error is decreasing.

Figure 2: Architecture of the mapping network. Top: An example input with sparse measurements, showing only the occupied voxels. Bottom: The corresponding reconstructed dense occupancy confidence after thresholding. Right: Schema of the network architecture, composed from the convolutional layers (denoted conv), linear rectifier units (relu), pooling layers (pool), and upsampling layers (deconv).

The mapping network consists of 6 convolutional layers with kernels followed by linear rectifier units (element-wise

) and, in 2 cases, by max pooling layers with

kernels and stride

, see Fig. 2. In the end, there is an fourfold upsampling layer so that the output has same size as input. The network was implemented in MatConvNet [16].

5 Planning of depth measuring rays

Planning at position searches for a set of rays , which approximately minimizes the expected logistic loss between ground truth map and reconstruction obtained from sparse measurements at the horizon . The result of planning is the set of rays , which will provide measurements for a sparse set of voxels. This set of voxels is referred to as covered by and denoted as . While the mapping network is trained offline on the ground-truth maps, the planning have to search the subset of rays online without any explicit knowledge of the ground-truth occupancy . Since it is not clear how to directly quantify the impact of measuring a subset of voxels on the reconstruction , we introduce simplified reconstruction model , which predicts the loss based on currently available map . This model conservatively assumes that the reconstruction in covered voxels is correct (i.e. ) and reconstruction of not covered voxels does not change (i.e. ). Given this reconstruction model, the expected loss simplifies to:


Since the ground-truth occupancy of voxels is apriori unknown, neither the voxel-wise loss nor the coverage are known. We model the expected loss in voxel as



is the entropy of the Bernoulli distribution with parameter

, denoting the probability of outcome from the possible outcomes

. The vector of concatenated losses

is denoted .

The length of particular rays is also unknown, therefore coverage of voxels by particular rays cannot be determined uniquely. Consequently, we introduce probability that voxel will not be covered by ray . This probability is estimated from currently available map as the product of (i) the probability that the voxels on ray which lie between voxel and the sensor are unoccupied and (ii) the probability that at least one of the following voxels or the voxel itself are occupied. If ray does not intersect voxel , then . The vector of probabilities for ray is denoted . Assuming that rays are independent measurements, the expected loss is modeled as .

The planning searches for the set of subsets of depth-measuring rays for the following positions, which minimize the expected loss, subject to budget constraints


where denotes cardinality of the set .

This is a non-convex combinatorial problem222In our experiments, the number of possible combinations is greater then . which needs to be solved online repeatedly for millions of potential rays. We tried several convex approximations, however the high-dimensional optimization has been extremely time consuming and the improvement with respect to the significantly faster greedy algorithm was negligible. As a consequence of that, we have decided to use the greedy algorithm. We first introduce its simplified version (Alg. 3) and derive its properties, the significantly faster prioritized greedy algorithm (Alg. 4) is explained later.

We denote the list of available rays at position as . At the beginning, the list of all available rays is initialized as follows . Alg. 3 successively builds the set of selected rays . In each iteration the best ray is selected, added into and removed from . The position from which the ray is chosen is denoted . If the budget of is reached, all rays from are removed from .

In order to avoid multiplication of all selected rays at each iteration, we introduce the vector , which keeps voxel loss. Vector is initialized as and whenever ray is selected, voxel losses are updated as follows , where denotes element-wise multiplication.

1:Set of available rays and budget
2: 4.5cm Initialization
4:while  do
5:      4.5cm Add the best ray
7:      4.5cm Update voxel costs
8:      4.5cm Remove from
9:     if  then
10:           4.5cm Close position
11:     end if
12:end while
13:return Set of selected rays
Algorithm 3 Greedy planning

The rest of this section is organized as follows: Section 5.1 shows the upper bound for the approximation ratio of the greedy algorithm. Section 5.2 introduces the prioritized greedy algorithm, which in each iteration needs to re-evaluate the cost function only for a small fraction of rays.

5.1 Approximation ratio of the greedy algorithm

We define the approximation ratio of a minimization algorithm to be , where is the cost function achieved by the algorithm and opt is the optimal value of the cost function. Given , we know that the algorithm provides solution whose value is at most . In this section we derive the upper bound of the approximation ratio of Algorithm 3. Figure 3 shows values of for different number of positions .

The greedy algorithm successively selects rays that reduce the cost function the most. To show how cost function differs from opt, an upper bound on the cost function need to be derived. Let us suppose that in the beginning of an arbitrary iteration we have voxel losses given by vector , the following lemma states that for arbitrary voxel , there always exists a ray , that reduces the cost function to , where is the unknown optimum value of the cost function which is achievable by rays .

Lemma 5.1.

If for some rays then


Proof: We know that there is optimal solution consisting from rays. Without loss of generality we assume that holds for first rays, then


This holds for an arbitrary positive scaling factor , therefore


We sum up inequalities over all voxels


We switch sums in the left hand side of the inequality to obtain addition of terms as follows


Hence, we know that at least one of these terms has to be smaller than or equal to of the right hand side

Especially, if there is only one position, all optimal rays are either already selected or still available. This assumption allows to derive the following upper bound on the cost function of the greedy algorithm after iterations for .

Theorem 5.1.

Upper bound of the greedy algorithm after iterations is


where and is Euler number.

Proof: We prove the upper bound by complete induction. In the beginning no ray is selected, per-voxel loss is and the value of the cost function . Using Lemma 5.1, we know that there exists ray such that , therefore we know that


Greedy algorithm continues by updating the per-voxel loss . In the second iteration there are two possible cases: (i) we have either used the optimal ray in the first iteration, then the situation is better and we know there is rays which achieves optimum, or (ii) we have not selected the optimal ray in the first iteration, therefore we have still rays which achieves the optimum. Since the cost function reduction in the latter case gives the upper bound on the cost function reduction in the former one, we assume that there is still optimal rays available, therefore there exists ray such that


We assume that the following holds


and prove the inequality for . Using the assumption (16) and Lemma 5.1, the following inequalities hold


Since 333 for . and , the upper bound for cost function of the greedy algorithm in th iteration is                  

Theorem 5.1 reveals that the approximation ratio of the greedy algorithm after iterations has following upper bound


We can simply find by considering for each voxel the best rays independently.

So far we have assumed that the greedy algorithm chooses only rays and that all rays are available in all iterations. Since there are positions and the greedy algorithm can choose only rays at each position, some rays may be no longer available when choosing th ray. In the worst case possible, the rays from the most promising position will become unavailable. Since we have not chosen optimal rays we can no longer achieve opt. Nevertheless, we can still choose from rays which achieve a new optimum.

We introduce as the optimum achievable after closing positions. Obviously . Let us assume that, when the first position is closed we cannot lose more than , therefore . Without any additional assumption, could be arbitrarily large. We discuss potential assumptions later. Similarly , and . The following theorem states the upper bound for as a function of .

Theorem 5.2.

Upper bound of the greedy algorithm after iterations is




We start from the result (17) shown in the proof of Theorem 5.1. Since there is rays achieving optimum , the cost function in th iteration is bounded as follows


In the th iteration, there are two possible cases: (i) rays from some position become not available and there is rays available which can achieve a new optimum which is not higher than or (ii) all rays are available and there is still rays which achieve . Noticing that the upper bound is increasing in and , we can cover both cases by considering there is still rays which achieves , therefore


We can now continue up to the iteration in which the upper bound is as follows


For th iteration the situation is similar as for th iteration. In order to cover both cases, we consider that there is rays which achieves and continue up to the th iteration, which yields the following upper bound


Finally after iterations the upper bound is


The last inequality stems from the fact that and that . ∎

Finally we derive the upper bound of the approximation ratio .

Theorem 5.3.

Upper bound of the approximation ratio is


where is lower bound of the opt.



The approximation ratio depends on the opt, if then , if then . If we make an assumption that each position covers only fraction of voxels, then . Figure 3 shows values of for different ratios of for this case.

Figure 3: as a function of ratios with .

5.2 Prioritized greedy planning

In practice we observed a significant speed up of the greedy planning (Alg. 3) by imposing prioritized search for . Namely, let us denote the decrease of the expected reconstruction error achieved by selecting ray in iteration , and show that it is non-increasing. For and it follows that . Summing the inequalities for all voxels , we get


for an arbitrary ray selected in iteration . Note that for any .

Now, when we search for maximizing in decreasing order of , , we can stop once for the next ray because none of the remaining rays can be better than . Moreover, we can take advantage of the fact that all the remaining rays including remained sorted when updating the priority for the next iteration. The proposed planning is detailed in Alg. 4.

The number of re-evaluations of in Alg. 4 was approximately smaller than in Alg. 3. Despite the sorting took about a of the computation time, the prioritized planning was about faster and took on average using a single-threaded implementation.

1:1cm Set of rays at positions , budget voxel costs probability vectors mapping from ray to position
2: 4.5cm No rays selected
3: 4.5cm Force recompute
4: 4.5cm Sequence of ray indices, denotes the th element in the sequence, the subsequence from the th to the th element.
5:while  do
6:     for  do
8:          if  then
9:               break
10:          end if
11:     end for
12:     Sort subsequence s.t.
13:     Merge sorted subsequences and
15:      4.5cm Add the best ray
16:      4.5cm Update voxel costs
17:     if  then
18:           4.5cm Close position
19:     else
20:           4.5cm Remove from
21:     end if
22:end while
23:return Selected rays at every position
Algorithm 4 Prioritized greedy planning

6 Experiments


All experiments were conducted on selected sequences from categories City and Residential from the KITTI dataset [5]. We first brought the point clouds (captured by the Velodyne HDL-64E laser scanner) to a common reference frame using the localization data from the inertial navigation system (OXTS RT 3003 GPS/IMU) and created the ground-truth voxel maps from these. The voxels traced from the sensor origin towards each measured point were updated as empty except for the voxels incident with any of the end points which were updated as occupied for each incident end point. The dynamic objects were mostly removed in the process since the voxels belonging to these objects were also many times updated as empty while moving. All maps used axis-aligned voxels of edge size .

For generating the sparse measurements, we consider an SSL sensor with the field of view of horizontally and vertically discretized in directions. At each position, we select rays and ray-trace in these directions until an occupied voxel is hit or the maximum distance of is reached. Only the rays which end up hitting an occupied voxel produce valid measurements, as is the case with the time-of-flight sensors. Local maps and contain volume of discretized into voxels.

6.1 Active 3D mapping

In this experiment, we used and sequences from the Residential category for training and validation, respectively, and sequences from the City category for testing. We evaluate the iterative planning-learning procedure described in Sec. 4. For learning the mapping networks, we used learning rate

based on epoch number

, batch size , and momentum . Networks were trained for epochs.

The ROC curves shown in Fig. 4 (left) are computed using ground-truth maps and predicted global occupancy maps . The performance of the network (denoted Coupled) significantly outperforms the network (Random), which shows the benefit of the proposed iterative planning-mapping procedure. Examples of reconstructed global occupancy maps are shown in Fig. 5. Note that the valid measurements covered around of the input voxels.

Figure 4: ROC curves of occupancy prediction from active 3D mapping on test sets. Left: Random denotes the global occupancy obtained by using with random sparse measurements, Coupled the occupancy obtained by using with the prioritized greedy planning. The voxels which are more than from what could possibly be measured are excluded, together with the false positives which can be attributed to discretization error (in 1-voxel distance from an occupied voxel). Right: Random denotes the local occupancy maps obtained by using , Coupled the maps obtained by using , and Res3D-GRU-3 denotes the reconstruction obtained by the network adapted from [3].
Figure 5: Examples of global map reconstruction. Top: Sparse measurement maps . Middle: Reconstructed occupancy maps in form of isosurface. Bottom: Ground-truth maps . The black line denotes trajectory of the car.

6.2 Comparison to a recurrent image-based architecture

We provide a comparison with the image-based reconstruction method of Choy  [3]. Namely, we modify their residual Res3D-GRU-3 network to use sparse depth maps of size instead of RGB images. The sensor pose corresponding to the last received depth map was used for reconstruction. The number of views were fixed to , with randomly selected depth-measuring rays in each image. For this experiment, we used 20 sequences from the Residential category—18 for training, 1 for validation and 1 for testing. Since the Res3D-GRU-3 architecture is not suited for high-dimensional outputs due to its high memory requirements, we limit the batch size to and the size of the maps to , which corresponds to recurrent units. Our mapping network was trained and tested on voxel maps instead of depth images.

The corresponding ROC curves, computed from local maps and , are shown in Fig. 4 (right). Both and networks outperforms the Res3D-GRU-3 network. We attribute this result mostly to the fact that our method is implicitly provided the known trajectory, while the Res3D-GRU-3 network is not. Another reason may be the ray-voxel mapping which is also known implicitly in our case, compared to [3].

7 Conclusions

We have proposed a computationally tractable approach for the very high-dimensional active perception task. The proposed 3D-reconstruction CNN outperforms a state-of-the-art approach by in recall, and it is shown that when learning is coupled with planning, recall increases by additional on the same false positive rate. The proposed prioritized greedy planning algorithm seems to be a promising direction with respect to on-board reactive control since it is about faster and requires only of ray evaluations compared to a naïve greedy solution.


The research leading to these results has received funding from the European Union under grant agreement FP7-ICT-609763 TRADR and No. 692455 Enable-S3, from the Czech Science Foundation under Project 17-08842S, and from the Grant Agency of the CTU in Prague under Project SGS16/161/OHK3/2T/13.


  • [1] E. Ackerman. Quanergy announces $250 solid-state LIDAR for cars, robots, and more. In IEEE Spectrum, January 2016.
  • [2] A. Andreopoulos, S. Hasler, H. Wersing, H. Janssen, J. K. Tsotsos, and E. Körner. Active 3D object localization using a humanoid robot. IEEE Transactions on Robotics, 27(1):47–64, 2011.
  • [3] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In Computer Vision – ECCV 2016: 14th European Conference on, pages 628–644, Cham, 2016. Springer International Publishing.
  • [4] M. Firman, O. M. Aodha, S. Julier, and G. J. Brostow. Structured prediction of unobserved voxels from a single depth image. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , pages 5431–5440, June 2016.
  • [5] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11):1231–1237, Sept. 2013.
  • [6] D. Jayaraman and K. Grauman. Look-ahead before you leap: End-to-end active recognition by forecasting the effect of motion. In Computer Vision – ECCV 2016: 14th European Conference on, pages 489–505. Springer International Publishing, Cham, 2016.
  • [7] A. K. Mishra, Y. Aloimonos, L. F. Cheong, and A. Kassim. Active visual segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):639–653, April 2012.
  • [8] A. Monszpart, N. Mellado, G. J. Brostow, and N. J. Mitra. RAPter: Rebuilding man-made scenes with regular arrangements of planes. ACM Trans. Graph., 34(4):103:1–103:12, July 2015.
  • [9] D. T. Nguyen, B. S. Hua, M. K. Tran, Q. H. Pham, and S. K. Yeung. A field model for repairing 3d shapes. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5676–5684, June 2016.
  • [10] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view CNNs for object classification on 3D data. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5648–5656, June 2016.
  • [11] J. Rock, T. Gupta, J. Thorsen, J. Gwak, D. Shin, and D. Hoiem. Completing 3d object shape from one depth image. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2484–2493, June 2015.
  • [12] H. Seifa and X. Hub. Autonomous driving in the iCity—HD maps as a key challenge of the automotive industry. Autonomous Robots, 2(2):159–162, 2016.
  • [13] C.-H. Shen, H. Fu, K. Chen, and S.-M. Hu. Structure recovery by part assembly. ACM Trans. Graph., 31(6):180:1–180:11, Nov. 2012.
  • [14] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3D shape recognition. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 945–953, Dec 2015.
  • [15] M. Sung, V. G. Kim, R. Angst, and L. Guibas. Data-driven structural priors for shape completion. ACM Trans. Graph., 34(6):175:1–175:11, Oct. 2015.
  • [16] A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pages 689–692, New York, NY, USA, 2015. ACM.
  • [17] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3D ShapeNets: A deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, June 2015.
  • [18] B. Zheng, Y. Zhao, J. C. Yu, K. Ikeuchi, and S. C. Zhu.

    Beyond point clouds: Scene understanding by reasoning geometry and physics.

    In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 3127–3134, June 2013.