Unpaired Point Cloud Completion on Real Scans using Adversarial Training

03/29/2019 ∙ by Xuelin Chen, et al. ∙ 0

As 3D scanning solutions become increasingly popular, several deep learning setups have been developed geared towards that task of scan completion, i.e., plausibly filling in regions there were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 4

page 5

page 7

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Robust, efficient, and scalable solutions now exist for easily scanning large environments and workspaces [6, 3]. The resultant scans, however, are often partial and have to be completed (i.e., missing parts have to be hallucinated and filled in) before they can be used in downstream applications, e.g., virtual walk-through, path planning, etc.

The most popular data-driven scan completion methods rely on paired supervision data, i.e., for each incomplete training scan, a corresponding complete data (e.g., voxels, point sets, etc.) is required. One way to establish such a shape completion network is then to train a suitably designed encoder-decoder architecture [8, 7]. The required

Figure 1: We present a point-based shape completion network that can be directly used on raw scans without requiring paired training data. Here we show a sampling of results from the ScanNet [6], Matterport [3], KITTI [9], and 3D-EPN [8] datasets.

paired training data is obtained by virtually scanning 3D objects (e.g., SunCG [25], ShapeNet [4] datasets) to simulate occlusion effects. Such approaches, however, is unsuited for real scans where large volumes of paired supervision data remain difficult to collect. To the best of our knowledge, no point-based unpaired method exists that learns a mapping to translate noisy and incomplete point cloud from raw scans to clean and complete point sets.

In this paper, we propose an unpaired point-based scan completion method that can be trained without requiring explicit correspondence between partial point sets (e.g., raw scans) and example complete shape models (e.g., synthetic models). Note that the network does not require explicit examples of real complete scans and hence existing (unpaired) large-scale real 3D scan (e.g., [6, 3]) and virtual 3D object repositories (e.g., [25, 4]) can directly be leveraged as training data. Figure 1 shows some example scan completion produced by our framework.

We achieve this by designing a generative adversarial network (GAN) wherein a generator, i.e., an adaptation network, transforms the input into a suitable latent representation such that a discriminator cannot differentiate between the transformed latent variables and the latent variables obtained from training data (i.e., complete shape models). Intuitively, the generator is responsible for the key task of mapping raw partial point sets into clean and complete point sets, and the process is regularized by working in two different latent spaces that has separately learned manifolds of scanned and synthetic object data.

We demonstrate our method on several publicly available real world scan datasets namely (i) ScanNet [6] chairs and tables; (ii) Matterport [3] chairs and tables; and (iii) KITTI [9] cars. In absence of suitable ground truth, we cannot directly compute accuracy for the completed scans. Hence, in order to quantitatively evaluate the performance of the network, we report numbers on a synthetic benchmark dataset [8] where completed versions are available to measure accuracy of completions. Finally, we compare our method against baseline methods to demonstrate the advantages of the proposed unpaired scan completion framework.

2 Related Work

Deep Learning on Point clouds

Our method is built upon recent advances in deep neural networks for point clouds. PointNet 

[22] is the pioneer work in this field. For a input point set, it uses point-wise MLP layers, followed by a symmetric and permutation-invariant function to aggregate the information of the whole shape and obtain a compact global feature, which can then be for multiple tasks (e.g., classification and segmentation). Although many alternatives to PointNet have been proposed [27, 17, 23, 16, 37] to achieve higher performance, the simplicity and effectiveness of PointNet and its extension PointNet++ make it popular for many other tasks [34, 33, 35, 11].

Recently, [1]

showed that their autoencoder network, which is derived from PointNet, is able to learn compact representations of point cloudss. The author also performed a study of different generative models on point clouds to show the significant improvement of training GAN in the fixed latent space to generate point cloud from a random vector, over training on raw point clouds. Inspired by this work, we design a GAN to translate between two different latent spaces to perform unpaired shape completion.

Shape Completion

With the advances of deep neural networks, many learning-based approaches have been proposed to address the shape completion task. Inspired by CNN based 2D image completion tasks, voxels along with associated 3D convolutional neural networks have been widely adopted for shape completion task in 3D 

[7, 8, 24, 12, 28, 31, 29]. As quantizing the shape to voxel grids will lead to geometric information loss, several recent approaches [36, 35, 1] that operate directly on point sets have been proposed for filling the missing part of the point sets.

However, these works require paired supervision, i.e., partial-complete paired data as input for training a parameterized model (usually a deep neural network) that directly maps the partial input to its ground truth shape. Since ground truth of real-world data is often unavailable, training and testing paired data is usually generated by a virtual synthesis procedure, and tested on synthetic datasets.

Realizing the gap between synthetically-generated data and real-world data, researchers have proposed to directly work on real-world data [26]

. They also work in a latent space created for clean and complete data, but measure reconstruction loss using a maximum likelihood estimator. Instead, we propose two latent spaces, and use a GAN setup to learn the translation between the two spaces. Defining the reconstruction Hausdorff loss on point clouds allows us to directly work with point sets, instead of having to voxelize the input.

Figure 2: Unpaired Scan Completion Network.

Generative Adversarial Network

GAN [10] has been more and more popular and introduced into many other tasks since it was proposed. In 2D image domain, researchers have extended the adversarial training to recover richer information from low-resolution images or corrupted images [14, 30, 18, 20, 2, 32, 13]. In 3D context, [31, 29], where the authors combine 3D-CNN and generative adversarial training to complete voxel-based shapes under the supervision of ground truth data.

3 Method

Given a noisy and partial point set as input, our goal is to lift the input to produce clean and complete point set as output. (Note that there is no explicit correspondence between the point sets and – the clean point set being complete is typically sparser than the partial input.) The challenge is to perform this lifting without paired training data. We achieve this by learning two separate point set manifolds, for the scanned inputs and for clean and complete shapes. Solving the shape completion problem then amounts to learning a mapping between the two latent spaces. We train a generator to perform the mapping. In absence of paired training data, we score the generated output, i.e., the learned mapping by setting up a min-max game where the generator is trained to fool a discriminator , whose goal is to differentiate between encoded clean and complete shapes, and mapped encodings of the raw/partial inputs. Figure 2 shows the setup of the proposed scan completion network. The latent space encoder/decoders, the mapping generator, and the discriminator are all trained as detailed next.

3.1 Learning latent spaces for point sets

First, we train two autoencoders (AE) that operate directly on the noisy-partial and clean-complete point sets, respectively. We work directly on the point sets instead of quantizing them to voxel grids or signed distance fields. For the noisy-partial point sets, we learn an encoder network that maps from the original parameter space , defined by the concatenation of the coordinates of the points in (2048 in all our tests) to a lower-dimensional latent space . A decoder network performs the inverse transformation back to giving us a reconstructed point set with points. Both networks are trained with reconstruction loss,

(1)

where denotes point set samples drawn from the set of noisy and partial point sets, is the Earth mover’s Distance (EMD) between point sets , and are the parameters of the encoder and decoder networks, respectively. Once trained, the weights of both networks are held fixed and the latent code for a raw input pointset provides a more suitable representation for subsequent training and implicitly captures the manifold of noisy-partial scans. The encoder and decoder blocks consist of MLPs and their details are provided in the supplementary. The architecture of the encoder and decoder is similar to PointNet [1, 22]: using a 5-layer MLP to lift individual points to a deeper feature space, followed by a symmetric function to maintain permutation invariance. This results in a -dimensional latent code that describes the entire point cloud (=128 in all our experiments). The decoder simply transforms the latent vector using 3 fully connected layers, the first two having ReLUs, to reconstruct a x 3 point cloud.

Similarly, for point sets coming from clean and complete point sets , we train another encoder and decoder pair that provides a latent parameterization for the clean and complete point sets. Analogous to Eq. 1, the reconstruction loss is defined as,

(2)

where denotes point set samples drawn from clean and complete point sets.

As demonstrated in the context of images [15], an encoder-decoder network implicitly performs denoising when trained on noisy images by regressing to the mean image as the low dimensionality of the latent space prevents it from modeling the high entropy noise. The idea naturally extends to point sets, and such an autoencoder can be used to denoise a noisy scan without requiring paired input, i.e., the denoised point set is given by . As shown in Figure 3, such an AE works well for denoising complete scans, but cannot be trained to perform scan completion, which is our goal. Hence, as described next, we propose a GAN setup to learn a mapping between the latent spaces of partial and complete scans, i.e., .

Figure 3: Similar to image-domain methods [15], a point-based autoencoder (AE), when trained to encode-decode noisy but complete scans, can be effectively used for denoising noisy point clouds (top). However, such an AE by itself cannot do scan completion (bottom), as is our goal.

3.2 Learning a mapping between latent spaces

We setup a min-max game between a generator and a discriminator to perform the mapping between the latent spaces. The generator is trained to perform the mapping such that the discriminator fails to reliably tell if the latent variable comes from or the remapped . The generator and discriminator architecture details can be found in the supplementary material.

The latent representation of a noisy and partial scan is mapped by the generator to . Then, the task of the discriminator is to distinguish between latent representations and .

We train the trajectory update network using an adversarial loss [10]. Given training examples of clean latent variables and remapped-noisy latent variables , we seek to optimize the following adversarial loss over the mapping generator and a discriminator architecture ,

(3)

In our experiments, we found the least square GAN [19] to be easier to train and hence used the discriminator and generator losses as,

(4)
(5)

The above setup encourages the generator to perform the mapping resulting in to be a clean and complete point cloud . However, the generator is free to map the a noisy latent vector to any point on the manifold of valid shapes in , including shapes that are far from the original partial scan . As shown in Figure 4, the result is a complete and clean point cloud that is not similar in shape to the partial scanned input. To prevent this, we add a reconstruction term to the generator loss:

(6)

where denotes the Hausdorff distance between two point sets and are the trade-off parameters. Unless specified, we use in all our experiments.

Figure 4: Effect of unpaired scan completion without (Equation 5) and with Hausdorff Loss (Equation 6). Without the HL loss term, the network produces a clean point set for a complete chair, that is different in shape from the input. With the HL loss term, the network produces a clean point set that matches the input.
Figure 5: Qualitative comparisons on 3D-EPN dataset.

4 Experimental Evaluation

In this section, we present quantitative and qualitative experimental results with several noisy and partial datasets as input. First, to evaluate our method, we present our results on 3D-EPN dataset [8] which contains ground truth data for evaluation, and show comparisons to several baseline methods. Second, since the 3D-EPN dataset lacks of control of the point cloud incompleteness, we derive a synthetic dataset based on ShapeNet [4], on which we can investigate the performance of our method under different levels of input incompleteness. Last but not least, we conduct experiments on real-world datasets (Scannet, Matterport and KITTI) as directly working on real-world data is the main stress of our work.

4.1 Datasets

Clean and Complete Point Sets

are obtained by virtually scanning the models from ShapeNet. We use a subset of 4 categories, namely (chair, table, plane and car), in our experiments. To generate clean and complete point set of a model, we virtually scan the models by performing ray-intersection test from virtual scanner cameras placed around the model to obtain dense point set, followed by a down-sampling procedure to obtain a relatively sparse point set of points. Note that we use the models without any pose and scale augmentation.

This dataset is used for training to learn the clean and complete point set manifold in all our experiments. The following datasets of different data distributions serve as different partial input data to our method.

3D-EPN Dataset

provides partial reconstructions of ShapeNet objects (8 categories) by using volumetric fusion method [5] to integrate depth maps scanned along a virtual scanning trajectory around the model. For each model, they generate a set of trajectories with different levels of incompleteness in order to reflect real-world scanning with a hand-held commodity RGB-D sensor. The entire dataset covers 8 categories and a total of 25590 object instances (the test set is composed of 5384 models). We take a subset of 4 categories (namely chair, table, plane and car) from the training data of 3D-EPN dataset, and corresponding test data. Note that in the original 3D-EPN dataset, the data is represented in Signed Distance Field (SDF) for training data and Distance Field (DF) for test data. As our method works on pure point sets, we only use the point cloud representations of the training data provided by the authors, instead of using the SDF data, which holds richer information and is claimed in [8] to be crucial for completing partial data.

model AE Ours w/o GAN 3D-EPN Ours Ours+
acc. comp. F1 acc. comp. F1 acc. comp. F1 acc. comp. F1 acc. comp. F1
chair 75.3 63.4 68.8 25.8 67.0 37.3 65.3 75.5 70.0 58.6 61.3 60.0 75.9 73.8 74.9
table 82.6 72.8 77.4 32.6 75.9 45.6 66.8 74.6 70.5 61.1 72.5 66.3 83.3 82.5 82.9
plane 88.9 82.6 85.7 31.9 97.8 48.2 90.0 88.2 89.1 85.5 80.6 83.0 96.0 93.6 94.8
car 56.4 54.4 55.4 51.1 91.4 65.6 60.5 73.2 66.2 77.0 75.0 76.0 92.7 92.2 92.4
Table 1: Comparison of ours with various baselines on the 3D-EPN benchmark dataset. Note that 3D-EPN [8] requires paired supervision data, while ours does not and hence can be used also for completing raw scans.

Synthetic Data Generation

serves the purposed of having a control of the incompleteness of the partial input, in order to test the performance of our method under varying levels of incompleteness, we also use ShapeNet to generate a synthetic dataset, in which we can control the incompleteness of the synthetic partial point sets. To generate the partial input data with incompleteness control, we take a subset of 4 categories (chair, table, plane and car) of ShapeNet objects, for each category, we split the models into 90%/10% train/test sets. For each model, we have already scanned the clean and complete point set for it (as described earlier in this subsection), we randomly chose a point from its point set, and remove its () nearest neighbor points. The parameter controls the incompleteness of the synthetically-generated input. Furthermore, we add Gaussian noise to each point (=0 and =0.01 for all our experiments). Last, we duplicate the points in the resulting point sets to generate point sets with equal number of points.

Real-world Data

comes from three sources. The first one is derived from Scannet dataset which provide many mesh objects that have been pre-segmented from its surrounding environment. For the purpose of training and testing our network, we extract 550 chair objects and 550 table objects from Scannet dataset, and manually align them to be consistently orientated with models in ShapeNet dataset. We also split these objects into 90%/10% train/test sets.

The second one consists of 20 chairs and 20 tables from the Matterport dataset, the same extraction and alignment as is done in Scannet dataset is also applied here. Note that we train our method only on Scannet training data, and use the trained model to test on Matterport data, to show how our method can generalize to absolutely unseen data. For both Scannet and Matterport datasets, we uniformly sample points on the surface mesh of each object to obtain the partial input.

Last, we extract car observations from the KITTI dataset using the provided ground truth bounding boxes for training and testing our method. We use KITTI Velodyne point clouds from the 3D object detection benchmark and the split of [21]. We filter the observations such that each car observation contains at least 100 points to avoid overly sparse observations.

4.2 Evaluation Metrics

To evaluate the completion results against the ground truth, we adopt three standard metrics in statistical analysis. Analogous to the precision, recall and F1 score, we define the following metrics:

Accuracy

Let denote the ground truth point set and denote the completed point set from the partial input. The accuracy is used to measure the fraction of points in that are matched by the ground truth . More specifically, for each point , we compute . If is within distance threshold , we count it as a correct match. We report the fraction of matched points.

Completeness

Similarly, the completeness records the fraction of point in that are within distance threshold of any point in .

F1

We define the completion F1 score by the harmonic average of the accuracy and completeness, where F1 score reaches its best value at 1 (perfect accuracy and completeness) and worst at 0.

In the following, we will show all experimental and evaluation results. When training our network, we train separate networks for each category in the dataset, during testing, the ground truth class label of the input shape is used to determine which network to use. Note that the ground truth shape is only used for evaluation of our method, as our method does not require ground truth shape for training. More training details can be found in the supplementary.

4.3 Experimental Results on 3D-EPN Data

We train our network using 3D-EPN partial training data and test on its test data to obtain the completion results. We compare our method to several baseline methods (as listed below) and present both quantitative and qualitative comparisons on 3D-EPN test set:

  • Autoencoder (AE). The autoencoder trained only with clean and complete point set.

  • Ours w/o GAN. To compare with the idea of [26], in which there is no adversarial training and the network only optimizes in a single clean latent space, we modify our network by simply setting to switch-off the adversarial training module, to show that using adversarial training is very crucial to our success.

  • 3D-EPN method. A supervised method, which requires richer (SDF) and paired data and is trained with the supervision from the ground truth. We obtain the results of 3D-EPN method from the author, then convert their Distance Field representation results into surface mesh, from which we can uniformly sample points for calculating our point-based metrics.

  • Ours+. Since our method receives no supervision from the ground truth data, we also adapt our network to train with the ground truth for fair comparison. More specifically, we set and use EMD loss between the output and input as . More details and discussion about adapting our method to leverage the supervision from ground truth can be found in the supplementary.

Figure 6: Real-world data results. Top: Scannet [6] chairs and tables. Bottom: Matterport [3] chairs and tables.
Figure 7: Completion results of KITTI [9] cars.

Table 1 shows quantitative results of 4 classes on 3D-EPN test set and summaries the comparisons: while our network is trained with unpaired data, we can still achieve comparable performance to that of 3D-EPN, which is trained with ground truth. After adapting our method to be supervised by the ground truth, Ours+ achieves the best F1 score over all other methods. In addition, switching off the adversarial training leads to a dramatic decrease of the completion performance. Last, we found that a simple autoencoder network trained with only clean and complete data can produce quantitatively good results, especially when the input scan is nearly complete.

Furthermore, we present qualitatively comparisons in Fig 5, where we show side-by-side the partial input, AE, 3D-EPN, Ours, Ours+ result and the ground truth point set. We can see that, even though our method is not quantitatively the best among all these methods, our results are qualitatively very plausible, as the generator is restricted to generate point sets from learned clean and complete shape manifolds.

model Ours w/o GAN Ours w/o recons. loss Ours w/ EMD loss Ours
acc. comp. F1 acc. comp. F1 acc. comp. F1 acc. comp. F1
chair 25.8 67.0 37.3 43.6 43.0 43.3 45.5 42.8 44.1 58.6 61.3 60.0
table 32.6 75.9 45.6 48.9 42.0 45.2 66.1 65.9 66.0 61.1 72.5 66.3
plane 31.9 97.8 48.2 83.6 76.9 80.1 83.3 77.4 80.3 85.5 80.6 83.0
car 51.1 91.4 65.6 72.4 70.8 71.6 79.1 77.7 78.4 77.0 75.0 76.0
Table 2: Ablation study showing the importance of the various terms in our proposed network.

4.4 Experimental Results on Synthetic Data

To further evaluate our method under different levels of incompleteness of the input, we conduct experiments on our synthetic data, in which we can control the fraction of missing points. To be more specific, we train our network with varying levels of incompleteness by randomizing during training, and afterwards fix during the testing. Table 4 shows performance evaluation of different classes under increasing amount of incompleteness.

4.5 Experimental Results on Real-world Data

To evaluate the applicability of our method to real-world data, we train and test our network on noisy and partial chairs and tables extracted from Scannet dataset. We further test the network trained on Scannet dataset on chairs and tables extracted from Matterport dataset, to show how well our network can generalize to definitely unseen data. We present the qualitative comparison of our method and AE in Fig 6, we can see that, on real-world data, AE failed to complete the noisy and partial point sets with high quality for most examples, while our method can consistently produce highly plausible completions for Scannet and Matterport data. Note that our network is trained with only around 500 training samples.

Completing the car observations from KITTI is extremely challenging, as each car instance only receives countable measures from the Lidar scanner. Fig 7 shows the qualitative results of our method on completing sparse point sets of KITTI cars, we can see that our network can still generate highly plausible cars with such sparse inputs.

Since the ground truth for these real-world chair, table and car data is unavailable for evaluation, we use a point-based object part segmentation network (as described in [23]) to indirectly evaluate our completion results of real-world data. Due to the absence of ground truth segmentation of our output completion point sets, we propose to calculate the approximated segmentation accuracy for each completion result of real-world data. For example, for a point set of chair class, we count the predicted segmentation label of each point to be correct as long as the predicted label falls into the set of 4 parts (namely, seat, back, leg and armrest) of chair class. Table 3 shows the significant improvements of our completion results on segmentation accuracy.

raw input completion
chair 24.8 77.2
table 83.5 96.4
car 5.2 98.0
Table 3: Segmentation accuracy of the real-world data.

4.6 Ablation Study

  • Ours w/o GAN, is an ablation test of adversarial training where to verify the effectiveness of using adversarial training in our network, we switch off the GAN module by simply setting .

  • Ours w/o reconstruction loss, is an ablation test to verify the effectiveness of the reconstruction loss term in generator loss.

  • Ours w/ EMD loss, is to show that using Hausdorff Distance for reconstruction loss (HL) is superior to EMD loss, as HL only guides the network to generate a point set that match the input partially and EMD loss would force the network to reconstruct the overall partial input.

Table 2 shows all quantitative results of all ablation experiments. We demonstrate the importance of various modules in our proposed network.

incomp. model Ours model Ours
acc. comp. F1 acc. comp. F1
10 chair 80.7 84.8 82.7 plane 94.2 95.4 94.8
20 76.4 78.7 77.6 92.9 93.6 93.2
30 72.4 72.5 72.5 90.6 92.0 91.3
40 67.3 66.3 66.8 88.5 89.8 89.2
50 62.2 61.9 62.1 88.6 90.0 89.3
10 table 85.0 87.7 86.3 car 82.2 81.4 81.8
20 82.2 84.2 83.2 79.7 78.5 79.1
30 79.2 79.6 79.4 76.6 75.0 75.8
40 75.5 73.3 74.4 72.6 71.8 72.2
50 72.5 68.8 70.6 66.9 66.5 66.7
Table 4: Performance evaluation on different classes under increasing amunt of incompleteness (numbers denote % of the shapes without nearby sampled points). Note that although F1 scores go down with increasing incompleteness, the output continues to remain valid complete and clean scans as our formulation restricts generation to learned shape manifolds.

5 Conclusion

We presented a point-based unpaired shape completion framework that can be applied directly on raw partial scans to obtain clean and complete point clouds. At the core of the algorithm is an adaptation network acting as a generator that transforms latent code encodings of the raw point scans, and maps them to latent code encoding of clean and complete object scans. The two latent spaces regularizes the problem by restricting the transfer problem to respective data manifolds. We extensively evaluated our method on real scans and virtual scans, demonstrating that our approach consistently leads to plausible completions and perform superior to other unpaired methods. The work opens up the possibility of generalizing our approach to scene-level scan completions, rather than object-specific completions. Another interesting future direction will be to combine point- and image-features to apply the completion setup to both geometry and texture details.

References

  • [1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. In 2019 IEEE Winter Conf. on App. of Comp. Vis., pages 40–49, 2018.
  • [2] A. Bulat, J. Yang, and G. Tzimiropoulos.

    To learn image super-resolution, use a gan to learn how to do image degradation first.

    In Proc. Euro. Conf. on Comp. Vis., pages 185–200, 2018.
  • [3] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nie√üner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgb-d data in indoor environments. 09 2017.
  • [4] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  • [5] B. Curless and M. Levoy. A volumetric method for building complex models from range images. 1996.
  • [6] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In

    Proc. Computer Vision and Pattern Recognition (CVPR), IEEE

    , 2017.
  • [7] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner. Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2018.
  • [8] A. Dai, C. Ruizhongtai Qi, and M. Nießner. Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proc. Int. Conf. on Comp. Vis., pages 5868–5877, 2017.
  • [9] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  • [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
  • [11] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. Pcpnet learning local shape properties from raw point clouds. In Computer Graphics Forum, volume 37, pages 75–85, 2018.
  • [12] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu. High-resolution shape completion using deep neural networks for global structure and local geometry inference. In Proc. Int. Conf. on Comp. Vis., pages 85–93, 2017.
  • [13] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Trans. on Graph., 36(4):107, 2017.
  • [14] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  • [15] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. CoRR, abs/1803.04189, 2018.
  • [16] J. Li, B. M. Chen, and G. Hee Lee. So-net: Self-organizing network for point cloud analysis. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 9397–9406, 2018.
  • [17] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen. Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems, 2018.
  • [18] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In Proc. Int. Conf. on Comp. Vis., pages 2794–2802, 2017.
  • [19] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, and Z. Wang. Multi-class generative adversarial networks with the L2 loss function. CoRR, abs/1611.04076, 2016.
  • [20] S.-J. Park, H. Son, S. Cho, K.-S. Hong, and S. Lee. Srfeat: Single image super-resolution with feature discrimination. In Proc. Euro. Conf. on Comp. Vis., pages 439–455, 2018.
  • [21] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgb-d data. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 918–927, 2018.
  • [22] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 652–660, 2017.
  • [23] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108, 2017.
  • [24] A. Sharma, O. Grau, and M. Fritz. Vconv-dae: Deep volumetric shape learning without object labels. In Proc. Euro. Conf. on Comp. Vis., pages 236–250, 2016.
  • [25] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. Proceedings of 29th IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  • [26] D. Stutz and A. Geiger. Learning 3d shape completion from laser scan data with weak supervision. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 1955–1964, 2018.
  • [27] H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H. Yang, and J. Kautz. Splatnet: Sparse lattice networks for point cloud processing. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 2530–2539, 2018.
  • [28] D. Thanh Nguyen, B.-S. Hua, K. Tran, Q.-H. Pham, and S.-K. Yeung. A field model for repairing 3d shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5676–5684, 2016.
  • [29] W. Wang, Q. Huang, S. You, C. Yang, and U. Neumann. Shape inpainting using 3d generative adversarial network and recurrent convolutional networks. In Proc. Int. Conf. on Comp. Vis., pages 2298–2306, 2017.
  • [30] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proc. Euro. Conf. on Comp. Vis., pages 63–79. Springer, 2018.
  • [31] B. Yang, S. Rosa, A. Markham, N. Trigoni, and H. Wen. 3d object dense reconstruction from a single depth view. arXiv preprint arXiv:1802.00411, 1(2):6, 2018.
  • [32] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do.

    Semantic image inpainting with deep generative models.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493, 2017.
  • [33] K. Yin, H. Huang, D. Cohen-Or, and H. Zhang. P2p-net: bidirectional point displacement net for shape transform. ACM Trans. on Graph., 37(4):152, 2018.
  • [34] L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, and P.-A. Heng. Ec-net: an edge-aware point set consolidation network. In Proc. Euro. Conf. on Comp. Vis., pages 386–402, 2018.
  • [35] L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, and P.-A. Heng. Pu-net: Point cloud upsampling network. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 2790–2799, 2018.
  • [36] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert. Pcn: Point completion network. In 2018 International Conference on 3D Vision (3DV), pages 728–737, 2018.
  • [37] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. In Advances in Neural Information Processing Systems, 2017.