1 Introduction
Robust, efficient, and scalable solutions now exist for easily scanning large environments and workspaces [6, 3]. The resultant scans, however, are often partial and have to be completed (i.e., missing parts have to be hallucinated and filled in) before they can be used in downstream applications, e.g., virtual walkthrough, path planning, etc.
The most popular datadriven scan completion methods rely on paired supervision data, i.e., for each incomplete training scan, a corresponding complete data (e.g., voxels, point sets, etc.) is required. One way to establish such a shape completion network is then to train a suitably designed encoderdecoder architecture [8, 7]. The required
paired training data is obtained by virtually scanning 3D objects (e.g., SunCG [25], ShapeNet [4] datasets) to simulate occlusion effects. Such approaches, however, is unsuited for real scans where large volumes of paired supervision data remain difficult to collect. To the best of our knowledge, no pointbased unpaired method exists that learns a mapping to translate noisy and incomplete point cloud from raw scans to clean and complete point sets.
In this paper, we propose an unpaired pointbased scan completion method that can be trained without requiring explicit correspondence between partial point sets (e.g., raw scans) and example complete shape models (e.g., synthetic models). Note that the network does not require explicit examples of real complete scans and hence existing (unpaired) largescale real 3D scan (e.g., [6, 3]) and virtual 3D object repositories (e.g., [25, 4]) can directly be leveraged as training data. Figure 1 shows some example scan completion produced by our framework.
We achieve this by designing a generative adversarial network (GAN) wherein a generator, i.e., an adaptation network, transforms the input into a suitable latent representation such that a discriminator cannot differentiate between the transformed latent variables and the latent variables obtained from training data (i.e., complete shape models). Intuitively, the generator is responsible for the key task of mapping raw partial point sets into clean and complete point sets, and the process is regularized by working in two different latent spaces that has separately learned manifolds of scanned and synthetic object data.
We demonstrate our method on several publicly available real world scan datasets namely (i) ScanNet [6] chairs and tables; (ii) Matterport [3] chairs and tables; and (iii) KITTI [9] cars. In absence of suitable ground truth, we cannot directly compute accuracy for the completed scans. Hence, in order to quantitatively evaluate the performance of the network, we report numbers on a synthetic benchmark dataset [8] where completed versions are available to measure accuracy of completions. Finally, we compare our method against baseline methods to demonstrate the advantages of the proposed unpaired scan completion framework.
2 Related Work
Deep Learning on Point clouds
Our method is built upon recent advances in deep neural networks for point clouds. PointNet
[22] is the pioneer work in this field. For a input point set, it uses pointwise MLP layers, followed by a symmetric and permutationinvariant function to aggregate the information of the whole shape and obtain a compact global feature, which can then be for multiple tasks (e.g., classification and segmentation). Although many alternatives to PointNet have been proposed [27, 17, 23, 16, 37] to achieve higher performance, the simplicity and effectiveness of PointNet and its extension PointNet++ make it popular for many other tasks [34, 33, 35, 11].Recently, [1]
showed that their autoencoder network, which is derived from PointNet, is able to learn compact representations of point cloudss. The author also performed a study of different generative models on point clouds to show the significant improvement of training GAN in the fixed latent space to generate point cloud from a random vector, over training on raw point clouds. Inspired by this work, we design a GAN to translate between two different latent spaces to perform unpaired shape completion.
Shape Completion
With the advances of deep neural networks, many learningbased approaches have been proposed to address the shape completion task. Inspired by CNN based 2D image completion tasks, voxels along with associated 3D convolutional neural networks have been widely adopted for shape completion task in 3D
[7, 8, 24, 12, 28, 31, 29]. As quantizing the shape to voxel grids will lead to geometric information loss, several recent approaches [36, 35, 1] that operate directly on point sets have been proposed for filling the missing part of the point sets.However, these works require paired supervision, i.e., partialcomplete paired data as input for training a parameterized model (usually a deep neural network) that directly maps the partial input to its ground truth shape. Since ground truth of realworld data is often unavailable, training and testing paired data is usually generated by a virtual synthesis procedure, and tested on synthetic datasets.
Realizing the gap between syntheticallygenerated data and realworld data, researchers have proposed to directly work on realworld data [26]
. They also work in a latent space created for clean and complete data, but measure reconstruction loss using a maximum likelihood estimator. Instead, we propose two latent spaces, and use a GAN setup to learn the translation between the two spaces. Defining the reconstruction Hausdorff loss on point clouds allows us to directly work with point sets, instead of having to voxelize the input.
Generative Adversarial Network
GAN [10] has been more and more popular and introduced into many other tasks since it was proposed. In 2D image domain, researchers have extended the adversarial training to recover richer information from lowresolution images or corrupted images [14, 30, 18, 20, 2, 32, 13]. In 3D context, [31, 29], where the authors combine 3DCNN and generative adversarial training to complete voxelbased shapes under the supervision of ground truth data.
3 Method
Given a noisy and partial point set as input, our goal is to lift the input to produce clean and complete point set as output. (Note that there is no explicit correspondence between the point sets and – the clean point set being complete is typically sparser than the partial input.) The challenge is to perform this lifting without paired training data. We achieve this by learning two separate point set manifolds, for the scanned inputs and for clean and complete shapes. Solving the shape completion problem then amounts to learning a mapping between the two latent spaces. We train a generator to perform the mapping. In absence of paired training data, we score the generated output, i.e., the learned mapping by setting up a minmax game where the generator is trained to fool a discriminator , whose goal is to differentiate between encoded clean and complete shapes, and mapped encodings of the raw/partial inputs. Figure 2 shows the setup of the proposed scan completion network. The latent space encoder/decoders, the mapping generator, and the discriminator are all trained as detailed next.
3.1 Learning latent spaces for point sets
First, we train two autoencoders (AE) that operate directly on the noisypartial and cleancomplete point sets, respectively. We work directly on the point sets instead of quantizing them to voxel grids or signed distance fields. For the noisypartial point sets, we learn an encoder network that maps from the original parameter space , defined by the concatenation of the coordinates of the points in (2048 in all our tests) to a lowerdimensional latent space . A decoder network performs the inverse transformation back to giving us a reconstructed point set with points. Both networks are trained with reconstruction loss,
(1) 
where denotes point set samples drawn from the set of noisy and partial point sets, is the Earth mover’s Distance (EMD) between point sets , and are the parameters of the encoder and decoder networks, respectively. Once trained, the weights of both networks are held fixed and the latent code for a raw input pointset provides a more suitable representation for subsequent training and implicitly captures the manifold of noisypartial scans. The encoder and decoder blocks consist of MLPs and their details are provided in the supplementary. The architecture of the encoder and decoder is similar to PointNet [1, 22]: using a 5layer MLP to lift individual points to a deeper feature space, followed by a symmetric function to maintain permutation invariance. This results in a dimensional latent code that describes the entire point cloud (=128 in all our experiments). The decoder simply transforms the latent vector using 3 fully connected layers, the first two having ReLUs, to reconstruct a x 3 point cloud.
Similarly, for point sets coming from clean and complete point sets , we train another encoder and decoder pair that provides a latent parameterization for the clean and complete point sets. Analogous to Eq. 1, the reconstruction loss is defined as,
(2) 
where denotes point set samples drawn from clean and complete point sets.
As demonstrated in the context of images [15], an encoderdecoder network implicitly performs denoising when trained on noisy images by regressing to the mean image as the low dimensionality of the latent space prevents it from modeling the high entropy noise. The idea naturally extends to point sets, and such an autoencoder can be used to denoise a noisy scan without requiring paired input, i.e., the denoised point set is given by . As shown in Figure 3, such an AE works well for denoising complete scans, but cannot be trained to perform scan completion, which is our goal. Hence, as described next, we propose a GAN setup to learn a mapping between the latent spaces of partial and complete scans, i.e., .
3.2 Learning a mapping between latent spaces
We setup a minmax game between a generator and a discriminator to perform the mapping between the latent spaces. The generator is trained to perform the mapping such that the discriminator fails to reliably tell if the latent variable comes from or the remapped . The generator and discriminator architecture details can be found in the supplementary material.
The latent representation of a noisy and partial scan is mapped by the generator to . Then, the task of the discriminator is to distinguish between latent representations and .
We train the trajectory update network using an adversarial loss [10]. Given training examples of clean latent variables and remappednoisy latent variables , we seek to optimize the following adversarial loss over the mapping generator and a discriminator architecture ,
(3) 
In our experiments, we found the least square GAN [19] to be easier to train and hence used the discriminator and generator losses as,
(4)  
(5) 
The above setup encourages the generator to perform the mapping resulting in to be a clean and complete point cloud . However, the generator is free to map the a noisy latent vector to any point on the manifold of valid shapes in , including shapes that are far from the original partial scan . As shown in Figure 4, the result is a complete and clean point cloud that is not similar in shape to the partial scanned input. To prevent this, we add a reconstruction term to the generator loss:
(6) 
where denotes the Hausdorff distance between two point sets and are the tradeoff parameters. Unless specified, we use in all our experiments.
4 Experimental Evaluation
In this section, we present quantitative and qualitative experimental results with several noisy and partial datasets as input. First, to evaluate our method, we present our results on 3DEPN dataset [8] which contains ground truth data for evaluation, and show comparisons to several baseline methods. Second, since the 3DEPN dataset lacks of control of the point cloud incompleteness, we derive a synthetic dataset based on ShapeNet [4], on which we can investigate the performance of our method under different levels of input incompleteness. Last but not least, we conduct experiments on realworld datasets (Scannet, Matterport and KITTI) as directly working on realworld data is the main stress of our work.
4.1 Datasets
Clean and Complete Point Sets
are obtained by virtually scanning the models from ShapeNet. We use a subset of 4 categories, namely (chair, table, plane and car), in our experiments. To generate clean and complete point set of a model, we virtually scan the models by performing rayintersection test from virtual scanner cameras placed around the model to obtain dense point set, followed by a downsampling procedure to obtain a relatively sparse point set of points. Note that we use the models without any pose and scale augmentation.
This dataset is used for training to learn the clean and complete point set manifold in all our experiments. The following datasets of different data distributions serve as different partial input data to our method.
3DEPN Dataset
provides partial reconstructions of ShapeNet objects (8 categories) by using volumetric fusion method [5] to integrate depth maps scanned along a virtual scanning trajectory around the model. For each model, they generate a set of trajectories with different levels of incompleteness in order to reflect realworld scanning with a handheld commodity RGBD sensor. The entire dataset covers 8 categories and a total of 25590 object instances (the test set is composed of 5384 models). We take a subset of 4 categories (namely chair, table, plane and car) from the training data of 3DEPN dataset, and corresponding test data. Note that in the original 3DEPN dataset, the data is represented in Signed Distance Field (SDF) for training data and Distance Field (DF) for test data. As our method works on pure point sets, we only use the point cloud representations of the training data provided by the authors, instead of using the SDF data, which holds richer information and is claimed in [8] to be crucial for completing partial data.
model  AE  Ours w/o GAN  3DEPN  Ours  Ours+  

acc.  comp.  F1  acc.  comp.  F1  acc.  comp.  F1  acc.  comp.  F1  acc.  comp.  F1  
chair  75.3  63.4  68.8  25.8  67.0  37.3  65.3  75.5  70.0  58.6  61.3  60.0  75.9  73.8  74.9 
table  82.6  72.8  77.4  32.6  75.9  45.6  66.8  74.6  70.5  61.1  72.5  66.3  83.3  82.5  82.9 
plane  88.9  82.6  85.7  31.9  97.8  48.2  90.0  88.2  89.1  85.5  80.6  83.0  96.0  93.6  94.8 
car  56.4  54.4  55.4  51.1  91.4  65.6  60.5  73.2  66.2  77.0  75.0  76.0  92.7  92.2  92.4 
Synthetic Data Generation
serves the purposed of having a control of the incompleteness of the partial input, in order to test the performance of our method under varying levels of incompleteness, we also use ShapeNet to generate a synthetic dataset, in which we can control the incompleteness of the synthetic partial point sets. To generate the partial input data with incompleteness control, we take a subset of 4 categories (chair, table, plane and car) of ShapeNet objects, for each category, we split the models into 90%/10% train/test sets. For each model, we have already scanned the clean and complete point set for it (as described earlier in this subsection), we randomly chose a point from its point set, and remove its () nearest neighbor points. The parameter controls the incompleteness of the syntheticallygenerated input. Furthermore, we add Gaussian noise to each point (=0 and =0.01 for all our experiments). Last, we duplicate the points in the resulting point sets to generate point sets with equal number of points.
Realworld Data
comes from three sources. The first one is derived from Scannet dataset which provide many mesh objects that have been presegmented from its surrounding environment. For the purpose of training and testing our network, we extract 550 chair objects and 550 table objects from Scannet dataset, and manually align them to be consistently orientated with models in ShapeNet dataset. We also split these objects into 90%/10% train/test sets.
The second one consists of 20 chairs and 20 tables from the Matterport dataset, the same extraction and alignment as is done in Scannet dataset is also applied here. Note that we train our method only on Scannet training data, and use the trained model to test on Matterport data, to show how our method can generalize to absolutely unseen data. For both Scannet and Matterport datasets, we uniformly sample points on the surface mesh of each object to obtain the partial input.
Last, we extract car observations from the KITTI dataset using the provided ground truth bounding boxes for training and testing our method. We use KITTI Velodyne point clouds from the 3D object detection benchmark and the split of [21]. We filter the observations such that each car observation contains at least 100 points to avoid overly sparse observations.
4.2 Evaluation Metrics
To evaluate the completion results against the ground truth, we adopt three standard metrics in statistical analysis. Analogous to the precision, recall and F1 score, we define the following metrics:
Accuracy
Let denote the ground truth point set and denote the completed point set from the partial input. The accuracy is used to measure the fraction of points in that are matched by the ground truth . More specifically, for each point , we compute . If is within distance threshold , we count it as a correct match. We report the fraction of matched points.
Completeness
Similarly, the completeness records the fraction of point in that are within distance threshold of any point in .
F1
We define the completion F1 score by the harmonic average of the accuracy and completeness, where F1 score reaches its best value at 1 (perfect accuracy and completeness) and worst at 0.
In the following, we will show all experimental and evaluation results. When training our network, we train separate networks for each category in the dataset, during testing, the ground truth class label of the input shape is used to determine which network to use. Note that the ground truth shape is only used for evaluation of our method, as our method does not require ground truth shape for training. More training details can be found in the supplementary.
4.3 Experimental Results on 3DEPN Data
We train our network using 3DEPN partial training data and test on its test data to obtain the completion results. We compare our method to several baseline methods (as listed below) and present both quantitative and qualitative comparisons on 3DEPN test set:

Autoencoder (AE). The autoencoder trained only with clean and complete point set.

Ours w/o GAN. To compare with the idea of [26], in which there is no adversarial training and the network only optimizes in a single clean latent space, we modify our network by simply setting to switchoff the adversarial training module, to show that using adversarial training is very crucial to our success.

3DEPN method. A supervised method, which requires richer (SDF) and paired data and is trained with the supervision from the ground truth. We obtain the results of 3DEPN method from the author, then convert their Distance Field representation results into surface mesh, from which we can uniformly sample points for calculating our pointbased metrics.

Ours+. Since our method receives no supervision from the ground truth data, we also adapt our network to train with the ground truth for fair comparison. More specifically, we set and use EMD loss between the output and input as . More details and discussion about adapting our method to leverage the supervision from ground truth can be found in the supplementary.
Table 1 shows quantitative results of 4 classes on 3DEPN test set and summaries the comparisons: while our network is trained with unpaired data, we can still achieve comparable performance to that of 3DEPN, which is trained with ground truth. After adapting our method to be supervised by the ground truth, Ours+ achieves the best F1 score over all other methods. In addition, switching off the adversarial training leads to a dramatic decrease of the completion performance. Last, we found that a simple autoencoder network trained with only clean and complete data can produce quantitatively good results, especially when the input scan is nearly complete.
Furthermore, we present qualitatively comparisons in Fig 5, where we show sidebyside the partial input, AE, 3DEPN, Ours, Ours+ result and the ground truth point set. We can see that, even though our method is not quantitatively the best among all these methods, our results are qualitatively very plausible, as the generator is restricted to generate point sets from learned clean and complete shape manifolds.
model  Ours w/o GAN  Ours w/o recons. loss  Ours w/ EMD loss  Ours  

acc.  comp.  F1  acc.  comp.  F1  acc.  comp.  F1  acc.  comp.  F1  
chair  25.8  67.0  37.3  43.6  43.0  43.3  45.5  42.8  44.1  58.6  61.3  60.0 
table  32.6  75.9  45.6  48.9  42.0  45.2  66.1  65.9  66.0  61.1  72.5  66.3 
plane  31.9  97.8  48.2  83.6  76.9  80.1  83.3  77.4  80.3  85.5  80.6  83.0 
car  51.1  91.4  65.6  72.4  70.8  71.6  79.1  77.7  78.4  77.0  75.0  76.0 
4.4 Experimental Results on Synthetic Data
To further evaluate our method under different levels of incompleteness of the input, we conduct experiments on our synthetic data, in which we can control the fraction of missing points. To be more specific, we train our network with varying levels of incompleteness by randomizing during training, and afterwards fix during the testing. Table 4 shows performance evaluation of different classes under increasing amount of incompleteness.
4.5 Experimental Results on Realworld Data
To evaluate the applicability of our method to realworld data, we train and test our network on noisy and partial chairs and tables extracted from Scannet dataset. We further test the network trained on Scannet dataset on chairs and tables extracted from Matterport dataset, to show how well our network can generalize to definitely unseen data. We present the qualitative comparison of our method and AE in Fig 6, we can see that, on realworld data, AE failed to complete the noisy and partial point sets with high quality for most examples, while our method can consistently produce highly plausible completions for Scannet and Matterport data. Note that our network is trained with only around 500 training samples.
Completing the car observations from KITTI is extremely challenging, as each car instance only receives countable measures from the Lidar scanner. Fig 7 shows the qualitative results of our method on completing sparse point sets of KITTI cars, we can see that our network can still generate highly plausible cars with such sparse inputs.
Since the ground truth for these realworld chair, table and car data is unavailable for evaluation, we use a pointbased object part segmentation network (as described in [23]) to indirectly evaluate our completion results of realworld data. Due to the absence of ground truth segmentation of our output completion point sets, we propose to calculate the approximated segmentation accuracy for each completion result of realworld data. For example, for a point set of chair class, we count the predicted segmentation label of each point to be correct as long as the predicted label falls into the set of 4 parts (namely, seat, back, leg and armrest) of chair class. Table 3 shows the significant improvements of our completion results on segmentation accuracy.
raw input  completion  

chair  24.8  77.2 
table  83.5  96.4 
car  5.2  98.0 
4.6 Ablation Study

Ours w/o GAN, is an ablation test of adversarial training where to verify the effectiveness of using adversarial training in our network, we switch off the GAN module by simply setting .

Ours w/o reconstruction loss, is an ablation test to verify the effectiveness of the reconstruction loss term in generator loss.

Ours w/ EMD loss, is to show that using Hausdorff Distance for reconstruction loss (HL) is superior to EMD loss, as HL only guides the network to generate a point set that match the input partially and EMD loss would force the network to reconstruct the overall partial input.
Table 2 shows all quantitative results of all ablation experiments. We demonstrate the importance of various modules in our proposed network.
incomp.  model  Ours  model  Ours  

acc.  comp.  F1  acc.  comp.  F1  
10  chair  80.7  84.8  82.7  plane  94.2  95.4  94.8 
20  76.4  78.7  77.6  92.9  93.6  93.2  
30  72.4  72.5  72.5  90.6  92.0  91.3  
40  67.3  66.3  66.8  88.5  89.8  89.2  
50  62.2  61.9  62.1  88.6  90.0  89.3  
10  table  85.0  87.7  86.3  car  82.2  81.4  81.8 
20  82.2  84.2  83.2  79.7  78.5  79.1  
30  79.2  79.6  79.4  76.6  75.0  75.8  
40  75.5  73.3  74.4  72.6  71.8  72.2  
50  72.5  68.8  70.6  66.9  66.5  66.7 
5 Conclusion
We presented a pointbased unpaired shape completion framework that can be applied directly on raw partial scans to obtain clean and complete point clouds. At the core of the algorithm is an adaptation network acting as a generator that transforms latent code encodings of the raw point scans, and maps them to latent code encoding of clean and complete object scans. The two latent spaces regularizes the problem by restricting the transfer problem to respective data manifolds. We extensively evaluated our method on real scans and virtual scans, demonstrating that our approach consistently leads to plausible completions and perform superior to other unpaired methods. The work opens up the possibility of generalizing our approach to scenelevel scan completions, rather than objectspecific completions. Another interesting future direction will be to combine point and imagefeatures to apply the completion setup to both geometry and texture details.
References
 [1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. In 2019 IEEE Winter Conf. on App. of Comp. Vis., pages 40–49, 2018.

[2]
A. Bulat, J. Yang, and G. Tzimiropoulos.
To learn image superresolution, use a gan to learn how to do image degradation first.
In Proc. Euro. Conf. on Comp. Vis., pages 185–200, 2018.  [3] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nie√üner, M. Savva, S. Song, A. Zeng, and Y. Zhang. Matterport3d: Learning from rgbd data in indoor environments. 09 2017.
 [4] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An InformationRich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
 [5] B. Curless and M. Levoy. A volumetric method for building complex models from range images. 1996.

[6]
A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner.
Scannet: Richlyannotated 3d reconstructions of indoor scenes.
In
Proc. Computer Vision and Pattern Recognition (CVPR), IEEE
, 2017.  [7] A. Dai, D. Ritchie, M. Bokeloh, S. Reed, J. Sturm, and M. Nießner. Scancomplete: Largescale scene completion and semantic segmentation for 3d scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2018.
 [8] A. Dai, C. Ruizhongtai Qi, and M. Nießner. Shape completion using 3dencoderpredictor cnns and shape synthesis. In Proc. Int. Conf. on Comp. Vis., pages 5868–5877, 2017.
 [9] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
 [10] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
 [11] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra. Pcpnet learning local shape properties from raw point clouds. In Computer Graphics Forum, volume 37, pages 75–85, 2018.
 [12] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu. Highresolution shape completion using deep neural networks for global structure and local geometry inference. In Proc. Int. Conf. on Comp. Vis., pages 85–93, 2017.
 [13] S. Iizuka, E. SimoSerra, and H. Ishikawa. Globally and locally consistent image completion. ACM Trans. on Graph., 36(4):107, 2017.
 [14] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photorealistic single image superresolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
 [15] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. CoRR, abs/1803.04189, 2018.
 [16] J. Li, B. M. Chen, and G. Hee Lee. Sonet: Selforganizing network for point cloud analysis. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 9397–9406, 2018.
 [17] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen. Pointcnn: Convolution on xtransformed points. In Advances in Neural Information Processing Systems, 2018.
 [18] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In Proc. Int. Conf. on Comp. Vis., pages 2794–2802, 2017.
 [19] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, and Z. Wang. Multiclass generative adversarial networks with the L2 loss function. CoRR, abs/1611.04076, 2016.
 [20] S.J. Park, H. Son, S. Cho, K.S. Hong, and S. Lee. Srfeat: Single image superresolution with feature discrimination. In Proc. Euro. Conf. on Comp. Vis., pages 439–455, 2018.
 [21] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgbd data. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 918–927, 2018.
 [22] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 652–660, 2017.
 [23] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108, 2017.
 [24] A. Sharma, O. Grau, and M. Fritz. Vconvdae: Deep volumetric shape learning without object labels. In Proc. Euro. Conf. on Comp. Vis., pages 236–250, 2016.
 [25] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. Proceedings of 29th IEEE Conference on Computer Vision and Pattern Recognition, 2017.
 [26] D. Stutz and A. Geiger. Learning 3d shape completion from laser scan data with weak supervision. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 1955–1964, 2018.
 [27] H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.H. Yang, and J. Kautz. Splatnet: Sparse lattice networks for point cloud processing. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 2530–2539, 2018.
 [28] D. Thanh Nguyen, B.S. Hua, K. Tran, Q.H. Pham, and S.K. Yeung. A field model for repairing 3d shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5676–5684, 2016.
 [29] W. Wang, Q. Huang, S. You, C. Yang, and U. Neumann. Shape inpainting using 3d generative adversarial network and recurrent convolutional networks. In Proc. Int. Conf. on Comp. Vis., pages 2298–2306, 2017.
 [30] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy. Esrgan: Enhanced superresolution generative adversarial networks. In Proc. Euro. Conf. on Comp. Vis., pages 63–79. Springer, 2018.
 [31] B. Yang, S. Rosa, A. Markham, N. Trigoni, and H. Wen. 3d object dense reconstruction from a single depth view. arXiv preprint arXiv:1802.00411, 1(2):6, 2018.

[32]
R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. HasegawaJohnson, and M. N.
Do.
Semantic image inpainting with deep generative models.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493, 2017.  [33] K. Yin, H. Huang, D. CohenOr, and H. Zhang. P2pnet: bidirectional point displacement net for shape transform. ACM Trans. on Graph., 37(4):152, 2018.
 [34] L. Yu, X. Li, C.W. Fu, D. CohenOr, and P.A. Heng. Ecnet: an edgeaware point set consolidation network. In Proc. Euro. Conf. on Comp. Vis., pages 386–402, 2018.
 [35] L. Yu, X. Li, C.W. Fu, D. CohenOr, and P.A. Heng. Punet: Point cloud upsampling network. In Proc. IEEE Conf. on Comp. Vis. and Pat. Rec., pages 2790–2799, 2018.
 [36] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert. Pcn: Point completion network. In 2018 International Conference on 3D Vision (3DV), pages 728–737, 2018.
 [37] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. In Advances in Neural Information Processing Systems, 2017.
Comments
There are no comments yet.