I Introduction
Odometry (egomotion) estimation is a core module in SLAM (SLAM) which presents various applications to an autonomous robot [1] and 3D mapping [2, 3]. So far, most odometry modules have been focused on modelbased using cameras [4, 5, 6] and LiDAR [7, 8, 9]. For example, visualLiDAR odometry and mapping (VLOAM) records the first place in the KITTI odometry benchmark and has shown remarkable accuracy. Despite their superior performances, modelbased methods are exposed to challenges such as vulnerability to environmental disturbance and parameter selection. Therefore, recent studies have started to examine learningbased methods mostly for visual odometry in both a supervised [10, 11] and an unsupervised [12, 13] manner.
Similar to vision, some effort toward learningbased odometry using a range sensor (e.g. LiDAR)
has been initiated. However, the major challenge is to handle a dense point cloud by feeding it into a deep neural network, and several recent studies have focused on feeding the point cloud directly to the network
[14, 15], albeit for objectsized point cloud data. As an example for odometry, learningbased approaches for point clouds were presented in [16, 17] where the authors relied on a supervised method requiring the groundtruth with labeled sequences. Unlike these previous approaches, we examine an unsupervised manner for deep LiDAR odometry in order to achieve scalability and flexibility in the training phase.In this paper, we propose unsupervised deep LiDAR odometry, called DeepLO. For efficiency, we feed a rendered vertex map and a normal map into a network and regress a 6D relative pose between two frames, while the NICPlike loss [2] is calculated using the aforementioned representations; thus, an overall training pipeline is conducted in an unsupervised manner. fig:thumbnail_map shows the learned trajectory of Oxford Robotcat dataset [3] with an unsupervised manner. This figure indicates that our method successfully captures the relative motion of a long sequence trajectory (10km) without groundtruth. To the best of our knowledge, our work is the first unsupervised learningbased odometry for a range sensor.
Our contributions are:

We propose a general pipeline for deeplearningbased LiDAR odometry that can be trained in both a supervised and an unsupervised manner.

For efficient unsupervised training and inference, we use a vertex and normal map as inputs and use them on loss calculation. By doing so, the timeconsuming labeling procedure is alleviated in an unsupervised fashion. These summarized representations can be exploited in both the training and inference stages.

The proposed learning system can generally be used for a LiDAR point cloud (submap) regardless of the hardware type or configuration (e.g., the 3D surrounding by KITTI dataset and the 2D pushbroom of the Oxford RobotCar dataset).
The remaineder of the paper is composed as follows. In related_works, recent works on model and learningbased odometry estimation methods are discussed. In proposed_method, we introduce the architecture and loss functions of our learningbased LiDAR odometry. The performance of our method is compared with other stateoftheart methods in experimental_results. Conclusions and ideas for further work are shared in conclusion.
Ii Related Works
In this section, we provide a summary of existing model and learningbased methods for odometry estimation.
Iia Modelbased Odometry Estimation for a Range Sensor
For range sensors such as an RGBD camera and LiDAR, many modelbased methods [8, 18, 9] including the odometry module, have been proposed which minimize the error between two consecutive frames or a frame and a map using the ICP [19, 20, 21, 2]. LOAM extracts edge and planar features for matching and run two parallel modules of different frequencies for fast and accurate egomotion estimation [8]. Recently, Behley and Stachniss proposed a surfelbased mapping method, called SuMa, for 3D laser range data [9]. The SuMa representation is considered to be efficient and accurate for dense mapping such as ElasticFusion [18] using an RGBD camera. This type of research [18, 9] also minimizes the error between the current frame and the rendered view of a map using ICP. By leveraging renderedimage coordinateparameterized normal and vertex information, SuMa can almost employ points for ICP (unlike featurebased methods such as LOAM) while minimizing a loss of a piece of original information (e.g., from filtering, rasterization, or taking features).
IiB Learningbased Odometry Estimation
IiB1 Visual Sensor
Recently a number of learningbased visual odometry methods have been developed. Wang et al. [10] proposed a supervised visual odometry network using a recurrent network structure, while Zhou et al. [11] introduced a method of configuring a tracking and mapping process in a network structure. However, since it is challenging to construct a lot of groundtruth data to train a network, many unsupervised learning methods which leverage photoconsistency have recently been introduced. Zhou et al. [12] reported that the metric depth could be learned from monocular sequences by warping the consecutive images. Li et al. [13] proposed the selfsupervised learning of depth and egomotion through spatial and temporal stereo consistency. Recently, [22]
proposed the joint learning of the depth, odometry, and optical flow of consecutive scenes with consideration of the outlier (dynamic) pixels for robustness. In addition,
[23] introduced a hybrid pipeline for robust egomotion estimation. This method learns disparity and depth via deep networks and predicts relative motion with modelbased RANSAC (RANSAC) outlier rejection.IiB2 Range Sensor
Unlike the aforementioned methods for visual sensors, there are few learningbased methods for range sensors because the range sensor data (e.g., 3D point cloud) is sparse and irregular; this fact makes it difficult to directly employ conventional modules such as 2D convolution and upconvolution due to memory inefficiency issue. Recently, some methods that consume irregular point cloud data directly and achieve permutation invariance have been proposed for object recognition or the segmentation problem [14, 24]. However, to the best of our knowledge, there have not yet been any empirical reports for odometry estimation that directly leverages point cloud. To avoid the these issues, a few studies proposed rasterized imagebased learning methods for 3D LiDAR odometry [16, 17]. However, they lose the original point cloud information (e.g., the 3D point coordinates as real numbers) via the rasterization, and their training is performed in a supervised manner, thus they have low scalablility for the emergently available 3D point cloud data as LiDAR becomes more popular.
In contrast to the aforementioned methods, we propose a deep LiDAR odometry network in both an unsupervised and a supervised manners. By incorporating traditional modelbased point errors into the deep architecture, we can build the unsupervised learning pipeline of LiDARbased odometry without groundtruth relative poses.
Iii Proposed Method
In this section, we describe the details of our approach. Our system is composed of feature networks (FeatNet) and a pose network (PoseNet
). FeatNets extract the feature vectors of consecutive frames and PoseNet estimates the relative motion of the frames from features. The networks can be trained in both a supervised and an unsupervised manner. A proper training strategy can be chosen according to the availability of groundtruth labels. The overall pipeline for training and inference is depicted in fig:network.
Iiia Input Representation
Before describing the details, we first explain the input representation given a LiDAR point cloud. To cope with the unordered characteristics of a LiDAR point cloud, we reformulate it using an image coordinateparameterized representation which unlike rasterized image (e.g., range image in [17]), preserves the 3D point information as real numbers. We employ projection function to project the 3D point cloud into 2D image plane on spherical coordinates. Each 3D point in a sensor frame is mapped onto the 2D image plane represented as
(1) 
where depth ; and are the horizontal (azimuth) and vertical (elevation) fieldofview, respectively; vertical fieldofview is composed of upper () and lower () parts. Here, and are the horizontal and vertical resolutions for pixel representation. If several 3D points are projected onto the same pixel coordinates, we choose the nearest point as a pixel value. We define the mapped representation as vertex map which has 2D coordinates and 3channel values as a 3D point. fig:input_repre shows an example point cloud on timestamp , corresponding vertex map , and normal map on frame .
We then assign a normal vector of each vertex adopting normal estimation methods in [2]. The normal vector of each vertex is computed by the nearest vertices in the vertex map. Because we already built a vertex map, extensive queries on a kdtree are not required. For reliable normal vector estimation, we discard the distance vertices from the center vertex. In this paper, we set the threshold range as 50cm and filtered simply by the depth values computed on vertex map generation.
To verify the frame representation , we compare the proposed representation with the existing rangebased representation which utilizes a point range and extra characteristics (e.g., intensity and height) as pixel values. fig:pcd_comparison shows the comparisons of reconstructed point clouds of each descriptions: vertex map (blue) and range map (orange) representation. The left plot is a sample frame of an urban scene (left), and a right plot represents a top view of a wall (plane). As in the enlarged view of wall, the range mapbased representation [17, 16] has offset of the reconstructed point clouds due to the angular discretization of the range image. On the other hand, the vertex map is represented by a raw point cloud, discretization offsets can be prevented.
IiiB Proposed Network
The proposed network is composed of two parts, as shown in fig:network: a vertex network (VertexNet), a normal network (NormalNet), and the pose networks (PoseNet). VertexNet and NormalNet use the vertex and normal maps as input in consecutive frames.
VertexNet is used to infer the scale information of motion. We use vertex maps of consecutive frames as input and embed the translational motion into the feature. NormalNet is configured to extract the rotation information between two frames. The output from both networks is represented as a feature vector size of 1024, and the sum of the two feature vectors is used as input for PoseNet, which is designed as fullyconnected networks that transfer features for metric information, and predicts translation and rotation separately.
VertexNet and NormalNet are designed based on residual blocks [25] with fully convolutional networks. For PoseNet, we construct the decoupled pose estimation composed of translation and rotation as quaternion .
IiiC Objective Losses
In this section, we introduce two types of objective losses: unsupervised and supervised. Both losses are selectively used according to the validity of the training data.
IiiC1 Unsupervised Loss
For unsupervised training, we integrate the ICP method into the deeplearning framework. Given the predicted relative motion , we define the orthogonal distance of the point correspondences as the loss value. For the correspondence search, projective data association is used to obtain point correspondences. Each vertex in the vertex map is transformed into a frame as , where is a transformation matrix. Next, the corresponding vertex and normal vectors are assigned via a projection function , the mathematical expression for which is
(2)  
(3) 
where and are corresponding vertex and normal vectors of on frame , respectively. Given the point correspondences, the ICP loss is
(4) 
where represents the sum of the normal distances of the point correspondences.
We also introduce fieldofview loss (FOV loss) which prevent divergence training to the out of fieldofview condition because although the ICP loss is essential for training convergence, additional regularization is needed for a stable training process. Because the ICP loss is zero when there are no correspondences, a naïve ICP loss may lead the network to a large relative motion which yields no correspondences. To avoid such cases, we used a penalty loss as a hardcounting loss of outofFOV points. The FOV loss is expressed as
(5) 
where represents the heaviside function and are the width and height of the vertex map, respectively. Finally, the overall unsupervised loss is obtained as
(6) 
where and are trainable scaling factors which balance the magnitude of each loss.
The characteristics of loss on motion perturbation is depicted in fig:loss_vals. A red cross sign () in each subplot is the loss value on the groundtruth relative pose. We simply add perturbation on translation fig:error_on_t and rotation fig:error_on_q, and track the unsupervised loss transitions on the motion errors. Each curve on translation and rotation has convex shape around groundtruth. This indicates that the tendency of loss supports the validity of the proposed loss on training.
IiiC2 Supervised Loss
In this section, we describe the supervised loss that can be applied when a groundtruth pose or a reference pose (visual odometry) is available. Similar to [26, 27], our network estimates relative rotation as quaternion form which has bounded magnitude of rotation . However, if quaternion subtraction is used as the training loss, normalization is not considered and thus the rotation difference is not reflected correctly. To overcome this issue, we transform the quaternion into the Euler angle in degree. In practice, we find that Euler transformed representation shows better performance and convergence of training. Finally, the supervised loss is expressed as follows,
(7) 
where is the translation loss with groundtruth , is the rotation loss with groundtruth , and represents trainable scale factors to balance the translation and rotation losses during training. We used L1norm as loss values.
fig:euler_mode clarifies the effect of the proposed rotation loss. We verified both the Euler format fig:euler_on and the quaternion format fig:euler_off with the same training setup. This plot shows the rotation error of each frame after training had finished. As can be seen, using the proposed representation method guides the network to learn the rotation better.
Iv Experimental Results
In this section, we evaluated the proposed supervised model (DeepLOSup) and unsupservised model (DeepLOUns) via both qualitative and quantitative comparisons using publicly available datasets, KITTI and Oxford RobotCar.
Iva Implementation and Training
The proposed network was implemented using PyTorch and trained with an NVIDIA GTX 1080ti. We employed the Adam solver
[28] with , , and . We started the training with an initial learning rate of and controlled it by a step scheduler with a stepsize of 20 and . The scaling factors were initialized with for automatic scale learning on loss functions eq:L_uns and eq:L_sup, and we set horizontal field of view and vertical field of view to process raw point clouds to the vertex map. The corresponding horizontal and vertical resolutions were and , and the size of the input vertex map was 720 52.Unlike supervised learning, unsupervised learning needs guided training at the start. The initial relative motion from the network has random values, thus we first trained the network with fixed motion beforehand; we employed a simple forward motion (1m moving forward with no rotation) for the first 20 iterations and then switched the training to unsupervised mode.
Sequence  0  1  2  3  4  5  

Proposed  DeepLOUns  1.90  0.80  37.83  0.86  2.05  0.81  2.85  1.43  1.54  0.87  1.72  0.92 
DeepLOSup  0.32  0.12  0.16  0.05  0.15  0.05  0.04  0.01  0.01  0.01  0.11  0.07  
Learningbased  Zhu et al. [23]  4.56  2.46  78.98  3.03  5.89  2.16  6.84  2.42  9.12  1.42  3.93  2.09 
SfMLearner [12]  66.35  6.13  35.17  2.74  58.75  3.58  10.78  3.92  4.49  5.24  18.67  4.10  
UnDeepVO [13]  4.41  1.92  69.07  1.60  5.58  2.44  5.00  6.17  4.49  2.13  3.40  1.50  
Modelbased  SuMa [9]  2.10  0.90  4.00  1.20  2.30  0.80  1.40  0.70  11.90  1.10  1.50  0.80 
6  7  8  9  10  
Proposed  DeepLOUns  0.84  0.47  0.70  0.67  1.81  1.02  6.55  2.19  7.74  2.84  
DeepLOSup  0.03  0.07  0.08  0.05  0.09  0.04  13.35  4.45  5.83  3.53  
Learningbased  Zhu et al. [23]  7.48  3.76  3.13  2.25  4.81  2.24  8.84  2.92  6.65  3.89  
SfMLearner [12]  25.88  4.80  21.33  6.65  21.90  2.91  18.77  3.21  14.33  3.30  
UnDeepVO [13]  6.20  1.98  3.15  2.48  4.08  1.79  7.01  3.61  10.63  4.65  
Modelbased  SuMa [9]  1.00  0.60  1.80  1.20  2.50  1.00  1.90  0.80  1.80  1.00 
IvB Evaluation with the KITTI Dataset
We evaluated our method on the wellknown odometry datasets of KITTI Vision Benchmark [29]. The KITTI dataset contains 3D point clouds from Velodyne HDL64E with groundtruth global 6D pose. This dataset has 10 sequences from different environments having dynamic objects (e.g., urban, highways, and streets).
Training details. Similar to previously reported learningbased odometry methods [13, 23], we used sequences 0008 for the training and 0910 for the test. In addition, we verified our method via both training strategies (i.e., the supervised and the unsupervised).
Evaluation. fig:kitti_result shows the trajectory comparisons of the proposed methods with different strategies. The performance of the trajectories represents the soundness of the network fit (training: 0008) and the generality of our method (test: 0910). Note that all of the tests had the same parameter settings on both learning models: DeepLOSup and DeepLOUns.
The trajectories from the training sets (00 to 08) of DeepLOSup showed wellfitted result to groundtruth, but the performance was relatively low with the test sequences. However, with DeepLOUns, the results with both the training and test sequences conveyed similar performance. This finding indicates that unsupervised learning attained a better performance for the generality aspect.
tab:kitti contains the details of the results; the average translation and rotation RMSE drift on length of 100800m. We evaluated our method against several previously reported ones, namely UndeepVO [13], SfMLeaner [12], and Zhu’s method [23]; the result values for these were taken from the result in [23]. We also compared our method to SuMa [9] which is recent modelbased SLAM using LiDAR measurements. The values of SuMa are referenced from frametoframe estimation results.
Compared to learningbased methods, we can see that our method gives better results for learning data sets. Unlike DeepLOSup, however, DeepLOUns showed a large translation error in sequence 01. Sequence 01 (highway) includes has dynamic objects, fewer structures and large translational motions than other sequences. Thus, it is difficult to capture true translation with the unsupervised loss which relies on geometric consistency of structures. This aspect also can be seen in other learningbased methods. For the test sequences, DeepLOUns performed better than other learningbased methods when DeepLOSup was slightly worse than DeepLOUns for the test sequences. This is because DeepLOSup is overfitted to the training sets, and it is expected that better results will be obtained by adding more validation sets to prevent overfitting. Compared to SuMa, DeepLoUns showed excellent performance for learning data because it repeatedly learns the geometric and motion characteristics of the dataset. To achieve better performance in test sequences, we could train more sequences with various motions and guide the training loss with robust kernels which reduce the effects of dynamic objects.
IvC Evaluation with the Oxford RobotCar Dataset
The Oxford RobotCar dataset [3] comprises data collected by repeating the same path dozens of times for longterm autonomy researches.
Dataset preparation.
Because this dataset uses a pushbroom style 2D LiDAR, information from a single scan is not enough to train a network and infer the robot pose. Therefore, we made a submap with sufficient length (80m in our work) of accumulated 2D scans using each scan pose interpolated from the
INS (INS) data. We determined that the groundtruth pose corresponding to the submap was the interpolated global pose (in the world frame) of a center scan. The 3D point coordinates of the submap are represented in the robot frame where the groundtruth global pose of the submap was considered as the origin. Each submap as a 3D scan is sampled per every 12m using a truncated normal distribution.
Training details. The Oxford Robotcar dataset can be divided into three types: Long, Alternate and Short. Since the focus of the Oxford RobotCar dataset is on seasonal diversity, we used Long sequences (20150203084510 and 20150310141810) as training data and the other two types (Alternate and Short
) as test data. It is notable that the Oxford RobotCar dataset has less trajectory diversity than the KITTI dataset, thus we applied transfer learning to secure network generality and improve learning stability, and we used the weight of the network learned by the KITTI dataset as the initial weight. We found that this strategy significantly improved the training phase by enabling the learning for the dataset with less diversity for training.
Evaluation. We compared our method with the groundtruth trajectory and stereo visual odometry given in the Oxford RobotCar dataset. Since all trajectories have different framerate, we evaluated each method using the trajectory evaluation method [30].
To compute relative errors, subtrajectory segments of tested methods are selected along different travel distances. Each subtrajectory is aligned using the first state, and the error are calculated for all the subtrajectories. fig:oxford_result shows the aligned trajectories using the entire trajectory statement. The figure includes the trajectory comparisons to the groundtruth and corresponding error plots (relative translation error (%) and heading error (deg)).
For trajectory alignment, transformation was estimated and applied to the tested methods. Each trajectory in fig:oxford_result represents the results on the learned sequence (fig:oxford_result(a)) and test sequences (fig:oxford_result(b) and (c)), and below each trajectory is the quantitative result compared to the groundtruth. The trajectories might be transformed from the initial position due to the alignment. We compared our methods with the stereo odometry provided by the dataset.
Absolute Trajectory Error (RMSE)  

Datetime  Type  StereoVO [3]  DeepLOS  DeepLOU  

Alternate  37.74  14.71  19.93  


34.55  12.89  22.93  

Alternate  36.09  16.94  12.80  

Short  4.22  9.53  6.78 
In the case of the training sequence, the stereo trajectory was not accurate enough to be used as the baseline and was thus excluded from the comparison. As can be seen, our methods showed stable and comparable performances for all of the sequences with the trained networks being able to capture the relative motion of test sets. Note that our proposed models achieved performances that were better or close to StereoVO [3].
tab:oxford represents absolute trajectory errors (ATE) on test sequences. We compared translation error (m) of all trajectories. First two sequences of the table are Alternate and Alternate with reverse direction which is not introduced in fig:oxford_result. This evaluation quantifies the quality of the whole trajectory. Since ATE is sensitive to the initial results on the path, interesting results can be seen compared to relative errors. Looking at the results for sequence 20140625162215, we found that the error at the beginning of DeepLOUns and DeepLOSup also affected the absolute trajectory.
V Conclusion
We demonstrated a novel learningbased LiDAR odometry estimation pipeline in an unsupervised and a supervised manner. We suggested surfellike representations (vertex and normal map) as network inputs without precision loss. We showed that the proposed unsupervised loss could capture the geometric consistency of point clouds. To the best of our knowledge, ours is the first unsupervised approach for deeplearningbased LiDAR odometry. In addition, our method showed prominent performance compared to other learningbased or modelbased methods in various environments. We also derived training adaptation via transfer learning in heterogeneous environments. In future work, we plan to design the networks and loss functions for large and fast motion such as in sequence 01
of the KITTI dataset and extend our framework to sequential approaches with recurrent neural networks.
References
 Cadena et al. [2016] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robustperception age,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016.
 Serafin and Grisetti [2015] J. Serafin and G. Grisetti, “NICP: Dense normal based point cloud registration,” in Proc. IEEE/RSJ Intl. Conf. on Intell. Robots and Sys., 2015, pp. 742–749.
 Maddern et al. [2017] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The Oxford RobotCar dataset,” Intl. J. of Robot. Research, vol. 36, no. 1, pp. 3–15, 2017.
 MurArtal et al. [2015] R. MurArtal, J. M. M. Montiel, and J. D. Tardos, “ORBSLAM: a versatile and accurate monocular SLAM system,” IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1163, 2015.
 Forster et al. [2014] C. Forster, M. Pizzoli, and D. Scaramuzza, “SVO: Fast semidirect monocular visual odometry,” in Proc. IEEE Intl. Conf. on Robot. and Automat., 2014, pp. 15–22.
 Engel et al. [2018] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Trans. Pattern Analysis and Machine Intell., vol. 40, no. 3, pp. 611–625, 2018.
 Bosse et al. [2012] M. Bosse, R. Zlot, and P. Flick, “Zebedee: Design of a springmounted 3D range sensor with application to mobile mapping,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1104–1119, 2012.
 Zhang and Singh [2014] J. Zhang and S. Singh, “LOAM: Lidar Odometry and Mapping in Realtime,” in Proc. Robot.: Science & Sys. Conf., Berkeley, USA, July 2014.
 Behley and Stachniss [2018] J. Behley and C. Stachniss, “Efficient SurfelBased SLAM using 3D Laser Range Data in Urban Environments,” in Proc. Robot.: Science & Sys. Conf., Pittsburgh, Pennsylvania, June 2018.

Wang et al. [2017]
S. Wang, R. Clark, H. Wen, and N. Trigoni, “DeepVO: Towards endtoend visual odometry with deep recurrent convolutional neural networks,” in
Proc. IEEE Intl. Conf. on Robot. and Automat., 2017, pp. 2043–2050. 
Zhou et al. [2018]
H. Zhou, B. Ummenhofer, and T. Brox, “DeepTAM: Deep tracking and mapping,”
in
Proc. European Conf. on Comput. Vision
, 2018, pp. 851–868.  Zhou et al. [2017] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and egomotion from video,” in Proc. IEEE Conf. on Comput. Vision and Pattern Recog., vol. 2, no. 6, 2017, pp. 1851–1860.
 Li et al. [2018a] R. Li, S. Wang, Z. Long, and D. Gu, “UnDeepVo: Monocular visual odometry through unsupervised deep learning,” in Proc. IEEE Intl. Conf. on Robot. and Automat., 2018, pp. 7286–7291.
 Qi et al. [2017] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” 2017, pp. 652–660.
 Lee [2018] M. A. U. G. H. Lee, “PointNetVLAD: Deep point cloud based retrieval for largescale place recognition,” Proc. IEEE Conf. on Comput. Vision and Pattern Recog., pp. 4470–4479, 2018.
 Nicolai et al. [2016] A. Nicolai, R. Skeele, C. Eriksen, and G. A. Hollinger, “Deep learning for laser based odometry estimation,” in RSS workshop Limits and Potentials of Deep Learning in Robotics, 2016.
 Velas et al. [2018] M. Velas, M. Spanel, M. Hradis, and A. Herout, “CNN for IMU assisted odometry estimation using velodyne LiDAR,” in Proc. Intl. Conf. Aut. Rob. Sys. and Comp., 2018, pp. 71–77.
 Whelan et al. [2016] T. Whelan, R. F. SalasMoreno, B. Glocker, A. J. Davison, and S. Leutenegger, “ElasticFusion: Realtime dense SLAM and light source estimation,” Intl. J. of Robot. Research, vol. 35, no. 14, pp. 1697–1716, 2016.
 Besl and McKay [1992] P. J. Besl and N. D. McKay, “A method for registration of 3D shapes,” IEEE Trans. Pattern Analysis and Machine Intell., vol. 14, no. 2, pp. 239–256, 1992.
 Rusinkiewicz and Levoy [2001] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” in Proc. IEEE Intl. Conf. on 3D Imaging, Modeling, Processing, Visualization and Transmission, 2001, pp. 145–152.
 Segal et al. [2009] A. Segal, D. Haehnel, and S. Thrun, “GeneralizedICP,” in Proc. Robot.: Science & Sys. Conf., Seattle, USA, June 2009.
 Luo et al. [2018] C. Luo, Z. Yang, P. Wang, Y. Wang, W. Xu, R. Nevatia, and A. Yuille, “Every pixel counts++: Joint learning of geometry and motion with 3D holistic understanding,” arXiv preprint arXiv:1810.06125, 2018.
 Zhu et al. [2018] A. Z. Zhu, W. Liu, Z. Wang, V. Kumar, and K. Daniilidis, “Robustness meets deep learning: An endtoend hybrid pipeline for unsupervised learning of egomotion,” arXiv preprint arXiv:1812.08351, 2018.
 Li et al. [2018b] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “PointCNN: Convolution On XTransformed Points,” in Advances in Neural Information Processing Sys. Conf., 2018, pp. 828–838.
 He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. on Comput. Vision and Pattern Recog., 2016, pp. 770–778.
 Kendall et al. [2015] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for realtime 6dof camera relocalization,” in Proc. IEEE Intl. Conf. on Comput. Vision, 2015, pp. 2938–2946.
 Valada et al. [2018] A. Valada, N. Radwan, and W. Burgard, “Deep auxiliary learning for visual localization and odometry,” in Proc. IEEE Intl. Conf. on Robot. and Automat., 2018, pp. 6939–6946.
 Kingma and Ba [2014] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 Geiger et al. [2012] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” in Proc. IEEE Conf. on Comput. Vision and Pattern Recog., 2012.
 Zhang and Scaramuzza [2018] Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual (inertial) odometry,” in Proc. IEEE/RSJ Intl. Conf. on Intell. Robots and Sys., 2018, pp. 7244–7251.
Comments
There are no comments yet.