I Introduction
Robotic manipulation in unstructured environments is an unsolved research problem [1, 2], and recent work has led to the development of systems capable of perceiving the world and extracting taskrelevant information (e.g. object identity and pose) useful for manipulation and planning [3, 4, 5, 6]
. As a result, today’s stateoftheart systems are making substantial progress towards coarse manipulation (e.g. picking and sorting) but still struggle with manipulation tasks that demand tight tolerance (
) in unstructured environments.To address this, we present an objectrecognition and poseestimation algorithm — SegICPDSR, that works by (a) fusing a series of RGBD observations into a dense, semanticallylabeled point cloud, then (b) generating candidate point clouds from existing CAD models by raycasting, and finally (c) performing modeltoscene registration via iterative closest point (ICP) registration using a modeltoscene correspondence metric [6] (see Fig. 1 and 2). SegICPDSR
provides high accuracy, low variance pose estimates in unstructured environments, because it attenuates sensor noise by smoothing successive measurements and handles dynamic scenes where obstacles may become occluded due to object or camera motion.
In our testcases, SegICPDSR achieves level position accuracy and standard deviation () and degreelevel orientation accuracy and standard deviation (), which enables tight tolerance manipulation in unstructured environments.
Ii Related Work
Accurate pose estimation in cluttered, unstructured environments is challenging, and many successful manipulation techniques (e.g. visual servoing) bypass the problem entirely. Several recent manipulation approaches have shown great success by avoiding an explicit representation of object pose and directly mapping raw sensor observations to motor behavior [7, 8, 9]. However, these approaches do not provide the semantic expressiveness required for symbolic task planners [10], which are convenient for complex, longhorizon, multistep manipulation tasks. Furthermore, object recognition and pose estimation provide an elegant, compact representation of highdimensional sensor input. In this regard, we focus primarily on approaches that first solve the objectidentification and poseestimation problem and subsequently tackle manipulation.
Iia Point Cloud Registration
Recent efforts in pose estimation have contributed to faster and more precise architectures for locating objects of interest in cluttered scenes. Among these various approaches are point cloud registration algorithms [11, 12, 13, 14, 5, 6], which are prone to converge to local minima and are brittle to partial or full occlusions. Global registration schemes have been proposed, but are not yet realtime, do not scale [15], or require additional information (e.g. surface normals) [16] which can degrade the solution at specific perspectives. To counter these shortcomings, several solutions have been proposed which first segment the scene point cloud, then use a local registration method (usually ICP) for modeltoscene alignment with either a prior pose estimate or the segmentation centroid as a seed [5, 6]. These approaches also fuse multiple observations from different viewpoints to improve the pose estimate and can run in realtime (), but their pose estimation errors still plateau at , which is insufficient for tighttolerance robotic manipulation tasks such as an openloop peginhole insertion (note that closedloop manipulation strategies employing visual or force feedback may also be used to further improve performance). For this reason, we investigate combining semantic segmentation with dense scene reconstruction to further reduce pose estimation errors and variance by increasing the density and accuracy of the semanticallylabeled point clouds used for registration.
IiB Dense Scene Reconstruction
A number of realtime dense visual SLAM (simultaneous localization and mapping) systems have been developed [17, 18, 19, 20, 21, 22, 23, 24, 25]. While many of these approaches scale well, they have poor performance with locally loopy trajectories [21], perform no explicit map reconstruction, or require pose graph optimization steps and merging key surface element (surfel) views to create the final map [22]. Systems that place large emphasis on the accuracy of the reconstructed map over estimated trajectories are offline [23]. Thus, for our application, we elect to begin with ElasticFusion [24, 25] because of its realtime operation and mapcentric approach, which avoids postprocessing or pose graph optimization and aims to use surface prediction at every frame for simultaneous dense mapping and camera pose estimation. ElasticFusion has found numerous recent applications, such as SemanticFusion, a system that produces semantically fused dense reconstruction [26]. SemanticFusion’s
architecture leveraged a convolutional neural network for segmentation,
ElasticFusionfor visual SLAM and frametoframe correspondences, and a Bayesian update scheme to track object class probabilities per surfel. In a similar approach,
DARNN [27], a 3D semantic mapping system, has demonstrated high performance using KinectFusion [20] to produce 3D semantic scenes.Additionally, LabelFusion has demonstrated the use of dense scene reconstruction to generate large datasets of labeled segmentation and pose images—specifically a million object instances in multiobject scenes [28]. Contrary to this approach, where a handaligned object pose is used to generate segmentation, we are interested in the reverse problem: high accuracy object pose estimation using online segmentation.
Iii Technical Approach
We present a novel architecture (shown in Figure 2) that improves upon SegICP [6]—explicitly tackling the following shortcomings: (1) subpar RGBD calibration and (2) segmentation errors, both of which result in poseestimation errors due to incorrect points being included in the local registration step, (3) sensor noise which leads to large variance in the pose estimates, and (4) registration errors caused by modelcrops which include points not present in the observed point cloud.
To address the subpar RGBD calibration, we developed an automated method to simultaneously calibrate the extrinsic and intrinsic parameters of our color and depth cameras, which resulted in a pixel residual error (Section IIIA). Next, we adversarially trained a semantic segmentation convolutional neural net (CNN) to further increase segmentation accuracy (Section IIIB). Exploiting this precise calibration and segmentation performance, we replaced the pixelintensitybased photometric error term in ElasticFusion [24] with the semantic label, resulting in a realtime, labeled, dense, scene reconstruction (Section IIIC). Using realistic model point clouds generated by raycasting (Section IIID), we then performed modeltoscene registration to acquire each object’s pose estimate, while leveraging a timehistory of viewpoints for more accurate and lower variance pose estimation.
Iiia Simultaneous Intrinsic and Extrinsic Calibration
Our automated data collection and optimization procedure to calibrate the Kinect1 RGB and IR cameras works as follows. First we collect a set of measured observation pairs, consisting of the robot arm end effector position computed from forward kinematics and the four corner pixel locations of the calibration target as measured by an Aruco tag detector [29]. Next we fit this data with a first order BrownConrady distortion [30] pinhole model using LBFGSB [31] with bounds centered around the initial parameter estimates for each camera by optimizing for projection and distortion parameters that minimize pixelwise projection error in image space. The minimization is given by:
(1) 
where is the camera extrinsic matrix;
is the vector of radial and tangential distortion coefficients;
are the focal lengths and principal points, respectively; and is the projection of (the position of the target corners as measured by forward kinematics) into the unrectified image using the model defined by the minimization parameters. After applying this procedure, our Kinect1 camera achieved a average pixel error over a calibration dataset of RGB and IR images (the results of which are shown in Fig. 3).IiiB Segmentation via Adversarial Networks
Segmentation plays an important role in our overall framework. It crops the scene point cloud to obtain only the points corresponding to the objects of interest, which reduces the number of local minima available to ICP. Thus, as prior results indicate, our pose accuracy is directly related to the quality of segmentation [6]. To improve our segmentation performance (measured as intersection over union or IOU), we investigated various stateoftheart network architectures and trained them adversarially [32]. Previous results indicate that training two networks (one as a semantic segmentor and another as an adversarial discriminator) leads to increased labelling accuracy [33].
Table I
outlines the network architectures we tested for better segmentation quality. These networks were all trained and evaluated using PyTorch
^{1}^{1}1PyTorch is a tensors and dynamic neural networks Python package with strong GPU acceleration
http://pytorch.org. with a dataset of multiobject images in the automotive maintenance domain.IOU Metric  Parameters  Forward pass*  

SegNetA  M  /  
DeepLab [34]  M  /  
SegNet [35]  M  / 
We found that training segmentors adversarially increased the segmentation quality, even between networks with identical architectures—which is in agreement with [33]. As such, for SegICPDSR, we use an adversarially trained encoderdecoder SegNetA architecture for segmentation.
IiiC Fusing Consistent Segmentation Frames
ElasticFusion uses trajectories of raw RGBD frames as input and provides an estimated camera pose in addition to a dense reconstruction of the scene (by minimizing the cost which enforces constant intensity at each pixel based on the sum of the RGB values). Our goal is to instead create a labeled, dense, unorganized point cloud to be used for efficient segmentation and registration (as shown in Figure 4). The SemanticFusion [26] system does by using raw RGBD camera frames as input to ElasticFusion and relying on a Bayesian update scheme to generate the final map with the output of ElasticFusion and a segmentor. Instead, we directly provide ElasticFusion with a labeled point cloud, rather than a colorized (RGB) point cloud, avoiding the Bayesian update step.
Therefore, SegICPDSR aims to find the motion parameters to minimize the difference between the latest segmentation label for each measured point and the label of the transformed active model of the last frame , where is the transformed point corresponding to from the active model. For simplicity in the formulation, we substituted (see [24] for details).
We represent this cost over the label differences as , which we use instead of the existing photometric error in our ElasticFusion implementation [24]:
(2) 
where the term uses the output of SegNet to assign integer values to pixels according to their semantic label. The cost encourages the reconstruction algorithm to provide correspondences between frames directly in the space of segmentation labels given by , instead of intensity. As such, this allows us to bypass the Bayesian update scheme and directly take the raw output of ElasticFusion as our semanticallylabeled point cloud without an extra update step.
Scene Cropping. In order to obtain object cropped scene clouds for registration, we iterate through all points in the reconstructed point cloud and select the semantic labels corresponding to each of the objects, which is an efficient operation. The dense reconstruction output typically has fewer points than a singleframe Kinect1 point cloud. The Kinect1 point cloud at resolution has points whereas typical dense scene reconstructions contain to points (see Fig. 4). This means that the perobject scene cropping is faster on the dense reconstruction than the raw sensor point cloud.
IiiD Generating Realistic Candidate Point Clouds
Pointtopoint registration of observed point clouds to points generated from a CAD model is challenging because the ICP optimization contains many local minima. Many of these minima arise if extraneous points are included in the generated model point cloud, which do not have corresponding points in the observed point cloud due to occlusions or the camera viewpoint. If the ICP leastsquares minimization finds one of these minima, it results in registration errors as seen in Figure 5. Because of this, we can not use a trivial modelcropping strategy to generate these point clouds; instead, we use a VTK implementation of raycasting [36] to render plausible model point clouds, providing sets of visible points for the object at various azimuths and elevations around the object mesh (see Fig. 5). This approach is robust to objects with concave components (e.g. funnels) and removes many of the undesirable local minima in the registration step. In addition, raycasting allows us to conveniently control the point density of individual object clouds, which also contributes to better registration solutions.
Additionally, by dynamically ignoring candidate model crops based on their euclidean and quaternion metric distance, we achieve a – registration speedup by reducing the number of ICP registrations which must be performed at each timestep.
Iv Experiments
We performed three experiments to evaluate the performance of SegICPDSR as compared to SegICP [6] on realworld, tighttolerance manipulation tasks, all on physical hardware: either on a calibrated Kinect1 RGBD sensor or on our integrated KUKA robotic system (a KUKA LBR iiwa14 arm mounted onto a torso and equipped with a Robotiq 2F85 Adaptive Gripper, DirectPerception PTUE46 head, and Microsoft Kinect1 RGBD sensor).
These experiments (illustrations in Figure 6) are aimed at evaluating both perception accuracy and variance as well as our ability to perform tighttolerance manipulation in the real world. They are as follows:


– Experiment 1: LabelFusion dataset. Benchmarking accuracy and variance of SegICPDSR vs. SegICP on a frame video generated by LabelFusion^{2}^{2}2LabelFusion is a pipeline to rapidly generate annotated segmentation and pose RGBD datasets http://labelfusion.csail.mit.edu [28].
– Experiment 2: Tighttolerance grasping. Sub level accuracy is required for successfully grasping the oilfiller cap whose radius is (with gripper width ).
– Experiment 3: Tighttolerance, semirigid peginhole insertion. This is a sub cmlevel precision insertion task with a funnel tip radius of and engine hole radius of .
Experiment Results
We perform the following experiments in order to confirm the accuracy and stability of our proposed SegICPDSR approach, integrate it in the loop with modelbased manipulation, and explore its limitations in realworld manipulation tasks. The results of these experiments are as follows:
– Experiment 1: LabelFusion dataset. In order to generate a large dataset for evaluation, we used LabelFusion [28] to create various multiframe videos with a motorcycle engine in the scene. LabelFusion produces annotated segmentation and pose data by first building a dense scene reconstruction of a trajectory around the object with ElasticFusion, registering the model to the dense point cloud, and transforming the object poses into each individual frame in the camera trajectory. We benchmark both SegICP and SegICPDSR against this frame LabelFusion dataset using a raycasting resolution of .
Our results are illustrated in Figure 7 with histograms of the pose estimation results and the success measure indicated in red. SegICPDSR outperformed SegICP considerably in both accuracy and variance. SegICPDSR had an average position error of , and an average angle error of , with the angle error defined as , while successfully identifying the pose in of images (with metric and [6]). SegICPDSR achieves notably higher success than SegICP due to reconstructive segmentation, which leverages a time history of segmentations; whereas SegICP is brittle to instantaneous false detections of the segmentor (SegNet) and noise in the depth map.
For this dataset, SegICPDSR achieves increase in accuracy and increase in success as compared to SegICP.


Success  Successful poses  Euclidean std. (success)  Euler std. (success)  Euclidean std. (failure)  Euler std. (failure)  

SegICP [6]  
SegICPDSR 
– Experiment 2: Tighttolerance grasping. We demonstrate the accuracy of SegICPDSR by grasping and removing the oil filler cap from a motorcycle engine (see Fig. 6). Using only our pose estimate, knowledge of the object geometry, and inverse kinematics, we compute a motion plan for this grasping task. We randomly placed the engine in arbitrary poses within reach of the robot, first running SegICP, then SegICPDSR to acquire a time history of pose estimates for comparison. As SegICP did not provide a high enough quality pose for successful grasping, we only performed grasps with SegICPDSR. First, we directly use our pose estimate to derive the filler cap DOF pose and solve for a trajectory to grasp the cap using Drake^{3}^{3}3Drake is a planning, control, and analysis toolbox for nonlinear dynamical systems http://drake.mit.edu.. Of the twenty trials, the robot successfully performed grasps. The trial engine poses are illustrated in Figure 8 (colored as green and red to indicate successes and failures, respectively), along with statistics for the pose estimates from both SegICP and SegICPDSR. SegICPDSR provided pose estimates which were more accurate than SegICP with an average range (difference between max and min) for each component of the pose estimate () that was also similarly smaller for SegICPDSR. The range and standard deviation reported are over multiple observations for each unique engine pose.
– Experiment 3: Tighttolerance, peginhole insertion. We categorize the accuracy of our pose estimation in the loop with a difficult, tighttolerance insertion task. The task is to insert a plastic blue funnel (with a tip radius of ) into the engine fill port ( radius), which places a sub tolerance on the poseestimation error if the robot is to successfully execute an insertion without distorting the funnel or moving the engine.
We executed insertions on the KUKA iiwa arm by first estimating the engine’s pose and finding a kinematic trajectory relative to the estimated object frame; these consisted of ten randomly positioned engine poses in reach of the robot. For each pose, we attempted to insert the funnel into the engine fill port five times each with SegICP and SegICPDSR. The results are indicated in Figure 9. Again, the detected poses using SegICPDSR exhibited smaller standard deviation (SegICPDSR euclidean std: , euler std: compared to SegICP euclidean std: , euler std: ), which resulted in higher success rates () for the insertion task compared to SegICP (). High variability in the pose estimation of SegICP appears correlated to the trials that exhibited insertion failures, given by euclidean std. and euler std.
Success measure. We categorized a successful insertion by the blue funnel’s tip being completely inside the engine fill port ( depth), see Figure 6. If the arm pushes the engine considerably due to poor pose estimation and the tip of the funnel collides with the edge of the fill port, it is categorized as a failure. Successful poses are then the unique engine poses (out of the possible ten) that resulted in at least () successful insertions.
V Conclusion
We present a realtime, dense, semantic scene reconstruction and pose estimation system, SegICPDSR, that achieves level pose accuracy and standard deviation ( and ), enabling tighttolerance manipulation in unstructured environments. We acquired these results on thousands of video frames and demonstrated that with accurate perception alone, we can solve for kinematic trajectories using SegICPDSR pose estimates to accomplish tighttolerance manipulation tasks with high probabilities for success ( successful tighttolerance insertions). Because SegICPDSR operates over a fused, dense scene reconstruction, it produces a smoothed pose estimate between consecutive frames (overall euclidean std: , euler std: ) without additional filtering.
We are actively investigating more sophisticated registration methods (e.g. local pointtomesh and global registration) to further increase our pose estimation accuracy. The availability of realtime, high quality pose estimation improves the applicability and utility of advanced task planners [10] and motion planners which incorporate dynamics [37, 38]. Accurate initial pose estimates can serve as a starting point for local feature tracking or learned controllers [9, 7] — ultimately we are interested in a combination of these methods in order to solve more complex manipulation tasks in unstructured environments.
References
 [1] G. Pratt and J. Manzo, “The DARPA Robotics Challenge,” IEEE Robotics & Automation Magazine, vol. 20, no. 2, pp. 10–12, 2013.
 [2] P. R. Wurman and J. M. Romano, “The Amazon Picking Challenge 2015,” IEEE Robotics and Automation Magazine, vol. 22, no. 3, pp. 10–12, 2015.
 [3] M. Fallon, S. Kuindersma, S. Karumanchi, M. Antone, T. Schneider, H. Dai, C. P. D’Arpino, R. Deits, M. DiCicco, D. Fourie et al., “An Architecture for Online Affordancebased Perception and Wholebody Planning,” Journal of Field Robotics, vol. 32, no. 2, pp. 229–254, 2015.
 [4] R. Jonschkowski, C. Eppner, S. Höfer, R. MartínMartín, and O. Brock, “Probabilistic multiclass segmentation for the Amazon Picking Challenge,” in International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1–7.

[5]
A. Zeng, K.T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao, “Multiview selfsupervised deep learning for 6D pose estimation in the Amazon Picking Challenge,” in
International Conference on Robotics and Automation (ICRA), 2017, pp. 1386–1383.  [6] J. M. Wong, V. Kee, T. Le, S. Wagner, G.L. Mariottini, A. Schneider, L. Hamilton, R. Chipalkatty, M. Hebert, D. Johnson et al., “SegICP: Integrated Deep Semantic Segmentation and Pose Estimation,” in International Conference on Intelligent Robots and Systems (IROS), 2017.

[7]
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “Endtoend training of deep
visuomotor policies,”
Journal of Machine Learning Research
, vol. 17, no. 39, pp. 1–40, 2016.  [8] J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kröger, J. Kuffner, and K. Goldberg, “DexNet 1.0: A CloudBased Network of 3D Objects for Robust Grasp Planning Using a MultiArmed Bandit Model with Correlated Rewards,” in International Conference on Robotics and Automation (ICRA), 2016, pp. 1957–1964.
 [9] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “DexNet 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics,” arXiv preprint:1703.09312, 2017.
 [10] L. P. Kaelbling and T. LozanoPérez, “Integrated task and motion planning in belief space,” The International Journal of Robotics Research, vol. 32, no. 910, pp. 1194–1227, 2013.
 [11] P. J. Besl and N. D. McKay, “A Method for Registration of 3D Shapes,” in RoboticsDL tentative. International Society for Optics and Photonics, 1992, pp. 586–606.
 [12] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” in International Conference on 3D Digital Imaging and Modeling. IEEE, 2001, pp. 145–152.

[13]
C. Yuan, X. Yu, and Z. Luo, “3D point cloud matching based on principal component analysis and iterative closest point algorithm,” in
International Conference on Audio, Language and Image Processing (ICALIP), 2016, pp. 404–408.  [14] T. Schmidt, R. A. Newcombe, and D. Fox, “DART: Dense Articulated RealTime Tracking,” in Robotics: Science and Systems, 2014.
 [15] G. Izatt and R. Tedrake, “Globally Optimal Object Pose Estimation in Point Clouds with MixedInteger Programming.”

[16]
Q.Y. Zhou, J. Park, and V. Koltun, “Fast Global Registration,” in
European Conference on Computer Vision
. Springer, 2016, pp. 766–782.  [17] S. M. Seitz and C. R. Dyer, “Photorealistic scene reconstruction by voxel coloring,” International Journal of Computer Vision, vol. 35, no. 2, pp. 151–173, 1999.

[18]
R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single
moving camera,” in
Conference on Computer Vision and Pattern Recognition (CVPR)
. IEEE, 2010, pp. 1498–1505.  [19] A. Geiger, J. Ziegler, and C. Stiller, “StereoScan: Dense 3D Reconstruction in Realtime,” in Intelligent Vehicles Symposium (IV), 2011, pp. 963–968.
 [20] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “KinectFusion: RealTime Dense Surface Mapping and Tracking,” in International Symposium on Mixed and Augmented Reality (ISMAR), 2011, pp. 127–136.
 [21] T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Realtime largescale dense RGBD SLAM with volumetric fusion,” The International Journal of Robotics Research, vol. 34, no. 45, pp. 598–626, 2015.
 [22] J. Stückler and S. Behnke, “Multiresolution surfel maps for efficient dense 3D modeling and tracking,” Journal of Visual Communication and Image Representation, vol. 25, no. 1, pp. 137–147, 2014.
 [23] Q.Y. Zhou and V. Koltun, “Dense scene reconstruction with points of interest,” ACM Transactions on Graphics (TOG), vol. 32, no. 4, p. 112, 2013.
 [24] T. Whelan, S. Leutenegger, R. SalasMoreno, B. Glocker, and A. Davison, “ElasticFusion: Dense SLAM without a pose graph.” Robotics: Science and Systems, 2015.
 [25] T. Whelan, R. F. SalasMoreno, B. Glocker, A. J. Davison, and S. Leutenegger, “ElasticFusion: Realtime dense SLAM and light source estimation,” The International Journal of Robotics Research, vol. 35, no. 14, pp. 1697–1716, 2016.
 [26] J. McCormac, A. Handa, A. Davison, and S. Leutenegger, “SemanticFusion: Dense 3D semantic mapping with convolutional neural networks,” in International Conference on Robotics and Automation (ICRA), 2017, pp. 4628–4635.
 [27] Y. Xiang and D. Fox, “DARNN: Semantic Mapping with Data Associated Recurrent Neural Networks,” arXiv preprint:1703.03098, 2017.
 [28] P. Marion, P. R. Florence, L. Manuelli, and R. Tedrake, “A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes,” arXiv preprint:1707.04796, 2017.
 [29] S. GarridoJurado, R. M. noz Salinas, F. MadridCuevas, and M. MarínJiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280 – 2292, 2014.
 [30] D. C. Brown, “Closerange camera calibration,” Photogrammetric Engineering, vol. 37, no. 8, pp. 855–866, 1971.
 [31] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, “Algorithm 778: LBFGSB: Fortran subroutines for largescale boundconstrained optimization,” ACM Transactions on Mathematical Software (TOMS), vol. 23, no. 4, pp. 550–560, 1997.
 [32] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [33] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic Segmentation using Adversarial Networks,” arXiv preprint:1611.08408, 2016.
 [34] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs,” arXiv preprint:1606.00915, 2016.
 [35] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional EncoderDecoder Architecture for Image Segmentation,” arXiv preprint:1511.00561, 2015.
 [36] W. J. Schroeder, B. Lorensen, and K. Martin, The Visualization Toolkit: An ObjectOriented Approach to 3D Graphics. Kitware, 2004.
 [37] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained Unscented Dynamic Programming,” 2017.
 [38] Z. Manchester and S. Kuindersma, “DIRTREL: Robust Trajectory Optimization with Ellipsoidal Disturbances and LQR Feedback,” 2017.
Comments
There are no comments yet.