SLAMBooster: An Application-aware Controller for Approximation in SLAM

Simultaneous Localization and Mapping (SLAM) is the problem of constructing a map of an agent's environment while localizing or tracking the mobile agent's position and orientation within the map. Algorithms for SLAM have high computational requirements, which has hindered their use on embedded devices. Approximation can be used to reduce the time and energy requirements of SLAM implementations as long as the approximations do not prevent the agent from navigating correctly through the environment. Previous studies of approximation in SLAM have assumed that the entire trajectory of the agent is known before the agent starts to move, and they have focused on offline controllers that use features of the trajectory to set approximation knobs at the start of the trajectory. In practice, the trajectory is not usually known ahead of time, and allowing knob settings to change dynamically opens up more opportunities for reducing computation time and energy. We describe SLAMBooster, an application-aware online control system for SLAM that adaptively controls approximation knobs during the motion of the agent. SLAMBooster is based on a control technique called hierarchical proportional control but our experiments showed this application-agnostic control led to an unacceptable reduction in the quality of localization. To address this problem, SLAMBooster exploits domain knowledge: it uses features extracted from input frames and from the estimated motion of the agent in its algorithm for controlling approximation. We implemented SLAMBooster in the open-source SLAMBench framework. Our experiments show that SLAMBooster reduces the computation time and energy consumption by around half on the average on an embedded platform, while maintaining the accuracy of the localization within reasonable bounds. These improvements make it feasible to deploy SLAM on a wider range of devices.

READ FULL TEXT VIEW PDF

page 2

page 8

page 10

10/18/2017

FPGA-based ORB Feature Extraction for Real-Time Visual SLAM

Simultaneous Localization And Mapping (SLAM) is the problem of construct...
02/04/2017

Embedded Systems Architecture for SLAM Applications

In recent years, we have observed a clear trend in the rapid rise of aut...
03/30/2022

GTP-SLAM: Game-Theoretic Priors for Simultaneous Localization and Mapping in Multi-Agent Scenarios

Robots operating in complex, multi-player settings must simultaneously m...
02/26/2021

Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

This paper describes an image based visual servoing (IBVS) system for a ...
08/19/2019

How far should self-driving cars see? Effect of observation range on vehicle self-localization

Accuracy and time efficiency are two essential requirements for the self...
11/07/2021

Hierarchical Segment-based Optimization for SLAM

This paper presents a hierarchical segment-based optimization method for...
04/20/2018

High Dynamic Range SLAM with Map-Aware Exposure Time Control

The research in dense online 3D mapping is mostly focused on the geometr...

1. Introduction

Approximate computing is effective in reducing the time and energy required to execute application programs in domains like computational science (Sidiroglou-Douskos et al., 2011; Samadi et al., 2014; Carbin et al., 2012, 2013; Park et al., 2015; Sampson et al., 2011, 2014). Emerging problem domains like autonomous vehicles, robot navigation, augmented reality and the Internet of Things has opened up new opportunities for the use of approximate computing. Many of these applications need to run on embedded and low-power devices, so reducing their power and energy requirements permits them to be deployed on a wider range of devices for longer durations. Moreover, they are usually streaming applications in which inputs are not provided at the start of the program, as they are in computational science applications, but are supplied to the application over a period of time in a streaming fashion. Approximation in such programs must be performed in an adaptive, time-dependent manner to exploit temporal properties of the streaming input, so these types of applications are useful case studies for the design of systems support for approximate computing.

This paper is a case study of the use of principled approximation in Simultaneous Localization and Mapping (SLAM) (Mur-Artal et al., 2015; Newcombe et al., 2011; Whelan et al., 2012; Boikos and Bouganis, 2016; Ratter et al., 2013; Whelan et al., 2015; Borthwick and Durrant-Whyte, 1994), which is an important problem in domains such as robot navigation, augmented reality, and control of drones, robots and other autonomous agents. Unmanned agents have sensors like cameras or LIDAR to probe their environments. The SLAM problem is to use this sensory input to (i) construct a map of the agent’s environment (mapping), and (ii) determine the agent’s position and orientation in this environment (localization). The mapping and localization steps are performed repeatedly as the agent explores the environment.

Many algorithms (for e.g., KinectFusion, ElasticFusion, and ORB-SLAM) have been proposed for solving the SLAM problem (Izadi et al., 2011; Newcombe et al., 2011; Whelan et al., 2012; Mur-Artal et al., 2015; Borthwick and Durrant-Whyte, 1994; Whelan et al., 2015). These algorithms have high computational cost for two reasons. First, they require repeated application of floating-point intensive kernels such as stencil computations and filters to sensor data (Bodin et al., 2016; Palomar et al., 2017)

. Second, the systems must deal with noise that is inherent in operating in real-world environments, and this requires use of complex algorithms like Extended Kalman filtering and Particle filters. This high computational cost may lead to poor performance such as low frame rates, leading to suboptimal user experience and presenting a barrier to deploying SLAM algorithms on battery-operated, low-power devices such as cell phones, robots and drones even though these are natural targets of SLAM.

Approximating SLAM

Approximate computing can be used to reduce the time and energy requirements of SLAM implementations as long as the approximations do not prevent the agent from navigating correctly through the environment. Implementations of SLAM algorithms usually expose a number of algorithmic parameters, also called knobs, that trade off computation for accuracy of localization and mapping. Prior work on using approximation in SLAM has assumed that the entire trajectory is known before the agent starts to move, and properties of the trajectory are used to set the knobs at the start of the trajectory (Nardi et al., 2015; Palomar et al., 2017; Bodin et al., 2016). This approach is called offline control since knob settings are determined once and for all before the computation begins. These studies have primarily highlighted opportunities for exploiting approximation in SLAM algorithms. In most applications however, trajectories are not known ahead of time. In addition, permitting knob settings to be controlled adaptively during the trajectory opens up more opportunities for reducing time and energy requirements.

In this paper, we describe SLAMBooster, an application-aware online control system for a popular SLAM algorithm called KinectFusion (Newcombe et al., 2011; Izadi et al., 2011). KinectFusion takes depth images from a Kinect sensor and performs 3D reconstruction (mapping) and localization in indoor room-sized environments (Section 2). SLAMBooster adaptively controls approximation knobs during the motion of the agent, and is based on hierarchical proportional control, which is a standard application-agnostic control methodology (Section 3). However, our experiments showed that this application-agnostic approach by itself led to an unacceptable reduction in the quality of localization. To address this problem, SLAMBooster exploits domain knowledge in two ways: it uses features extracted from input frames and from the estimated motion of the agent in its algorithm for controlling approximation (Section 4).

We implemented SLAMBooster in the open-source SLAMBench framework to control approximation in the execution of KinectFusion. We evaluated SLAMBooster on more than a dozen trajectories, some from the literature and some from our own study (Section 5). Our experiments show that on the average, SLAMBooster reduces the computation time by 60% and also reduces the energy consumption by 42% on an embedded platform, while maintaining the accuracy of the localization within reasonable bounds. We also compare the performance of SLAMBooster with an application-agnostic control system for software applications (Imes et al., 2015; Filieri et al., 2014). Our studies show that our system is more effective than this traditional controller at simultaneously meeting accuracy constraints and providing good performance for many input trajectories.

SLAMBooster therefore provides a case study of how to control approximation online in a principled way in a new application domain, and it is an important step towards enabling efficient execution of SLAM algorithms like KinectFusion on a wider range of platforms.

2. KinectFusion and SLAMBench

This section briefly describes the KinectFusion algorithm (Section 2.1) and the SLAMBench infrastructure (Section 2.2), and highlights the performance challenges in using KinectFusion on resource-constrained platforms (Section 2.3).

(a) Depth frame captured by an agent
(b) 3D reconstructed surface
Figure 1. KinectFusion takes depth frames as inputs and produces a 3D reconstructed surface with the trajectory of the agent in that environment.

2.1. Steps in KinectFusion

KinectFusion is a dense SLAM algorithm which takes depth frames from a Kinect sensor and processes them to perform 3D reconstruction (mapping) and localization in room-sized indoor environments. It produces a 3D reconstructed map and a trajectory incrementally as it processes incoming depth frames. The algorithm performs the following high-level steps for each input frame (Newcombe et al., 2011; Izadi et al., 2011).

  1. Acquisition: an input depth frame such as the one shown in Figure (a)a is read in either from a camera or from disk.

  2. Preprocessing: depth values in the frame are normalized and a bilateral filter is applied for noise reduction.

  3. Localization: a new estimate of the position and orientation (together called a pose) of the camera is computed using an iterative closest point (ICP) algorithm. The algorithm determines the difference in the alignment of the current normalized depth frame with the depth frame computed from the previous camera pose (see raycasting below). This phase is also called tracking in the SLAM literature.

  4. Integration: the existing 3D map is updated to incorporate the aligned data for the current frame using the pose determined in the tracking phase.

  5. Raycasting: a depth frame from the new camera pose is computed from the global 3D map using a raytracing algorithm.

  6. Rendering: a visualization of the 3D surface is generated.

2.2. SLAMBench

The KinectFusion algorithm has been implemented in the open-source SLAMBench framework (Nardi et al., 2015). This implementation exposes the following algorithmic parameters (knobs), which are set to certain default values in SLAMBench.

  1. Compute size ratio (csr): resolution of the depth image used as input.

  2. Volume resolution (vr): resolution at which the scene is reconstructed.

  3. Tracking rate (tr): rate at which tracking and localization are performed.

  4. Integration rate (ir): rate at which new frames are integrated to the scene.

  5. ICP threshold (icp): threshold for the ICP algorithm.

  6. Pyramid level iterations (pd): maximum number of iterations ICP algorithm performs on each level of the image pyramid.

  7. distance (mu): the truncation distance in the output volume representation.

These knobs can be tuned to optimize the computation time or energy required for processing each frame (in our study, we ignore the image acquisition time and rendering time since these steps are platform-independent and cannot be approximated from SLAMBench). However, this tuning needs to be done under the constraint that the output quality is acceptable since KinectFusion can construct inaccurate 3D maps or trajectories when too much approximation is introduced. Constraints on output quality can be defined more precisely as follows.

  • Tracking loss happens when a frame is discarded because it cannot be aligned with the depth frame computed from the previous camera pose. In some cases, KinectFusion can never recover from a loss in tracking. Therefore, a system for controlling approximation in KinectFusion should attempt to prevent tracking loss when optimizing computation time and energy.

  • The difference between the actual and the computed location of the agent at any frame is defined as instantaneous trajectory error (ITE). One widely used quality metric used for SLAM is the average trajectory error (ATE), which is the average of the ITE over all the frames of the trajectory.

Since ATE is a property of the entire trajectory, it is not known until the end of the trajectory. Furthermore, it requires knowing ground truth (i.e., the actual trajectory taken by the agent), which is usually not available except in simulated SLAM environments. Online control of knobs in SLAM therefore requires a proxy for the ITE that can be computed at each frame rather than at the end of the computation. Section 3 describes the proxy used in SLAMBooster.

2.3. Performance of KinectFusion in SLAMBench

(a) Computation time
(b) ATE
(c) Computation time (in seconds)
(d) ATE
Figure 2. Performance of the default and the most accurate configurations of unmodified KinectFusion. The top row shows performance on an Intel Xeon system, while the bottom row shows performance with an ODROID XU4 system.

To get a sense of the performance of KinectFusion with the default knob settings used in SLAMBench, we ran an unmodified OpenCL implementation of KinectFusion on two platforms with different compute capabilities. The first platform is a high-end Intel Xeon E5-2630 system with a Nvidia Quadro M4000 GPU, and the other is ODROID XU4 board which is an widely used platform for emulating embedded systems. The ODROID XU4 has an octa-core Exynos 5422 big.LITTLE processor and a Mali-T628 MP6 GPU. The top row in Figure 2 shows the computation time per frame and ATE on the Xeon system, while the bottom row shows these values for the ODROID system. In each figure, the horizontal axis shows four living room trajectories named lr0-lr3 from the ICL-NUIM dataset (Handa et al., 2014). The dataset includes ground truth, so it is possible to compute the ATE for these trajectories.

For computation time performance, the vertical axis shows the computation time required by KinectFusion to process each input frame. For the ATE, the vertical axis shows the average deviation between the reconstructed trajectories and the ground truth. In SLAM literature, an ATE of 5 cm or less is considered reasonable (Bodin et al., 2016); the horizontal dashed lines in Figures (b)b and (d)d show the 5 cm constraint on the ATE. The left bar, DEFAULT, represents the default settings of the parameters in KinectFusion as set in SLAMBench. The right bar, ACCURATE, represents the most accurate configuration of the parameters for KinectFusion and thus incurs more overhead than DEFAULT. Note that neither DEFAULT nor ACCURATE can process lr3 well because lr3 has frames that are too difficult to be tracked, so the difference in the amount of error for lr3 introduced by DEFAULT and ACCURATE can be ignored.

From Figure 2, we see that the best frame rate attainable by KinectFusion on the high-end Xeon system is approximately 90 fps, but it is only 3-4 fps on the big.LITTLE embedded system. Thus, while KinectFusion can achieve real-time processing rates on high-end hardware, it performs poorly on an embedded system with constraints on resources such as hardware capabilities, energy or power consumption, and peak frequency. One way around this problem is to use approximation but this needs to be done without reducing the quality of the output to an unacceptable level. The rest of the paper explores how this is done in SLAMBooster.

3. Design Choices in Approximation Controller

(a) Ranking knobs for computation time
(b) Ranking knobs for ATE
Figure 3. Ranking knobs by importance for ATE and computation time

In this section, we describe the main choices made in the design of SLAMBooster. Section 3.1 motivates why we need online control of knobs. As mentioned in Section 2, online control requires a online proxy for error that can be computed as frames are processed. Section 3.2 describes the proxy we use in SLAMBooster. Section 3.3 describes how we reduce the size of the knob space to simplify the design of SLAMBooster.

3.1. Offline vs. Online Control of Knobs

An offline control system for SLAM would use fixed knob settings for the entire computation, and it would choose the knob settings using (i) cost and error models built from training data and (ii) features of the input trajectory. Offline control has been used successfully to control approximation in long-running compute-intensive programs (Sui et al., 2016) but there are some obvious drawbacks in using this approach for SLAM. In most applications of SLAM such as robot motion, the environment is discovered while moving through it so the trajectory is not known before the SLAM computation begins. Furthermore, online control permits knob settings to be set adaptively, utilizing information from each frame, and this can be more efficient than setting the knobs once and for all at the start of the SLAM computation. For example, if the scene has objects like chairs or tables, localization and integration are relatively easy and the SLAM computation can be performed with lower precision. Conversely, when the scene has only smooth surfaces like walls, it complicates the process of tracking and aligning frames. Hence, a more precise computation may be needed to avoid large tracking error. For these reasons, SLAMBooster uses online control of approximation.

3.2. Proxy for Instantaneous Trajectory Error

An online control system needs online metrics to monitor the performance of the application during execution. As discussed in Section 2, the usual error metric used in SLAM is the Average Trajectory Error (ATE) but this is a property of the entire trajectory so it cannot be used directly as an online error estimator. If ground truth is available, we can use the instantaneous trajectory error (ITE) but in most applications of SLAM, even the ground truth is not available.

To devise a proxy for ITE, we exploit the basic assumption in the KinectFusion algorithm that the movement of the agent between successive frames is small (for example, this assumption helps ensure the success of the iterative closest point algorithm during the localization step to align the incoming frame with the global map).

In the spirit of this observation, we evaluated several plausible metrics including the inter-frame difference in depth values and the alignment error as online proxies for error; intuitively, large values of these metrics suggest that the scene is changing suddenly, requiring higher precision computation to avoid large tracking error. However, our experiments showed that there is no strong correlation between these metrics and the ITE. The metric which correlated best with the ITE is the velocity of the agent. The localization phase in the KinectFusion algorithm estimates the pose of the agent, and by looking at the difference in poses between successive frames (assuming every frame is tracked), we can estimate the velocity of the agent. We used Lasso (Tibshirani, 1996) to confirm a positive correlation between the velocity and the ITE for the four ICL-NUIM trajectories for which we have the ground truth. Therefore, we use the estimated velocity of the agent as a proxy for the ITE.

3.3. Reducing the Knob Space

Precise modeling of the relationship between knob settings and computation time or error requires extensive exploration of the knob space and is input-sensitive, which makes it intractable (Bodin et al., 2016). To make the control problem more tractable, we ignore knobs that are ill-suited for online tuning, such as vr and mu, since they require recomputing the global map data structure, which is expensive. We also do not control the tracking rate tr or integration rate ir and we set them to one, since every frame should be tracked and integrated in an effort to not violate the assumption that the movement between successive frames is small.

We ranked the remaining knobs by their influence on ATE and computation time. These knobs are csr, icp, and pd. The knob pd has three components, referred to as pd0, pd1, and pd2. Figures (a)a and (b)b show how computation time and ATE change for the first three living room trajectories from the ICL-NUIM dataset when knobs are changed one at a time, keeping all other knobs fixed at their default values. We find that knobs csr, icp and pd0 have the most impact on performance, and this finding is consistent with prior work (Bodin et al., 2016). Among the three, csr has dominant impact on computation time and also significantly impacts ATE, which is intuitive since it controls the resolution of the depth image to be used for computation. Knob pd1 and pd2 did not significantly influence ATE or computation time, and hence are less interesting for online control. Therefore, we only use csr, icp and pd0 for approximation control in SLAMBooster.

4. SLAMBooster Online Control System

This section describes SLAMBooster in stages. Section 4.1 presents a naïve hierarchical proportional controller that controls knobs following the order of importance identified in Section 3.3. This naïve controller is successful in reducing computation time but it violates the error constraint by a significant margin for some trajectories. Therefore, we improve it by exploiting domain-specific knowledge of the KinectFusion algorithm, using image feature extraction (Section 4.2) and pose correction (Section 4.3). The computation time performance is further improved by using reduced-precision floating-point operations in some of the computation phases (Section 4.4).

1:Frame KF.acquisition()
2:if  BOOTSTRAP_FRAMES then
3:     Knob Knob
4:     Frame KF.preprocessing(Frame, Knob)
5:     Pose KF.tracking(Frame, Knob)
6:     KF.integration(Pose, Frame, Knob)
7:     KF.raycasting(Pose, Knob)
8:     KF.rendering(Pose, Knob)
9:     continue
10:end if
11:
12:feature_trigger ExtractFeatures(Frame)
13:if velocity_trigger or feature_trigger then
14:     Knob Knob - 1 Increase precision
15:else
16:     Knob Knob + 1 Increase approximation
17:end if
18:
19:Frame KF.preprocessing(Frame, Knob)
20:Pose KF.tracking(Frame, Knob)
21:
22:V Pose - Pose Compute velocity
23:if V > EXTRAPOLATE_THRES then
24:     Pose T * Pose
25:     T T
26:else
27:     T Pose * Pose
28:end if
29:
30:velocity_window velocity_window V
31:V sum(velocity_window) / WINDOW_LEN
32:ascending_trigger is_asc_order(velocity_window)
33:velocity_trigger (V > THRESHOLD)
34:                          ||  (V > AVG_THRESHOLD)
35:                          ||   ascending_trigger
36:                          ||  (V > EXTRAPOLATE_THRES)
37:
38:KF.integration(Pose, Frame, Knob)
39:KF.raycasting(Pose, Knob)
40:KF.rendering(Pose, Knob)
Algorithm 1 Online controller in SLAMBooster for the KinectFusion algorithm applied on each input frame Frame.

4.1. Hierarchical Proportional Controller

In general, a proportional controller adjusts knob settings by looking at the difference between a reference value and a value that is derived from the state of the system. For example, an automobile cruise control based on proportional control looks at the difference between the speed of the car and the desired speed, and either accelerates or slows down the car proportionately to this difference.

When there are multiple knobs, a hierarchical proportional controller tunes one knob at time, following the order of importance of the knobs. The most important knob is tuned unless it has already reached its maximum (or minimum) value, in which case the next most important knob is tuned, and so on. The controller we have designed follows the order of importance discussed in Section 3.3. To introduce more approximation, csr is dialed up until it reaches its maximum value (in this paper, increasing a knob value introduces more approximation), while icp and pd0 are anchored to their default value. The knob icp is changed only after csr has reached its maximum value, and so on. The reverse order is followed when reducing the amount of approximation.

Algorithm 1 shows the pseudocode for SLAMBooster and incorporates the refinements discussed in this section. KF in Algorithm 1 stands for KinectFusion, and lines showing operations on KF represent logic from the unmodified KinectFusion algorithm. The highlighted code implements optimizations to the baseline controller, and should be ignored for now.

  • The code shows the computations performed for each input frame Frame acquired from the I/O device (line 1). Pixels in the frame are depth values.

  • After acquiring the frame, the controller determines whether it is still in the bootstrap phase. No approximation is done for the first few frames to allow KinectFusion to start building an accurate initial global 3D map, so the unmodified KinectFusion algorithm is executed (lines 39).

  • If SLAM is not in the bootstrap phase, the controller adjusts the knobs based on a predicate velocity_trigger computed while processing the previous frame (lines 1317) (feature_trigger in the predicate of the conditional is a refinement introduced later in this section; for now, it can be assumed to be false). Knobs can be dialed up or down depending on this predicate. These knob settings are fed into the five phases of the SLAM algorithm described in Section 2 to process the current frame with the desired level of approximation.

  • Lines 2236 (ignore highlighted lines) show the computation of the velocity_trigger for use in the next frame. The current velocity (denoted by ) is estimated using the difference between the poses in the current frame and the previous frame. Intuitively, approximation needs to be dialed down if this value is above some threshold.

    In addition to this, we found it beneficial to maintain a sliding window of the history of , and track the average velocity (denoted by ) for the current window. The controller also checks whether the velocity is consistently increasing over a window, which is an indicator that more precise computation is desirable. These measures are compared with pre-defined thresholds, and the value of velocity_trigger is set for use by the next frame.

Our experiments showed that although this simple hierarchical proportional controller is effective in reducing the computation time by more than half, it fails to meet the trajectory error constraint for many of our real-world trajectories (see Section 5.3). We address this problem next.

4.2. Extracting Image Features

In the tracking and integration phases, KinectFusion merges information from the current depth frame with the global 3D map. Each incoming depth frame in KinectFusion is an abstraction of the scene at which the camera is pointing. If the scene has objects like chairs and tables, it is easier for the algorithm to integrate a depth frame with the existing 3D map than if the scene is a blank wall for example. Therefore, it is desirable to adapt the level of approximation to the scene.

To implement this idea, we sample the four quadrants of the input depth frame, leaving out pixels at the margins of the frame since the Kinect sensor is known to potentially produce invalid depth pixels at the periphery. To reduce computational overhead, a fixed

number of pixels are sampled from each quadrant, independent of the resolution of the frame; this number is chosen so that at the lowest resolution, all pixels are sampled. We then compute the standard deviation of depth values within each quadrant. If all these standard deviations are below some threshold value, the camera might be pointing to a smooth surface so the control system increases the precision of KinectFusion computation.

Algorithm 1 shows the augmented control system using feature detection. The ExtractFeatures function implements smooth feature detection (line 12). This additional information is used by the controller to manipulate knobs (line 13). Our experiments indicate that compared with the naïve controller, this improves the ATE with little additional computation overhead.

4.3. Pose Correction

The final enhancement we make to the basic hierarchical control system is to use a simple form of Kalman filtering (Grewal and Andrews, 2014; Yan Pei, Swarnendu Biswas, Donald S. Fussell, and Keshav Pingali, 2017) to recompute the pose when it appears that the agent has made a sudden movement. Informally, Kalman filtering is a method for combining a number of uncorrelated estimates of some unknown quantity to obtain a more reliable estimate. In many practical problems, the unknown quantity is the state of a dynamical system, and there are two estimates of this state at each time step, one from a model of state evolution and one from measurement, that are combined using Kalman filtering.

In the context of SLAM, the unknown state is the pose of the agent. When a frame is processed by KinectFusion, the tracking module uses the measured depth values in the frame to provide an estimate of the new pose, as shown in line 20 in Algorithm 1. However, if this estimated pose differs substantially from the pose in the previous frame, it violates the assumption that the movement of the agent between successive frames should be small. This indicates that KinectFusion has potentially inferred an inaccurate pose, and the pose estimate from the tracking module may be unreliable. Using approximation with this inaccurate pose may lead to large drifts in the estimates produced in the real-world 3D trajectories used in our experimental study.

In the spirit of Kalman filtering, we use a simple model to estimate the pose if the estimate from the measurement produced by the tracking module is substantially different from the pose in the previous frame. Lines 2328 in Algorithm 1 show the pseudocode. KinectFusion represents the live 6DOF camera pose estimate by a rigid body transformation matrix. T represents the transformation matrix calculated at frame when Pose is computed from Pose. The logic compares with a threshold to check whether correcting Pose is required. If the velocity is below the threshold, the matrix T will be calculated using the current and the previous pose. On the other hand, if the difference in poses is abnormally large, Pose is recomputed by applying T to Pose, following the assumption that the movement between successive frames should be small. Downstream KinectFusion kernels work using this corrected estimate of Pose.

Section 5.3 shows that correcting pose estimations in this way improves the accuracy of the trajectory reconstruction substantially, with minimal control overhead.

4.4. Reduced-precision Floating-point Format

Finally, we explored the benefit of using half-precision floating-point numbers instead of the default single-precision floating-point numbers. Half-precision format can potentially improve vector operation efficiency and cache miss rates. OpenCL extension

cl_khr_fp16 has support for half scalar and vector types as built-in types that can be used for arithmetic operations, type casts, etc.

Some phases in KinectFusion such as Localization and Integration have constants that are too small to be represented in half-precision format. Therefore, we manually transformed the Raycasting and Preprocessing phases in half-precision format. Since these phases perform a large number of vector operations, use of reduced precision can be beneficial (Palomar et al., 2017).

5. Experimental Results

This section evaluates the benefits and effectiveness of SLAMBooster for approximating KinectFusion.

5.1. Methodology

We implemented SLAMBooster in the open-source SLAMBench (Nardi et al., 2015) infrastructure111https://github.com/pamela-project/slambench.

Platform

Figure 2 shows that unmodified KinectFusion achieves good performance (90 fps) on a high-end Intel Xeon E5-2630 system with a Nvidia Quadro M4000 GPU. Our experiments with the Intel Xeon system show that SLAMBooster is able to improve the performance further (200 fps) without failing the accuracy constraint. We do not show results with the Intel Xeon system for lack of space. Instead, we present detailed results of SLAMBooster on a low-power embedded environment using an ODROID XU4 board with a Samsung Exynos 5422 octa-core processor. The Exynos processor has four big Cortex-A15 cores running at 2 GHz and four Cortex-A7 cores running at 1.4 GHz, and has 2 GB LPDDR3 RAM. The XU4 board is equipped with a Mali-T628 MP6 GPU that supports OpenCL 1.2, and runs Ubuntu 16.04.4 LTS with Linux Kernel 4.14 LTS. The XU4 board does not have on-board power monitors, we use a SmartPower2222http://www.hardkernel.com/main/products/prdt_info.php?g_code=G148048570542 device to monitor energy consumption for the whole board.

Benchmark trajectories

SLAMBench supports the ICL-NUIM RGB-D dataset333http://www.doc.ic.ac.uk/~ahanda/VaFRIC/iclnuim.html, which is used for benchmarking SLAM algorithms (Handa et al., 2014). The dataset contains two scenes, a living room and an office room, obtained by using the Kintinuous system (Whelan et al., 2012), and the ground truth trajectory. Each scene has several synthetically-generated trajectories. We were unable to execute trajectories for the office room scene correctly with SLAMBench. In addition, we also exclude the lr3 benchmark from our experiment since even the ACCURATE configuration cannot meet the error constraint (Figure 2, Section 2.3). Therefore, we use three trajectories with the living room scene, referred to as lr0, lr1 and lr2 in this paper.

To increase the diversity of trajectories, we used a first-generation Kinect camera to collect fourteen additional trajectories from real-world scenes in an indoor environment. The fourteen trajectories are: ktcn0, ktcn1, lab0, lab1, lab2, lab3, mr0, mr1, mr2, off0, off1, off2, pd0, and pd1. All the inputs are collected at 30 fps with resolution 640x480. We do not have ground truth for these trajectories, hence we use the trajectory computed by the most accurate setting of KinectFusion as a stand-in for ground truth. We have verified using SLAMBench GUI that KinectFusion is able to rebuild the trajectories correctly. Column 2 in Table 1 shows the length of each trajectory.

5.2. Performance of SLAMBooster

Average Computation Time.
(b) Average Computation Time.
(c) Average Energy Consumption Per Frame.
(a) ATE.
(a) ATE.

Figure 4. Online KinectFusion control system performance on embedded platform
Default KinectFusion SLAMBooster
# frames % frames Computation ATE Energy (J)/ % frames Computation ATE Energy (J)/
tracked time (s) (cm) frame tracked time (s) (cm) frame
lr0 1510 100 0.235 0.98 2.13 100 0.096 1.48 1.32
lr1 967 100 0.234 0.61 2.19 100 0.085 3.11 1.27
lr2 882 100 0.246 1.07 2.28 100 0.156 1.80 1.52
ktcn0 1550 100 0.225 1.05 1.87 100 0.156 0.99 1.26
ktcn1 800 100 0.204 2.70 1.94 100 0.136 2.56 1.38
lab0 800 100 0.239 0.63 1.96 100 0.061 4.90 1.00
lab1 1250 100 0.212 0.39 1.85 100 0.083 1.21 1.15
lab2 1250 100 0.230 3.63 1.95 99.5 0.077 4.02 1.03
lab3 1250 100 0.226 0.65 1.96 100 0.072 4.97 1.01
mr0 929 100 0.273 0.97 2.18 100 0.139 3.27 1.36
mr1 1494 100 0.255 2.19 2.15 100 0.136 3.73 1.24
mr2 1450 100 0.236 0.49 2.06 100 0.070 2.86 1.07
off0 750 100 0.215 0.17 1.92 100 0.115 0.49 1.20
off1 1050 100 0.248 0.78 2.20 100 0.123 0.61 1.31
off2 1200 100 0.221 0.45 1.91 100 0.064 1.36 0.99
pd0 1600 100 0.251 0.21 2.12 99.0 0.057 3.63 0.98
pd1 1500 100 0.216 0.57 2.01 100 0.095 3.16 1.14
geomean 100 0.232 0.75 2.04 99.9 0.094 2.13 1.09
Table 1. Detailed results with input trajectories controlled using the SLAMBooster control system on the ODROID XU4 platform.

Figure 4 shows the ATE, average computation time per frame, and average energy consumption per frame for the benchmark trajectories on the embedded platform. The figures compare two configurations: using the default knob settings in SLAMBench, and using SLAMBooster to control knobs online. Each bar is the average of three trials. Performance numbers are shown in a tabular form in Table 1.

For most trajectories, ATE is higher when knobs are controlled with SLAMBooster, as expected. Nevertheless, all individual ATEs are less than 5 cm and therefore meets the required quality constraint. Figure (b)b shows the average computation time for each frame. The average speedup of SLAMBooster over DEFAULT is 2.5x, so SLAMBooster achieves a throughput of 10 frames per second. Although this frame rate does not meet real time constraint, computer graphics researchers still consider 10 fps to be reasonable for providing smooth user experience.

Figure (c)c shows another important metric, the average energy consumed for processing a frame. The idle power dissipation of an ODROID XU4 board is about 4 Watt. The energy saving for each frame is about 42%. The reduction in energy per frame is not as significant as the reduction in computation time per frame because of the frequent up and down scaling of certain data structures, required while tuning the csr knob.

3D map comparison

(a) 3D surface reconstructed by ACCURATE
(b) 3D surface reconstructed by SLAMBooster
(c) 3D surface difference
Figure 5. Evaluation of the quality of 3D maps

Since SLAM is used for navigation, the 3D map is mainly a means to an end rather than an end in itself; nevertheless it is interesting to study the quality of the 3D map produced by SLAMBooster. Figure 5 compares a typical global 3D map built by using the most accurate knob settings (Figure (a)a) with the one built by using SLAMBooster (Figure (b)b). Figure (c)c is a diff of these two maps in which pixels that are substantially different are marked in red. We see that using SLAMBooster does not impact the quality of the 3D map substantially.

Control overhead

The overhead of SLAMBooster arises mainly from data structure down sampling and up sampling when tuning knob csr. Measurements show that the average overhead introduced by the controller is 1 ms on ODROID XU4 board, which is 1% of the total computation time. Therefore, the overhead of the control logic is negligible.

5.3. Impact of Optimizations

Average Computation Time.
(b) Average Computation Time.
(c) Average Energy Consumption Per Frame.
(a) Absolute ATE.
(a) Absolute ATE.

Figure 6. Incremental optimization impact on ATE, computation time, energy

Figure 6 shows the incremental impact of the different optimizations in the naïve SLAMBooster controller. The controller configurations are listed below.

  • DEFAULT setting provided by SLAMBench

  • Proportional controller

  • Proportional controller + image feature extraction

  • Proportional controller + image feature extraction + pose correction

  • SLAMBooster: proportional controller + image feature extraction + pose correction + half-precision floating point

Figures (a)a, (b)b and (c)c show the ATE, computation time and energy consumption respectively for the different controller configurations. The naïve proportional controller achieves much of the time savings but violates the error constraint for 5 out of 17 inputs. This is not acceptable since nearly a third of the inputs fail to meet the error constraint. When image feature extraction is incorporated into the controller, the overall error performance is improved and only three inputs fail to meet the 5 cm error constraint. Note that feature extraction introduces a 15% computation time overhead compared to the proportional controller.

When pose correction is also incorporated, all the benchmarks meet the error constraint. It is interesting to note that this reduces the overall computation time compared to using only image feature extraction. The reason is that this setting improves localization and mapping, so more approximation can be done safely, leading to reduced computation time. Finally, when half-precision floating-point arithmetic is added, the overall computation time and energy consumption are both improved by

5% at the cost of slightly worse error.

5.4. Effectiveness of Online Control

Knob Activity:
(b) Knob Activity: csr
(c) Trigger Activity: Feature Trigger and Correction Trigger
(a) Trajectory Error
(a) Trajectory Error

Figure 7. Knob activity for benchmark: mr1

The results in the previous section showed the effectiveness of SLAMBooster for entire trajectories. To get a better sense of how online control in SLAMBooster works, it is useful to visualize how knob settings change from the beginning to the end of a complete trajectory. Figure 7 is such a visualization for the mr1 trajectory. This is one of the more difficult trajectories in our benchmark set and the naïve proportional controller, even with image feature extraction, fails to meet the error constraint. In Figure 7, the axis represents frame number in chronological order, so it is a proxy for time.

The black line in Figure (a)a shows instantaneous trajectory error (ITE) over time when SLAMBooster is used for the entire trajectory. Those ITEs are computed after the execution by comparing the reconstructed trajectory with the ground truth, which is assumed to be the trajectory reconstructed with the most accurate setting of KinectFusion. Figure (b)b and Figure (c)c show the settings of the csr knob and the activation of feature extraction and pose correction triggers respectively.

To demonstrate the effectiveness of SLAMBooster, we show another configuration in Figure (a)a (represented by the light gray line). In this configuration, we switch SLAMBooster to the naïve hierarchical controller after frame 500. During frames 520–590, the camera encounters a smooth surface and the naïve controller cannot deal with it. As shown by the gray line in Figure (a)a, ITE increases dramatically (the peak ITE is 100 cm and is too large to be plotted on this scale) and the ITE never comes down to an acceptable level. This shows that the optimization techniques used in SLAMBooster are critical for complex trajectories like mr1.

Figure (b)b shows the activity of the most important knob csr. Frames 200 to 400 are relatively easy and speed is low, so SLAMBooster tunes the knob all the way up to approximate the computation. For frames between 700 to 1200, ITE gets larger because the velocity of the agent increases. As a result, csr is set to the most accurate value by SLAMBooster in order to handle the drift in the trajectory.

This example shows that SLAMBooster can control the knobs dynamically to exploit opportunities to save computation time and energy while ensuring that the localization error is within some reasonable bound.

5.5. Comparison with Application-Agnostic Controller

We compare SLAMBooster with an application-agnostic control system for software applications. Application-agnostic control systems have been used successfully in the literature to control a diverse set of applications (Filieri et al., 2014; Imes et al., 2015). In the following, we first briefly present a general strategy to design such a control system, which we call TC, and then describe our adaptions to SLAM.

  • The application execution is divided into intervals or windows of some size (e.g., 64 or 32 frames in our experiments). TC tries to meet the desired performance constraints and objectives on the average for each window.

  • In each interval, TC tracks how well the performance constraint has been met, and this information is used to decide whether the system should be sped up or slowed down in the next interval to better meet the performance constraint. This desired performance level is normalized by the performance obtained by setting knobs to their default values, and this dimensionless quantity ("performance speedup") is used to find the knob settings for the next interval.

  • To find knob settings for a desired performance speedup, TC consults a configuration table, which returns Pareto optimal knob settings for a given performance speedup (in some cases, it returns a pair of knob configurations but this detail can be ignored). The configuration table is constructed ahead of time by profiling the program using representative inputs and knob configurations.

We implemented an online controller for SLAM following the traditional control design scheme described above. Instead of a performance speedup requirement, each time interval is given an error budget (i.e., the total amount of error allowed in the next interval) whose value is computed using TC’s strategy for determining performance speedup. Since ground truth is not available for most of our trajectories, we use velocity as a proxy for the actual error in each interval (a reference velocity is defined as the required velocity). The base velocity for each window, corresponding to the base performance in the original traditional controller, is provided by ground truth instead of implementing Kalman filtering (Imes et al., 2015). Note that this value is at least as accurate as what Kalman filtering estimates. We used two approaches to build the configuration table. The first one used the three synthetically-generated living room trajectories lr0, lr1 and lr2 for profiling. Since these trajectories are not representative of the entire set of trajectories in our benchmark set, we would expect the performance of the controller to be poor. The second approach is to use a more diverse set of inputs, one from each scene in the benchmark set (for example, mr1 is picked from the meeting room category).

Living Rooms CFG Diverse Rooms CFG
Ref # Error # Better # Error # Better
Velocity Violated Time Violated Time
0.004 3.5 1 1 0
0.005 5.5 2 1 0
0.006 7 3 1 0
0.008 10 3 2 0
0.010 10 5 2 0
0.012 10 6 2 0
Table 2. Traditional online controller performance

Table 2 shows the performance of the traditional control scheme compared to SLAMBooster. The evaluation includes running all the benchmarks using the traditional controller on each configuration table with different values for the reference velocity. The Error Violated column shows the number of inputs that violate the 5 cm error constraint, while the Better Time column shows the number of trajectories that satisfy the error constraint and have less computation time than with SLAMBooster. When the configuration table is built using only the living room trajectories (Living Rooms CFG), the controller introduces approximation too aggressively because the profiling set, living room trajectories, are relatively simple to approximate comparing to other real world trajectories. When the error budget (reference velocity) gets looser, more trajectories achieve better computation time at the cost of unacceptable tracking error. On the other hand, when profiling is done with the diverse trajectory set (Diverse Rooms CFG), the error constraint is rarely violated but the computation times are slower than with SLAMBooster because the configuration table is overly conservative.

Prior work has shown that a traditional controller can control a diverse set of applications (Imes et al., 2015; Filieri et al., 2014). However, high input sensitivity and low error tolerance characteristic of SLAM makes it difficult for a traditional controller to control KinectFusion. Intuitively, the configuration table is a model of system behavior that averages over all the trajectories used in the profiling (training) phase, so the controller cannot optimize the behavior of the system for the particular trajectory of interest in a given execution. In addition, this controller does not exploit the SLAM-specific techniques in SLAMBooster such as extracting image features and pose correction, which proved essential in Section 5.4.

6. Related Work

We discuss work on approximation in SLAM algorithms and on using control-theoretic approaches for optimal resource management.

6.1. Approximating SLAM

Recent work has used KinectFusion and the SLAMBench infrastructure to study the performance impact of reduced-precision floating-point arithmetic in SLAM algorithms (Palomar et al., 2017; Oh et al., 2016). Unlike SLAMBooster, these approaches do not exploit approximation at the algorithmic level.

Offline control of KinectFusion has been explored by Bodin et al.(Bodin et al., 2016)

using an active learning technique. Given the entire trajectory ahead of time, they use a random forest of decision trees to characterize the input trajectory, and generate a Pareto optimal set of configurations that trades off computation time, energy consumption and the ATE. In addition to algorithmic knobs, they also explore approximation of compiler and hardware-level knobs. Their study is limited to a subset of frames for one synthetically-generated trajectory. Follow-up work utilizes the motion information of the autonomous agent to improve offline control and evaluates the offline approximation technique on other SLAM algorithms 

(Nardi et al., 2017; Saeedi et al., 2017). In contrast, SLAMBooster performs online control so it does not need to know the entire trajectory before the agent begins to move. Adaptive control of knobs also permits SLAMBooster to optimize knob settings dynamically to take advantage of diverse environments, which is not possible with offline control.

6.2. Adaptive Resource Management

Several systems have been proposed that aim to balance power or energy consumption along with performance and program accuracy (Hoffmann, 2015; Farrell and Hoffmann, 2016; Imes et al., 2015). JouleGuard is a runtime control system that coordinates approximate applications with system resource usage to provide control-theoretic guarantees on energy consumption (i.e., will not exceed a given threshold), while maximizing accuracy (Hoffmann, 2015)

. The control system uses reinforcement learning to identify the most energy-efficient configuration, and uses an adaptive PID controller-like mechanism to maintain application accuracy. POET measures program progress and power consumption, uses feedback control theory to ensure the timing goals are met, and solves a linear optimization problem to select minimal energy resource allocations based on a user-provided specification 

(Imes et al., 2015). MeanTime is a approximation system for embedded hardware that uses control theory for resource allocation (Farrell and Hoffmann, 2016). These techniques combine PID-like control techniques to various parts of the system, and provide empirical demonstrations of overall system behavior rather than guarantees. Recent work on optimal resource allocation focuses on designing sophisticated control systems using linear quadratic Gaussian control for example (Pothukuchi et al., 2016; Mishra et al., 2018; Rahmani et al., 2018). Unlike SLAMBooster, they do not exploit application-specific properties to perform control.

7. Conclusion

SLAM algorithms are being deployed in low-power systems for augmenting user experience. A big obstacle to widespread adoption of SLAM is its computational expense. In this work, we present SLAMBooster, a heuristic-based proportional control system, to approximate the computation in KinectFusion, which is a popular SLAM algorithm. We show that the SLAMBooster controller, augmented with insights from the application domain, is effective in reducing the computation time and the energy consumption, with acceptable bounds on the localization accuracy. Our work shows the opportunity of introducing controlled approximation in SLAM algorithms.

Acknowledgments

This work was supported by NSF grants 1337281, 1406355, and 1618425, and by DARPA contracts FA8750-16-2-0004 and FA8650-15-C-7563. The authors would like to thank Behzad Boroujerdian for helpful discussions about SLAM.

References

  • (1)
  • Bodin et al. (2016) Bruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Lujan, Björn Franke, Paul H.J. Kelly, and Michael O’Boyle. 2016.

    Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding. In

    Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT ’16). ACM, New York, NY, USA, 57–69.
    https://doi.org/10.1145/2967938.2967963
  • Boikos and Bouganis (2016) K. Boikos and C. S. Bouganis. 2016. Semi-Dense SLAM on an FPGA SoC. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL). 1–4. https://doi.org/10.1109/FPL.2016.7577365
  • Borthwick and Durrant-Whyte (1994) S. Borthwick and H. Durrant-Whyte. 1994. Simultaneous Localisation and Map Building for Autonomous Guided Vehicles. In Intelligent Robots and Systems ’94. ’Advanced Robotic Systems and the Real World’, IROS ’94. Proceedings of the IEEE/RSJ/GI International Conference on, Vol. 2. 761–768 vol.2. https://doi.org/10.1109/IROS.1994.407552
  • Carbin et al. (2012) Michael Carbin, Deokhwan Kim, Sasa Misailovic, and Martin C. Rinard. 2012. Proving Acceptability Properties of Relaxed Nondeterministic Approximate Programs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). ACM, New York, NY, USA, 169–180. https://doi.org/10.1145/2254064.2254086
  • Carbin et al. (2013) Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13). ACM, New York, NY, USA, 33–52. https://doi.org/10.1145/2509136.2509546
  • Farrell and Hoffmann (2016) Anne Farrell and Henry Hoffmann. 2016. MEANTIME: Achieving Both Minimal Energy and Timeliness with Approximate Computing. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, Denver, CO, 421–435. https://www.usenix.org/conference/atc16/technical-sessions/presentation/farrell
  • Filieri et al. (2014) Antonio Filieri, Henry Hoffmann, and Martina Maggio. 2014. Automated Design of Self-adaptive Software with Control-Theoretical Formal Guarantees. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 299–310. https://doi.org/10.1145/2568225.2568272
  • Grewal and Andrews (2014) Mohinder S. Grewal and Angus P. Andrews. 2014. Kalman Filtering: Theory and Practice with MATLAB (4th ed.). Wiley-IEEE Press.
  • Handa et al. (2014) A. Handa, T. Whelan, J. McDonald, and A. J. Davison. 2014. A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM. In 2014 IEEE International Conference on Robotics and Automation (ICRA). 1524–1531. https://doi.org/10.1109/ICRA.2014.6907054
  • Hoffmann (2015) Henry Hoffmann. 2015. JouleGuard: Energy Guarantees for Approximate Applications. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). ACM, New York, NY, USA, 198–214. https://doi.org/10.1145/2815400.2815403
  • Imes et al. (2015) C. Imes, D. H. K. Kim, M. Maggio, and H. Hoffmann. 2015. POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium. 75–86. https://doi.org/10.1109/RTAS.2015.7108419
  • Izadi et al. (2011) Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 559–568.
  • Mishra et al. (2018) Nikita Mishra, Connor Imes, John D. Lafferty, and Henry Hoffmann. 2018. CALOREE: Learning Control for Predictable Latency and Low Energy. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA, 184–198. https://doi.org/10.1145/3173162.3173184
  • Mur-Artal et al. (2015) R. Mur-Artal, J. M. M. Montiel, and J. D. Tardøs. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (Oct. 2015), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671
  • Nardi et al. (2017) L. Nardi, B. Bodin, S. Saeedi, E. Vespa, A. J. Davison, and P. H. J. Kelly. 2017. Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper. ArXiv e-prints (Feb. 2017). arXiv:1702.00505
  • Nardi et al. (2015) Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H. J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O’Boyle, Graham Riley, Nigel Topham, and Steve Furber. 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In IEEE International Conference on Robotics and Automation (ICRA). arXiv:1410.2167.
  • Newcombe et al. (2011) Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In IEEE ISMAR. IEEE.
  • Oh et al. (2016) Jinwook Oh, Jungwook Choi, Guilherme C Januario, and Kailash Gopalakrishnan. 2016. Energy-Efficient Simultaneous Localization and Mapping via Compounded Approximate Computing. In Signal Processing Systems (SiPS), 2016 IEEE International Workshop on. IEEE, 51–56.
  • Palomar et al. (2017) Oscar Palomar, Andy Nisbet, John Mawer, Graham Riley, and Mikel Lujan. 2017. Reduced precision applicability and trade-offs for SLAM algorithms. In Third Workshop on Approximate Computing (WACAS).
  • Park et al. (2015) Jongse Park, Hadi Esmaeilzadeh, Xin Zhang, Mayur Naik, and William Harris. 2015. FlexJava: Language Support for Safe and Modular Approximate Programming. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 745–757. https://doi.org/10.1145/2786805.2786807
  • Pothukuchi et al. (2016) Raghavendra Pradyumna Pothukuchi, Amin Ansari, Petros Voulgaris, and Josep Torrellas. 2016. Using Multiple Input, Multiple Output Formal Control to Maximize Resource Efficiency in Architectures. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA ’16). IEEE Press, Piscataway, NJ, USA, 658–670. https://doi.org/10.1109/ISCA.2016.63
  • Rahmani et al. (2018) Amir M. Rahmani, Bryan Donyanavard, Tiago Müch, Kasra Moazzemi, Axel Jantsch, Onur Mutlu, and Nikil Dutt. 2018. SPECTR: Formal Supervisory Control and Coordination for Many-core Systems Resource Management. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA, 169–183. https://doi.org/10.1145/3173162.3173199
  • Ratter et al. (2013) A. Ratter, C. Sammut, and M. McGill. 2013. GPU Accelerated Graph SLAM and Occupancy Voxel Based ICP for Encoder-Free Mobile Robots. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. 540–547. https://doi.org/10.1109/IROS.2013.6696404
  • Saeedi et al. (2017) Sajad Saeedi, Luigi Nardi, Edward Johns, Bruno Bodin, Paul H. J. Kelly, and Andrew J Davison. 2017. Application-oriented Design Space Exploration for SLAM Algorithms. IEEE, 5716–5723. https://doi.org/10.1109/ICRA.2017.7989673
  • Samadi et al. (2014) Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-Based Approximation for Data Parallel Applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’14). ACM, New York, NY, USA, 35–50. https://doi.org/10.1145/2541940.2541948
  • Sampson et al. (2011) Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate Data Types for Safe and General Low-Power Computation. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA, 164–174. https://doi.org/10.1145/1993498.1993518
  • Sampson et al. (2014) Adrian Sampson, Pavel Panchekha, Todd Mytkowicz, Kathryn S. McKinley, Dan Grossman, and Luis Ceze. 2014. Expressing and Verifying Probabilistic Assertions. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 112–122. https://doi.org/10.1145/2594291.2594294
  • Sidiroglou-Douskos et al. (2011) Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-offs With Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). ACM, New York, NY, USA, 124–134. https://doi.org/10.1145/2025113.2025133
  • Sui et al. (2016) Xin Sui, Andrew Lenharth, Donald S. Fussell, and Keshav Pingali. 2016. Proactive Control of Approximate Programs. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). ACM, New York, NY, USA, 607–621. https://doi.org/10.1145/2872362.2872402
  • Tibshirani (1996) Robert Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society 58, 1 (1996), 267–288.
  • Whelan et al. (2015) Thomas Whelan, Stefan Leutenegger, Renato Salas Moreno, Ben Glocker, and Andrew Davison. 2015. ElasticFusion: Dense SLAM Without A Pose Graph. In Proceedings of Robotics: Science and Systems. Rome, Italy. https://doi.org/10.15607/RSS.2015.XI.001
  • Whelan et al. (2012) Thomas Whelan, John Mcdonald, Michael Kaess, Maurice Fallon, Hordur Johannsson, and John J. Leonard. 2012. Kintinuous: Spatially Extended KinectFusion. In 3rd RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras.
  • Yan Pei, Swarnendu Biswas, Donald S. Fussell, and Keshav Pingali (2017) Yan Pei, Swarnendu Biswas, Donald S. Fussell, and Keshav Pingali. 2017. An Elementary Introduction to Kalman Filtering. ArXiv e-prints (Oct. 2017). arXiv:1710.04055