I Introduction
Autonomous vehicles operating in the real world must navigate through a priori unknown environments using onboard, limitedrange sensors. As a vehicle makes progress towards a goal and receives new sensor information about the environment, rigorous safety analysis is critical to ensure that the system’s behavior does not lead to dangerous collisions. In order to provide such safety guarantees for real vehicles, any analysis should take into account multiple sources of uncertainty, such as modelling error, external disturbances, and unknown parts of the environment.
A variety of mechanisms have been proposed to ensure robustness to modeling error and external disturbances [24, 16, 34]. Additionally, safety guarantees for systems using limitedrange sensors in unknown environments have been the subject of recent investigation [21, 22, 32, 19]. Although interesting results have emerged from these studies, the safety guarantees are provided by imposing specific assumptions on the sensor and/or the planner that are rather restrictive for a variety of realworld autonomous systems and sensors used for navigational purposes. In contrast, rather than proposing a specific planning and sensing paradigm that guarantees safety, we aim to design a safety framework that is compatible with a broad class of sensors, planners, and dynamics.
There are two main challenges with providing such a framework. The first challenge relates to ensuring safety with respect to unknown parts of the environment and external disturbances while minimally interfering with goaldriven behavior. Second, realtime safety assurances need to be provided as new environment information is acquired, which requires approximations that are both computationally efficient and not overly conservative. Moreover, this safety analysis should be applicable to a wide variety of realworld sensors, planners, and vehicles.
In this paper, we propose a safety framework that can overcome these challenges for autonomous vehicles operating in a priori unknown static environments under the assumption that the sensors work perfectly within their ranges. Erroneous and noisy sensors can make safety analysis significantly more challenging and we defer this to future work. Our framework is based on Hamilton Jacobi (HJ) reachability analysis [25, 27], a verification method for guaranteeing safety and performance for dynamical systems with general nonlinear dynamics and disturbances. This approach provides not only the set of states from which the dynamical system can always satisfy state constraints (i.e. remain collisionfree), but also provides an optimal controller that guarantees the system will never violate the state constraints. In particular, we treat the unknown environment at any given time as an obstacle and use HJ reachability to compute the backward reachable set (BRS), i.e. the set of states from which the vehicle can enter the unknown and potentially unsafe part of the environment, despite the best control effort. The complement of the BRS therefore represents the safe set for the vehicle. With this computation, we also obtain the corresponding least restrictive safety controller, which does not interfere with the planner unless the safety of the vehicle is at risk. Use of HJ reachability analysis in our framework thus allows us to overcome the first challenge—our framework can be applied to general nonlinear vehicles, sensors, and planners.
In general, due to the computationally expensive nature of HJ computations, this approach has not been leveraged in settings where the environment is not known beforehand and rather is sensed at runtime. To overcome this challenge, we propose a novel, realtime algorithm to compute the BRS. Our algorithm only locally updates the BRS in light of new environment information, which significantly alleviates the computational burden of HJ reachability while still maintaining the safety guarantees at all times. To summarize, our key contributions are:

a provablysafe framework for navigation in static unknown environments that is applicable to a broad class of sensors, planners, and dynamics,

an algorithm for online safe set updates from new sensor measurements as the robot navigates,

demonstration of our approach on different sensors and planners on a vehicle with nonlinear dynamics in the presence of external disturbances,

a hardware demonstration of our approach to provide safety around a stateoftheart learningbased planner which uses only monocular RGB images for planning.
Ii Related Work
An extensive body of research deals with motion planning and safe exploration for robots in unknown environments, some of which focuses on safety guarantees despite modeling error and external disturbances. We cannot hope to summarize all these works here, but we attempt to discuss several of the most closely related approaches.
Iia Safe motion planning
Methods that ensure safety despite modeling error and disturbances are largely motivated by the tradeoff between safety and efficiency during realtime planning. A popular approach is to perform offline computations that quantify disturbances and modeling error which can be used online to determine collisionfree trajectories [24, 16, 34]. Alternatively, [2, 3] use control barrier functions to design provably stable controllers while satisfying given statespace constraints. However, these methods assume that a recursively feasible collisionfree path can be obtained despite the unknown environment, which may not be possible in realworld environments. Several works address this problem for singleagent scenarios within a model predictive control framework [29, 31], as well as for multiple vehicles using sequential trajectory planning [33, 5]. However, these works assume a priori knowledge of all obstacles, whereas our framework guarantees safety in an a priori unknown environments for potentially highorder nonlinear dynamics.
Ensuring safety with respect to both modeling error and limited sensing horizons have been studied using sumofsquares [21], linear temporal logic [22], reactive synthesis approaches [32], graphbased kinodynamic planner [19] among others. These works typically impose restrictions on sensors or planners to ensure safety with respect to the unknown environment. In contrast, the proposed framework is sensor and planner agnostic, provided that the sensor can accurately identify the obstacles within it’s sensing region.
IiB Safe exploration
The problem of finding feasible trajectories to a specified goal in an unknown environment has also been studied in the robotic exploration literature for simplified kinematic motion models using frontierexploration methods [36] and D* [20]. Other works include samplingbased motion planners for driftless dynamics [8] and dynamic exploration methods for vehicles with a finite stopping time [17]
. Robotic exploration has been also studied within the context of fully and partially observable Markov decision processes
[30, 28][1, 18]to reduce collision probabilities; however, no theoretical safety guarantees are typically provided.
Safe exploration has also been studied in terms of Lyapunov stability [9, 11]. Even though stability is often desirable, it is insufficient to guarantee collision avoidance. In contrast, our formulation uses a stronger definition of safety, and is more in line with [15, 13], which characterize safety using reachable sets.
Iii Problem Statement
In this work, we study the problem of autonomous navigation in apriori unknown static environments. Consider a stable, deterministic, nonlinear dynamical model of the vehicle
(1) 
where , , and represent the state, the control, and the disturbance experienced by the vehicle. Here, can include the effect of both the external disturbances or dynamics mismatch. For convenience, we partition the state into the position component and the nonposition component : . We assume that the flow field is uniformly continuous in time, and Lipschitz continuous in for fixed and . With this assumption, given , there exists a unique trajectory solving (1) [12]. We also assume that the vehicle state can be accurately sensed at all times.
Let and denote the start and the goal state of the vehicle. The vehicle aims to navigate from to in an a priori unknown environment, , whose map or topology is not available to the robot. At any time and state , the vehicle has a planner , which outputs the control command to be applied at time . The vehicle also has a sensor which at any given time exposes a region of the state space
, and provides a conservative estimate of the occupancy within
. For example, if the vehicle has a camera sensor, would be a triangular region (prismatic in 3D) representing the fieldofview of the camera. We assume perfect perception within this limited sensor range. Dealing with erroneous perception, sensor noise, and dynamic environments are problems in their own right, and we defer them to future work. Finally, we assume that there is a known initial obstaclefree region around given by ; e.g. this is the case when the vehicle is starting at rest and its initial state is collisionfree.Given , , , the planner , and the sensor , the goal of this paper is to design a least restrictive control mechanism to navigate the vehicle to the goal state while remaining safe, which means avoiding obstacles at all times. Since the environment is unknown, the safety needs to be ensured given the partial observations of the environment obtained through the sensor, which in general is challenging. We use the HJ reachabilitybased framework to ensure safety despite only partial knowledge of the environment.
Iv Running Example
To illustrate our approach and provide intuition behind the proposed framework, we introduce a simple running example: a 3dimensional Dubins’ car system with disturbances added to the velocity. The dynamics of the system are given by:
(2)  
where is the state, is the position, is the heading, and is the disturbance experienced by the vehicle. The control of the vehicle is , where is the speed and is the turn rate. Both controls have a lower and upper bound, which for this example are chosen to be , , . The disturbance bound is chosen as .
The environment setup for is shown in Figure 2. The vehicle start and the goal state are given by (shown in black) and (the center of the green area). The goal is to reach within of (the light green area). However, there is an obstacle in the environment which is not known to the vehicle beforehand (shown in grey). At the beginning of the running example navigation task, we assume that there is no obstacle within of , and obtain the initial obstaclefree region (the area inside the dashed black line).

To demonstrate the sensoragnostic nature of our approach, we simulate the Dubins’ car with two different sensors: a LiDAR and a camera. For a LiDAR, the sensing region is given by a circle of radius centered around the current position , where in this simulation (shown in Figure 2). For a camera, the sensing region is determined by a triangular region with solid angle (also called the fieldofview) and apex at the current vehicle heading, and a maximum extent of . We use and for our simulations (shown in Figure 2). However, part of the regions of can be occluded by the obstacles, as would be the case for any realworld sensors.
Additionally, for each sensor, we demonstrate our approach on two different planners : a samplingbased planner and a modelbased planner. For the samplingbased planner, we use a RapidlyExploring Random Tree (RRT) [23], and for the modelbased planner, we use a splinebased planner [35]. Our goal is to safely navigate to the goal position despite the unknown obstacles.
V HamiltonJacobi Reachability
Our framework is based on Hamilton Jacobi (HJ) reachability analysis [25, 27]. This analysis has been successfully applied in a variety of domains, such as aircraft autolanding and safe multivehicle path planning [5, 6]. In this work, we will be using reachability analysis to compute a backward reachable set (BRS) given a set of unsafe states . Intuitively, is the set of states such that the system trajectories that start from this set can enter within a time horizon of for some disturbance despite the best control efforts. In contrast, for any trajectory that starts from , there exists a control such that the system trajectory will never enter , despite the worstcase disturbance. Here, represents the complement of the set .
The computation of the BRS can be formulated as a differential game between the control and disturbance, which can be solved using the principle of dynamic programming. The cost functional corresponding to this differential game is given by:
(3) 
where represents the system state at time starting from state at time and applying control with disturbance . In (3), the function is the implicit surface function representing the unsafe set . Intuitively, keeps track of whether the system trajectory even entered the unsafe set during the time horizon , and if so, the cost corresponding to that trajectory is negative.
The value function corresponding to the cost functional in (3) is given by:
(4) 
where represents the set of nonanticipative strategies [27]. If the value function is negative for a given state, then starting from this state the system cannot avoid entering into the unsafe set eventually. Thus, the value function in (4) keeps track of all unsafe trajectories of the system, which in turn can be used to compute the safe trajectories for the system. For further details on this formulation, we refer the interested readers to [27, 6].
The value function in (4) can be obtained using dynamic programming, which yields a Hamilton JacobiIsaacs Variational Inequality (HJIVI) [14, 25]. Ultimately, a BRS can be computed by solving the following final value HJIVI:
(5)  
Here, and denote the time and space derivatives of the value function respectively. The Hamiltonian, , encodes the role of system dynamics, control, and disturbance, and is given by
(6) 
Once the value function is computed, the BRS, and consequently, the set of safe states are given by
(7)  
(8) 
HJI reachability also provides the optimal control to keep the system in the safe set and is given by
(9) 
In fact, the system can safely apply any control as long as it is not at the boundary of the unsafe region. If the system reaches the boundary of , the control in (9) steers the system away from the unsafe states. This least restrictive controller provided by HJI reachability is also the basis for ensuring safety for any planner in a least restrictive fashion in our framework.
Vi Our Approach
We propose an HJreachabilitybased framework to ensure safety in an a priori unknown environment. Our framework also uses a novel, realtime computation of a conservative approximation of the safe set based on new observations of the environment as the vehicle is navigating. We first describe our framework, and then present a realtime algorithm to update the safe set.
Via Ensuring safety in unknown environments
Our framework treats the unsensed environment at any given time as an obstacle. The unsensed environment along with the sensed obstacles are used to compute a safe region for the vehicle using HJI reachability. This ensures that the vehicle never enters the unknown and potentially unsafe part of the environment, despite the worst case disturbance.
Let denote the sensed obstaclefree region of the environment at any time . Given the initial obstaclefree region , we compute the corresponding safe set by solving the HJIVI in (5), assuming everything outside is an obstacle. For this computation, the initial value function in (5) is given by , where is positive inside and negative outside. One such function is given by the signed distance to . Starting from , the HJIVI is solved to obtain the value function . Here, is the dummy computation variable in (5). is then used to compute the safe region (see (8)). As long as the vehicle is inside , a controller exists to ensure that it does not collide with the known or unknown obstacles.
We next execute a controller on the system for the time horizon as per the following control law:
(10) 
where is the optimal safe controller corresponding to and is given by (9). Also, until the safe set is updated, we use the last computed safe set for finding the optimal safe controller, i.e., . The control mechanism in (10) is least restrictive in the sense that it lets the planner execute the desirable control on the system, except when the system is at the risk of violating safety. Note that the control horizon in our framework can be arbitrarily chosen by the system designer while still ensuring safety.
While the system is executing the control law in (10), it will obtain new sensor measurements at each time , which is used to obtain , the free space sensed at that time. If the sensor is completely occluded by an obstacle at any time, the corresponding free space is an empty set. Thus, the overall known free space at time is given by:
(11) 
At the end of the control time horizon, we compute another safe region assuming everything outside is an obstacle. This safe region is obtained by solving HJIVI until convergence. We then execute a control law similar to in (10), except that the safety controller intervenes only when the system is at the boundary of . The entire procedure is repeated until the system reaches the goal state.
Since the safety controller does not allow the system trajectory to leave the known free space, the proposed framework is guaranteed to avoid collision at all times. However, the safe set can be rather conservative especially early on when most of the environment is still unexplored, which is a tradeoff we make to ensure safety against all unexpected obstacles. If additional information about the obstacles in the environment is known, it can be incorporated and will only reduce the conservativeness of the safe set.
Note that the safe set does not necessarily need to be updated every seconds. It can be updated faster, slower or at the same rate as the control horizon. Essentially one can use the most recent safe set in the control law in (10), and still ensure safety at all times. This is because the safe set at any time is only smaller than the safe set at time when . However, the safe set should be updated as quickly as possible to minimize interference with the planner.
ViB Efficient update of the BRS
Our framework requires the computation of a safe set in realtime as the vehicle is navigating through the environment. In general, this is challenging due to the exponentially scaling computational complexity of HJI reachability with respect to the state dimension [6]. To mitigate some of the computational challenges, we introduce two novel approaches to computing the BRS: warmstarting and local value function updates.
ViB1 Warmstart approach
At any given time, the vehicle senses only a small additional part of the environment. Consequently, the free space map only changes by a small amount in a small time horizon. Intuitively, this should only cause a small change in the safe region. We leverage this intuition to propose a novel, faster way to update the reachable set. For brevity, we explain our approach assuming that the safe set is updated every seconds, but the same results hold when the safe set is updated at a nonfixed rate.
Given the last computed safe set at time , and the maps at and the current time , we “warmstart” the value function in (5) for the BRS computation at time as follows:
(12) 
where as before is defined such that it is positive inside and negative outside. Intuitively, instead of initializing the value function with everywhere in the state space, (12) initializes it with the last computed value function for the states where no new information has been obtained since the last computation, and with only at the states which were previously assumed to be occupied but are actually obstaclefree. This leads to a much faster computation of BRS because the value function needs to be updated only for a much smaller number of states that are newly found out to be free. At all the other states, the value function is already almost accurate and only small refinements are required. Interestingly, this procedure also maintains the conservativeness of the safe region, which is sufficient to ensure collision avoidance at all times.
Lemma 1
Let be the solution of the following warmstarted HJIVI:
where is defined as in (12). Let be the solution of the HJIVI in (5) with . Then for all . In particular, and .
Since represents the converged value function at time , we have . Therefore, from (12), . Now we are ready to prove our claim.
Intuitively, Lemma 1 states that the safe set obtained by the warmstart approach is an underapproximation of the actual safe set obtained by solving full HJIVI. Thus, it can be used to ensure safety for the vehicle while being computationally efficient. In practice, for the sensors and navigation problems in this paper, the amount of conservatism incurred by warmstarting is very small, as we demonstrate in Section VII. Our overall approach with warmstarting to update the safe set is summarized in Algorithm 1.
We start with the initial known free space and compute the initial safe set using HJIVI (Line 6). The value function for this computation is initialized by the signed distance to . We also maintain the last computed BRS , the safe set , and the corresponding time (Line 7). At every state, the vehicle obtains the current sensor observation and extracts the sensed free space (Line 9 and 10). Next, a control command is applied to the vehicle (Line 11). If the vehicle is inside , the planner is used to obtain the control command; otherwise, the safety controller is applied. Every seconds, the safe set and controller are updated based on the free space sensed by the vehicle so far using HJIVI (Line 15). The value function for this computation is warmstarted with except at the states which are discovered to be obstacle free since as described in (12) (Line 14). The whole procedure is repeated until the vehicle reaches its goal.
ViB2 Local update of the BRS
In the last section, we discussed how warmstarting the value function computation might lead to a faster convergence of the value function; however, the value function is still computed over the entire state space. In this section, we present a more practical algorithm that leverages the advantages of warmstarting by computing and updating the value function only locally at the states for which new information has been obtained since the last value function computation. Our safety framework is still same as what described in Algorithm 1—only the computational procedure for the safe set computation (Line 15 in Algorithm 1) is being modified to update the value function locally. We outline this procedure in Algorithm 2.
In Algorithm 2, we maintain a list of states at which the value function needs to be updated in light of the new environment observations. is initialized to be the set of states that are newly discovered to be free since , i.e., (Line 1). Since the change in the value of the states in (compared to ) would also cause a change in the value of the neighboring states, , we also add them to (Line 2). Thus, . Typically, the value function in HJIVI is computed by discretizing the statespace into a grid and solving the VI over that grid. In such cases, the spatial derivative of the value function (required to compute the Hamiltonian in the HJIVI in (5)) is computed numerically using the neighboring grid points. This spatial derivative is precisely responsible for the propagation of the change in the value function at a state to its neighboring states. In such cases, might represent the neighboring grid points used to compute the spatial derivative of the value function for the states in ; however, other neighboring criteria can be used.
Once the neighbors are added to , the value for all the states in is initialized as per (12) (Line 3), and their value is updated using HJIVI in (5) for some time step (Line 6). This computation is much faster than classical HJIVI computation since it is typically performed for many fewer states. Next, we remove all those states from whose value function hasn’t changed significantly over this (Line 8 and 9), as these states won’t cause any change in the value function for any other state. The neighbors of the remaining states are next added to (Line 10) and the entire procedure is repeated until the value function is converged for all states. Note that Algorithm 2 still maintains the conservatism of the safe set since it is just a different computational procedure for computing the warmstarted value function, which is still used within the safety framework in Algorithm 1.
Vii Simulations
Viia Running example revisited
We now return to our running example and demonstrate the proposed approach in simulation (described in Section IV). We implement our safety framework with three different methods to update the BRS: using the full HJIVI, the warmstart approach (Section VIB1), and the local update approach (Section VIB2). The corresponding system trajectories for different planners and sensors for all the three methods are shown in Figure 3. For all combinations of planners and sensors, the proposed framework is able to safely navigate the vehicle to its goal position despite the external disturbances and no a priori knowledge of the obstacle (none of the trajectories go through the obstacle). As the vehicle navigates through the environment, the planner makes optimistic decisions at several states that might lead to a collision; however, the safety controller intervenes to ensure safety. States where the safety controller is applied are marked in red.
Note that the safety controller intervenes more frequently for the camera sensor as compared to the LiDAR. This is because the fieldofview (FoV) of a camera is typically much narrower than a LiDAR (which senses the obstacles in all directions within a range). Given this limited FoV, the safety controller needs to account for a much larger unexplored environment, which in turn leads to more cautious control.
We compare the computation time required for each of the three methods to compute the BRS for the camera and LiDAR sensors in Table I. All computations were done on a MATLAB implementation on a desktop computer with a Core i7 5820K processor using the Level Set toolbox [26]. As expected, across all scenarios, warmstarting the value function for the BRS computation leads to a significant improvement in computation time compared to full HJIVI; however, the computation time might still not be practical for most realworld applications. Only locally updating the value function in addition to warmstarting leads to a significant further improvement in the computation time, and the BRS is updated in approximately 1s on average for all sensors and planners. This improvement is impressive considering that the computation was done in MATLAB without any parallelization which is known to decrease the computation time by a factor of 100 [10].
Lemma 1 indicates that the safe set obtained by warmstarting the value function is conservative compared to the one obtained by full HJIVI. Therefore, we also compare the percentage volume of the states at which the safe set is conservative. This overconservative volume is typically limited to 0.5% which indicates that the warmstarting approach is able to approximate the true value function quite well.


Finally, we take a closer look at how the safe control comes into play when the system is operating with a rangelimited sensor. Figure 4 showcases a Dubins’ car with a camera sensor and an RRT planner, where the current robot state is shown in black, the corresponding sensed region is in dark blue, and the trajectory and corresponding sensed regions are shown in grey and light blue respectively. Since the camera’s FoV is occluded by an obstacle at the current state, it cannot sense the environment past the obstacle. Figure 4 illustrates the corresponding current belief map of the environment which is the union of the free space sensed by the vehicle so far (shown in white). Since the current sensed region is contained within the sensed region at the previous state, no new environment information is obtained and hence the BRS is not updated. The slice of the safe set at the current vehicle heading is shown in Figure 4 (the area within the red boundary). Since the vehicle is at the boundary of the safe set, the safety controller intervenes and applies a control that leads the system towards the interior of the safe set (the red arrow) to ensure collision avoidance.
ViiB Safety for a learningbased planner
Since the proposed safety framework is planneragnostic, we can use it to ensure safe navigation even in the presence of a learningbased planner. In particular, we use the visionbased planner proposed in [7]
, which takes an RGB camera image and the goal position as input, and uses a Convolutional Neural Networkbased perception module to produce a desired next state that moves the robot towards its goal while trying to avoid obstacles on its way. This desired next state is used by a modelbased lowlevel planner to produce a smooth trajectory from the vehicle’s current state to the desired state. The authors demonstrate that the proposed planner can leverage robot’s prior experience to navigate efficiently in novel indoor cluttered environments; however, it still leads to collisions in several realworld scenarios, like when the vehicle needs to go through narrow spaces. We use the proposed safety framework to ensure both safe and efficient planning in such difficult navigation scenarios.
The task setup for our simulation is shown in Figure 5. The robot needs to go through a very narrow hallway, followed by a door into the room to reach its goal (the green circle) starting from the initial state (black arrow). At the beginning, the robot has no knowledge about the obstacles (shown in dark grey). We simulate this scenario using the S3DIS dataset which contains mesh scans of several Stanford buildings [4]. By rendering this mesh at any state, we can obtain the image observed by the camera (used by the planner) as well as the occupancy information within the robot’s FoV (used for the safety computation). For the robot dynamics, we use the 4D Dubins’ car model:
(13) 
where is the position, is the heading, and is the speed of the vehicle. The control is , where is the acceleration and is the turn rate.
The trajectory taken by the learningbased planner in the absence of the safety module is shown in Figure 5. Even though the vehicle is able to go through the narrow hallway, it collides with the door eventually. The trajectory taken by the vehicle when the planner is combined with the proposed safety framework is shown in Figure 5. When the planner takes an unsafe action near the door, the safety controller intervenes (marked in red) and guides the robot to safely go through the doorway. We also illustrate the image observed by the robot near the doorway in Figure 5. Even though most of the robot’s vision is blocked by the door, the planner makes a rather optimistic decision of moving forward and leads to a collision. In contrast, the safety controller makes a conservative decision of rotating in place to explore the environment more before moving forward, and eventually goes through the doorway to reach the goal. The planneragnostic nature of our framework allows us to provide safety guarantees around learningbased planners as well.
Viii Experiments
We test the proposed approach in hardware using a TurtleBot 2 with a mounted stereo RGB camera. For the vehicle state measurement, we use the onboard odometry sensors on the TurtleBot. In our experiment, the vehicle needs to navigate through an unknown cluttered indoor environment to reach its goal (shown in Figure 1). For the BRS computation, we use the dynamics model in (13). We premap the environment using an opensource Simultaneous Localization and Mapping (SLAM) algorithm and the onboard stereo camera. This premapping step is used to avoid the significant delay and inaccuracies in the realtime SLAM map update. However, the full map is not provided to the robot during deployment. Instead, for the safe set computation at any given time, the current FoV of the camera is projected on the SLAM map and only the information within the FoV is used to update the safe set. This emulates the limited sensor range of the Turtlebot’s camera. Regardless, this alludes to one of the important and interesting future research directions of ensuring safety despite sensor noise.
For planning, we use the learningbased planner described in Section VIIB that uses the current RGB image to determine a candidate next state. A topview of our experiment setting is shown in Figure 6. The vehicle starting position and heading are shown in black, the goal region is shown in green, and the obstacles (unknown to the vehicle beforehand) are shown in grey. We ran the experiment with and without the safety controller and show the corresponding trajectories in Figure 6. Without the safety controller, the learningbased planner struggles with making sharp turns near the corner, and eventually collides into the obstacle (the chair, in this case). For context, we also show the RGB observation received by the planner near the corner. Even though the robot is very close to the chair, the planner makes the unsafe decision of continuing to move forward. However, when the learningbased planner is used within the proposed safety framework, the safety controller is able to account for this unsafe situation and safely steer the vehicle away from the obstacle. We show the corresponding safe set when the vehicle is at the obstacle boundary and the corresponding vehicle trajectory obtained using the safety controller. Afterwards, the planner takes over and steers the vehicle to the goal.
Ix Practical Considerations
When implementing a safety framework on real systems, there are many practical considerations that should be acknowledged. Below we discuss some of the main practical considerations we encountered during our simulation and hardware experiments when using our proposed framework:

Since the value function is computed over a discretized state space, it might incur some numerical inaccuracies. Using a finer discretization and a higher order approximation for the spatial derivative is often helpful in alleviating these issues; however, the computation time also increases consequently. In our experience, we have found the 3rd or higher order approximations schemes to work pretty well.

Due to the complicated geometry of realworld obstacles, the sensed map at any given time could appear highly irregular. Such irregular maps induce irregular implicit surface functions, which can significantly hamper the value function computation. Thus, it might often be desirable to convert the occupancy map into a regular, wellbehaved function, such as a signed distance map, and use that for the value function computation.

Theoretically speaking, the safety controller only needs to be applied when the vehicle is at the boundary of the safe set. However, in practice, due to numerical inaccuracies, it might be desirable to apply the safety controller at a positive level of the value function.
X Conclusion And Future Work
Providing safety guarantees for realworld autonomous systems operating in a priori unknown environments is a challenging but important problem. In this paper, we propose an HJ reachabilitybased safety framework for navigation in unknown environments that is applicable to a wide variety of planners and sensors. To overcome the computation complexity of classical HJ reachability analysis, we propose a novel, realtime algorithm to update the reachable set as the vehicle traverses the environment. We demonstrate our approach on multiple sensors and planners, including a learningbased planner, both in simulation and on a hardware testbed.
Several interesting future directions emerge from this work. First, the proposed framework currently assumes perfect state estimation and sensor measurements. However, these assumptions may not hold in practice, and need to be appropriately accounted for in the framework. Additionally, the current safety guarantees hold for static environments–extensions to dynamic, multiagent environments is another interesting and valuable research direction.
Acknowledgments
This research is supported in part by the DARPA Assured Autonomy program under agreement number FA875018C0101, by NSF under the CPS Frontier project VeHICaL project (1545126), and by SRC under the CONIX Center, and by Berkeley Deep Drive.
The authors would also like to thank Kene Akametalu, Ellis Ratner, and Anca Dragan for their helpful advice on various technical issues examined in this paper.
References
 [1] J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. In ICML, 2017.
 [2] A. Agrawal and K. Sreenath. Discrete control barrier functions for safetycritical control of discrete systems with application to bipedal robot navigation. In RSS, 2017.
 [3] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2017.

[4]
I. Armeni, A. Sax, A. R. Zamir, and S. Savarese.
Joint 2D3DSemantic Data for Indoor Scene Understanding.
arXiv preprint, 2017.  [5] S. Bansal, M. Chen, J. Fisac, and C. J. Tomlin. Safe sequential path planning under disturbances and imperfect information. In ACC, 2017.
 [6] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin. HamiltonJacobi reachability: A brief overview and recent advances. In CDC, 2017.
 [7] S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin. Combining optimal control and learning for visual navigation in novel environments. arXiv preprint, 2019.
 [8] K. E. Bekris and L. E. Kavraki. Greedy but safe replanning under kinodynamic constraints. In ICRA, 2007.
 [9] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause. Safe modelbased reinforcement learning with stability guarantees. NIPS ’17.
 [10] M. Chen, S. Bansal, K. Tanabe, and C. J. Tomlin. Provably safe and robust drone routing via sequential path planning: A case study in san francisco and the bay area. arXiv preprint, 2017.
 [11] Y. Chow, O. Nachum, E. DuenezGuzman, and M. Ghavamzadeh. A lyapunovbased approach to safe reinforcement learning. In NIPS ’18.

[12]
E. A. Coddington and N. Levinson.
Theory of ordinary differential equations
. McGrawHill, New York, 1955. pp. 1–42.  [13] J. Fisac, A. Akametalu, M. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. A general safety framework for learningbased control in uncertain robotic systems. IEEE Trans. on Automatic Control, 2018.
 [14] J. Fisac, M. Chen, C. J. Tomlin, and S. Sastry. Reachavoid problems with timevarying dynamics, targets and constraints. In HSCC, 2015.
 [15] T. Fraichard and H. Asama. Inevitable collision statesa step towards safer robots? Advanced Robotics, 18(10):1001–1024, 2004.
 [16] S. Herbert, M. Chen, S. Han, S. Bansal, J. Fisac, and C. J. Tomlin. Fastrack: a modular framework for fast and guaranteed safe motion planning. In CDC, 2017.
 [17] L. Janson, T. Hu, and M. Pavone. Safe motion planning in unknown environments: Optimality benchmarks and tractable policies. arXiv preprint, 2018.
 [18] G. Kahn, A. Villaflor, V. Pong, P. Abbeel, and S. Levine. Uncertaintyaware reinforcement learning for collision avoidance. arXiv preprint, 2017.
 [19] D.F. Keil, J. Fisac, and C. J. Tomlin. Safe and complete realtime planning and exploration in unknown environments. arXiv preprint, 2018.
 [20] S. Koenig and M. Likhachev. Fast replanning for navigation in unknown terrain. IEEE Trans. on Robotics, 21(3):354–363, 2005.
 [21] S. Kousik, S. Vaskov, F. Bu, M. J. Roberson, and R. Vasudevan. Bridging the gap between safety and realtime performance in recedinghorizon trajectory design for mobile robots. arXiv preprint, 2018.
 [22] M. Lahijanian, M. R. Maly, D. Fried, L. E. Kavraki, H. KressGazit, and M. Vardi. Iterative temporal planning in uncertain environments with partial satisfaction guarantees. IEEE Trans. on Robotics, 32:583–599, 2016.
 [23] Steven M LaValle. Rapidlyexploring random trees: A new tool for path planning. 1998.
 [24] A. Majumdar and R. Tedrake. Funnel libraries for realtime robust feedback motion planning. The International Journal of Robotics Research, 36(8):947–982, 2017.
 [25] K. Margellos and J. Lygeros. HamiltonJacobi Formulation for Reach–Avoid Differential Games. IEEE Trans. on Automatic Control, 56(8):1849–1861, 2011.
 [26] I. Mitchell. A toolbox of level set methods. http://www. cs. ubc. ca/mitchell/ToolboxLS/toolboxLS. pdf, Tech. Rep. TR200409, 2004.
 [27] I. Mitchell, A. Bayen, and C. J. Tomlin. A timedependent hamiltonjacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. on automatic control, 50(7):947–957, 2005.
 [28] T. Moldovan and P. Abbeel. Safe exploration in markov decision processes. arXiv preprint, 2012.
 [29] A. Richards and J. How. Model predictive control of vehicle maneuvers with guaranteed completion time and robust feasibility. ACC ’03.
 [30] C. Richter, W. VegaBrown, and N. Roy. Bayesian learning for safe highspeed navigation in unknown environments. In Robotics Research, pages 325–341. 2018.
 [31] U. Rosolia and F. Borrelli. Learning model predictive control for iterative tasks. a datadriven control framework. IEEE Trans. on Automatic Control, 63(7):1883–1896, 2018.
 [32] S. Sarid, Bingxin X., and H. Kressgazit. Guaranteeing highlevel behaviors while exploring partially known maps. In RSS, 2012.
 [33] T. Schouwenaars, J. How, and E. Feron. Decentralized cooperative trajectory planning of multiple aircraft with hard safety guarantees. In AIAA Guidance, Navigation, and Control Conference, 2004.
 [34] S. Singh, A. Majumdar, J. Slotine, and M. Pavone. Robust online motion planning via contraction theory and convex optimization. In ICRA ’17.

[35]
R. Walambe, N. Agarwal, S. Kale, and V. Joshi.
Optimal trajectory generation for cartype mobile robot using spline interpolation.
IFACPapersOnLine, 49(1):601–606, 2016.  [36] L. Yoder and S. Scherer. Autonomous exploration for infrastructure modeling with a micro aerial vehicle. In Field and service robotics, 2016.