Technical Report: Reactive Semantic Planning in Unexplored Semantic Environments Using Deep Perceptual Feedback

02/25/2020 ∙ by Vasileios Vasilopoulos, et al. ∙ University of Pennsylvania 0

This paper presents a reactive planning system that enriches the topological representation of an environment with a tightly integrated semantic representation, achieved by incorporating and exploiting advances in deep perceptual learning and probabilistic semantic reasoning. Our architecture combines object detection with semantic SLAM, affording robust, reactive logical as well as geometric planning in unexplored environments. Moreover, by incorporating a human mesh estimation algorithm, our system is capable of reacting and responding in real time to semantically labeled human motions and gestures. New formal results allow tracking of suitably non-adversarial moving targets, while maintaining the same collision avoidance guarantees. We suggest the empirical utility of the proposed control architecture with a numerical study including comparisons with a state-of-the-art dynamic replanning algorithm, and physical implementation on both a wheeled and legged platform in different settings with both geometric and semantic goals.



There are no comments yet.


page 1

page 2

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation and Prior Work

Navigation is a fundamentally topological problem [4] reducible to purely reactive (i.e., closed loop state feedback based) solution, given perfect prior knowledge of the environment [24]. For geometrically simple environments, “doubly reactive” methods that reconstruct the local obstacle field on the fly [20, 9], or operate with no need for such reconstruction at all [2], can guarantee collision free convergence to a designated goal with no need for further prior information. However, imperfectly known environments presenting densely cluttered or non-convex obstacles have heretofore required incremental versions of random sampling-based tree construction [10] whose probabilistic completeness can be slow to be realized in practice, especially when confronting settings with narrow passages [18].

Fig. 1: Ghost Spirit [5] following a human, while avoiding some familiar and some novel obstacles in a previously unexplored environment. Familiar obstacles are recognized and localized using visually detected semantic keypoints (bottom left inset) [21], combined with geometric features (top left inset) [3] and avoided by a local deformation of space (Fig. 3) that brings them within the scope of a doubly reactive navigation algorithm [2]. Novel obstacles are detected by LIDAR and assumed to be convex, thus falling within the scope of [2]. Formal guarantees are summarized in Theorems 1 and 2 of Section V, and experimental settings are summarized in Fig. 7.
Fig. 2: Snapshot Illustration of Key Ideas, following [33]: The robot moves in the physical space, in an environment with known exterior boundaries (walls), toward a goal (pink) discovering along the way (black) both familiar objects of known geometry but unknown location (dark grey) and unknown obstacles (light grey), with an onboard sensor of limited range (orange disk). As in [33], these obstacles are processed by the perceptual pipeline (Fig. 4) and stored permanently in the semantic space if they have familiar geometry, or temporarily, with just the corresponding sensed fragments, if they are unknown. The consolidated obstacles (formed by overlapping catalogued obstacles from the semantic space), along with the perceptually encountered components of the unknown obstacles, are again stored in the mapped space. A change of coordinates, , entailing an online computation greatly streamlined relative to its counterpart in [33]

deforms the mapped space to yield a geometrically simple but topologically equivalent model space. This new change of coordinates defines a vector field on the model space, which is transformed in realtime through the diffeomorphism to generate the input in the physical space.

Monolithic end-to-end learning approaches to navigation – whether supporting metric [7] or topological [25] representations of the environment – suffer from the familiar problems of overfitting to specific settings or conditions. More modular data driven methods that separate the recruitment of learned visual representation to support learned control policies achieve greater generalization [28], but even carefully modularized approaches that handcraft the interaction of learned topological global plans with learned reactive motor control in a physically informed framework [17] cannot bake into their architectures the exploitation of crucial properties that robustify design and afford guaranteed policies.

I-B Summary of Contributions

I-B1 Architectural Contributions

In [33], we introduced a Deep Vision based object recognition system [21] as an “oracle” for informing a doubly reactive motion planner [2, 32], incorporating a Semantic SLAM engine [3] to integrate observations and semantic labels over time. Here, we extend this architecture in two different ways (see Fig. 4). First, new formal advances (described below) streamline the reactive computation, enabling robust online and onboard implementation (perceptual updates at 4Hz; reactive planning updates at 30Hz), affording tight realtime integration of the Semantic SLAM engine. Second, we incorporate a separate deep neural net that captures a wire mesh representation of encountered humans [13], enabling our reactive module to track and respond in realtime to semantically labeled human motions and gestures.

I-B2 Theoretical Contributions

We introduce a new change of coordinates, replacing the (potentially combinatorially growing) triangulation on the fly of [33] with a fixed convex decomposition [6] for each catalogued obstacle and revisit the prior hybrid dynamics convergence result [33] to once again guarantee obstacle free geometric convergence. However, this streamlined computation, enabling full realtime integration of the Semantic SLAM engine, now allows us to react logically as well as geometrically within unexplored environments. In turn, realtime semantics combined with human recognition capability motivates the statement and proof of new rigorous guarantees for the robots to track suitably non-adversarial (see Definition 4) moving targets, while maintaining collision avoidance guarantees.

I-B3 Empirical Contributions

We suggest the utility of the proposed architecture with a numerical study including comparisons with a state-of-the-art dynamic replanning algorithm [19], and physical implementation on both a wheeled and legged platform in highly varied environments (cluttered outdoor and indoor spaces including sunlight-flooded linoleum floors as well as featureless carpeted hallways). Targets are robustly followed up to speeds amenable to the perceptual pipeline’s tracking rate. Importantly, the semantic capabilities of the perceptual pipeline are exploited to introduce more complex task logic (e.g., track a given target unless encountering a specific human gesture).

I-C Organization of the Paper and Supplementary Material

After stating the problem in Section II, Section III describes the environment representation assumed for the diffeomorphism construction between the mapped and model spaces in Section IV. Section V includes our main formal results, Section VI and Section VII continue with our numerical and experimental studies, and Section VIII

concludes with ideas for future work. The supplementary video submission provides visual context for our empirical studies; we also include pointers to open-source software implementations, including both the MATLAB simulation package

111, and the ROS-based controller222, in C++ and Python.

Ii Problem Formulation

As in [32, 33], we consider a robot with radius , centered at , navigating a compact, polygonal, potentially non-convex workspace , with known boundary , towards a target . The robot is assumed to possess a sensor with fixed range , for recognizing “familiar” objects and estimating distance to nearby obstacles333For our hardware implementation, this idealized sensor is reduced to a combination of a LIDAR for distance measurements to obstacles and a monocular camera for object recognition and pose identification.. We define the enclosing workspace, as the convex hull of the closure of the workspace , i.e., .

The workspace is cluttered by a finite but unknown number of disjoint obstacles, denoted by . The set also includes non-convex “intrusions” of the boundary of the physical workspace into , that can be described as the connected components of . As in [2, 33], we define the freespace as , where is the open ball centered at with radius , and denotes its closure. Similarly to , we define the enclosing freespace, , as .

Although none of the positions of any obstacles in are à-priori known, a subset of these obstacles, indexed by , is assumed to be “familiar” in the sense of having a known, readily recognizable polygonal geometry, that the robot can identify and localize instantaneously from online sensory measurement. We require that this subset also includes all connected components of . The remaining obstacles in , indexed by , are assumed to be convex but are in all other regards completely unknown to the robot, while nevertheless satisfying the curvature condition given in [2, Assumption 2].

To simplify the notation, we neglect the robot dimensions, by dilating each obstacle by , and assume that the robot operates in . We denote the set of dilated obstacles derived from and , by and respectively. Since obstacles in are polygonal, and dilations of polygonal obstacles are not in general polygonal, we approximate obstacles in with conservative polygonal supersets. For obstacles in we require the following separation assumptions [2].

Assumption 1

Each obstacle has a positive clearance from any obstacle , with , and a positive clearance from .

Then, similarly to [24, 32, 33], we describe each polygonal obstacle by an obstacle function444Although the construction of such functions is a separate problem on its own, here we derive implicit representations using so-called “R-functions” from the constructive solid geometry literature [27]. We have already successfully implemented such constructions in a similar setting for star-shaped [32] and arbitrary polygonal obstacles [33]., , a real-valued map providing an implicit representation of the form that the robot can construct online after it has localized . We also require the following technical assumption.

Assumption 2

For each , there exists such that the set has a positive clearance from any obstacle .

Note that Assumptions 1 and 2 constrain the shape and placements (convex and sufficiently separated respectively) only of obstacles that have never previously been encountered. Familiar (polygonal, dilated by ) obstacles , while fixed, can be placed completely arbitrarily with no further prior information, and are allowed to overlap with the boundary of the enclosing freespace . Finally, we assume that the freespace is path-connected., i.e., the robot operates in a non-adversarial environment.

Based on these assumptions and considering first-order dynamics , the problem consists of finding a Lipschitz continuous controller , that leaves the freespace positively invariant and steers the robot to the (possibly moving) goal .

Iii Environment Representation

In this Section, we establish notation for the four distinct representations of the environment that we will refer to as planning spaces [32, 33], as shown in Fig. 2.

Iii-a Physical Space

The physical space is a description of the geometry of the actual world, inaccessible to the robot. It describes the enclosing workspace , punctured with the obstacles , giving rise to . The robot navigates this space toward , and discovers and localizes new obstacles along the way.

We denote by the set of physically “instantiated” familiar objects, indexed by , that drives the construction of the semantic, mapped and model spaces described next. Such elements of the power set also index the modes of our hybrid system (Section V, [33]).

Iii-B Semantic Space

The semantic space describes the robot’s evolving information about the environment, from the observable portions of a subset of unrecognized obstacles in , together with the polygonal boundaries of the familiar obstacles, that are instantiated when the sensory data triggers the identification and localization of a familiar obstacle.

We denote the set of unrecognized obstacles in the semantic space by , indexed by , and the set of familiar obstacles in the semantic space by . This environment is constantly updated, both by discovering and storing new familiar obstacles in the semantic map and by updating information regarding obstacles in . Here the robot is treated as a point particle.

Iii-C Mapped Space

Although the semantic space contains all the relevant geometric information about the obstacles the robot has encountered, it does not explicitly contain any topological information about the explored environment, since Assumption 2 does not exclude overlaps between obstacles in . Their consolidation in real time reduces the number of actual obstacles, by taking unions of elements of , making up (i.e., a new set of consolidated familiar obstacles indexed by with ), as well as copies of the sensed fragments of unknown obstacles from (i.e., ) to form the mapped space, . By Assumption 2, convex obstacles are assumed to be far enough away from familiar obstacles, such that no overlap occurs in the above union.

Next, since Assumption 2 allows overlaps between obstacles in and the boundary of the enclosing freespace , for any connected component of such that , we take and include in a new set , indexed by . The rest of the connected components in , which do not intersect , are included in a set , indexed by . The idea here is that obstacles in should be merged to the boundary of , and obstacles in should be deformed to disks, since and need to be diffeomorphic.

Iii-D Model Space

The model space has the same boundary as and consists of copies of the sensed fragments of the unrecognized visible obstacles in , and a collection of Euclidean disks corresponding to the consolidated obstacles in that are deformed to disks. The centers and radii of the disks are chosen so that is in the interior of , as in [23]. The obstacles in are merged into , to make and topologically equivalent, through a map , described next.

Fig. 3: Diffeomorphism construction via direct convex decomposition: Any arbitrary convex decomposition (e.g., [6]) defines a tree (left), which induces the sequence of purging transformations that map the polygon’s boundary and exterior to the boundary and exterior of an equivalent disk. The purging transformation for each convex piece is defined by a pair of convex polygons that limit the effect of the diffeomorphism to a neighborhood of . The final map is guaranteed to be smooth, as shown by a visualization of its determinant in logarithmic scale (right).

Iv Diffeomorphism Construction

Here, we describe our method of constructing the diffeomorphism, , between and . We assume that the robot has recognized and localized the obstacles in , and has, therefore, identified obstacles to be merged to the boundary of the enclosing freespace , stored in , and obstacles to be deformed to disks, stored in .

Iv-a Obstacle Representation and Convex Decomposition

As a natural extension to doubly reactive algorithms for environments cluttered with convex obstacles [2, 20], we assume that the robot has access to the convex decomposition of each obstacle . For polygons without holes, we are interested in decompositions that do not introduce Steiner points (i.e., additional points except for the polygon vertices), as this guarantees the dual graph of the convex partition to be a tree. Here, we acquire this convex decomposition using Greene’s method [6] and its C++ implementation in CGAL [29], operating in time, with the number of polygon vertices the number of reflex vertices. Other algorithms [15] could be used as well, such as Keil’s decomposition algorithm [12, 11], operating in time.

As shown in Fig. 3, convex partioning results in a tree of convex polygons corresponding to , with a set of vertices identified with convex polygons (i.e., vertices of the dual of the formal partition) and a set of edges encoding polygon adjacency. Therefore, we can pick any polygon as root and construct based on the adjacency properties induced by the dual graph of the decomposition, as shown in Fig. 3. If , we pick as root the polygon with the largest surface area, whereas if , we pick as root any polygon adjacent to .

Iv-B The Map Between the Mapped and the Model Space

As shown in Fig. 3, the map between the mapped and the model space is constructed in several steps, involving the successive application of purging transformations by composition, during execution time, for all leaf polygons of all obstacles in and , in any order, until their root polygons are reached. We denote by this final intermediate space, where all obstacles in have been deformed to their root polygons. We denote by and the intermediate spaces before and after the purging transformation of leaf polygon respectively.

We begin our exposition with a description of the purging transformation that maps the boundary of a leaf polygon onto the boundary of its parent, , and continue with a description of the map that maps the boundaries of root polygons of obstacles in and to and the corresponding disks in respectively.

Iv-B1 The map between and

We first find admissible centers , and polygonal collars , that encompass the actual polygon , and limit the effect of the purging transformation in their interior, while keeping its value equal to the identity everywhere else (see Fig. 3).

Definition 1

An admissible center for the purging transformation of the leaf polygon , denoted by , is a point in such that the polygon with vertices the original vertices of and is convex.

Definition 2

An admissible polygonal collar for the purging transformation of the leaf polygon is a convex polygon such that:

  1. does not intersect the interior of any polygon with , or any .

  2. , and .

Examples are shown in Fig. 3. As in [33], we also construct implicit functions corresponding to the leaf polygon such that and , using tools from [27].

Based on these definitions, we construct the switch of the purging transformation for the leaf polygon as a function , equal to 1 on the boundary of , equal to 0 outside and smoothly varying (except the polygon vertices) between 0 and 1 everywhere else (see (8) in Appendix B). Finally, we define the deforming factors as the functions , responsible for mapping the boundary of the leaf polygon onto the boundary of its parent (see (9) in Appendix B). We can now construct the map between and as in [32, 33]

Proposition 1

The map is a diffeomorphism between and away from the polygon vertices of , none of which lies in the interior of .


Included in Appendix B. ∎

We denote by the map between and , arising from the composition of purging transformations .

Iv-B2 The Map Between and

Here, for each root polygon , we define the polygonal collar and the switch of the transformation as in Definition 2 and (8) (see Appendix B) respectively, and we distinguish between obstacles in and in for the definition of the centers as follows (see Fig. 3).

Definition 3

An admissible center for the transformation of:

  1. the root polygon , corresponding to , is a point in the interior of (here identified with ).

  2. the root polygon , corresponding to , is a point , such that the polygon with vertices the original vertices of and is convex.

Finally, we define the deforming factors as in Section IV-B1 for obstacles in , and as the function for obstacles in (see Fig. 3). We construct the map between and as

with .

We can similarly arrive at the following result.

Proposition 2

The map is a diffeomorphism between and away from any sharp corners, none of which lie in the interior of .

Iv-B3 The Map Between and

Based on the construction of and , we can finally write the map between the mapped space and the model space as the function given by . Since both and are diffeomorphisms away from sharp corners, it is straightforward to show that the map is a diffeomorphism between and away from any sharp corners, none of which lie in the interior of .

Fig. 4: The online reactive planning architecture: Advancing beyond [33]

, camera output is run through a perceptual pipeline incorporating three separate neural networks (run onboard at 4Hz) whose function is to: (a) detect familiar obstacles and humans

[22]; (b) localize corresponding semantic keypoints [21]; and (c) perform a 3D human mesh estimation [13]. Keypoint locations on the image, other detected geometric features, and an egomotion estimate provided by visual inertial odometry are used by the semantic mapping module [3] to give updated robot () and obstacle poses (). The reactive planner, now streamlined to run onboard at 3x the rate of the corresponding module in [33], merges consolidated obstacles in (recovered from ), along with LIDAR data for unknown obstacles, to provide the robot inputs and close the control loop. In this new architecture, the estimated human meshes are used to update the target’s position in the reported human tracking experiments, detect a specific human gesture or pose related to the experiment’s semantics, or (optionally) introduce additional obstacles in the semantic mapping module for some out-of-scope experiments.

V Reactive Planning Algorithm

The analysis in Section IV describes the diffeomorphism construction between and for a given index set of instantiated familiar obstacles. However, the onboard sensor might incorporate new obstacles in the semantic map, updating . Therefore, as in [33], we give a hybrid systems description of our reactive controller, where each mode is defined by an index set of familiar obstacles stored in the semantic map, the guards describe the sensor trigger events where a previously “unexplored” obstacle is discovered and incorporated in the semantic map (thereby changing , and , ) [33, Eqns. (31),(35)], and the resets describe transitions to new modes that are equal to the identity in the physical space, but might result in discrete “jumps” of the robot position in the model space [33, Eqns. (32), (36)]. In this work, this hybrid systems structure is not modified, and we just focus on each mode separately.

For a fully actuated particle with dynamics , the control law in each mode is given as


with the control input in the model space given as [2]


Here, and denote the robot and goal position in the model space respectively, and denotes the projection onto the convex local freespace for , , defined as the Voronoi cell in [2, Eqn. (25)], separating from all the model space obstacles (see Fig. 2). We use the following definition to define a slowly moving, non-adversarial moving target.

Definition 4

The smooth function is a non-adversarial target if its model space velocity, given as , always satisfies either , or .

Intuitively, this Definition requires the moving target to slow down when the robot gets too close to obstacles (i.e., when becomes small) or the target itself (i.e., when ), proportionally to the control gain , unless the target approaches the robot (i.e., ). We use Definition 4 to arrive at the following central result.

Theorem 1

With the terminal mode of the hybrid controller555The terminal mode of the hybrid system is indexed by the improper subset, , where all familiar obstacles in the workspace have been instantiated in the set ., the reactive controller in (1) leaves the freespace positively invariant, and:

  1. tracks by not increasing , if is a non-adversarial target (see Definition 4).

  2. asymptotically reaches a constant with its unique continuously differentiable flow, from almost any placement , while strictly decreasing along the way.


Included in Appendix B. ∎

In [32], we extended our algorithm to differential drive robots, by constructing a smooth diffeomorphism away from sharp corners. We summarize the details of the construction in Appendix A, and present our main result below, whose proof follows similar patterns to that of Theorem 1 and is omitted for brevity.

Theorem 2

With the terminal mode of the hybrid controller, the reactive controller for differential drive robots (see (3) in Appendix A) leaves the freespace positively invariant, and:

  1. tracks by not increasing , if is a non-adversarial target (see Definition 4).

  2. asymptotically reaches a constant with its unique continuously differentiable flow, from almost any robot configuration in , without increasing along the way.

Vi Numerical Studies

In this Section, we present numerical studies run in MATLAB using ode45, that illustrate our formal results. Our reactive controller is implemented in Python and communicates with MATLAB using the standard MATLAB-Python interface. For our numerical results, we assume perfect robot state estimation and localization of obstacles, using a fixed range sensor that can instantly identify and localize either the entirety of familiar obstacles that intersect its footprint, or the fragments of unknown obstacles within its range.

Vi-a Illustrations of the Navigation Framework

We begin by illustrating the performance of our reactive planning framework in two different settings (Fig. 5), for both a fully actuated and a differential drive robot, also included in the accompanying video submission. In the first case (Fig. 5-a), the robot is tasked with moving to a predefined location in an environment resembling an apartment layout with known walls, cluttered with several familiar obstacles of unknown location and pose, from different initial conditions. In the second case (Fig. 5-b), the robot navigates a room cluttered with both familiar and unknown obstacles from several initial conditions. In both cases, the robot avoids all the obstacles and safely converges to the target. The robot radius used in our simulation studies is 0.2m.

Fig. 5: Top: Navigation in an indoor layout cluttered with multiple familiar obstacles and previously unknown pose. - Bottom: Navigation in a room cluttered with known non-convex (dark grey) and unknown convex (light grey) obstacles. Simulations are run from different initial conditions.

Vi-B Comparison with RRT [19]

In the second set of numerical results, we compare our reactive controller with a state-of-the-art path replanning algorithm, RRT [19]. We choose to compare against this specific algorithm instead of another sampling-based method for static environments (e.g., RRT* [10]), since both our reactive controller and RRT are dynamic in nature; they are capable of incorporating new information about the environment and modifying the robot’s behavior appropriately. For our simulations, we assume that RRT possesses the same sensory apparatus with our algorithm; an “oracle” that can instantly identify and localize nearby obstacles. The computed paths are then reactively tracked using [1].

Fig. 6 demonstrates the performance degradation of RRT in the presence of narrow passages; as the passage becomes narrower (yet always larger than the robot’s diameter), the minimum number of (offline-computed) samples needed for successful replanning and safe navigation becomes increasingly large. On the contrary, our algorithm always guarantees safe passage to the target, without any prior offline computation. In the accompanying video attachment, we also include a different mode of failure for RRT; in the presence of multiple narrow passages and with an insufficient number of initially provided samples, RRT could cause cycling by constantly replanning as it searches for new openings, before (incorrectly) reporting failure and halting.

Fig. 6: Minimum number of (offline computed) samples needed for successful online implementation of RRT [19] in an unexplored environment with two familiar obstacles forming a narrow passage. The number becomes increasingly large as the gap becomes smaller. The robot diameter is 50cm.

Vii Experiments

Fig. 7: Types of environments used in our experiments. Visual context is included in the supplementary video submission.
Fig. 8: Top: Turtlebot reactively follows a human until a stop gesture is given and detected – Bottom: Turtlebot safely returns to its starting position.

Vii-a Experimental Setup

Our experimental layout is summarized in Fig. 4. Since the algorithms introduced in this paper take the form of first-order vector fields, we mainly use a quasi-static platform, the Turtlebot robot [31] for our physical experiments. We suggest the robustness of these feedback controllers by performing several experiments on the more dynamic Ghost Spirit legged robot [5], using a rough approximation to the quasi-static differential drive motion model. In both cases, the main computer is an Nvidia Jetson AGX Xavier GPU unit, responsible for running our perception and navigation algorithms, during execution time. This GPU unit communicates with a Hokuyo LIDAR, used to detect unknown obstacles, and a ZED Mini stereo camera, used for visual-inertial state estimation and for detecting humans and familiar obstacles.

Our perception pipeline supports the detection and 3D pose estimation of objects and humans, who, for the purposes of this paper, are used as moving targets. We use the YOLOv3 detector 

[22] to detect 2D bounding boxes on the image which are then processed based on the class of the detected object. If one of the specified object classes is detected, then we follow the semantic keypoints approach of [21] to estimate keypoints of the object on the image plane. The object classes used in our experiments are chair, table, ladder, cart, gascan and pelican case. The training data for the particular instances of interest are collected with a semi-automatic procedure, similarly to [21]. Given the bounding box and keypoint annotations for each image, the two networks are trained with their default configurations until convergence. On the other hand, if the bounding box corresponds to a person detection, then we use the approach of [13], that provides us with the 3D mesh of the person.

Our semantic mapping infrastructure relies on the algorithm presented in [3], and is implemented in C++. This algorithm fuses inertial information (here provided by the position tracking implementation from StereoLabs on the ZED Mini stereo camera), geometric (i.e., geometric features on the 2D image), and semantic information (i.e., the detected keypoints and the associated object labels as described above) to give a posterior estimate for both the robot state and the associated poses of all tracked objects, by simultaneously solving the data association problem arising when several objects of the same class exist in the map.

Finally, our reactive controller is also implemented in C++ using Boost Geometry [26] for the underlying polygon operations, and communicates with our perception and semantic mapping pipelines using ROS, as shown in Fig. 4.

Vii-B Empirical Results

As also reported in the supplementary video, we distinguish between two classes of physical experiments in several different environments shown in Fig. 7; tracking either a predefined static target or a moving human, and tracking a given semantic target (e.g., approach a desired object).

Vii-B1 Geometric tracking of a (moving) target amidst obstacles

Fig. 1 shows Spirit tracking a human in a previously unexplored environment, cluttered with both catalogued obstacles (whose number and placement is unknown in advance) as well as completely unknown obstacles666The formal results of Section V require that these unknown obstacles be convex. However, here we clutter the environment with a mix of unknown obstacles - some convex, but others of more complicated nonconvex shapes (e.g., unknown walls) - to establish empirical robustness even in environments that are out of the scope of the underlying theory.. The robot uses familiar obstacles to both localize itself against them [3] and reactively navigate around them. Fig. 7 summarizes the wide diversity of à-priori unexplored environments, with different lighting conditions, successfully navigated indoors (by Turtlebot and Spirit) and outdoors (by Spirit), while tracking humans along thousands of body lengths.

As anticipated, the few failures we recorded were associated with the inability of the SLAM algorithm to localize the robot in long, featureless environments. However, it should be noted that even when the robot or object localization process fails, collision avoidance is still guaranteed with the use of the onboard LIDAR. Nevertheless, collisions could result with obstacles that cannot be detected by the 2D horizontal LIDAR (e.g., the red gascan shown in Fig. 8). One could still think of extensions to the presented sensory infrastructure (e.g., the use of a 3D LIDAR) that could at least still guarantee safety under such circumstances.

Vii-B2 Logical reaction using predefined semantics

In the second set of experimental runs, we exploit the new online semantic capabilities to introduce logic in our reactive tracking process. For example, Fig. 8 depicts a tracking task requiring the robot to respond to the human’s stop signal (raised left or right hand) by returning to its starting position. The supplementary video presents several other semantically specified tasks requiring autonomous reactions of both a logical as well as geometric nature (all involving negotiation of novel environments from the arbitrary geometric circumstances associated with different contexts of logical triggers).

Viii Conclusion and Future Work

This paper presents a reactive planner that can provably safely semantically engage non-adversarial moving targets in planar workspaces, cluttered with an arbitrary mix of previously geometrically and semantically catalogued obstacles. We document the practicability of this approach by comparing our method with a state-of-the-art replanning algorithm, and reporting on empirical results involving tasks requiring both geometric as well as logical reactions in unexplored environments. We require only modest computational hardware, and reuse the identical code base whether in reaction to a geometric or a semantically tagged target, across varied environments, executed on both a wheeled robot and a dynamic legged platform. Future work seeks to extend past hierarchical mobile manipulation schemes using early versions of this architecture [34] to incorporate both more dexterous manipulation [30] as well as logically complex abstract specification (e.g., using temporal logic [14]).


This work was supported by AFRL grant FA865015D1845 (subcontract 669737-1), AFOSR grant FA9550-19-1-0265 (Assured Autonomy in Contested Environments), and ONR grant #N00014-16-1-2817, a Vannevar Bush Fellowship held by the last author, sponsored by the Basic Research Office of the Assistant Secretary of Defense for Research and Engineering. The authors thank Diedra Krieger for administrative support and assistance with video recording.


  • [1] O. Arslan and D. E. Koditschek (2017) Smooth Extensions of Feedback Motion Planners via Reference Governors. In IEEE Int. Conf. Robotics and Automation, pp. 4414–4421. Cited by: §VI-B.
  • [2] O. Arslan and D. E. Koditschek (2018-07) Sensor-Based Reactive Navigation in Unknown Convex Sphere Worlds. Int. J. Robotics Research 38 (1-2), pp. 196–223. Cited by: Appendix A, Appendix B, Appendix B, Appendix B, Fig. 1, §I-A, §I-B1, §II, §II, §II, §IV-A, §V.
  • [3] S. L. Bowman, N. Atanasov, K. Daniilidis, and G. J. Pappas (2017-05) Probabilistic data association for semantic SLAM. In IEEE Int. Conf. Robotics and Automation, pp. 1722–1729. Cited by: Fig. 1, §I-B1, Fig. 4, §VII-A, §VII-B1.
  • [4] M. Farber (2006) Topology of Robot Motion Planning. Morse Theoretic Methods in Nonlinear Analysis and in Symplectic Topology, pp. 185–230. Cited by: §I-A.
  • [5] Ghost Robotics (2020) Ghost Robotics Spirit 40. External Links: Link Cited by: Fig. 1, §VII-A.
  • [6] D. H. Greene (1983) The decomposition of polygons into convex parts. Computational Geometry 1, pp. 235–259. Cited by: §I-B2, Fig. 3, §IV-A.
  • [7] S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J. Malik (2017) Cognitive Mapping and Planning for Visual Navigation. In IEEE CVPR, pp. 7272–7281. Cited by: §I-A.
  • [8] M. W. Hirsch (1976) Differential Topology. Springer. Cited by: Appendix B.
  • [9] B. D. Ilhan, A. M. Johnson, and D. E. Koditschek (2018-08) Autonomous legged hill ascent. J. Field Robotics 35 (5), pp. 802–832. External Links: ISSN 1556-4967, Document Cited by: §I-A.
  • [10] S. Karaman and E. Frazzoli (2012) High-speed flight in an ergodic forest. In IEEE Int. Conf. Robotics and Automation, pp. 2899–2906. Cited by: §I-A, §VI-B.
  • [11] J. M. Keil (1985) Decomposing a Polygon into Simpler Components. SIAM Journal on Computing 14 (4), pp. 799–817. Cited by: §IV-A.
  • [12] M. Keil and J. Snoeyink (2002) On the time bound for convex decomposition of simple polygons. Int. J. Computational Geometry & Applications 12 (3), pp. 181–192. Cited by: §IV-A.
  • [13] N. Kolotouros, G. Pavlakos, M. J. Black, and K. Daniilidis (2019) Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In

    IEEE Int. Conf. Computer Vision

    pp. 2252–2261. Cited by: §I-B1, Fig. 4, §VII-A.
  • [14] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas (2009) Temporal-logic-based reactive mission and motion planning. IEEE Trans. Robotics 25 (6), pp. 1370–1381. Cited by: §VIII.
  • [15] J. Lien and N. M. Amato (2006) Approximate convex decomposition of polygons. Comp. Geometry 35 (1-2), pp. 100–123. Cited by: §IV-A.
  • [16] W. S. Massey (1992) Sufficient conditions for a local homeomorphism to be injective. Topology and its Applications 47, pp. 133–148. Cited by: Appendix B.
  • [17] X. Meng, N. Ratliff, Y. Xiang, and D. Fox (2019) Neural Autonomous Navigation with Riemannian Motion Policy. In IEEE Int. Conf. Robotics and Automation, pp. 8860–8866. Cited by: §I-A.
  • [18] I. Noreen, A. Khan, and Z. Habib (2016) Optimal path planning using RRT* based approaches: a survey and future directions. Int. J. Advanced Computer Science and Applications 7 (11), pp. 97–107. Cited by: §I-A.
  • [19] M. Otte and E. Frazzoli (2015) RRT: Asymptotically optimal single-query sampling-based motion planning with quick replanning. Int. J. of Robotics Research 35 (7), pp. 797–822. Cited by: §I-B3, Fig. 6, §VI-B, §VI-B.
  • [20] S. Paternain, D. E. Koditschek, and A. Ribeiro (2017) Navigation Functions for Convex Potentials in a Space with Convex Obstacles. IEEE Trans. Automatic Control. External Links: ISSN 0018-9286, Document Cited by: §I-A, §IV-A.
  • [21] G. Pavlakos, X. Zhou, A. Chan, K. G. Derpanis, and K. Daniilidis (2017-05) 6-DoF object pose from semantic keypoints. In IEEE Int. Conf. Robotics and Automation, pp. 2011–2018. Cited by: Fig. 1, §I-B1, Fig. 4, §VII-A.
  • [22] J. Redmon and A. Farhadi (2018) YOLOv3: an incremental improvement. arXiv pre-print: 1804.02767. Cited by: Fig. 4, §VII-A.
  • [23] E. Rimon and D. E. Koditschek (1989) The Construction of Analytic Diffeomorphisms for Exact Robot Navigation on Star Worlds. Trans. American Mathematical Society 327 (1), pp. 71–116. Cited by: Appendix B, §III-D.
  • [24] E. Rimon and D. E. Koditschek (1992) Exact Robot Navigation Using Artificial Potential Functions. IEEE Trans. Robotics and Automation 8 (5), pp. 501–518. Cited by: §I-A, §II.
  • [25] N. Savinov, A. Dosovitskiy, and V. Koltun (2018) Semi-parametric topological memory for navigation. Arxiv preprint: 180300653. Cited by: §I-A.
  • [26] B. Schling (2011) The Boost C++ Libraries. XML Press. Cited by: §VII-A.
  • [27] V. Shapiro (2007) Semi-analytic geometry with R-functions. Acta Numerica 16, pp. 239–303. Cited by: Appendix B, §IV-B1, footnote 4.
  • [28] W. B. Shen, D. Xu, Y. Zhu, L. J. Guibas, L. Fei-Fei, and S. Savarese (2019) Situational Fusion of Visual Representation for Visual Navigation. In IEEE Int. Conf. Computer Vision, pp. 2881–2890. Cited by: §I-A.
  • [29] The CGAL Project (2019) CGAL user and reference manual. 4.14 edition, CGAL Editorial Board. External Links: Link Cited by: §IV-A.
  • [30] T. T. Topping, V. Vasilopoulos, A. De, and D. E. Koditschek (2019) Composition of templates for transitional pedipulation behaviors. In Int. Symposium on Robotics Research, Cited by: §VIII.
  • [31] TurtleBot2 (2019) Open-source robot development kit for apps on wheels. External Links: Link Cited by: §VII-A.
  • [32] V. Vasilopoulos and D. E. Koditschek (2018) Reactive Navigation in Partially Known Non-Convex Environments. In 13th Int. Workshop on the Algorithmic Foundations of Robotics (WAFR), Cited by: Appendix A, Appendix B, Appendix B, §I-B1, §II, §II, §III, §IV-B1, §V, footnote 4.
  • [33] V. Vasilopoulos, G. Pavlakos, K. Schmeckpeper, K. Daniilidis, and D. E. Koditschek Reactive Navigation in Partially Familiar Planar Environments Using Semantic Perceptual Feedback. Under review, arXiv pre-print: 2002.08946. Cited by: Appendix A, Appendix B, Appendix B, Appendix B, Fig. 2, §I-B1, §I-B2, §II, §II, §II, §III-A, §III, Fig. 4, §IV-B1, §IV-B1, §V, footnote 4.
  • [34] V. Vasilopoulos, T. T. Topping, W. Vega-Brown, N. Roy, and D. E. Koditschek (2018) Sensor-Based Reactive Execution of Symbolic Rearrangement Plans by a Legged Mobile Manipulator. In IEEE/RSJ Int. Conf. Intelligent Robots and Systems, pp. 3298–3305. Cited by: §VIII.

Appendix A Controller for Differential Drive Robots

Since a differential drive robot, whose dynamics are given by777We use the ordered set notation and the matrix notation for vectors interchangeably. , with and , with the linear and angular input respectively, operates in instead of , we first need a smooth diffeomorphism away from sharp corners on the boundary of , and then establish the results about our controller.

Following our previous work [32, 33], we construct our map from to as , with , and . Here, and , with denoting the projection onto the first two components. We show in [33] that is a diffeomorphism from to away from sharp corners, none of which lie in the interior of .

Based on the above, we can then write with , and the inputs related to through and , with . The idea now is to use the control strategy in [2] to find inputs in , and then use the relations above to find the actual inputs in that achieve as


with fixed gains.

Appendix B Proofs of Main Results

Proof of Proposition 1.

We follow similar patterns to the proof of [33, Proposition 1]. We first need to show that the functions are smooth away from the polygon vertices, none of which lies in the interior of . We begin with . First of all, with the procedure outlined in [32], the only points where and are not smooth are vertices of and respectively. We use the function [8] described by


and parametrized by , that has derivative


and define the smooth auxiliary switches


with , and tunable parameters. We note that is smooth everywhere, since does not belong in and is exactly 0 on the vertices of . Therefore, by defining as



defining the shared hyperplane between

and , we get that can only be non-smooth on the vertices of except for (i.e., on the vertices of the polygon ), and on points where its denominator becomes zero. Since both and vary between 0 and 1, this can only happen when and , i.e., only on and . The fact that is smooth everywhere else derives immediately from the fact that is a smooth function, and is smooth everywhere except for the polygon vertices.

On the other hand, the singular points of the deforming factor , defined as




the normal vector corresponding to the shared edge between and , are the solutions of the equation