Autonomous aerial vehicles with onboard vision-based sensing and planning have been shown to be capable of performing fast and agile maneuvers. However, long-horizon planning required to achieve a task has proven to be a challenge, particularly with limited onboard compute. We propose a fully integrated vision-based multirotor platform that allows minimally trained operators to teleoperate the vehicle at high speeds with agility, while remaining safe around obstacles in unstructured outdoor environments. At high speeds, the environment around the vehicle changes quickly, and is subject to dynamic obstacles and lighting conditions. Our multirotor architecture integrates the following to achieve agile and collision-free flight in these scenarios: 1) an extension of motion primitive-based teleoperation Yang2018 to generate jerk-continuous local trajectories, a crucial component to prevent instability in agile flights, and 2) efficient local mapping for collision avoidance purposes using a KD-tree.
Many prior works in high speed flight exploit the field-of-view (FOV) of stereo cameras for fast collision checking for autonomous flights, including florence2016; florence2017, where trajectories are constrained to be inside the FOV of the depth sensor with a max range of m. In matthies2014, trajectories generated by RRT are checked for collisions directly in the disparity space. lopez2017 presents aggressive flight on a kg MAV, achieving a velocity of m/s. These methods achieve fast collision checking by circumventing the need to construct a local map and checking for collisions in the sensor’s FOV. This however, limits the range of motions the vehicle can perform. Approaches with local map generation using a laser range finder chen2016 and a monocular RGB camera daftry2016 have been shown to achieve maximum velocities of m/s and m/s respectively, but data processing limits the update rate of the local maps. In our approach, we give a minimally trained operator full control of the vehicle and show that fast and agile flights can be achieved with a human-in-the-loop while maintaining safety.
We perform a series of high speed collision avoidance trials in both indoor and outdoor environments with untrained operators. In our experiments, our hexarotor attains speeds exceeding m/s and accelerations exceeding m/s. We are able to safely avoid obstacles at speeds up to m/s and accelerations of m/s, while retaining a local map.
2 Technical Approach
2.1 Smooth Motion Primitive-Based Teleoperation
Aggressive multirotor flights demand large angular velocities and large angular accelerations, which are directly related to the jerk and snap of the reference position mellinger_minimum_2011. Thus to avoid incurring large tracking error due to discontinuous trajectories, we extend forward-arc motion primitives Yang2018 to generate trajectories that retain differentiability up to jerk and continuity up to snap. From the resulting trajectories, we can calculate desired vehicle attitude, angular velocity, and angular acceleration for use as feedforward terms in the controller.
The motion primitives are parameterized as follows. We define a local frame to be a fixed -axis aligned frame, taken at a snapshot in time. The motion primitive definition will be provided in the local frame at the time at which an input is issued, and can be freely transformed into a fixed global frame or body frame for control purposes. An action specified by a continuous external input, such as a joystick, generates a single motion primitive. For a multirotor, we define an action as in the local frame at which the input is issued, where is the -velocity (i.e., the forward velocity of the vehicle), is the -velocity, and is the yaw rate. Let
denote a vector containing the position and yaw of the vehicle, i.e.. Then, the endpoint velocities are defined by the unicycle model Pivtoraiko2009. The unicycle model dynamics are given by , where . The position endpoints are unconstrained and we enforce all higher order derivatives above velocity to be zero.
A regeneration step occurs when a new input is received from the joystick, or the previous trajectory finishes executing. Alternatively, a fixed regeneration rate can be chosen in order to accommodate changes in the environment for collision avoidance. Suppose regeneration step occurs at time . Then, a library of dynamically feasible motion primitives is generated in the local frame specified by the reference state at time , i.e. , given a set of discretized actions . Each motion primitive is a vector of four order polynomials that specify the trajectory along the position coordinates , , and yaw coordinate . Given an action at regeneration step , each motion in the motion primitive library is generated in frame according to
where specifies the time derivative. Note that all constraints are appropriately transformed into . The duration of the trajectory and the maximum , , and yaw velocities are specified according to the desired operational range. We further allow the operator to freely rotate the motion primitive library with respect to ’s -axis to allow for sideway slalom motions.
The result of having snap-continuous trajectories (see Fig. 1) ensures that we have smoothness in error dynamics, thus minimizing instabilities and tracking error.
2.2 Pruning & Trajectory Selection.
At every time step, a family of motion primitives, called the motion primitive library, is created. The motion primitive library is constructed by discretizing the continuous input along each action dimension, such that each action is selected from a convex set with size where is the dimension of the space of each input. An example of the discretization is shown in Fig. 2.
At every time step, the operator input is mapped to the closest input in the action space, as defined by the Euclidean norm. A priority queue that minimizes input distance from the selected input to each input in the action space is used to iterate through the action space until a feasible, collision-free trajectory is found. This results in having the operator input mapped to the feasible motion primitive in the library that is parameterized by the closest discretized action, i.e. . A trajectory is deemed collision-free if the minimum distance between any point along the trajectory and the surrounding environment is above the sum of the vehicle size and an operator specified collision radius. Algorithm 1 describes teleoperation with reactive collision avoidance in detail.
The effect of this pruning algorithm is that the vehicle exhibits natural behavior in the presence of obstacles. If a pillar is in front of the vehicle, then the vehicle chooses a motion primitive some angle away and avoids the obstacle. If a wall is present, then the vehicle will choose linear velocities that gradually decrease until the vehicle is stopped.
2.3 Local Map Representation using KD-Trees
We present a local mapping framework that generates a spatially consistent local map of the robot surroundings represented as voxel grids. The local map is generated by retaining only the depth sensor measurements obtained at poses that lie in the vicinity of the vehicle’s current pose. This enables trajectories to span in the space observed by all of the retained sensor measurements. Sequential sensor measurements obtained using a stereo imaging sensor often contain redundant information about the surroundings of the robot. The novelty of information in an incoming sensor measurement at the resolution of the voxel grids is dictated by the rotation and translation displacement of the robot between the current frame and the previous frames. To enforce spatial locality, we dynamically select anchor frames and transform subsequent sensor measurements that contain novel information about the surroundings in to the anchor frames. In order to efficiently incorporate only novel information in the local map, we classify each incoming sensor measurement into one of the following categories: a KeyFrame (KF), a SubFrame (SF), or a BufferFrame (BF) (see Table LABEL:table:frame_classification). The local map (L) is updated in a locally consistent coordinate frame according to the type of sensor frame ().
A sensor measurement that is more than meters away from the current KF is classified as a KF. Sensor measurements that are not new KFs, but are either more than meters in position or degrees in heading away from the previous SF, are classified as SFs. Sensor measurements that are neither KFs nor SFs are classified as BFs, which do not contain sufficient novel information about the surroundings, but are registered to L to account for dynamic changes in the scene. BFs are in L only for the iterations in which they are observed.