Model Predictive Path Integral Control Framework for Partially Observable Navigation: A Quadrotor Case Study

04/18/2020 ∙ by Ihab S. Mohamed, et al. ∙ Inria Université Nice Sophia Antipolis 0

Recently, Model Predictive Path Integral (MPPI) control algorithm has been extensively applied to autonomous navigation tasks, where the cost map is mostly assumed to be known and the 2D navigation tasks are only performed. In this paper, we propose a generic MPPI control framework that can be used for 2D or 3D autonomous navigation tasks in either fully or partially observable environments, which are the most prevalent in robotics applications. This framework exploits directly the 3D-voxel grid acquired from an on-board sensing system for performing collision-free navigation. We test the framework, in realistic RotorS-based simulation, on goal-oriented quadrotor navigation tasks in a cluttered environment, for both fully and partially observable scenarios. Preliminary results demonstrate that the proposed framework works perfectly, under partial observability, in 2D and 3D cluttered environments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Multimedia Material

The supplementary video attached to this work is available at: https://urlz.fr/cs2L

I Introduction

Having a safe and reliable system for autonomous navigation of robotic systems such as Unmanned Aerial Vehicles (UAVs) is a highly challenging and partially-solved problem for robotics communities, especially for cluttered and GPS-denied environments such as dense forests, crowded offices, corridors, and warehouses. Such a problem is very important for solving many complex applications, such as surveillance, search-and-rescue, and environmental mapping. To do so, UAVs should be able to navigate with complete autonomy while avoiding all kinds of obstacles in real-time. To this end, they must be able to (i) perceive their environment, (ii) understand the situation they are in, and (iii) react appropriately.

Obviously enough, this problem has been already addressed in the literature, particularly those works related to dynamics and control, motion planning, and trajectory generation in unstructured environments with obstacles [10, 6, 11, 1, 13]. Moreover, the applications of the path-integral control theory have recently become more prevalent. One of the most noteworthy works is Williams’s iterative path integral method, namely MPPI control framework [15]. In this method, the control sequence is iteratively updated to obtain the optimal solution on the basis of importance sampling of trajectories. In [16], authors derived a different iterative method in which the control- and noise-affine dynamics constraints, on the original MPPI framework, are eliminated. This framework is mainly based on the information-theoretic interpretation of optimal control using KL-divergence and free energy, while it was previously based on the linearization of Hamilton-Jacob Bellman (HJB) equation and application of Feynman-Kac lemma. Although different methods are adopted to derive the MPPI framework, they are practically equivalent and theoretically related111In the sense that the method in [16] can exactly recover that in [15] if dynamics is considered to be affine in control. In other words, the iterative method in [16] can be seen as the generalization of the latter method.. An extension to Williams’s information-theoretic-based approach is presented in [9], where a learned model is used to generate informed sampling distributions.

The attractive features of MPPI controller, over alternative methods, can be summarized as: (i) a derivative-free optimization method, i.e. no need for derivative information to find the optimal solution; (ii) no need for approximating the system dynamics and cost functions with linear and quadratic forms, i.e., non-linear and non-convex functions can be naturally employed, even that dynamics and cost models can be easily represented using neural networks; (iii) planning and execution steps are combined into a single step, providing an elegant control framework for autonomous vehicles. However, one drawback of MPPI is that its convergence rate is empirically slow, which has been addressed in

[12].

In the context of autonomous navigation, it is observed that the MPPI controller has been mainly applied to the tasks of aggressive driving and UAVs navigation in 2D cluttered environments. To do so, MPPI requires a cost map, as an environment representation, to drive the autonomous vehicle. Concerning autonomous driving, the cost map is obtained either off-line [16]

or from an on-board monocular camera using deep learning approaches

[4, 2]. Regarding UAV navigation in cluttered environments, the obstacle (i.e., cost) map is assumed to be available, and only static 2D floor-maps are used. Conversely, in practice, the real environments are often partially observable, with dynamic obstacles. Moreover, it is noteworthy that only 2D navigation tasks are performed so far, which limits the applicability of the control framework. For this reason, this paper focuses on MPPI for 2D and 3D navigation tasks in a previously unseen and dynamic environment. In particular, the main contributions of our work can be summarized as follows:

  1. We propose a generic MPPI framework for autonomous navigation in cluttered 2D and 3D environments, which are inherently uncertain and partially observable. To the best of our knowledge, this point has not been reported in the literature, which opens up new directions for research.

  2. We demonstrate this framework on a set of simulated quadrotor navigation tasks using RotorS simulator and Gazebo [5], assuming that: (i) there is a priori knowledge about the environment (namely, fully observable case); (ii) there is no a priori information (namely, partially observable case). In this case, the robot is building and updating the map, which represents the environment, online as it goes along. This allows the opportunity to navigate in dynamic environments.

  3. To ensure a realistic simulation, our proposed framework is evaluated taking into account the modelling errors, noisy sensors, and windy environments.

This paper is organized as follows. Section II describes the quadrotor dynamics model which represents our case study for the framework validation, whereas in Section III the real-time MPPI control strategy is explained. Our proposed framework is then described in Section IV and evaluated in Section V. Finally, concluding remarks are provided in Section VI.

Ii Quadrotor Dynamics Model

Considering a quadrotor vehicle model illustrated in Fig. 1, the dynamics model can be defined by assigning a fixed inertial frame and body frame attached to the vehicle. The origin of the body frame, , is located at the center of mass of the quadrotor, where and lie in the quadrotor plane defined by the centers of the four rotors, and is perpendicular to this plane and points upward. The inertial reference frame, , is defined by , and , with pointing upward. We assume that the first rotor (i.e., along the axis) and the third rotor rotate counter clockwise (namely, ), whilst the second (i.e., along the axis) and fourth rotors rotate clockwise (namely, ). Here, the vehicle is only subject to: (i) a gravitational acceleration in direction; (ii) the sum of the forces generated by each rotor, , acts along the direction.

Fig. 1: Schematic of the considered quadrotor model in conjunction with the coordinate systems and forces acting on a vehicle frame.

Furthermore, the Euler angles, with transformation sequence, are used to model the rotation of the quadrotor in frame , where the roll , pitch , and yaw angles refer to a rotation about , , and axis, respectively. The rotation matrix from to is accordingly expressed as

where and . The transformation matrix from Euler angular velocities, , to body frame angular velocity, , is given by

Given all considerations above and according to the well-described models in [10] and [8], the dynamics of the position , linear velocity , orientation locally defined by Euler angles , and body angular rates can be written as

(1)

where is the mass of the quadrotor and its inertia matrix expressed in the body frame , , , , , , is the accumulated force (i.e., thrust) generated by all rotors and constitutes the first control input to the system, is the total torque applied to the vehicle with its components expressed in which represents the second control input, and is the distance from the center of the vehicle to the axis of rotation of each rotor. In this paper, the state of the system is defined as . Each rotor produces a vertical force,

, and moment,

, according to and , where is the angular velocity of the rotor, is the rotor force constant, and is the rotor moment constant. Accordingly, the mapping between the control inputs, namely and , and the system’s input, i.e., , in order to control the quadrotor, can be expressed as

(2)

Iii MPPI Control Strategy

The MPPI controller is a stochastic Model Predictive Control (MPC) method, which can be applied to non-linear dynamics and non-convex cost objectives. So, it is a sampling-based and derivative-free approach. The key idea of MPPI is to sample thousands of trajectories, based on Monte-Carlo simulation, in real-time from the system dynamics. Each trajectory is then evaluated according to a predefined cost function. Consequently, the optimal control sequence is updated over all trajectories. This can be easily done, in real-time, by taking advantage of the parallel nature of sampling and using a Graphics Processing Unit (GPU).

Let assume that the discrete-time stochastic dynamical system has a form of

(3)

where

is the state vector of the system at time

, denotes a control input for the system, and

is a zero-mean Gaussian noise vector with a variance of

, i.e., , which represents the control input updates. Given a finite time-horizon , the objective of the stochastic optimal control problem is to find a control sequence, , which minimizes the expectations over all generated trajectories taken with respect to (3), i.e., , where is the state-dependent cost-to-go of a trajectory . Accordingly, the optimal problem can be formulated as

(4)

where , , and are a final terminal cost, a state-dependent running cost, and a positive definite control weight matrix, respectively. To solve this optimization problem, we consider the iterative update law derived in [15], in which MPPI algorithm updates the control sequence, from onward, as

(5)

where is the number of random samples (namely, rollouts), is a hyper-parameter so-called the inverse temperature, and is the modified cost-to-go of the rollout from time onward.

Given:
          : Number of rollouts (samples) & timesteps
          : Initial control sequence
          : Dynamics & time-step size
          : Cost functions/hyper-parameters
          SGF: Savitzky-Galoy (SG) convolutional filter

1:while task not completed do
2:      StateEstimator(),
3:      RandomNoiseGenerator(),
4:      TrajectoryCostInitializer(),
5:     for  to  do
6:         
7:         for  to  do
8:              
9:              
10:         end for
11:         ,
12:     end for
13:     
14:     for  to  do
15:         
16:     end for
17:     
18:      SendToActuators()
19:     for  to  do
20:         
21:     end for
22:      ControlSequenceInitializer()
23:     Check for task completion
24:end while
Algorithm 1 Real-Time MPPI Control Scheme [15]

In this work, the modified running cost is defined as

(6)

where refers to the exploration noise which determines how aggressively the state-space is explored. It is noteworthy that the low values of result in the rejection of many sampled trajectories because their cost is too high, while too large values result in that the controller produces control inputs with significant chatter.

The real-time control cycle of MPPI algorithm is described in Algorithm 1 with more detail. At each time-step

, the system current state is estimated, and a

random control variations are generated on a GPU using CUDA’s random number generation library (lines ). Then, based on the parallel nature of sampling, all trajectory samples are executed individually in parallel. For each trajectory, the dynamics are predicted forward and its expected cost is computed (lines ), bearing in mind that the cost of each trajectory is zero-initialized (line ). The control sequence is then updated (lines ), taking into account the minimum sampled cost (line ). Due to the stochastic nature of the sampling procedure which leads to significant chattering in the resulting control, the control sequence is then smoothed using a Savitzky-Galoy (SG) filter (line 17). Finally, the first control is executed (line 18), while the remaining sequence of length is slid down to be used at next time-step (lines ).

Iv Generic MPPI Framework

In this section, we present a generic and elegant MPPI framework, as illustrated in Fig. 2, in order to not only navigate autonomously in previously unseen 2D or 3D environments while avoiding collisions with obstacles but also to explore and map them. Moreover, we describe how the environment is represented to be used by MPPI for partially and fully observable navigation tasks. In Fig. 2, we present the block diagram of our proposed control framework integrated into the Robot Operating System (ROS). The individual components of our framework are described in detail in the following sections.

Fig. 2: The global architecture of our proposed framework.

Iv-a Environment Representation

To clarify how MPPI can be used for 2D or 3D navigation in unseen environments, we assume that the environment is discretized into a 2D or 3D grid (see Fig. 2), where each cell is labeled as free, occupied, or unknown, i.e., . In practice, this labeling can be acquired from depth sensors using OctoMap [7]. As the 3D grid is a grid of cubic volumes of equal size called voxels, we can represent the environment by a set of layers along direction, where , refers to the real environment’s height, and is the voxel size. Accordingly, each layer represents a 2D occupancy grid, producing in-total 2D grids for a given environment, as illustrated in Fig. 3. Since this work is mainly concerned with the control framework, the perception problem is not presently covered. Thus, the perception is here imitated by the so-called 2D or 3D mask , which represents the sensor’s field of view (FoV) as the robot cannot normally perceive the entire environment.

Layer

Layer

Layer

Layer

Layer

3D , Dim: ,

where robot is placed in ,

and are updated.

Robot

at
Fig. 3: 3D environment representation of a generic MPPI.

For the sake of simplicity, we assume that the 3D has dimensions and an orientation , where and represent the number of cells in the 2D grid along and axes. While represents the number of layers along , assuming that the robot’s vertical FoV is between where the robot is located in layer (see Fig. 3). Note that the 3D is equivalent to 2D .

Iv-B Fully/Partially Observable 2D/3D Navigation Tasks

Let assume that represents the global map representation of a given environment. This map is initialized using a priori knowledge about the environment. Let be the local map of MPPI, which has the same size as and each cell is initialized with referring to unknown cells. This local map is continuously updated using an onboard sensing system, as depicted in Fig. 2.

Iv-B1 Fully Observable Case

In a fully observable case, MPPI is directly provided with a global map to compute the trajectories’ cost for avoiding obstacles, taking into account the current robot position obtained from a state estimator. For a 2D navigation task, a 2D floor grid is sufficient; only the first layer, , of 3D voxel-grid will be accessed by MPPI. While, for the 3D case, the 3D voxel-grid is required for performing collision-free navigation. Clearly, given the robot’s position, particularly its -component, the corresponding layer, , to -component is used for evaluating the cost-to-go of each sampled trajectory (see Fig. 3). At the moment, the main limitation of the fully observable case is that our framework is only able to handle static environments; this is an important motivation for defining a partially observable navigation task that is prevalent in robotics applications.

Iv-B2 Partially Observable Case

In this case, the robot is located in a previously unseen and dynamic environment and must navigate to: (i) a predefined goal, (ii) or explore and map that area. As the environment is unknown and is not directly accessible by MPPI, the robot must perceive its environment (in our case, through the predefined mask ) and react appropriately. Here, the MPPI map , which is initialized with , is fed directly into the control framework as a local map for avoiding obstacles. This map is continuously updated as the robot moves around. For the sake of clarity, let us consider the map in Fig. 3 as , where the robot is located in and has dimensions of . As a consequence, based on the intersection between and , the layers from to in are updated, while other layers have remained constant.

Given:
          : Global and local map of MPPI
          : Sensor’s FoV and its parameters

1:while task not completed do
2:      StateEstimator(),
3:      MapUpdate()
4:      MPPIController()
5:     Check for task completion
6:end while
Algorithm 2 Real-Time Generic MPPI Framework

Generally speaking, the real-time implementation of our proposed generic MPPI framework for 2D or 3D navigation in cluttered environments is better described in Algorithm 2, which employs the MPPI control scheme described above. At each time-step , the current state is estimated (line 2). The local map of MPPI is then updated accordingly, given the global map (line 3). As discussed previously, in conjunction with is used for updating the local map , as the perception modules have not been considered in the current work. Thus, in practice, the robot must be equipped with an on-board sensing system, e.g. depth camera or laser scanner, with a maximum FoV . This currently sensed data is used to obtain a 2D or 3D occupancy map, which is continuously updated, i.e., . For instance, a quadrotor-based exploration algorithm is proposed in [3] to build a real-time 3D map. Finally, this map enables the controller to find the optimal control to be executed, resulting in collision-free navigation (line 4).

V Simulation Details and Results

In this section, we describe how we evaluate our approach in terms of simulation scenarios and performance metrics. Moreover, we conduct realistic simulations to evaluate and demonstrate the performance of the proposed framework.

V-a Simulation Setup, Scenarios, and Metrics

V-A1 Simulation Setup

In order to evaluate the performance of our proposed MPPI framework in a previously unseen and cluttered environment, simulation studies have been performed using RotorS simulator and Gazebo [5]. The parameters of the real simulated quadrotor (namely, Humminbird quadrotor) are tabulated in Table I. To ensure a realistic RotorS-based simulation, all navigation tasks are carried out by (i) using noisy sensors such as GPS and IMU, (ii) adding external disturbances such as continuous wind with changing speed and direction, as proposed in [14], and (iii) considering 10% of modelling errors in the real values of mass and inertia of the prediction model given in (1). The MPPI controller has a time horizon of , a control frequency of , and generates 2700 samples each time-step . The rest of its hyper-parameters are also listed in Table I, where the value in represents the noise in the thrust input and . For the SG filter, we set the length of filter window and order of the polynomial function to 51 and 3, respectively. The real-time execution of MPPI is performed on a GeForce GTX 1080 Ti, where all algorithms are written in Python and have been implemented using ROS.

Parameter Value Parameter Value
[] []
[]
[]
[]
[]
[]
TABLE I: Parameters of quadrotor and MPPI

In our RotorS-based simulations, the full-state information, , is directly provided, using an odometry

estimator based on an Extended Kalman Filter (EKF), as an input to the controller. While, since the actual control signal of the quadrotor is the angular velocity of each rotor

, the controller outputs and are directly converted into using (2), to be sent to the quadrotor as Actuators message. In summary, the closed-loop of MPPI in ROS, at each , can be summarized as: (i) MPPI first receives the odometry message and the updated map obtained from the on-board sensor; (ii) the control action is then computed and published. As mentioned before, the whole process of our proposed framework is summarized in Fig. 2.

Since we are interested in goal-oriented quadrotor navigation tasks in cluttered environments, the state-dependent cost function is defined as

where:

The first term indicates the collision with ground or obstacles, while prevents the quadrotor from (i) going too fast, (ii) using too aggressive roll and pitch angles, or (iii) colliding with the ceiling, where and are boolean variables. refers to the desired position to be reached and its orientation, while is a weighing matrix.

V-A2 Simulation Scenarios

Two different scenarios are considered for evaluating the proposed framework: 2D scenario and 3D scenario. The 2D scenario refers to a windy forest of cylindrical obstacles placed in a 2D grid pattern, where each cylinder has a radius of with equal spacing of apart. The 3D scenario refers to the same environment described in the 2D scenario, while we have added two horizontal layers of cylinders at   and with the same spacing each, as shown in Fig. 4. The voxel size is set to , to ensure high accuracy and to meet the reality when a 3D occupancy grid is involved. As a result, the 3D voxel grid has 43 layers, each layer represents a 2D grid.222The cell size of 2D grid is (i.e., 1:1 scale), since all obstacles, in our case, are placed in a 2D grid pattern with integer numbers. The 3D scenario is used for performing fully and partially observable 3D navigation tasks, while the former is used for the 2D tasks. For both scenarios, two different cases are considered: (i) Fully Observable Case (FOC), in which an a priori knowledge about the environment is used to initialize the global map ; in FOC, is exactly ; and (ii) Partially Observable Case (POC), where it is assumed that there is no a priori information about the environment; for this reason, the local map is discovered and built online as the robot moves around.

Fig. 4: The 3D cluttered environment used for 3D tasks.

V-A3 Performance Metrics

We define two metrics for evaluation. First, the performance of MPPI in POC is compared to its performance in FOC, considering the latter case as a baseline. To achieve a fair comparison, in all simulations, the robot navigates to the same specified goals with a maximum velocity of . The predefined goals (in ) in order are: , then the UAV will land. While, at each , the desired yaw angle is updated, letting the front camera points towards the next goal. For both cases (FOC/POC), we use a number of indicators, as a second metric, to describe the general performance such as the number of collisions , task completion percentage , average completion time , average flying distance , average flying speed , and average energy consumption (for more details, we refer to [14]). In all simulations, the task is considered to be terminated if the quadrotor reached the given goals (i.e., ) or crashed into an obstacle (i.e., ).

V-B Simulation Results

The performance of our proposed framework is validated for both fully and partially observable 2D/3D quadrotor navigation tasks, considering the predefined goals. In all partially observable tasks, we set to , while represents the angle between the current and next goal. To test whether the quadrotor is able to navigate successfully through the cluttered environment, we performed 5 trials for both FOC and POC. The reader is invited to watch the whole simulation results at https://bit.ly/2PAbESO.

V-B1 2D Navigation Results

Figure 5 shows an example of a final trajectory generated by MPPI in the case of (i) using a 2D global map for FOC, or (ii) using only a 2D local map for POC. The 2D-floor (with grey circles) and its updated (with blue circles, representing the obstacles within the robot’s ), for given goals, are shown in Fig. 6, including the generated trajectories in both cases and the robot’s FoV . In both 2D FOC and POC, it can be seen that MPPI is able to safely navigate through the windy obstacle field, in spite of the presence of modelling errors and measurements noise.

Fig. 5: Comparison of 3D trajectories in a 2D scenario.
Fig. 6: The obstacle field map, i.e., 2D , and its updated 2D map using a 2D . Two lines represent the trajectories generated by MPPI in both cases, i.e., FOC and POC.
Fig. 7: Comparison of 3D trajectories in a 3D scenario.

V-B2 3D Navigation Results

Another example of successfully generated trajectories for fully and partially observable 3D navigation tasks in a 3D scenario environment is depicted in Fig. 7. In this scenario, since consists of 43 2D grids, it is very difficult to visualize clearly in this paper the updates over all layers and insure that the robot performs collision-free navigation. So, first, we show only the layer , which has been mainly updated while the robot was moving towards and while it was landing after task completion, as an example of how the 3D is updated (see Fig. 8). Second, it is highly recommended to use the indicators described previously, especially for 3D cases.

Fig. 8: The update of the layer in the 3D , showing the generated trajectory in POC.

V-B3 Overall Performance

Table II shows the performance statistics for the five trials in 2D and 3D scenarios, considering the parameters of the controller tabulated in Table I.

Indicators 2D Scenario 3D Scenario
FOC POC FOC POC
0 (100%) 0 (100%) 0 (100%) 0 (100%)
[]
[]
[]
[]
[]
TABLE II: General performance of MPPI in 2D and 3D scenarios for both FOC and POC

In all trials (i.e., 20 trials in total), we can observe that the quadrotor navigates autonomously (i.e., ) while avoiding obstacles with an average flying speed of about , which is closer to the maximum specified velocity, i.e., , regardless of the limited FoV of the sensor in both 2D and 3D scenarios. Note that these 20 trials are equivalent to more than of autonomous navigation. Moreover, it can be seen that the average execution time of MPPI per iteration,

, in the 3D scenario is approximately equal to that in the 2D scenario (with very low standard deviation values), showing the superiority of the proposed framework even with 3D-voxel grids and its applicability to be used for 3D navigation tasks. In fact, this is not surprising as the 2D- or 3D-voxel grids are stored and directly used by MPPI on the GPU. For other indicators, we can observe that the performance of MPPI in

POC cases is slightly different than its performance in FOC cases, although we have set the same parameters for all simulations. The reason behind this difference is that the control variations are generated randomly, each time-step , on the GPU.

  Indicators 2D Scenario 3D Scenario
FOC POC FOC POC
Tuning Case 1: (i.e., ) and
Ncol (tcomp[%]) 1 (57.430) 1 (57.430) 1 (40.36) 1 (36.39)
Tuning Case 2: (i.e., ) and
Ncol (tcomp[%])    0 (100%)    0 (100%)    0 (100%)    0 (100%)
dav [m]   51.70.79   51.71.1   51.10.16   52.10.68
Tuning Case 3: (i.e., ) and
Ncol (tcomp[%])    0 (100%)    0 (100%)    0 (100%)    0 (100%)
dav [m]   50.40.86   50.70.25   50.20.49   50.50.36
Tuning Case 4: (i.e., ) and
Ncol (tcomp[%])    0 (100%)    0 (100%)    0 (100%)    0 (100%)
dav [m]   52.40.78   52.50.36   52.80.74   53.00.45
Tuning Case 5: and
Ncol (tcomp[%])    0 (100%)    0 (100%)    0 (100%)    0 (100%)
dav [m]   51.90.59   51.50.85   52.10.25   52.60.36
Tuning Case 6: and
Ncol (tcomp[%])    0 (100%)    0 (100%)    0 (100%)    0 (100%)
dav [m]   50.90.25   51.00.46   51.50.93   51.90.33
TABLE III: Effect of changing and on the behavior of the proposed controller

The number of timesteps (which is related to the time horizon and control frequency) and the exploration variance (which is related to the number of sampled trajectories ) play an important role in determining the behavior of MPPI. Therefore, we tested six different tuning cases, where different values of and have been considered, as shown in Table III. In the first three tuning cases, we study the influence of changing where remains constant (i.e., as defined before in Table I). While the effect of changing is studied in the last three cases, where was varied between and . For the sake of simplicity, each tuning case was tested by letting the quadrotor navigates only to then lands. In each tuning case, we conducted trials. As a consequence, the total trials for both FOC and POC cases are . For Tuning Case 1, where a short time horizon is chosen, it can be clearly noticed that the MPPI controller is unable to complete the trials at a satisfactory rate, where the success rate varies from to a maximum of in both 2D FOC and POC (1/3 successful trials). For other tuning cases (namely, from case 2 to 6), we can observe that the controller performs perfectly and is able to successfully complete all trials while avoiding obstacles. We can also notice that as and increase, the performance of the controller improves. Clearly, for Tuning Case 3 and 6 where high values of and are chosen, we can see that the average flying distance (for both 2D and 3D scenarios) is the shortest (compared to other successful cases) with low standard deviation values, which means that the quadrotor is taking a more direct route leading to the goal. However, having too long time horizons increase the computational effort dramatically because each trajectory takes more time to simulate. While, if the exploration variance is too large, the controller produces control inputs with significant chatter. We also notice during our simulations that higher values of control weight matrix leads to empirically slow convergence of MPPI towards the goal and fluctuated motion. For this reason, in all experiments, we set , , and to the values given in Table I, where MPPI performs very consistently for all given goals.

Vi Conclusions and future work

Within this work, we proposed an extension to the classical MPPI framework that enables the robot to navigate autonomously in 2D or 3D environments while avoiding collisions with obstacles. The key point of our proposed framework is to provide MPPI with a 2D or 3D grid representing the real world to perform collision-free navigation. This framework has been successfully tested on realistic simulations using quadrotor, considering both fully and partially observable cases. The current simulations illustrate the efficiency and robustness of the proposed controller for 2D and 3D navigation tasks. Although the theoretical stability of MPPI has not been addressed and proofed in the literature, its practical stability can be achieved by setting the MPPI parameters carefully as we described previously. We will explore the possible methods that may allow in the future to study the theoretical stability of MPPI. Our future work will also include the implementation of the framework in practical applications.

References

  • [1] T. Baca, D. Hert, G. Loianno, M. Saska, and V. Kumar (2018-Oct.) Model predictive trajectory tracking and collision avoidance for reliable outdoor deployment of unmanned aerial vehicles. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, pp. 6753–6760. Cited by: §I.
  • [2] A. Buyval, A. Gabdullin, K. Sozykin, and A. Klimchik (2019) Model predictive path integral control for car driving with autogenerated cost map based on prior map and camera image. In 2019 IEEE Intelligent Transportation Systems Conference, pp. 2109–2114. Cited by: §I.
  • [3] T. Cieslewski, E. Kaufmann, and D. Scaramuzza (2017-Sept.) Rapid exploration with multi-rotors: a frontier selection method for high speed flight. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, pp. 2135–2142. Cited by: §IV-B2.
  • [4] P. Drews, G. Williams, B. Goldfain, E. A. Theodorou, and J. M. Rehg (2017)

    Aggressive deep driving: combining convolutional neural networks and model predictive control

    .
    In Conference on Robot Learning, pp. 133–142. Cited by: §I.
  • [5] F. Furrer, M. Burri, M. Achtelik, and R. Siegwart (2016) Robot operating system (ros): the complete reference (volume 1). pp. 595–625. Cited by: item 2, §V-A1.
  • [6] D. González, J. Pérez, V. Milanés, and F. Nashashibi (2015-Nov.) A review of motion planning techniques for automated vehicles. IEEE Transactions on Intelligent Transportation Systems 17 (4), pp. 1135–1145. Cited by: §I.
  • [7] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard (2013) OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Autonomous robots 34 (3), pp. 189–206. Cited by: §IV-A.
  • [8] J. Kai, G. Allibert, M. Hua, and T. Hamel (2017-07) Nonlinear feedback control of quadrotors exploiting first-order drag effects. In IFAC World Congress, Toulouse, France, pp. 8189–8195. Cited by: §II.
  • [9] R. Kusumoto, L. Palmieri, M. Spies, A. Csiszar, and K. O. Arras (2019-05) Informed information theoretic model predictive control. In International Conference on Robotics and Automation, Montreal, Canada, pp. 2047–2053. Cited by: §I.
  • [10] D. Mellinger, N. Michael, and V. Kumar (2012) Trajectory generation and control for precise aggressive maneuvers with quadrotors. The International Journal of Robotics Research 31 (5), pp. 664–674. Cited by: §I, §II.
  • [11] K. Mohta, M. Watterson, Y. Mulgaonkar, S. Liu, C. Qu, A. Makineni, K. Saulnier, K. Sun, A. Zhu, J. Delmerico, et al. (2018) Fast, autonomous flight in GPS-denied and cluttered environments. Journal of Field Robotics 35 (1), pp. 101–120. Cited by: §I.
  • [12] M. Okada and T. Taniguchi (2018-Sept.) Acceleration of gradient-based path integral method for efficient optimal and inverse optimal control. In IEEE International Conference on Robotics and Automation, Brisbane, Australia, pp. 3013–3020. Cited by: §I.
  • [13] M. Ryll, J. Ware, J. Carter, and N. Roy (2019-05) Efficient trajectory planning for high speed flight in unknown environments. In International Conference on Robotics and Automation, Montreal, Canada, pp. 732–738. Cited by: §I.
  • [14] T. Tezenas Du Montcel, A. Negre, J. Gomez-Balderas, and N. Marchand (2019-05) BOARR: a benchmark for quadrotor obstacle avoidance based on ROS and RotorS. In French national conference on ROS (ROSCon Fr), Paris, France. External Links: Link Cited by: §V-A1, §V-A3.
  • [15] G. Williams, A. Aldrich, and E. A. Theodorou (2017) Model predictive path integral control: from theory to parallel computation. Journal of Guidance, Control, and Dynamics 40 (2), pp. 344–357. Cited by: §I, §III, Algorithm 1, footnote 1.
  • [16] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou (2018) Information-theoretic model predictive control: theory and applications to autonomous driving. IEEE Transactions on Robotics 34 (6), pp. 1603–1622. Cited by: §I, §I, footnote 1.