Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and Animals

02/25/2019
by   Fan Fei, et al.
Purdue University
2

Insects and hummingbirds exhibit extraordinary flight capabilities and can simultaneously master seemingly conflicting goals: stable hovering and aggressive maneuvering, unmatched by small scale man-made vehicles. Flapping Wing Micro Air Vehicles (FWMAVs) hold great promise for closing this performance gap. However, design and control of such systems remain challenging due to various constraints. Here, we present an open source high fidelity dynamic simulation for FWMAVs to serve as a testbed for the design, optimization and flight control of FWMAVs. For simulation validation, we recreated the hummingbird-scale robot developed in our lab in the simulation. System identification was performed to obtain the model parameters. The force generation, open-loop and closed-loop dynamic response between simulated and experimental flights were compared and validated. The unsteady aerodynamics and the highly nonlinear flight dynamics present challenging control problems for conventional and learning control algorithms such as Reinforcement Learning. The interface of the simulation is fully compatible with OpenAI Gym environment. As a benchmark study, we present a linear controller for hovering stabilization and a Deep Reinforcement Learning control policy for goal-directed maneuvering. Finally, we demonstrate direct simulation-to-real transfer of both control policies onto the physical robot, further demonstrating the fidelity of the simulation.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

page 5

page 6

page 7

04/11/2018

Reinforcement Learning for UAV Attitude Control

Autopilot systems are typically composed of an "inner loop" providing st...
04/26/2020

GymFG: A Framework with a Gym Interface for FlightGear

Over the past decades, progress in deployable autonomous flight systems ...
12/31/2020

Simulation and Control of Deformable Autonomous Airships in Turbulent Wind

Abstract. Fixed wing and multirotor UAVs are common in the field of robo...
03/07/2022

Mid-Air Helicopter Delivery at Mars Using a Jetpack

Mid-Air Helicopter Delivery (MAHD) is a new Entry, Descent and Landing (...
11/22/2020

Model Predictive Control for Micro Aerial Vehicles: A Survey

This paper presents a review of the design and application of model pred...
02/25/2019

Learning Extreme Hummingbird Maneuvers on Flapping Wing Robots

Biological studies show that hummingbirds can perform extreme aerobatic ...
03/29/2021

Orientation stabilization in a bioinspired bat-robot using integrated mechanical intelligence and control

Our goal in this work is to expand the theory and practice of robot loco...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Flying animals possess extraordinary capabilities and demonstrate rich repertoire of agile maneuvers, often under a variety of disturbances such as wind gust and rain [1]. They remain surprisingly stable during hover and can make sharp turns in a split second, e.g. the escape maneuvers of hummingbird take only 8 wing beats - a quarter of a second - to complete, as shown in Fig. 2. This is unmatched by man-made counterparts. Great progress has been made in recent years in the development of Flapping Wing Micro Air Vehicles (FWMAVs), among which Delfly [2], RoboBee [3], Nanohummingbird [4], KUBeetle [5], COLIBRI [6] and Purdue Hummingbird robot[7] have demonstrated successful takeoff and hovering.

Due to the complex nature of the unsteady aerodynamics during high-frequency flapping motion, the development of such platforms to match the performance of the nature’s flyers remains extremely challenging. On the design side, the system design and optimization problems are further complicated by the stringent weight, size and power constraints [7]. On the control side, the unsteady aerodynamics, high-frequency flapping oscillations, and noisy nonlinear flight dynamics present some extreme hurdles for maneuver control [8]. In summary, substantial progress is needed in all aspects of the system before a truly bio-inspired vehicle can be developed to approach the performance of its biological counterpart.

Furthermore, the difficulties and the limited availability of such hardware platforms could deter or slow down the interest and progress in FWMAVs. As a comparison, conventional robotic platforms are much more accessible such as manipulators, ground vehicles, underwater robots, legged robots, and drones/quadcopters. There are also various simulation and analytical tools for interested researchers to test their ideas [9, 10, 11, 12]. However, there is yet to be an easy-to-use flapping wing MAV simulation toolkit.

Fig. 1: Diagrams of the FWMAV robot platform and its simulation environment.
Fig. 2: A hummingbird flying from stable hovering to maneuvering back to hovering under 8 wingbeats [13]. The silhouette of the hummingbird is enhanced for better visibility.

To facilitate the design of FWMAV platforms and the study of flapping flight control in general, we present an open source high fidelity dynamic simulation for FWMAVs and flapping-wing animals such as hummingbirds and insects. Using the flapping-wing robot developed in our lab [7] as a blueprint, we built its virtual counterpart in the simulation environment. The simulation is written in C++ with Python binding, using customized flapping-wing aerodynamic models and DART [14] physics engine to solve multi-body kinematics and dynamics. The physical parameters were obtained by performing system identification on the robot. The aerodynamic modeling is validated through wing kinematics and force/torque measurements. Open loop flight tests were conducted and state transition statistics was verified. Finally, we demonstrate that the fidelity of the simulation is suitable for continuous control tasks. A feedback flight controller is designed in the simulation to achieve stable position tracking for the robot. When transfer to the robotic platform, the same flight performance was achieved on the vehicle by directly implementing the simulated controller onboard the robot. We also developed a goal-directed flight maneuvering control policy using deep reinforcement learning. The policy was optimized in simulation and directly transferred to the robot. Successful transferring of both controllers further validates the fidelity and effectiveness of the simulation.

With this tool, MAV designers can iterate and optimize their design and parameters before tediously building physical variations and testbeds, as the vehicle dynamics is detailed down to component level in simulation. This tool is built on top of DART, so topics like state estimations, perception, localization and mapping, can be studied with integration to ROS and Gazebo. We also challenge control and learning researchers to control such system to be equal or better at maneuvering than the animal, for which, we provide hummingbird data for comparison (Fig.

2

). Faster than realtime simulation and OpenAI Gym interface support research topics on Control Theory, Deep Reinforcement Learning and Imitation Learning. For experimental biologists, we provide several flapping-wing animal models with full degrees of freedoms of the wing motion, aiding the study of neural muscular control, flapping flight behaviors and evolution. We are open to provide experimental support on the physical robots for simulation users, like Robotarium

[15] and DuckieTown [16]. The code and data will be available online.

Ii System Model

Ii-a System Definition

The robotic vehicle used in this study is a motor-driven FWMAV, on which two motors were equipped to drive the two wings independently. It has a wingspan of and weights . Torsion spring was used to achieve resonance. Details of the platform are presented in [7]. We use wingbeat modulation technique to generate thrust and control torque[17]. The four input signal is defined as amplitude , amplitude difference , bias and split-cycle parameter , which controls thrust, roll, pitch, and yaw torque.

To recreate the FWMAV platform in the simulation, we describe the vehicle with five rigid bodies: one torso, two leading edge frames, and two wings. The leading edge is linked to the torso with a stroke joint, and the wing is linked to the leading edge with a rotation joint. The stroke joints are configured with spring constants. We simulate motor torques to drive the leading edge frame back and forth. Aerodynamic forces and torques can be calculated and applied on the wings at the span-wise and chord-wise center of pressure and . The stroke and rotation angles are set with limited movement range.

Fig. 3: Coordinate frames of the flapping wing in simulation, with the applied forces and torques illustrated. The angle of attack (AoA) formed by passive rotation of the wing, the torso, the left leading edge and the left wing joint are shown.

We define the wing movement with four degrees of freedoms so the aerodynamic model can be generalized to robots and animals. The leading edge has three degrees of freedoms: stroke plane offset , stroke angle , deviation angle , and the wing has one degree of freedom , which is the rotation angle. As shown in Fig. 3, and are fixed at zero for our vehicle platform. The coordinate system of the body and the left wing is illustrated in Fig. 4, where is the center of mass (CoM), is the left shoulder, is the distance from CoM to the shoulder and is half shoulder width. The positive direction of each degree of freedom is defined such that for both wings, positive can produce positive yaw torque, positive corresponds to upstroke, positive corresponds to heaving or abduction and positive corresponds to pronation.

Ii-B Aerodynamics

To accurately capture the body dynamics of the vehicle, we need to calculate the instantaneous aerodynamic forces and torques. Based on the blade element method and quasi-steady model, we calculate the normal force and rotational moment on the wing from the effective wing kinematics by incorporating body kinematics into wing motion through coordinate transformation.

Fig. 4: Coordinate frame definition of the left wing’s movement. The origins of all wing frames are located at shoulders . The four degrees of freedom is used to describe the wing kinematics of both robots and animals.

To calculate the aerodynamic force we need the velocity of the wing at span-wise location , and its angle of attack . For convenience, we divide the velocity into two components: one out-of-plane component that is normal to the plane, one in-plane component within the plane and normal to the leading edge.

(1)

where indicates left wing and indicates right wing, subscript indicates in-plane component, indicates out of plane component.

For the left wing, coefficient , , and . is the in-plane component, where its sign follows the positive stroke direction (right hand about ), and is the out-of-plane component, defined along the axis; and are the linear and angular velocity of the left leading edge in frame.

The body linear and angular velocity is and . The left shoulder velocity in the body frame can be calculated by

(2)

To get the velocity components in the leading edge frame, we define a rotation matrix which first rotates about axis then , then finally axis. The left shoulder velocity in the leading edge frame is

(3)

The angular velocity of the leading edge frame is

(4)

Knowing and , can be expressed in frame as

(5)

With (3) and (5), the coefficient and in equation (1) can be calculated. Right wing velocity is calculated similarly.

For simplicity and computation efficiency, we consider the angle of attack at the span-wise center of pressure

(6)

Define normal force in direction, aerodynamic moment and rotational damping moment in right-hand direction, from observation, we have .

To calculate the forces, the velocity squared at can be written as

(7)

Integrate blade element force along the wingspan [18], the normal force, aerodynamic moment and rotational damping moment are

(8)

where a non-dimensional chord-wise center of pressure is adopted from [19] and implemented as periodic.

The total instantaneous aerodynamic forces applied on the wing are about axis and at cord-wise and span-wise.

Iii Model Validation

It is well known that flapping wing robots are sensitive to mechanical imperfections in force production [20, 8]. To verify the fidelity and accuracy of the simulation quantitatively, it is ideal to have a model identical to the real robot. We first conduct system identification to tweak the uncertain system parameters to best approximate the mechanical trim condition of the real robot, then we validate its wing kinematics and open loop state transition. Note that for controller and vehicle design, small parametric uncertainty is acceptable, as the overall dynamic behavior is not affected and the small mechanical trim can be compensated by the controller. For reinforcement learning applications, dynamics randomization can be used to achieve a robust control policy [21, 22].

Iii-a System Identification and Force Mapping

Most system parameters can be directly measured and stay constant such as motor torque constant as well as mass and wing shape parameters if assume no physical damage occurs. Some parameters cannot be measured accurately but have non-negligible effects on body torque generation, such as spring resting position and wing rotation angle limits, which create net pitch and yaw torques. We measured all parameters, and use system ID to tweak the uncertain parameters within small bounds. Given the large number of uncertain parameters and highly coupled nonlinear dynamics, we use genetic algorithm to find the best fit. The parameters to be adjusted are: motor resistance

and , spring stiffness in both directions , and , , mid-stroke resting angle and and wing rotation angle upper and lower limit , , and .

Since the mass property of the robot is largely constant and can be easily measured, the system identification process focuses more on accurate force generation. We use an ATI Nano 17 sensor to measure the cycle-averaged force generated by the robot under different operating points. A total of 37 different inputs were used and 6 body force and torque were measured at each operating point. This force map with 222 data points will be used as the ground truth to measure the accurateness of the force generation of the simulation. The cost is defined as the squared error sum between the measured force and the force calculation from simulation across all data points.

Fig. 5: The force map of the real and the simulated vehicle. Thrust and three control torques are matching well after system identification.

The parameters were optimized with 200 individuals for 200 generations. The result with the best fit is shown in Fig. 5. The simulated force map matches the measurement well, with minor errors under larger inputs. This could be caused by the nonlinearity of the spring at a large deviation angle. The total error is 4.1%.

Iii-B Wing Kinematic

To further validate the system identification, we compare the wing kinematics of the real vehicle with the simulation. A high-speed camera is used to record the wing motion at , wing stroke and rotation angles are extracted using [23]. The real wings have a bi-stable design with the majority of the area constructed as a rigid plate. Since they still have a certain degree of twist, we pick the wing tip for rotation angle measurement since tip velocity is the highest. The simulated robot has a constant geometric AoA.

The wing kinematics under sinusoidal input is shown in Fig. 6. As seen from the figure, larger stroke amplitude on right wing corresponds to negative roll torque in force measurement, positive bias in stroke angle correlates to the positive pitch trim, and the difference in the rotation angle limit between two wings will result in a net yaw torque.

Fig. 6: A sample wing kinematics comparison of left and right wings between the robot and simulation with input.

Iii-C Open Loop State Transition

Fig. 7: The normalized state transition error between simulation and real robot is shown. The averaged absolute error after 1 wingbeat of each state variable is listed at the bottom.

For continuous control, the behavior of a vehicle can be viewed as a Markov decision process (MDP) with state space

, action space . For the simulation to be statistically meaningful, we need to evaluate whether the open loop state transition dynamics of the simulation matches that of the real vehicle. The state of the vehicle is and the action is . A total of 20 open loop flights were conducted, to avoid ground effect, only data points with altitude of at least five wing chord length were used. A total of 2500 valid samples were collected.

To evaluate the simulation state transition, we use each sample from the flight data as the initial state, and run the simulation with the recorded input and compare the state values with measurements after a given time. The averaged result is shown in Fig. 7 for each state. The error is normalized by the maximum range within one wingbeat across the 2500 samples collected for each state. The error shows that the simulation can accurately capture the state transition within one wingbeat with less than 5% error, where Euler angle error is smaller than and position error about . The state transition error is still acceptable after 2 wingbeats with only pitch and velocity showing larger error. This is expected as pitch and direction corresponds to the severe body vibration caused by the cyclic aerodynamic forces.

Iv Flight Control Baselines with Experimental Validation

Iv-a Closed Loop Position Controller

Fig. 8: a,b) These two figures demonstrate the body Euler angles and the position of real (top) and simulated (bottom) robots, respectively. c) Plots of the positions of both vehicles in the inertia frame; the dots indicate the first 8 seconds of the flight. d,e) Composed image sequences of the closed loop controlled flights of the real (left) and simulated (right) robots. The first 8 seconds of the flight was shown, demonstrating direct sim-to-real transfer of the controller and control gains.
Fig. 9: a,b) These two figures demonstrate the body Euler angles and the position of real (left) and simulated (right) robots, respectively. c,d) Composed image sequences of the controlled flights of the real (left) and simulated (right) robots. Direct sim-to-real transfer of the reinforcement learning maneuver policy is shown here.

We now show that the simulation can be used for flight controller design. We constructed a simple PID flight controller based on rigid body dynamics for the FWMAV in the simulation. The controller has a cascading structure wherein the outer loop is a cascading position and velocity PD controller that generates the target attitude, and the inner loop is the PID attitude controller. Heading (yaw) was controlled independently. The simulation ran at 10kHz, virtual Vicon and IMU sensors were implemented in the simulation at 150Hz and 500Hz respectively, with noise characteristics and delay similar to their physical counterpart. The sensor fusion and control algorithm were run at 500Hz. We manually tuned the control gains to achieve a stable flight. The controller with the tuned gains was then transferred to the robotic FWMAV platform.

The closed loop performance of the robotic vehicle is similar to that observed in the simulation as expected. Closed loop control error is very close between the two as shown in Table I. A sample flight data from the vehicle and the simulation using the same controller with the same reference input is shown in Fig. 8. The vehicle is commanded to takeoff and hovering at the height of 0.4m. Observing Fig. 8 a) and b), each axis exhibits similar behavior and tracking error, indicating the closed loop dynamics is very similar between simulation and experiment. Both vehicles move to their left during takeoff, as a result of the negative roll torque offset. The non-zero pitch angle is due to a small thrust component in the direction. These phenomenons further justifying the accuracy of the simulation and system identification.

RMS (deg&mm) Roll Pitch Yaw
Experimental 1.76 3.95 5.23 34.9 37.2 24.4
Simulation 1.60 3.86 4.79 36.5 38.7 26.9
TABLE I: Closed loop control error

Iv-B Goal-directed Maneuvering

To further demonstrate the fidelity of the simulation, we present a reinforcement learning policy transfer for maneuvering flight of the FWMAV. The goal of the flight maneuver is to move from position with yaw heading to with yaw heading , in hope of mimicking the hummingbird’s fast escape maneuver [13].

We use a standard reinforcement learning setup to optimize a maneuvering policy approximated by a standard MLP. The state transition dynamics is the closed-loop dynamics of the vehicle with feedback controller. The input is the state , and the output in this case is the additional control effort . The reward is selected such that the vehicle will receive positive reward near and with correct heading.

Since the system is largely deterministic, popular actor-critic algorithm deep deterministic policy gradient (DDPG) is selected to train the policy. We use 2 hidden layers of 32 hidden units for the actor network and 2 hidden layers of 64 hidden units for the critic network. The implementation is based on [24]

with same hyperparameters as

[25]. Dynamics randomization [21] is used during training, where we randomize the physical parameters of the vehicle slightly to improve the robustness of training and ensure simulation to the real world transfer. The learning curve averaged over 5 runs with different random seeds is shown in Fig. 10.

Fig. 10: Training curve of the maneuver policy averaged over 5 random seeds.

The trajectory manifested by the optimized policy is a unique multi-axis fast maneuver that minimizes the travel time. The resulting flight from simulation and the real flight is shown in Fig. 9. Detail of this study is presented in [26].

V Conclusions

In this work, we developed an open source high fidelity simulation with realistic multi-body dynamics for flapping wing flight. Instantaneous aerodynamics were simulated using blade element theory and quasi-steady aerodynamic model which was validated by force measurements. Open loop state transition dynamics of the simulation is validated by calculating the state transition error between the simulation and the vehicle. The error shows the simulation can accurately predict instantaneous state transitions and capture the dynamic effects. With successful sim-to-real transferring, we demonstrate the fidelity of the simulation in two controller design applications: 1) a linear cascading PID controller for FWMAV position control, 2) unique goal-directed maneuvering of FWMAV using a policy optimized by reinforcement learning. For both applications, no special treatments were needed in controller implementation. The experimental data match simulation results, proving the fidelity of the simulation. With motor and contact dynamics, the current feedback could also be used as tactile sensing to mimic animal somatosensory [27], which could be exploited in simulation for control and trajectory planning design. This open source simulation can serve as a design and flight control testbed for scientists and researchers interested in studying flapping wing animals and robots. The code, baselines, and data will be available online, and experimental support on the robot will be provided.

References

  • [1] C. Ellington, “The aerodynamics of insect flight. ii. morphological parameters,” Phil. Trans. R. Soc. Lond. B, vol. 305, pp. 17–40, 1984.
  • [2] G. De Croon, K. De Clercq, R. Ruijsink, B. Remes, and C. de Wagter, “Design, aerodynamics, and vision-based control of the delfly,” International Journal of Micro Air Vehicles, vol. 1, no. 2, pp. 71–97, 2009.
  • [3] K. Y. Ma, P. Chirarattananon, S. B. Fuller, and R. J. Wood, “Controlled flight of a biologically inspired, insect-scale robot,” Science, vol. 340, no. 6132, pp. 603–607, 2013.
  • [4] M. Keennon, K. Klingebiel, and H. Won, “Development of the nano hummingbird: A tailless flapping wing micro air vehicle,” in 50th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition, 2012, p. 588.
  • [5] H. V. Phan, T. Kang, and H. C. Park, “Design and stable flight of a 21 g insect-like tailless flapping wing micro air vehicle with angular rates feedback control,” Bioinspiration & biomimetics, vol. 12, no. 3, p. 036006, 2017.
  • [6] A. Roshanbin, H. Altartouri, M. Karásek, and A. Preumont, “Colibri: A hovering flapping twin-wing robot,” International Journal of Micro Air Vehicles, vol. 9, no. 4, pp. 270–282, 2017.
  • [7] J. Zhang, F. Fei, Z. Tu, and X. Deng, “Design optimization and system integration of robotic hummingbird,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on.   IEEE, 2017, pp. 5422–5428.
  • [8] J. Zhang, Z. Tu, F. Fei, and X. Deng, “Geometric flight control of a hovering robotic hummingbird,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on.   IEEE, 2017, pp. 5415–5421.
  • [9] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2.   Kobe, Japan, 2009, p. 5.
  • [10] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on.   IEEE, 2012, pp. 5026–5033.
  • [11] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
  • [12] N. P. Koenig and A. Howard, “Design and use paradigms for gazebo, an open-source multi-robot simulator.” in IROS, vol. 4.   Citeseer, 2004, pp. 2149–2154.
  • [13] B. Cheng, B. W. Tobalske, D. R. Powers, T. L. Hedrick, S. M. Wethington, G. T. Chiu, and X. Deng, “Flight mechanics and control of escape manoeuvres in hummingbirds i. flight kinematics,” Journal of Experimental Biology, pp. jeb–137 539, 2016.
  • [14] J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “Dart: Dynamic animation and robotics toolkit,” The Journal of Open Source Software, vol. 3, no. 22, p. 500, 2018.
  • [15] D. Pickem, P. Glotfelter, L. Wang, M. Mote, A. Ames, E. Feron, and M. Egerstedt, “The robotarium: A remotely accessible swarm robotics research testbed,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on.   IEEE, 2017, pp. 1699–1706.
  • [16] L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, Y. F. Chen, C. Choi, J. Dusek, Y. Fang et al., “Duckietown: an open, inexpensive and flexible platform for autonomy education and research,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on.   IEEE, 2017, pp. 1497–1504.
  • [17] D. B. Doman, M. W. Oppenheimer, and D. O. Sigthorsson, “Wingbeat shape modulation for flapping-wing micro-air-vehicle control during hover,” Journal of guidance, control, and dynamics, vol. 33, no. 3, pp. 724–739, 2010.
  • [18] J. P. Whitney and R. J. Wood, “Aeromechanics of passive rotation in flapping flight,” Journal of fluid mechanics, vol. 660, pp. 197–220, 2010.
  • [19] W. Dickson, A. Straw, C. Poelma, and M. Dickinson, “An integrative model of insect flight control,” in 44th AIAA Aerospace Sciences Meeting and Exhibit, 2006, p. 34.
  • [20] P. Chirarattananon, K. Y. Ma, and R. J. Wood, “Adaptive control of a millimeter-scale flapping-wing robot,” Bioinspiration & biomimetics, vol. 9, no. 2, p. 025004, 2014.
  • [21] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 1–8.
  • [22]

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in

    Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on.   IEEE, 2017, pp. 23–30.
  • [23] T. Hedrick, MATLAB tools for digitizing video files and calibrating cameras. [Online]. Available: http://www.unc.edu/~thedrick/software1.html
  • [24] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Benchmarking deep reinforcement learning for continuous control,” in

    International Conference on Machine Learning

    , 2016, pp. 1329–1338.
  • [25] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  • [26] F. Fei, Z. Tu, J. Zhang, and X. Deng, “Learning extreme hummingbird maneuvers on flapping wing robots,” in 2019 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2019.
  • [27] Z. Tu, F. Fei, J. Zhang, and X. Deng, “Acting is seeing: Navigating tight space using flapping wings,” in 2019 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2019.