A Study on the Challenges of Using Robotics Simulators for Testing

04/15/2020 ∙ by Afsoon Afzal, et al. ∙ Carnegie Mellon University 0

Robotics simulation plays an important role in the design, development, and verification and validation of robotic systems. Recent studies have shown that simulation may be used as a cheaper, safer, and more reliable alternative to manual, and widely used, process of field testing. This is particularly important in the context of continuous integration pipelines, where integrated automated testing is key to reducing costs while maintaining system safety. However, simulation and automated testing are not seeing the degree of widespread adoption in practice that their potential would motivate. Our goal in this paper is to develop a principled understanding of the ways developers use simulation in their process, and the challenges they face in doing so. This type of understanding can guide the development of more effective simulators and testing techniques for modern robotics development. To that end, we conduct a survey of 82 robotics developers from a diversity of backgrounds that addresses the current capabilities and limits of simulation technology in practice. We find that simulation is used by 85 participants for testing, and that many participants desire to use simulation as part of their test automation. We identify 10 high-level challenges that impede developers from using simulation for manual and automated testing, and general purposes. These challenges include the gap between simulation and reality, a lack of reproducibility, and considerable resource costs associated with using simulators. Finally, we outline avenues for improvement in the development of new simulators that can help simulation reach its potential as a means of verification and validation.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Robotics simulators are invaluable tools that allow developers to rapidly and inexpensively design, prototype, and test robots in a controlled environment without the need for physical hardware. Popular simulators, such as Gazebo [1], V-REP [2], and Webots [3], have been used to simulate a variety of systems including industrial robots, unmanned aerial vehicles, and autonomous (self-driving) cars.

Simulation is particularly promising for verification and validation (V&V) of robotic systems, potentially providing an automated, cost-effective, and scalable alternative to the manual and expensive process of field testing [4, 5, 6, 7]. Simulation can effectively and automatically discover bugs in a variety of robot application domains [8, 9, 10, 11]. Numerous companies involved in the autonomy sector, such as Uber [12], NVIDIA [13], and Waymo [14], use simulation on a large scale to develop, train, and test their algorithms. The high demand for simulation in this sector has led to the development of a new generation of specialized simulators, such as CARLA [15], LGSVL [16], AirSim [17], and AADS [18].

However, simulation-based testing does not dominate robotics V&V to a degree commensurate with its potential. Instead, field testing remains the predominant means of V&V for robotic systems [19, 20]. Studies have compared robotics simulators on aspects such as their features, usability, performance, documentation, and graphical user interface (GUI) [21, 22]. However, these studies do not answer the question of why simulation is not more widely adopted as a core V&V practice, or what challenges developers face when using it. Indeed, prior work studying the challenges of testing in robotics [20] and cyberphysical systems (CPSs) [19] in general identify simulation as a key element of robotics/CPS testing that requires improvement.

Our goal in this paper is to develop a grounded understanding of the ways developers use simulation in their process and the challenges they face in doing so. This type of understanding can guide the development of more effective simulators and testing techniques for modern robotics development that are better suited to developer needs and that can ultimately result in higher quality robots.

To this end, we conduct a study of robotics developers to understand how they perceive simulation-based testing, and what challenges they face while using simulators. Our survey with 82 participants confirms that simulation is a popular tool among robotics developers and that testing is its most common use case. From our participants’ responses, we identified 10 challenges that make it difficult for developers to use simulation in general, for testing, and specifically for automated testing. The general challenges include the difficulties of learning and using simulators, the lack of realism, and the absence of specific capabilities, which constrain the way developers use simulation. The challenges that limit the extent of simulation-based testing include a lack of reproducibility, the complexities of scenario and environment construction, and considerable resource costs. Finally, the absence of automation features, a lack of reliability, and API instability discourage developers from using simulation for test automation and prevent developers from realizing the benefits of continuous integration. We believe that the results of this study can inform the construction of a new generation of software-based simulators, designed to better accommodate the needs of developers that arise during robotics testing.

Overall, we make the following contributions:

  • We conduct a study of 82 robotics developers from a variety of organizations and with diverse levels of experience in robotics.

  • We find that developers are using simulation extensively for testing their robots and that many developers want to incorporate simulation into their test automation.

  • We identify and explore ten key challenges that impede or prevent developers from using simulation in general and for manual and automated testing.

  • We suggest opportunities for improvement that may address the identified challenges.

  • We provide our survey materials and additional results to allow the community to build on our research.

Ii Methodology

In this study, we aim to better understand the ways in which robotics developers use simulation as part of their testing process, and the challenges they face in doing so, by addressing the following research questions:


What challenges do developers face when using simulation in general?


What challenges do developers face when using simulation for testing?


What challenges do developers face when using simulation for test automation?

To answer these questions, we conducted an open-ended online survey of robotics developers in November 2019.

To reach our intended audience (i.e., robotics developers), we distributed our survey via social media outlets, email, and several popular forums within the community: We posted to the ROS and Robotics subreddits on Reddit,111https://reddit.com the ROS Discourse,222https://discourse.ros.org and the RoboCup forums.333http://lists.robocup.org/cgi-bin/mailman/listinfo We also advertised our survey on Facebook and Twitter, and posted a recruitment email to the Carnegie Mellon Robotics Institute and National Robotics Engineering Center mailing lists.

In total, 151 participants took the survey, out of which 82 completed it. For the purpose of analysis, we only consider the 82 completed responses. All 82 participants that completed the survey reported that they had used a robotics simulator. Figure 1 provides an overview of the demographics of the 82 participants that completed the survey. In terms of experience, more than two thirds of participants (71.95%) reported having worked with robotics software for more than three years. Most participants (79.27%) reported that they had worked with robotics in academia at some point during their life, and almost two thirds (65.85%) reported working with robotics in industry at some point. Participants report that they currently work at organizations of varying sizes. Overall, our study sample is composed of a diverse array of candidates with differing levels of experience that have worked in a variety of organizations, thus ensuring that the results of the study are not limited to any one population.

Experience Organization Size of organization
Years of experience # % Type # % Number of people # %
Less than one year 10 12.20% Academia 65 79.27% 1–10 people 22 26.83%
Between one and three years 13 15.85% Industry 54 65.85% 11–50 people 23 28.05%
Between three and ten years 40 48.78% Individual 35 42.68% 51–100 people 9 10.98%
More than ten years 19 23.17% Government 12 14.63% More than 100 people 28 34.15%
Other 9 10.98%
Fig. 1: Demographics for the 82 survey participants that completed the survey in terms of their experience, the types of organization at which they had worked, and the size of the most recent organization to which they belonged.

To analyze the open-ended survey responses, we used descriptive coding [23] to assign one or more short labels to each segment of data, identifying the topic(s) of that segment. After developing an initial set of codes, we conducted a process of adjudication to reach consistency and agreement, before using code mapping to organize the codes into larger categories [23, 24, 25]. Finally, we used axial coding to examine relationships between categories and to identify a small number of overarching research themes.

To facilitate data reuse and to aid others in the reproduction of our results, we share the recruitment materials, questionnaire, and codebook for our study at the following URL: https://bit.ly/2wRuEFP.444We provide access to anonymized survey responses upon request.

Iii Results

Fig. 2: An overview of the high-level reasons that participants gave for using simulation (82 responses).
Challenge Description Representative quote
Reality gap The simulator does not sufficiently replicate the real-world behavior of the robot to a degree that is useful. “[Simulation is n]ot realistic enough for accurately modeling task; preferred running on real robot” – P33
Complexity The time and resources required to setup a sufficiently accurate, useful simulator could be better spent on other activities. “It was easier and more accurate to setup and test on a physical system than simulate” – P4
Lacking capabilities Simulators may not possess all of the capabilities that users desire, or those simulators that do may be prohibitively expensive. “[M]ost simulators are good at one thing, some are good at simulating the vehicles (drone,robot,car,etc) some are good at simulating the environment (good for generating synthetic data) some are good at senors, some are good at physics, some are good at pid control, etc. but not one has all these attributes.” – P77
Reproducibility Simulations are non-deterministic, making it difficult to repeat simulations, recreate issues encountered in simulation or on real hardware, and track down problems. “Deterministic execution: the same starting conditions must produce absolutely identical results.” – P42
Scenario and environment construction It is difficult to create the scenarios and environments required for testing the system in simulation. “Setting up a simulation environment is too much work, so I don’t do it often.” – P38
Resource costs The computational overhead of simulation requires special hardware and computing resources which adds to the financial cost of testing. “Simulating multiple cameras (vision sensors) with full resolution at a high frame rate is usually very slow and therefore not practical.” – P37
Automation features The simulator is not designed to be used for automated testing and does not allow headless, scripted or parallel execution. “Most simulations are NOT designed to run headless, nor are they easily scriptable for automatic invocation.” – P34
Continuous integration It is difficult to deploy the simulator in suitable environments for continuous integration (e.g., cloud computing servers). “The simulation requires some computational resources which can be difficult to be part of CI, especially when our CI is running on the cloud” – P62
Simulator reliability The simulation is not reliable enough to be used in test automation in terms of the stability of the simulator software, and the timing and synchronization issues introduced by the simulator. “There were many challenges - 1. Getting difference in the real time and simulation time 2. Changing the entire physics engine source code for our application 3. Glitch during the process of trying to move the real hardware with the simulation model.” – P80
Interface stability The simulator’s interface is not stable enough or sufficiently well-documented to work with existing code or testing pipelines. “[We have automation difficulties with] integration into existing code, missing APIs, stability of libraries” – P28
Fig. 3: Summary of challenges participants encountered when using simulation in general (), specifically for testing (), and for test automation ().

Our survey asked participants broadly about their use of simulation. We find that our participants are unanimously familiar with simulation, and they use it on a regular basis for a variety of important purposes: 59 out of 82 (71.95%) participants reported that they used simulation within the last month at the time of completing the survey. When asked about their most recent project that involved simulation, 51 of 82 (62.20%) participants reported that they used a simulator daily, and 25 of 82 (30.49%) participants reported that they used a simulator on a weekly basis.

Figure 2 presents the variety and popularity of purposes for which our participants use simulation. Almost 85% of participants have used simulation for testing, and testing is the most popular use case for simulation. When asked for details on how they use simulation for testing, participants reported using it for testing algorithms, variability testing, component testing, sanity checking, and multi-robot testing. This finding suggests that developers generally see value in using simulation for testing.

Participants also reported that they use simulation for testing when it is unsuitable or impractical to test on real hardware or in a real environment. They reported using simulation to better understand the design and behavior of existing robotic systems and their associated software, and to incorporate simulation into automated robotics testing, including continuous integration (CI).

Of the 85% of participants who have used simulation for testing, we find that roughly 60% of them also have tried to use simulation as part of their test automation. These findings demonstrate that developers find simulation to be a valuable tool for testing, and there is a desire to incorporate simulation-based testing into their test automation processes.

Given the ubiquity of simulation and its importance to robotics testing and development, it is vital that we, as a community, understand the barriers that developers face when using simulation. By bringing these barriers to the attention of the community, we can work to lower those barriers and empower developers, bringing us closer to the potential of simulation, and advancing the state of robotics software development and quality assurance.

.2 Key Insight: Simulation is an essential tool for developers that is used extensively for building and testing robot software. Given its importance, it is vital that we better understand the challenges that prevent developers from realizing its full potential.

In the following sections, we present the challenges of using simulators, given in Figure 3. Section III-A discusses challenges that apply broadly to many uses of simulation, Section III-B narrows the focus to those challenges that apply when simulation is used for testing, and Section III-C further narrows to the challenges specific to test automation.

Iii-a RQ1: What challenges do developers face when using simulation in general?

Reason for not using simulation # %
Lack of time or resources 15 53.57%
Not realistic/accurate enough 15 53.57%
Lack of expertise or knowledge on how to use software-based simulation 6 21.43%
There was no simulator for the robot 4 14.29%
Not applicable 4 14.29%
Too much time or compute resources 2 7.14%
Nobody suggested it 0 0.00%
Other 2 7.14%
Fig. 4: An overview of the reasons that participants gave for not using simulation for a particular project, based on 28 responses.

Although we find that simulation is popular among developers, 28 of 82 (34.15%) participants reported making a decision to not use simulation for a project for a variety of reasons, given in Figure 4. By analyzing both these reasons and the difficulties that participants experienced when they did use simulation, we identified three high-level challenges of using simulation in general, discussed below.


Reality gap:

A number of participants cited an inadequate representation of physical reality (i.e., the reality gap) as both a challenge when trying to use simulation, and a reason not to use it in the first place. P33 notes that simulation can produce unrealistic behaviors that would not occur in the real world. P16 highlighted that accounting for all relevant physical phenomena can also be challenging: “my simple simulation model did not include a tire model, so simulations at higher speeds did not account for realistic behaviors for cornering or higher accelerations or deceleration.” In particular, realistically modeling stochastic processes (e.g., signal noise) and integrating those models into the simulation as a whole is a challenge: P15 shared, “A classic problem is integrating wireless network simulation with physical terrain simulation. This also applies to GPS signal simulation, as well.”

For some, such as P29, the reality gap can be too large to make simulation valuable: “too big discrepancy between simulation results and reality (physical interaction).” For others, if not many, simulation can still serve as a valuable tool despite the existence of the reality gap. As P36 puts it, “Software behavior in sim is different compared to real, so not everything can be tested, but a lot can be.”


Accurate simulation of the physical world is an inherently challenging process that naturally involves a composition of various models. Alongside the essential complexity of simulation are sources of accidental complexity [26] that do not relate to the fundamental challenges of simulation itself, but rather the engineering difficulties that developers face when trying to use simulation. These sources of accidental complexity may ultimately lead users to abandon or not use simulation at all. Inaccurate, inadequate, or missing documentation can make it difficult to learn and use a simulator, as P22 highlighted: “Lack of documents for different platform types and sometimes wrong documentation makes us lose a lot of time working on [stuff] that will never work, for example, the Gazebo simulator does not work well in Windows.” In some cases, documentation may be written in another language; P74 stopped using simulation for a project for that reason: “The language was Japanes[e], but we don’t speak that language so we couldn’t use well the simulator.”

Difficult-to-use APIs make it difficult to extend the simulator with new plugins, and a lack of support for industry-standard 3D modeling formats in widely used simulators such as Gazebo makes creating models a tedious and error-fraught process:

“Gazebo is the de-facto [simulator] right now and is poorly documented and difficult to customize to any degree.” – P4

Together, these sources of complexity increase the learning curve of many simulators and may lead developers to abandon or not use them in the first place. P20 said, “Steep learning curve in understanding the test environment software setup and libraries. Without a good software engineering skills the simulated environment will not replicate the real environment.”

Lacking capabilities:

Finding a simulator that provides all of the characteristics a user desires can be challenging. P77 highlighted that while it is possible to find a simulator that is good in one particular aspect, it is hard to find a simulator that is good in all desired aspects.

As P4 pointed out, simulators that do possess all of the desired qualities also tend to be very expensive: “Adding plugins is usually very challenging, and the only good frameworks that do any of this stuff well are very expensive (V-Rep and Mujoco for example).”

We asked which simulation features participants desired most but are unable to use in their current setups. Among the most important features they mentioned were the ability to simulate at faster-than-real-time speeds, native support for headless execution (discussed in Section III-C), and an easier means of constructing environments and scenarios (discussed in Section III-B).

Numerous participants desired the ability to run simulation at faster-than-real-time speeds but were unable to do so in their current simulation setups. For example, P52 said, “W[e] needed to speed up simulation time, but that was difficult to achieve without breaking the stability of the physics engine.” The ability to run at faster-than-real-time speeds is useful not only for reducing the wall-clock time taken to perform testing, but for other purposes, as P62 highlighted:

“Faster than real time is really important to produce training data for deep learning.”

Several participants also desired particular features that would increase the overall fidelity of the simulation. P46 wanted support for “Advanced materials in environments (custom fluids, deformable containers, etc.).” Interestingly, P69 desired the ability to tune the fidelity of the simulation: “Ability for controllable physics fidelity. First order to prove concepts then higher fidelity for validation. Gazebo doesn’t have that.” Recent studies have shown that low-fidelity simulation can be used as an effective and inexpensive way of discovering many bugs in a resource-limited environment [4, 5, 6].

Other capabilities specified by participants include native support for multi-robot simulation and large environments, and the ability to efficiently distribute a simulation session across multiple machines.

Ultimately, the complexities of setting up and using simulation, the reality gap, and the time and resources necessary to make the simulation useful led some participants to use physical hardware instead. As P4 said, “It was easier and more accurate to setup and test on a physical system than simulate.”

.2 Key Insight: Developers find considerable value in simulation, but difficulties of learning and using simulators, combined with a lack of realism and specific capabilities, constrain the way that developers use simulation. By alleviating these challenges, simulation can be used for a wider set of domains and applications.

Iii-B RQ2: What challenges do developers face when using simulation for testing?

Participants reported a variety of challenges in attempts to use simulation for testing, summarized in Figure 3. We identified the following challenges that mainly affect the use of simulation for testing:



The lack of reproducibility and presence of non-determinism in simulators lead to difficulties when testing, as reported by participants. P42 highlighted that a “Lack of deterministic execution of simulators leads to unrepeatable results.” This points to a need to accurately reproduce system failures that are discovered in testing, in order to diagnose and debug those failures. If a tester cannot consistently reproduce the failures detected in simulation, it will be difficult to know whether changes made to the code have fixed the problems. P7 pointed to the particular difficulty with achieving reproducibility in Gazebo: “Resetting gazebo simulations was not repeatable enough to get good data.” P48 and P81 also mentioned a desire for reproducibility.

Consistent and systematic testing procedures rely on deterministic test outcomes. This is particularly the case when incorporating test automation and continuous integration tests, which rely on automatically detecting when a test has failed, as a sign that there is a problem with software changes. Flaky [27] and non-deterministic tests may lead to the false conclusion that a problematic software change does not have a problem (a false negative) or that a good change has problems (a false positive).

Scenario and environment construction:

Testing in simulation requires a simulated environment and a test scenario. Participants reported difficulty in constructing both test scenarios and environments. P38 said: “Setting up a simulation environment is too much work, so I don’t do it often,” and P3 contributed, “Scripting scenarios was not easy. Adding different robot dynamics was also not easy.” They wanted to be able to construct these more easily or automatically. Participants pointed out that the scenarios or environments they require sometimes must be created “by hand,” which requires a heavy time investment and is subject to inaccuracies. P4 said, “Making URDF files is a tremendous pain as the only good way to do it right now is by hand which is faulty and error prone,” while P67 wanted, “Automated generation of simulation environments under some [custom] defined standards,” because “The automated simulation environment generation is not easy. Plenty of handy work must be done by human operators.”

Resource costs:

Simulation is computationally intensive. It often benefits from specialized hardware, such as GPUs. Participants report that these hardware requirements contribute strongly to the expense of simulation. These costs are compounded when tests are run multiple times, such as in test automation. For example, P42 reported that difficulties in using simulation as a part of test automation include: “High hardware requirements (especially GPU-accelerated simulators) driving high cloud server costs.” Participants reported difficulties with running simulations in parallel or taking advantage of distributed computing across several machines. Participants also reported challenges in simulating large environments and simulations of long duration, as they became too resource demanding to be practical. P67 requested, “High computational performance when the environment size grows large (Gazebo performance drops down rapidly when the number of models raises).” Participants also had issues relating to the cost of obtaining licenses for appropriate simulators. P66 reported that cost drove the choice not to use a particular simulator: “Back then, Webots was not free,” and P1 complained: “Not to mention the licensing price for small companies.”

.2 Key Insight: Almost 85% of participants used simulation for testing, but a lack of reproducibility, the complexities of scenario and environment construction, and considerable resource costs limit the extent of such testing.

Iii-C RQ3: What challenges do developers face when using simulation for test automation?

Research has shown that test automation can provide many benefits, including cost savings and higher software quality [28]. Despite the benefits of test automation, 27 of 69 (39.13%) participants reported never attempting to use simulation for this purpose. Responses indicated that the challenges with using simulation, both in general and for testing, prevented participants from attempting to incorporate it into test automation. Their reasons fell into three general categories:

  1. Lack of value, where they did not find test automation valuable or necessary. As P24 mentioned “There were no obvious test harnesses available for the simulation environments I use and it did not seem obviously valuable enough to implement myself.”

  2. Distrust of simulation, where they found the limitations of simulation to be too restrictive to be used in test automation. Reality gap and lacking capabilities discussed in Section III-A contribute to this belief. P33 mentioned “[Simulation is] not realistic enough for accurately modeling task; preferred running on real robot,” and P20 believed “Without a good software engineering skills the simulated environment will not replicate the real environment.”

  3. Time and resource limitations, where the complexity of the simulator (Section III-A) and resource costs (Section III-B) prevented them from attempting test automation. P14 explained

    “I didn’t think of it most probably because I hadn’t seen an example where software-based simulation was used for automated testing,”

    and P17 simply reported “seemed very hard to do.”

Among 42 people who have attempted using simulation as part of their test automation, 33 (78.57%) reported facing difficulties. Based on their descriptions of these difficulties, we identified the following four challenges specifically affecting test automation:


Automation features:

Although a GUI is an important component of a simulator, participants reported a preference towards running the simulator headless (i.e., without the GUI) when used for test automation. Disabling the GUI eliminates the computational overhead of the simulator caused by rendering heavy graphical models. Not being able to run the simulator headless is one of the major difficulties our participants face for automation.

“Making the simulator run without GUI on our Jenkins server555Jenkins is a continuous integration service: https://jenkins.io turned out to be more difficult than expected. We ended up having to connect a physical display to the server machine in order to run the simulation properly.” – P37

Furthermore, the ability to set up, monitor, and interact with the simulation via scripting, without the need for manual intervention, is vital for automation. Our participants reported the need to devise creative solutions in the absence of support for scripting. P8 shared, “Ursim needs click-automation to run without human interaction,” where they used an click-automation tool to be able to run the simulator automatically.

Continuous integration:

Continuous integration (CI) is emerging as one of the most successful techniques in automated software maintenance. CI systems are used to automate the building, testing, and deployment of software. Research has shown that CI practices have a positive effect on software quality and productivity [29].

CI, by definition, is an automated method, and in many cases involves the use of cloud services such as TravisCI.666https://travis-ci.org Our participants faced difficulties engineering the simulation to be used in CI and run on cloud servers. For example, P66 mentioned “I wasn’t able to setup a CI pipeline which runs in GPU machines, for use with rendering sensors.”

Many of these difficulties arise from lacking automation features (e.g., headless execution) and high resource costs (e.g., requiring expensive, GPU-heavy hardware), discussed earlier as challenges. P77 reported “It is expensive to spin up cloud GPU VMs to run the simulator.”

Simulator reliability:

One of the challenges of using a simulator in a test automation pipeline is the reliability of the simulator itself. In other words, participants reported facing unexpected crashes, and timing and synchronization issues while using the simulator in automation. P29, P54, P73, and P80 all reported software stability and timing issues as difficulties they faced for automation. P29 further elaborated difficulty in ensuring a clean termination of the simulator. That is, when the simulator crashes, it should properly store the logs and results before termination of the simulation, and properly kill all processes to prevent resource leaks. Clean termination is particularly relevant to test automation as resource leaks may compound when simulations are repeated, up to the point where it interferes with the ability to run additional simulations and requires manual intervention.

Interface stability:

The stability of the simulator interface can have a significant impact on the automation process because inconsistent simulator APIs can lead to failures in client applications [30]. Our participants reported unstable and fragile interfaces as a challenge for automation. For example, P39 mentioned “APIs are pretty fragile and a lot of engineering need to be done to get it working.”

Five participants reported difficulties in integrating existing code or infrastructure with simulation APIs. P80 mentioned changing the entire physics engine source code for their application. More specifically, participants desired better integration of simulators with ROS. For example, P74 shared “I would like that [the simulator] can be use with ROS.”

.2 Key Insight: Developers desire to include simulation as part of their test automation, but most developers that attempt to do so face numerous difficulties. These difficulties include an absence of automation features, and a lack of reliability and API stability. Ultimately, these challenges discourage developers from using simulation for test automation, limit the extent to which it is used, and prevent developers from leveraging the benefits of continuous integration.

Iv Discussion

Despite the myriad benefits of using simulation for testing, many popular simulators do not appear to make suitable accommodations for that use case. For example, participants report that Gazebo, the de facto simulator for the Robot Operating System [31], does not adequately support headless execution, lacks reproducibility, and performs poorly when used to simulate complex and large environments.

As robots and their associated codebases become larger and more complex, the need for, and cost of, a continuous process of verification and validation will increase considerably. The popular but expensive practice of using field testing to assure correctness will be unable to handle these increased needs by itself due to practical limits on hardware, human resources, and safety [20]. Simulation-based testing may serve as a cheaper, safer, and more reliable alternative by addressing the challenges identified in this paper.

To achieve this potential: (1) simulators should be made easier to use for both basic and advanced purposes, (2) simulators should ambitiously expand their capabilities to support complex, large-scale environments that better resemble the robots’ deployment in the physical world, and (3) simulators should be built to support scalable automation.

Simulators could be made easier to use in general by eliminating sources of complexity, introducing user-friendly features, and improving documentation. Examples of user-friendly features that participants requested include: providing a web interface by default, rather than a traditional graphical user interface; support for models written in industry-standard formats; and augmented reality visualizations. Such changes would allow more developers to reap the benefits of simulation by reducing the learning curve, and would reduce the considerable investment of time required to use simulation for more advanced purposes such as automated testing.

The scope and capabilities of simulators could be expanded to support larger, more realistic simulations of real-world robot deployments for a wider range of domains. To do so, simulators must efficiently support large, detailed environments that may contain multiple robots, and achieve greater physical fidelity without increasing resource costs. Additionally, simulators should strive to provide powerful, interactive tools that allow developers to easily design and generate vast scenarios and environments.

To support scalable automation of simulation-based testing as part of continuous process of verification and validation, simulators should: (a) provide support for headless execution, and scripting via stable and well-designed APIs; (b) ensure reproducible results and reliable simulation to allow developers to quickly and easily investigate discovered failures; and (c) substantially reduce the resource costs and hardware requirements of simulation. Addressing these needs would allow simulation to be deployed inexpensively in cloud environments as part of continuous integration.

The state of the art

Significant progress towards these goals is being made by a new generation of simulators. Several of these new simulators are specialized for particular domains: CARLA [15], LGSVL [16], and AADS [18] are specialized for automated driving applications, and AirSim [17] simulates a wider variety of autonomous vehicles. Notably, all of these simulators are built on top of popular video game engines, and support complex, dynamic urban environments. AADS [18] enhances the visual fidelity of simulation by integrating photos, videos, and sensor readings, allowing for more realistic testing of perception components. In contrast to these specialized simulators, Ignition Gazebo [32], the descendant of the Gazebo simulator, is agnostic to application and domain. Instead, Ignition Gazebo’s API supports various rendering and physics backends, allowing the developer to customize the simulator to better fit the needs of a particular use case (e.g., fidelity and performance). AWS RoboMaker [33] is a web-based IDE, simulator, and fleet management front-end designed to make it easier to develop, test, and deploy robot applications. RoboMaker internally builds on top of Gazebo by adding infrastructure for parallel simulations and automatic hardware scaling, and providing numerous prebuilt environments (e.g., indoor rooms, retail stores, and race tracks). Although each of these simulators addresses at least one of our identified challenges, they have yet to become widely adopted in the community, and it is as yet unclear whether they address enough of developers’ needs in the right combinations to succeed.

V Conclusion

In this paper, we conducted a study of 82 robotics developers to explore how robotics simulators are used, and the challenges that developers commonly face when using simulation for general purposes, testing, and test automation. Our results indicate that simulation is a popular tool among robotics developers, and is commonly used for testing with 85% of participants reporting having used simulation for testing, 60% of whom have also used simulation as part of their test automation. We identified 10 high-level challenges associated with the use of simulation, and discussed these challenges in detail. We further outlined ideas on how the community can tackle these challenges to unlock the full potential of simulation-based testing.


We would like to thank both the ROS and Reddit communities, and in particular, Chris Volkoff and Olly Smith, for their invaluable support in distributing our survey.

This research was partially funded by AFRL and DARPA: the authors are grateful for their support. Any opinions, findings, or recommendations expressed are those of the authors and do not necessarily reflect those of the US Government.


  • [1]

    N. Koenig and A. Howard, “Design and use paradigms for Gazebo, an open-source multi-robot simulator,” in

    International Conference on Intelligent Robots and Systems, ser. IROS ’04, vol. 3, 2004, pp. 2149–2154.
  • [2] E. Rohmer, S. P. N. Singh, and M. Freese, “V-REP: A versatile and scalable robot simulation framework,” in Intelligent Robots and Systems, ser. IROS ’13, 2013, pp. 1321–1326.
  • [3] O. Michel, “Cyberbotics Ltd. Webots™: Professional mobile robot simulation,” International Journal of Advanced Robotic Systems, vol. 1, no. 1, p. 5, 2004.
  • [4] T. Sotiropoulos, H. Waeselynck, J. Guiochet, and F. Ingrand, “Can robot navigation bugs be found in simulation? An exploratory study,” in Software Quality, Reliability and Security, ser. QRS ’17, 2017, pp. 150–159.
  • [5] C. S. Timperley, A. Afzal, D. S. Katz, J. M. Hernandez, and C. Le Goues, “Crashing simulated planes is cheap: Can simulation detect robotics bugs early?” in International Conference on Software Testing, Validation, and Verification, ser. ICST ’18, 2018, pp. 331–342.
  • [6] C. Robert, T. Sotiropoulos, J. Guiochet, H. Waeselynck, and S. Vernhes, “The virtual lands of Oz: testing an agribot in simulation,” Empirical Software Engineering, 2020.
  • [7] C. Gladisch, T. Heinz, C. Heinzemann, J. Oehlerking, A. von Vietinghoff, and T. Pfitzer, “Experience paper: Search-based testing in automated driving control applications,” in Automated Software Engineering, ser. ASE ’19, 2019, pp. 26–37.
  • [8] A. Gambi, M. Mueller, and G. Fraser, “Automatically testing self-driving cars with search-based procedural content generation,” in International Symposium on Software Testing and Analysis, ser. ISSTA ’19, 2019, pp. 318–328.
  • [9] G. E. Mullins, P. G. Stankiewicz, and S. K. Gupta, “Automated generation of diverse and challenging scenarios for test and evaluation of autonomous vehicles,” in International Conference on Robotics and Automation, ser. ICRA ’17, 2017, pp. 1443–1450.
  • [10] C. E. Tuncali, T. P. Pavlic, and G. Fainekos, “Utilizing S-TaLiRo as an automatic test generation framework for autonomous vehicles,” in International Conference on Intelligent Transportation Systems, ser. ITSC ’16, 2016, pp. 1470–1475.
  • [11] E. Rocklage, H. Kraft, A. Karatas, and J. Seewig, “Automated scenario generation for regression testing of autonomous vehicles,” in International Conference on Intelligent Transportation Systems, ser. ITSC ’17, 2017, pp. 476–483.
  • [12] Uber, “Self-Driving Simulation.” [Online]. Available: https://www.uber.com/us/en/atg/research-and-development/simulation
  • [13] NVIDIA, “NVIDIA DRIVE Constellation: Virtual reality autonomous vehicle simulator.” [Online]. Available: https://www.nvidia.com/en-us/self-driving-cars/drive-constellation
  • [14] Waymo, “Waymo safety report: On the road to fully self-driving,” 2018. [Online]. Available: https://waymo.com/safety
  • [15] A. Dosovitskiy, G. Ros, F. Codevilla, A. López, and V. Koltun, “CARLA: An open urban driving simulator,” in Conference on Robot Learning, ser. CoRL, 2017, pp. 1–16.
  • [16] LG, “LGSVL Simulator.” [Online]. Available: https://www.lgsvlsimulator.com
  • [17] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “AirSim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and Service Robotics, M. Hutter and R. Siegwart, Eds., 2018, pp. 621–635.
  • [18] W. Li, C. W. Pan, R. Zhang, J. P. Ren, Y. X. Ma, J. Fang, F. L. Yan, Q. C. Geng, X. Y. Huang, H. J. Gong, W. W. Xu, G. P. Wang, D. Manocha, and R. G. Yang, “AADS: Augmented autonomous driving simulation using data-driven algorithms,” Science Robotics, vol. 4, no. 28, 2019.
  • [19] X. Zheng, C. Julien, M. Kim, and S. Khurshid, “Perceptions on the state of the art in verification and validation in cyber-physical systems,” IEEE Systems Journal, vol. 11, no. 4, pp. 2614–2627, Dec 2017.
  • [20] A. Afzal, C. Le Goues, M. Hilton, and C. S. Timperley, “A study on challenges of testing robotic systems,” ser. ICST ’20, 2020.
  • [21] A. Staranowicz and G. L. Mariottini, “A survey and comparison of commercial and open-source robotic simulator software,” in Pervasive Technologies Related to Assistive Environments, ser. PETRA ’11, 2011, pp. 1–8.
  • [22] L. Pitonakova, M. Giuliani, A. Pipe, and A. Winfield, “Feature and performance comparison of the V-REP, Gazebo and ARGoS robot simulators,” in Annual Conference Towards Autonomous Robotic Systems, 2018, pp. 357–368.
  • [23] J. Saldaña, The coding manual for qualitative researchers.   Sage, 2015.
  • [24] K. M. MacQueen, E. McLellan-Lemal, K. Bartholow, and B. Milstein, Team-based Codebook Development: Structure, Process, and Agreement.   Rowman Altamira, 2008, pp. 119–135.
  • [25] K. Charmaz, Constructing Grounded Theory.   Sage, 2014.
  • [26] F. P. Brooks Jr., “No silver bullet: Essence and accidents of software engineering,” Computer, vol. 20, no. 4, pp. 10–19, April 1987.
  • [27] J. Micco, “Flaky tests at Google and how we mitigate them,” May 2016. [Online]. Available: https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html
  • [28] V. Garousi and M. V. Mäntylä, “When and what to automate in software testing? A multi-vocal literature review,” Information and Software Technology, vol. 76, pp. 92–117, 2016.
  • [29] M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig, “Usage, costs, and benefits of continuous integration in open-source projects,” in International Conference on Automated Software Engineering, ser. ASE ’16, 2016, pp. 426–437.
  • [30] L. Xavier, A. Brito, A. Hora, and M. T. Valente, “Historical and impact analysis of API breaking changes: A large-scale study,” in Software Analysis, Evolution and Reengineering, ser. SANER ’17, 2017, pp. 138–147.
  • [31] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “ROS: an open-source Robot Operating System,” in ICRA Workshop on Open Source Software, 2009, p. 5.
  • [32] Ignition Robotics, “Ignition Gazebo: A Robotic Simulator.” [Online]. Available: https://ignitionrobotics.org/libs/gazebo
  • [33] Amazon Web Services, “AWS RoboMaker.” [Online]. Available: https://aws.amazon.com/robomaker