The automotive industry is currently investing considerable efforts and resources towards the achievement of an autonomous vehicle that would meet the specification of SAE level 3 . Several companies have in fact already marketed vehicles exhibiting different semi-autonomous capabilities belonging to SAE level 2, ranging from adaptive lane keeping to self-parking features. The most relevant difference between level 2 and 3 in the SAE hierarchy is on who takes the responsibility of monitoring the driving environment: while in SAE level 2 the system assists the human driver in latitudinal and longitudinal adjustments, it is the driver who is expected to perform all the remaining tasks; instead, in SAE level 3 this is not required, meaning that the vehicle itself should be able to manage the dynamic driving tasks while the human driver is only expected to intervene upon request .
The software necessary to manage the diversity of situations that a vehicle can face is bound to be complex and computationally intensive, especially considering that the software present in modern vehicles already exceeds the Gigabyte in size . Moreover, as all vehicles share the same basic capabilities but differ in the provided software functionality, it can be expected that the latter will constitute for the customers a relevant practical difference between automakers, thus fuelling a competing functionality race that will move the value concentration from the hardware, i.e., the vehicle itself, to its software and capabilities.
I-a Continuous Experimentation
When the software takes over the competitively distinguishing role from the hardware in the value-creation process, delivering new updates and functionality in a quick manner becomes inevitable. This is very apparent in the software industry, especially for what concerns web-based software, where some development techniques have been introduced to accelerate the process as much as possible by learning from how users and customers interact with such systems. Among them we find Continuous Integration, Continuous Deployment, and Continuous Experimentation. These techniques refer to the need for automated mechanisms that respectively allow the immediate integration of new software into the entire code base as soon as possible, the possibility of immediate deploying of newly integrated software code into the actual systems, and the possibility of deploying and running alongside the official software a number of experiments, i.e., different versions of the official software, in order to evaluate their respective performances. While it adds computational overhead to the systems, Continuous Experimentation relies on real-world data to confirm or reject hypotheses about the software suitability for a given task, as opposed to relying only on simulations or speculations to steer the development of new software. Continuous Experimentation has proven very effective on web-based software systems . However, applying verbatim this way-of-working onto safety-critical cyber-physical systems such as vehicles would be an endeavour destined to face additional challenges that are specific to the automotive context. One first challenge is the additional complexity given by the fact that the target systems in the case of vehicles are not virtual machines in server farms but highly mobile physical objects with limited computational performance. Additionally, there is a resource availability problem given by the Continuous Experimentation practice itself, which introduces the need for additional computational power . This can pose issues to the automotive industry, which, being based on an economy of scale, has always built vehicles with hardware that is just enough powerful to fulfill its tasks in order to lower production costs.
Nonetheless, new competitors seem to embrace this challenge as it can be seen from a manufacturer for luxury electric vehicles. In their quarterly financial reports, they mention already since 2015 the systematic gathering of driving and sensor data via “field data feedback loops” that are used to “enable the system to continually learn and improve its performance” . While software experiments are not explicitly named, a company representative did mention the practice of installing “an ‘inert’ feature on vehicles” in order to “watch over tens of millions of miles how a feature performs” by logging its behavior in a real-world scenario .
A previous investigation in the automotive field by the authors shows that practitioners believe that they would benefit from the introduction of the Continuous Experimentation practice, even if it now faces these additional challenges . Another recent study showed that literature was generally focusing increasing efforts in the study of this practice, but only a small portion of these studies were actually proposing practical experiments . Hence, the current work was devised to fill this research gap and propose and evaluate a proof-of-concept for Continuous Experimentation, which bases on previously identified design criteria  on a real automotive system; for this purpose, a commercial truck tractor operated on a daily basis by a logistics company in Sweden was used for this study (and the truck is still in use throughout 2020).
I-B Scope of this work
While the aim of this work is to draw conclusions that are valid for the automotive field, it is worth highlighting the differences that would set the scope of the experimental work performed for this study apart from a real-world automotive scenario. One such scenario would generally involve a fleet of vehicles, likely passengers cars, which are controlled by a number of highly resource-constrained Electronic Control Units (ECUs). The experimental work for this study was instead performed on a single vehicle, i.e., a commercial truck tractor, equipped with a consumer-grade computing unit, which is powerful than a typical ECU, and the software was written using a high-level programming language. These differences are due to the fact that the aim of this study is to provide and evaluate a proof-of-concept for the Continuous Experimentation process rather than a ready-to-use automotive solution. A key aspect is however preserved: in the real-world case and in this study here, the vehicle is physically inaccessible to the manufacturer, forcing all software deployment and data exchange to be performed via an Over-The-Air (OTA) connection while the vehicle is in operation. Finally, it should be noted that the scope of this study does not include autonomous driving tasks, as the vehicle used in the experimental setting is manually driven by the logistics company.
I-C Research Goal
Previous investigations show that the literature lacks design science studies about Continuous Experimentation in realistic cyber-physical systems contexts, and especially in automotive contexts. This study attempts to bridge this research gap. The Research Goal (RG) of this work can be expressed as:
To provide and evaluate a proof-of-concept that shows the feasibility and benefit of a Continuous Experimentation decision cycle, based on previously identified design criteria, in the context of an automotive system.
The Research Goal of this article can be divided further in the following Research Questions (RQ):
To what extent do previously identified design criteria for Continuous Experimentation in the context of automotive cyber-physical systems hold?
What are the lessons learned from this Continuous Experimentation proof-of-concept?
This work implemented a Continuous Experimentation cycle on a computational system housed in a commercial truck tractor, deploying experimental software and retrieving performance data via a cellular connection while the automotive system was operated by the owner. Finally, design criteria identified in a previous study were discussed and their relevance validated in light of the present work.
I-E Structure of the document
The article is structured as follows: related works are discussed in Section II; Section III explains the methodology and Section IV the system architecture of the automotive system used in the experimental phase of the work; Results are collected in Section V and discussed in Section VI; finally, Section VII concludes the article and provides possible future directions for follow-up works.
Ii Related Works
A number of studies explore the Continuous Experimentation practice, in both its native application field, i.e., web-based systems, and more recently in the context of cyber-physical systems.
Fagerholm et al.  defined an organizational model for Continuous Experimentation in the context of web-based products, comprising the tasks and artefacts that different roles involved in planning and implementation of a software product should manage in order to enable the experimentation process.
Recent mapping studies on the Continuous Experimentation practice show that the majority of the works they encountered explore the statistical methods sub-topic and are often rooted in the web-based applications context, which is the originating field of this practice; only a minority of studies are addressing the Continuous Experimentation practice in the cyber-physical systems field [9, 12].
A previous work led by the authors  explored the design characteristics that a cyber-physical systems should possess in order to enable a Continuous Experimentation process on an autonomous vehicle. These design criteria are evaluated in this study to discuss their validity in the light of the work that has been performed and considering the difference between the scopes of the two studies.
Mattos et al.  performed a literature review to identify a set of challenges for Continuous Experimentation in cyber-physical systems that was used a starting point for a case study where they tried to identify possible solutions with industrial representatives.
A Design Study methodology, i.e., the design and investigation of artifacts in context , was adopted to achieve the Research Goal. A software infrastructure was prepared to host a number of software modules that would run and interact on an automotive platform performing a Continuous Experimentation cycle. While the Continuous Experimentation cycle will perform an experiment, the focus of this study is not the actual experiment itself (i.e., what the production and experimental module actually do) but instead on the process, i.e., even if an experiment has been set up, for the purpose of this study it does not matter whether it concludes successfully or not, but rather whether an experiment can be carried out according to the Continuous Experimentation principles.
The experiment itself consisted in running different Machine Learning-based object detectors connected to the live video feed in order to find an object detector module that would recognise, as accurately as possible, items and road users in the vehicle’s field of view. The experiment was run in a series of time-wise short sessions and the resulting data were analysed manually by the researchers. The machine learning software modules were based on publicly available detection models111https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md pre-trained on the COCO dataset . This dataset was chosen because of the breadth of its scope, which encompasses automotive items and more, making it a valuable choice for a general-purpose object detector.
Iii-a Software Architecture
The experimentation process is based on the interaction of the three modules Production Software, Experimental Software, and Supervisor, as shown in Figure 1. As the names suggests, Production Software simulates a production component, whose performances must not be influenced by any additional components. Each instance of the Experimental Software module represents an experiment deployed to test a new software variant, which should run in a sandboxed way, i.e., they must not issue commands to the actual system (especially any actuators) but instead have their output logged for later analysis, similarly to what is done by an automotive manufacturer who revealed it uses “inert features” . Finally, Supervisor poses as the experiment manager software, monitoring the other modules’ performances and deciding at any time whether to continue or not with the experiment, depending on whether the Experimental Software modules abide to the experiment parameters.
When the Supervisor is deployed to the computing system, an Experimentation Protocol is provided, a file collecting relevant parameters for the experiment, e.g., CPU thresholds for the Experimental Software that should not be crossed. Upon starting, the Supervisor will wait for the other software modules to start and manage the experimentation process. If a performance drop in the Production Software or an increase in resources consumption by the Experimental Software modules is detected by the Supervisor, the change is compared to the thresholds as specified in the Experimentation Protocol. If necessary, the Supervisor has the possibility to request the Experimental Software modules either a “performance degradation”, so that it consumes less resources leaving more for the Production Software, or a full stop of the experiment, if the violations are deemed too severe. During the experiment execution relevant data about the detection performances are collected and regularly transmitted back to the remote team, allowing them to analyze the experiment results and finally decide, which software version fulfilled its functional objectives more effectively.
Iii-B Software Development and Deployment
To simplify the deployment phase, all software modules were developed and encapsulated using Docker222https://www.docker.com. Docker uses OS-level resource isolation to enable the execution of software in environments called containers, which are run on top of the existing OS kernel, thus more lightweight than full-stack virtualization software. Each container is an instance of a Docker image, which acts as a machine template and can be used to store and deliver applications.
The versioning, integration, and deployment operations were run in a GitLab-based environment. GitLab is a web-based DevOps lifecycle environment that provides a Git repository manager providing, among the other services, a Continuous Integration/Continuous Deployment pipeline. The resulting development cycle would follow these steps: firstly, a new change is introduced in the codebase via a Git repository commit; then, the Continuous Integration pipeline is automatically triggered and the new code is integrated and built within the code base; finally, the Continuous Deployment pipeline is executed that builds a Docker image, which is afterwards ready for distribution. If the new code was part of a software experiment, at the end of these three steps a Docker image with the experimental software is ready to be deployed and executed. These steps embody what we can expect an actual Continuous Experimentation cycle to look like should the practice be adopted in the automotive field, from development to deployment to execution and finally, by instrumenting the code, to data collection, analysis, and choice of a final software variant.
Iv System Architecture
To provide a proof-of-concept for Continuous Experimentation in the automotive context and better understand the underlying challenges, a research project was initiated as a collaboration between Chalmers University of Technology’s Revere vehicular laboratory, Volvo, Trafikverket, GDL, Kerry Logistics, Speed Group, Borås Stad, Ellos, and Combitech to equip a modern Volvo tractor with a platform consisting of two computers, five cameras, and a GPS/IMU system for daily data logging during typical operations of a logistics company, as shown in Figure 4.
As shown in Figure 2, the system is designed in the following manner: The automotive platform, a commercial truck tractor, is equipped with a Linux-based, Docker-enabled computer and an Accelerated Processing Unit (APU). The main computer is equipped with an Intel Core i9-9900K CPU and an NVidia GP107 GPU. While it receives most of the sensor data and performs the most intensive computational jobs, e.g., running prototypical ML algorithms, the smaller unit has the purpose of providing a stable, low-energy demanding, highly available connection to the system. Moreover, the secondary unit has access to one camera, the GPS unit, and the truck’s CAN signals, which means that not only it enables remote access for maintenance, but that it can act as a reliable fail-over system should the main unit fail during operation. The sensors available to the computing units are five cameras, GPS units, and two 4G mobile data transmission modules. Additionally, the computing systems are capable to access a subset of the CAN signals of the automotive platform, specifically the ones containing speed and IMU information. To give a stable power supply and not limit operations to only the time when the engine runs, the system is powered by an additional battery pack, which is recharged by the truck when the engine is running. The system is monitored by the team through a software dashboard, shown in Figure 3, that allows to easily access important parameters such as vehicle speed, GPS position, CPU consumption, and storage disk space utilization among others. The software to collect or elaborate sensor data can be, and it normally is, deployed and monitored remotely via the mobile data connection. As the truck is in daily operation by the logistics company, the resulting data is also extracted remotely hence making this project and platform well-suited for this study on Continuous Experimentation, as it represents the use-case of a vehicle in a fleet that can run software experiments but cannot be directly accessed by the manufacturer.
From what resulted during the Experimental Software development, the average code base change would take around 4 minutes to be integrated, while the Docker image building phase would last around 7 minutes. This means that a little more than 10 minutes after new code is committed to the code base, it is already available for deployment into the system. These phases take place at the team’s end of the process and off-side from the automotive system itself, which has to download the software modules over the mobile connection. In the described setup, the resulting Docker image for an Experimental Software module amounted to approximately 5 GB in size due to the machine learning models and dependencies. The download of this image into the automotive system took approximately 14 minutes. While its size is significant, it is worth noting that no optimization nor compression was applied to the Docker image, which could have likely reduced significantly the amount of data to be deployed. Moreover, since the Docker experiment containers were spawned by the same image, the image itself had to store utility data necessary for all its future containers, hence a specialized image for each container would have helped greatly reducing the final Docker images size. At the end of the experiment it was concluded by manual inspection of the reported data that the object detector used as Production Software performed less accurately than one of the Experimental Software modules. In the described experimental setup, the process proved to be possible and feasible, leading to a successful experiment cycle that produced a data-based answer to a software development question.
Vi-a Research Question 1
In a previous study on the subject a set of design properties was identified that would enable Continuous Experimentation on a complex cyber-physical system such as an autonomous vehicle . These properties will be listed in the following and discussed in the light of the work described so far, and considering the difference in scope between the self-driving vehicles of the previous study and the human-driven vehicle in the present one.
Access to perception sensors and systems, this is obviously needed for experimentation purposes and was applied in this study; access to full vehicle control, in this work it was not needed as controlling the vehicle was not in the scope of the study; log internal activity and other relevant metrics, a necessary step to allow the analysis of the experimental results; enabling of data transmission from the developers to the deployed system and the feedback loop in the opposite direction, also necessary to deploy software and retrieve the resulting data; reliability, implemented in the software through health checking techniques adopted to limit and manage faults propagation; testability, as all changes in functionality were firstly tested on local machines fed with recordings of past camera streams to ensure that the new code to be deployed to the system would perform as expected; safety, in this case the software had no physical control over the actual vehicle, meaning that even in case of faults, the safety implications were limited. Nonetheless, safety constraints were implemented in the form of thresholds over the amount of computational power that the software modules could use; scalability, an automotive system is naturally distributed across several computational units, in the present case the system adopted in this study is distributed over two computing nodes. While one was used to actually execute Production and Experimental Software modules, the other was still involved in the process as it was accessed to retrieve the camera feed used by the software. Would it have been possible or necessary, the modular nature of the software that was used would have allowed for even more spread-out distribution, since the communication between software modules was performed via UDP multicast message exchange, requiring only a network connection among computing nodes; separation of concerns, meaning the establishment of abstraction layers between hardware and software and between data and exchanged messages, definitely a necessary part of any software running on complex cyber-physical systems; simplicity to involve new developers, a feature of the development process more than of the physical system itself, in this case provided mostly by the ease of use of the development tools, which automated the majority of the steps necessary to perform Continuous Integration/Deployment pipelines; facilitation for operators, meaning that the software should not be hard to operate for those who are not developers, in this study it was not possible to acquire an external perspective on this point, as the only tester and operator of the Continuous Experimentation cycle was also the developer; short cycle from development to deployment, which is necessary whenever possible in order to roll out changes and new features at a fast pace, was definitely present in this study due to the automated Continuous Integration/Deployment mechanisms.
To summarize, it is the authors’ conclusion that the aforementioned design features do hold for a Continuous Experimentation process on an automotive vehicle, with the only discrepancies explained by the lack of autonomous capabilities in the present study’s vehicle and the availability of a single developer/tester instead of different people covering different roles.
Vi-B Research Question 2
The presented Continuous Experimentation prototypical implementation proved that it is possible to achieve enough data feedback from a vehicular system to get a better understanding about the performances of running and/or ‘inert’ software, allowing researchers to decide how to proceed with future software development efforts based on the data coming from the automotive system operating in its context.
A number of challenges were not addressed in this study, e.g., the computational limits of native automotive ECUs, which were for this reason not used for this study, and the safety constraints in the software powering those computing nodes that would make the software for experimentation abide the vehicular regulations but more complex to develop. Additional smaller challenges were posed by practical issues such as the size of software downloads to be undertaken by the automotive system, which was slowed by the bandwidth of the mobile data connection of the system. These challenges are referred to future investigations, as in this work the focus was on the evaluation of a functional proof-of-concept, albeit unsophisticated. The Research Goal is thus considered fulfilled.
Vi-C Threats to Validity
A number of factors may threaten the validity of this work.
One of the main possible threats is likely the fact that the experiment infrastructure and the software modules do not abide to the available automotive standards like [16, 17]. For example, one of the main differences between the software used in this work and the software that would be legal to have in a vehicle is the use of dynamic memory allocation, which is currently forbidden in safety-critical systems due to the introduced vulnerability that could disrupt critical software capabilities when needed. This threatens the generalizability of the result since what was achieved in this study could be technically much harder to obtain abiding to the strict automotive software standards. This threat is less impending, however, when considering that this work had the goal of providing a proof-of-concept showing that a working exemplar of a Continuous Experimentation-enabled vehicle is within grasping distance for the automotive industry, rather than provide one ready for commercial use.
Connected to the aforementioned threat, another potential issue is the fact that the software developed for this work had the capability to only run one or two experiments at the same time. While this may sound as an important limitation, it is worth noting that a higher number of experiments running concurrently would require a higher amount of spare computational power in a real-world scenario. Moreover, if a vehicle can only run a pre-set, limited amount of experiments at the same time, this could play in favor of the development efforts necessary to tackle the aforementioned threat to validity: in fact, the variables that would normally require an amount of memory dependent on the number of experiments could be dimensioned a priori in this case.
Lastly, it should be noted that it is not necessarily possible to generalize the results obtained with Continuous Experimentation in the automotive field to the rest of the cyber-physical systems context. While the challenges lurking in the automotive field are more and more recognized and faced, it is possible and not unlikely that several more issues peculiar to different cyber-physical systems sub-fields are still in the way and will prevent a rapid widespread adoption of this practice.
Vii Conclusions and Future Work
The presented work demonstrated and evaluated the execution of a prototypical Continuous Experimentation cycle for an automotive system, which is in daily commercial operations by a logistic company. The system was equipped with computing units and sensors and accessed remotely via a mobile connection, which was the only communication channel used to deploy software and retrieve the data resulting from running a software experiment. A set of previously identified design criteria to enable Continuous Experimentation on autonomous vehicles was discussed in light of the (non-autonomous) system built for this work. The study concludes considering the implementation a success, highlighting some relevant challenges still standing on the way towards a fully-functional Continuous Experimentation-enabled vehicle.
One direction for future studies could be for example the automation of those steps that were manually performed in this work, e.g., the deployment of software to the automotive system, or the analysis of the resulting experiment data.
Another interesting follow-up study would be the replication of this proof-of-concept using software or hardware closer to those adopted for consumer vehicles. That would require the software to abide at least partly to existing regulations, and the hardware to be more resource-constrained to provide more realistic performances.
This work was supported by the project Highly Automated Freight Transports, funded by Vinnova FFI [2016-05413].
-  Society of Automotive Engineers, Warrendale, PA, “SAE J3016, taxonomy and definitions for terms related to on-road automated motor vehicles,” 2014.
-  B. W. SMITH, “SAE levels of driving automation,” 2013, Accessed 2020-01-31. [Online]. Available: http://cyberlaw.stanford.edu/blog/2013/12/sae-levels-driving-automation
-  M. Hiller, “Thoughts on the Future of the Automotive Electronic Architecture,” 2016, Accessed 2019-10-22. [Online]. Available: http://www.fuse-project.se/final-seminar-presentation-33558763
-  S. Gupta, R. Kohavi, D. Tang, Y. Xu, R. Andersen, E. Bakshy, N. Cardin, S. Chandran, N. Chen, D. Coey et al., “Top challenges from the first practical online controlled experiments summit,” ACM SIGKDD Explorations Newsletter, vol. 21, no. 1, pp. 20–35, 2019.
-  F. Giaimo, C. Berger, and C. Kirchner, “Considerations about continuous experimentation for resource-constrained platforms in self-driving vehicles,” in European Conference on Software Architecture. Springer, 2017, pp. 84–91.
-  “Tesla financials & accounting information,” Accessed 2020-01-31. [Online]. Available: https://ir.tesla.com/financial-information/quarterly-results
-  P. E. Ross, “Tesla reveals its crowdsourced autopilot data,” Accessed 2020-01-31. [Online]. Available: http://spectrum.ieee.org/cars-that-think/transportation/self-driving/tesla-reveals-its-crowdsourced-autopilot-data
-  F. Giaimo, H. Andrade, and C. Berger, “The automotive take on continuous experimentation: A multiple case study,” in 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 2019, pp. 126–130.
-  R. Ros and P. Runeson, “Continuous experimentation and a/b testing: A mapping study,” in Proceedings of the 4th International Workshop on Rapid Continuous Software Engineering, ser. RCoSE ’18. New York, NY, USA: ACM, 2018, pp. 35–41.
-  F. Giaimo and C. Berger, “Design criteria to architect continuous experimentation for self-driving vehicles,” in 2017 IEEE International Conference on Software Architecture (ICSA). IEEE, 2017, pp. 203–210.
-  F. Fagerholm, A. S. Guinea, H. Mäenpää, and J. Münch, “The right model for continuous experimentation,” Journal of Systems and Software, vol. 123, pp. 292–305, 2017.
-  F. Auer and M. Felderer, “Current state of research on continuous experimentation: a systematic mapping study,” in 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 2018, pp. 335–344.
-  D. I. Mattos, J. Bosch, and H. H. Olsson, “Challenges and strategies for undertaking continuous experimentation to embedded systems: Industry and research perspectives,” in International Conference on Agile Software Development. Springer, 2018, pp. 277–292.
-  R. J. Wieringa, Design science methodology for information systems and software engineering. Springer, 2014.
-  T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2014.
-  “ISO 26262-1:2011 “Road vehicles - Functional safety”,” Accessed 2019-11-04. [Online]. Available: https://www.iso.org/standard/43464.html
-  “ISO 21448:2019 “Road vehicles - Safety of the intended functionality”,” Accessed 2019-11-04. [Online]. Available: https://www.iso.org/standard/70939.html