Chaos Engineering for Enhanced Resilience of Cyber-Physical Systems
Cyber-physical systems (CPS) incorporate the complex and large-scale engineered systems behind critical infrastructure operations, such as water distribution networks, energy delivery systems, healthcare services, manufacturing systems, and transportation networks. Industrial CPS in particular need to simultaneously satisfy requirements of available, secure, safe and reliable system operation against diverse threats, in an adaptive and sustainable way. These adverse events can be of accidental or malicious nature and may include natural disasters, hardware or software faults, cyberattacks, or even infrastructure design and implementation faults. They may drastically affect the results of CPS algorithms and mechanisms, and subsequently the operations of industrial control systems (ICS) deployed in those critical infrastructures. Such a demanding combination of properties and threats calls for resilience-enhancement methodologies and techniques, working in real-time operation. However, the analysis of CPS resilience is a difficult task as it involves evaluation of various interdependent layers with heterogeneous computing equipment, physical components, network technologies, and data analytics. In this paper, we apply the principles of chaos engineering (CE) to industrial CPS, in order to demonstrate the benefits of such practices on system resilience. The systemic uncertainty of adverse events can be tamed by applying runtime CE-based analyses to CPS in production, in order to predict environment changes and thus apply mitigation measures limiting the range and severity of the event, and minimizing its blast radius.
READ FULL TEXT