A Framework for Checkpointing and Recovery of Hierarchical Cyber-Physical Systems
This paper tackles the problem of making complex resource-constrained cyber-physical systems (CPS) resilient to sensor anomalies. In particular, we present a framework for checkpointing and roll-forward recovery of state-estimates in nonlinear, hierarchical CPS with anomalous sensor data. We introduce three checkpointing paradigms for ensuring different levels of checkpointing consistency across the hierarchy. Our framework has algorithms implementing the consistent paradigm to perform accurate recovery in a time-efficient manner while managing the tradeoff with system resources and handling the interplay between diverse anomaly detection systems across the hierarchy. Further in this work, we detail bounds on the recovered state-estimate error, maximum tolerable anomaly duration and the accuracy-resource gap that results from the aforementioned tradeoff. We explore use-cases for our framework and evaluate it on a case study of a simulated ground robot to show that it scales to multiple hierarchies and performs better than an extended Kalman filter (EKF) that does not incorporate a checkpointing procedure during sensor anomalies. We conclude the work with a discussion on extending the proposed framework to distributed systems.
READ FULL TEXT