When to Reset Your Keys: Optimal Timing of Security Updates via Learning
Cybersecurity is increasingly threatened by advanced and persistent attacks. As these attacks are often designed to disable a system (or a critical resource, e.g., a user account) repeatedly, it is crucial for the defender to keep updating its security measures to strike a balance between the risk of being compromised and the cost of security updates. Moreover, these decisions often need to be made with limited and delayed feedback due to the stealthy nature of advanced attacks. In addition to targeted attacks, such an optimal timing policy under incomplete information has broad applications in cybersecurity. Examples include key rotation, password change, application of patches, and virtual machine refreshing. However, rigorous studies of optimal timing are rare. Further, existing solutions typically rely on a pre-defined attack model that is known to the defender, which is often not the case in practice. In this work, we make an initial effort towards achieving optimal timing of security updates in the face of unknown stealthy attacks. We consider a variant of the influential FlipIt game model with asymmetric feedback and unknown attack time distribution, which provides a general model to consecutive security updates. The defender's problem is then modeled as a time associative bandit problem with dependent arms. We derive upper confidence bound based learning policies that achieve low regret compared with optimal periodic defense strategies that can only be derived when attack time distributions are known.
READ FULL TEXT