Universal Caching
In the learning literature, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight. In the definition of static regret, the benchmark policy remains fixed throughout the time horizon. Naturally, the resulting regret bounds become loose in non-stationary settings where fixed benchmarks often suffer from poor performance. In this paper, we investigate a stronger notion of regret minimization in the context of an online caching problem. In particular, we allow the action of the offline benchmark at any round to be decided by a finite state predictor containing arbitrarily many states. Using ideas from the universal prediction literature in information theory, we propose an efficient online caching policy with an adaptive sub-linear regret bound. To the best of our knowledge, this is the first data-dependent regret bound known for the universal caching problem. We establish this result by combining a recently-proposed online caching policy with an incremental parsing algorithm, e.g., Lempel-Ziv '78. Our methods also yield a simpler learning-theoretic proof of the improved regret bound as opposed to the more involved and problem-specific combinatorial arguments used in the earlier works.
READ FULL TEXT