On Learning the cμ Rule: Single and Multiserver Settings
We consider learning-based variants of the c μ rule -- a classic and well-studied scheduling policy -- in single and multi-server settings for multi-class queueing systems. In the single server setting, the c μ rule is known to minimize the expected holding-cost (weighted queue-lengths summed both over classes and time). We focus on the setting where the service rates μ are unknown, and are interested in the holding-cost regret -- the difference in the expected holding-costs between that induced by a learning-based rule (that learns μ) and that from the c μ rule (which has knowledge of the service rates) over any fixed time horizon. We first show that empirically learning the service rates and then scheduling using these learned values results in a regret of holding-cost that does not depend on the time horizon. The key insight that allows such a constant regret bound is that a work-conserving scheduling policy in this setting allows explore-free learning, where no penalty is incurred for exploring and learning server rates. We next consider the multi-server setting. We show that in general, the c μ rule is not stabilizing (i.e. there are stabilizable arrival and service rate parameters for which the multi-server c μ rule results in unstable queues). We then characterize sufficient conditions for stability (and also concentrations on busy periods). Using these results, we show that learning-based variants of the cμ rule again result in a constant regret (i.e. does not depend on the time horizon). This result hinges on (i) the busy period concentrations of the multi-server c μ rule, and that (ii) our learning-based rule is designed to dynamically explore server rates, but in such a manner that it eventually satisfies an explore-free condition.
READ FULL TEXT