On Learning the cμ Rule in Single and Parallel Server Networks
We consider learning-based variants of the c μ rule for scheduling in single and parallel server settings of multi-class queueing systems. In the single server setting, the c μ rule is known to minimize the expected holding-cost (weighted queue-lengths summed over classes and a fixed time horizon). We focus on the problem where the service rates μ are unknown with the holding-cost regret (regret against the c μ rule with known μ) as our objective. We show that the greedy algorithm that uses empirically learned service rates results in a constant holding-cost regret (the regret is independent of the time horizon). This free exploration can be explained in the single server setting by the fact that any work-conserving policy obtains the same number of samples in a busy cycle. In the parallel server setting, we show that the c μ rule may result in unstable queues, even for arrival rates within the capacity region. We then present sufficient conditions for geometric ergodicity under the c μ rule. Using these results, we propose an almost greedy algorithm that explores only when the number of samples falls below a threshold. We show that this algorithm delivers constant holding-cost regret because a free exploration condition is eventually satisfied.
READ FULL TEXT