When Two is Worse Than One
This note is concerned with the impact on job latency of splitting a token bucket into multiple sub-token buckets with equal aggregate parameters and offered the same job arrival process. The situation commonly arises in distributed computing environments where job arrivals are rate controlled (each job needs one token to enter the system), but capacity limitations call for distributing jobs across multiple compute resources with scalability considerations preventing the use of a centralized rate control component (each compute resource is responsible for monitoring and enforcing that the job stream it receives conforms to a certain traffic envelope). The question we address is to what extent splitting a token bucket into multiple sub-token buckets that individually rate control a subset of the original arrival process affects job latency, when jobs wait for a token whenever the token bucket is empty upon their arrival. Our contribution is to establish that independent of the job arrival process and how jobs are distributed across compute resources (and sub-token buckets), splitting a token bucket always increases the sum of job latencies in the token buckets, and consequently the average job latency.
READ FULL TEXT