Parallel Combining: Making Use of Free Cycles
There are two intertwined factors that affect performance of concurrent data structures: the ability of processes to access the shared data in parallel and the cost of synchronization. It has been observed that for a class of "concurrency-averse" data structures, the use of fine-grained locking for parallelization does not pay off: an implementation based on a single global lock outperforms fine-grained solutions. The combining paradigm exploits this by ensuring that a thread holding the global lock combines requests and then executes the combined requests sequentially on behalf of other (waiting) concurrent threads. The downside here is that the waiting threads are unused even when concurrently applied requests can be potentially performed in parallel. In this paper, we propose parallel combining, a technique that leverages the computational power of waiting threads. The idea is that the combiner thread assigns waiting threads to perform requests synchronously using a parallel algorithm. We discuss two applications of the technique. First, we use it to transform a sequential data structure into a concurrent one optimized for read-dominated workloads. Second, we use it to construct a concurrent data structure from a batched one that allows synchronous invocations of sets of operations. In both cases, we obtain significant performance gains with respect to the state-of-the-art algorithms
READ FULL TEXT