Intermediate Value Linearizability: A Quantitative Correctness Criterion

by   Arik Rinberg, et al.

Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a CountMin sketch approximates the frequencies at which elements occur in a data stream, and a batched counter counts events in batches. This paper focuses on the correctness of concurrent implementations of such objects. Specifically, we consider quantitative objects, whose return values are from a totally ordered domain, with an emphasis on (e,d)-bounded objects that estimate a quantity with an error of at most e with probability at least 1 - d. The de facto correctness criterion for concurrent objects is linearizability. Under linearizability, when a read overlaps an update, it must return the object's value either before the update or after it. Consider, for example, a single batched increment operation that counts three new events, bumping a batched counter's value from 7 to 10. In a linearizable implementation of the counter, an overlapping read must return one of these. We observe, however, that in typical use cases, any intermediate value would also be acceptable. To capture this degree of freedom, we propose Intermediate Value Linearizability (IVL), a new correctness criterion that relaxes linearizability to allow returning intermediate values, for instance 8 in the example above. Roughly speaking, IVL allows reads to return any value that is bounded between two return values that are legal under linearizability. A key feature of IVL is that concurrent IVL implementations of (e,d)-bounded objects remain (e,d)-bounded. To illustrate the power of this result, we give a straightforward and efficient concurrent implementation of an (e, d)-bounded CountMin sketch, which is IVL (albeit not linearizable). Finally, we show that IVL allows for inherently cheaper implementations than linearizable ones.



There are no comments yet.


page 1

page 2

page 3

page 4


Upper and Lower Bounds on the Space Complexity of Detectable Object

The emergence of systems with non-volatile main memory (NVM) increases t...

Verifying Visibility-Based Weak Consistency

Multithreaded programs generally leverage efficient and thread-safe conc...

Replication-Aware Linearizability

Geo-distributed systems often replicate data at multiple locations to ac...

Some Results of Experimental Check of The Model of the Object Innovativeness Quantitative Evaluation

The paper presents the results of the experiments that were conducted to...

Upper and Lower Bounds for Deterministic Approximate Objects

Relaxing the sequential specification of shared objects has been propose...

Parity-Based Concurrent Error Detection Schemes for the ChaCha Stream Cipher

We propose two parity-based concurrent error detection schemes for the Q...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivation

Big data processing systems often perform analytics on incoming data streams, and must do so at a high rate due to the speed of incoming data. Data sketching algorithms, or sketches for short [cormode2012synopses], are an indispensable tool for such high-speed computations. Sketches typically estimate some function of a large stream, for example, the frequency of certain items [cormode2005improved], how many unique items have appeared [datar2002comparing, flajolet1983probabilistic, gibbons2001estimating], or the top- most common items [metwally2005efficient]. They are supported by many data analytics platforms such as PowerDrill [heule2013hyperloglog], Druid [druid], Hillview [hillview], and Presto [presto] as well as standalone toolkits [apache-datasketches].

Sketches are quantitative objects that support update and query operations, where the return value of a query is from a totally ordered set. They are essentially succinct (sublinear) summaries of a data stream. For example, a sketch might estimate the number of packets originating from any IP address, without storing a record for every packet. Typical sketches are probably approximately correct (PAC), estimating some aggregate quantity with an error of at most with probability at least for some parameters and .

The ever increasing rates of incoming data create a strong demand for parallel stream processing [cormode2011algorithms, heule2013hyperloglog]. In order to allow queries to return fresh results in real-time without hampering data ingestion, it is paramount to support queries concurrently with updates [rinberg2019fast, stylianopoulos2020delegation]. But parallelizing sketches raises some important questions, for instance: What are the semantics of overlapping operations in a concurrent sketch?, How can we prove error guarantees for such a sketch?, and, in particular, Can we reuse the myriad of clever analyses of existing sketches’ error bounds in parallel settings without opening the black box? In this paper we address these questions.

1.2 Our contributions

The most common correctness condition for concurrent objects is linearizability. Roughly speaking, it requires each parallel execution to have a linearization, which is a sequential execution of the object that “looks like” the parallel one. (See Section 2 for a formal definition.) But sometimes linearizability is too restrictive, leading to a high implementation cost.

In Section 3, we propose Intermediate Value Linearizability (IVL), a new correctness criterion for quantitative objects. Intuitively, the return value of an operation of an IVL object is bounded between two legal values that can be returned in linearizations. The motivation for allowing this is that if the system designer is happy with either of the legal values, then the intermediate value should also be fine. For example, consider a system where processes count events, and a monitoring process detects when the number of events passes a threshold. The monitor constantly reads a shared counter, which other process increment in batches. If an operation increments the counter from to batching three events, IVL allows a concurrent read by the monitoring process to return , although there is no linearization in which the counter holds . We formally define IVL and prove that this property is local, meaning that a history composed of IVL objects is itself IVL. This allows reasoning about single objects rather than about the system as a whole. We formulate IVL first for sequential objects, and then extend it to capture randomized ones.

Sketching algorithms have sequential error analyses which we wish to leverage for the concurrent case. In Section 4 we formally define -bounded objects, including concurrent ones. We then prove a key theorem about IVL, stating that an IVL implementation of a sequential -bounded object is itself -bounded. The importance of this theorem is that it provides a generic way to leverage the vast literature on sequential -bounded sketches [morris1978counting, flajolet1985approximate, cichon2011approximate, liu2016one, cormode2005improved, agarwal2013mergeable] in concurrent implementations.

As an example, in Section 5, we present a concurrent CountMin sketch [cormode2005improved], which estimates the frequencies of items in a data stream. We prove that a straightforward parallelization of this sketch is IVL. By the aforementioned theorem, we deduce that the concurrent sketch adheres to the error guarantees of the original sequential one, without having to “open” the analysis. We note that this parallelization is not linearizable.

Finally, we show that IVL is sometimes inherently cheaper than linearizability. We illustrate this in Section 6 via the example of a batched counter. We present a wait-free IVL implementation of this object from single-writer-multi-reader (SWMR) registers with step complexity for update operations. We then prove a lower bound of step complexity for the update operation of any wait-free linearizable implementation, using only SWMR registers. This exemplifies that there is an inherent and unavoidable cost when implementing linearizable algorithms, which can be circumvented by implementing IVL algorithms instead.

2 Preliminaries

Section 2.1 discusses deterministic shared memory objects and defines linearizability. In Section 2.2 we discuss randomized algorithms and their correctness criteria.

2.1 Deterministic objects

We consider a standard shared memory model [herlihy1990linearizability], where a set of processes access atomic shared memory variables. Accessing these shared variables is instantaneous. Processes take steps according to an algorithm, which is a deterministic state machine, where a step can access a shared memory variable, do local computations, and possibly return some value. An execution of an algorithm is an alternating sequence of steps and states. We focus on algorithms that implement objects, which support operations, such as read and write. Operations begin with an invocation step and end with a response step. A schedule, denoted , is the order in which processes take steps, and the operations they invoke in invoke steps with their parameters. Because we consider deterministic algorithms, uniquely defines an execution of a given algorithm.

A history is the sequence of invoke and response steps in an execution. Given an algorithm and a schedule , is the history of the execution of with schedule . A sequential history is an alternating sequence of invocations and their responses, beginning with an invoke step. We denote the return value of operation with parameter in history by . We refer to the invocation step of operation with parameter by process as and to its response step by , where . A history defines a partial order on operations: Operation precedes in history , denoted , if precedes in . Two operations are concurrent if neither precedes the other.

A well-formed history is one that does not contain concurrent operations by the same process, and where every response event for operation is preceded by an invocation of the same operation. A schedule is well-formed if it gives rise to a well-formed history, and an execution is well-formed if it is based on a well-formed schedule. We denote by the sub-history of consisting only of invocations and responses on object . Operation is pending in a history if is invoked in but does not return.

Correctness of an object’s implementation is defined with respect to a sequential specification , which is the object’s set of allowed sequential histories. If the history spans multiple objects, consists of sequential histories such that for all objects , pertains to ’s sequential specification (denoted ). A linearization [herlihy1990linearizability] of a concurrent history is a sequential history such that (1) after removing some pending operations from and completing others, it contains the same invocations and responses as with the same parameters and return values, and (2) preserves the partial order . Algorithm is a linearizable implementation of a sequential specification if every history of a well-formed execution of has a linearization in .

2.2 Randomized algorithms

In randomized algorithms, processes have access to coin flips from some domain

. Every execution is associated with a coin flip vector

, where is the coin flip in the execution. A randomized algorithm

is a probability distribution over deterministic algorithms

111We do not consider non-deterministic objects in this paper., arising when is instantiated with different coin flip vectors. We denote by the history of the execution of randomized algorithm observing coin flip vector in schedule .

Golab et al. show that randomized algorithms that use concurrent objects require a stronger correctness criterion than linearizability, and propose strong linearizability [golab2011linearizable]. Roughly speaking, strong linearizability stipulates that the mapping of histories to linearizations must be prefix-preserving, so that future coin flips cannot impact the linearization order of earlier events. In contrast to us, they consider deterministic objects used by randomized algorithms. In this paper, we focus on randomized object implementations.

3 Intermediate value linearizability

Section 3.1 introduces definitions that we utilize to define IVL. Section 3.2 defines IVL for deterministic algorithms and proves that it is a local property. Section 3.3 extends IVL for randomized algorithms, and Section 3.4 compares IVL to other correctness criteria.

3.1 Definitions

Throughout this paper we consider the strongest progress guarantee, bounded wait-freedom. An operation is bounded wait-free if whenever any process invokes , returns a response in a bounded number of ’s steps, regardless of steps taken by other processes. An operation’s step-complexity is the maximum number of steps a process takes during a single execution of this operation. We can convert every bounded wait-free algorithm to a uniform step complexity

one, in which each operation takes the exact same number of steps in every execution. This can be achieved by padding shorter execution paths with empty steps before returning. Note that in a randomized algorithm with uniform step complexity, coin flips have no impact on

’s execution times. For the remainder of this paper, we consider algorithms with uniform step complexity.

Our definitions use the notion of skeleton histories: A skeleton history is a sequence of invocation and response events, where the return values of the responses are undefined, denoted . For a history , we define the operator as altering all response values in to , resulting in a skeleton history.

In this paper we formulate correctness criteria for a class of objects we call quantitative. These are objects that support two operations: (1) update, which captures all mutating operations and does not return a value; and (2) query, which returns a value from a totally ordered domain. In a deterministic quantitative object the return values of query operations are uniquely defined. Namely, the object’s sequential specification contains exactly one history for every sequential history skeleton ; we denote this history by . Thus, for every history . Furthermore, for every sequential skeleton history , by definition, .

Example 1.

Consider an execution in which a batched counter initialized to is incremented by by process concurrently with a query by process , which returns . Its history is:

The skeleton history is:

A possible linearization of is:

Given the sequential specification of a batched counter, we get:

In a different linearization, the query may return instead.

3.2 Intermediate value linearizability

We now define intermediate value linearizability for quantitative objects.

Definition 1 (Intermediate value linearizability).

A history of an object is IVL with respect to sequential specification if there exist two linearizations of such that for every query that returns in ,

Algorithm is an IVL implementation of a sequential specification if every history of a well-formed execution of is IVL with respect to .

Note that a linearizable object is trivially IVL, as the skeleton history of the linearization of plays the roles of both and . The following theorem, proven in Appendix A, shows that this property is local (as defined in [herlihy1990linearizability]): [] A history of a well-formed execution of algorithm over a set of objects is IVL if and only if for each object , is IVL.

Locality allows system designers to reason about their system in a modular fashion. Each object can be built separately, and the system as a whole still satisfies the property.

3.3 Extending IVL for randomized algorithms

In a randomized algorithm with uniform step complexity, every invocation of a given operation returns after the same number of steps, regardless of the coin flip vector . This, in turn, implies that for a given , for any , the arising histories and differ only in the operations’ return values but not in the order of invocations and responses, as the latter is determined by , so their skeletons are equal. For randomized algorithm and schedule , we denote this arising skeleton history by .

We are faced with a dilemma when defining the specification of a randomized algorithm , as the algorithm itself is a distribution over a set of algorithms . Without knowing the observed coin flip vector , the execution behaves unpredictably. We therefore define a deterministic sequential specification for each coin flip vector , so the sequential specification is a probability distribution on a set of sequential histories .

A correctness criterion for randomized objects needs to capture the property that the distribution of a randomized algorithm’s outcomes matches the distribution of behaviors allowed by the specification. Consider, e.g., some sequential skeleton history of an object defined by . Let Q be a query that returns in , and assume that has some probability to return a value in for a randomly sampled . Intuitively, we would expect that if a randomized algorithm “implements” the specification , then has a similar probability to return in sequential executions of with the same history, and to some extent also in concurrent executions of of which is a linearization. In other words, we would like the distribution of outcomes of to match the distribution of outcomes in .

We observe that in order to achieve this, it does not suffice to require that each history have an arbitrary linearization as we did for deterministic objects, because this might not preserve the desired distribution. Instead, for randomized objects we require a common linearization for each skeleton history that will hold true under all possible coin flip vectors. We therefore formally define IVL for randomized objects as follows:

Definition 2 (IVL for randomized algorithms).

Consider a skeleton history of some randomized algorithm with schedule . is IVL with respect to if there exist linearizations of such that for every coin flip vector and query that returns in ,

Algorithm is an IVL implementation of a sequential specification distribution if every skeleton history of a well-formed execution of is IVL with respect to .

Note that since we require a common linearization under all coin flip vectors, we do not need to strengthen IVL for randomized settings in the manner that strong linearizability strengthens linearizability. This is because the linearizations we consider are a fortiori independent of future coin flips.

3.4 Relationship to other relaxations

In spirit, IVL resembles the regularity correctness condition for single-writer registers [lamport1986interprocess], where a query must return either a value written by a concurrent write or the last value written by a write that completed before the query began. Stylianopoulos et al. [stylianopoulos2020delegation] adopt a similar condition for data sketches, which they informally describe as follows: “a query takes into account all completed insert operations and possibly a subset of the overlapping ones.” If the object’s estimated quantity (return value) is monotonically increasing throughout every execution, then IVL essentially formalizes this condition, while also allowing intermediate steps of a single update to be observed. But this is not the case in general. Consider, for example, an object supporting increment and decrement, and a query that occurs concurrently with an increment and an ensuing decrement. If the query takes only the decrement into account (and not the increment), it returns a value that is smaller than all legal return values that may be returned in linearizations, which violates IVL. Our interval-based formalization is instrumental to ensuring that a concurrent IVL implementation preserves the probabilistic error bounds of the respective sequential sketch, which we prove in the next section.

Previous work on set-linearizability [neiger1994set] and interval-linearizability [castaneda2018unifying] has also relaxed linearizability, allowing a larger set of return values in the presence of overlapping operations. The return values, however, must be specified in advance by a given state machine; operations’ effects on one another must be predefined. In contrast to these, IVL is generic, and does not require additional object-specific definitions; it provides an intuitive quantitative bound on possible return values of arbitrary quantitative objects.

Henzinger et al. [henzinger2013quantitative] define the quantitative relaxation framework, which allows executions to differ from the sequential specification up to a bounded cost function. Alistarh et al. expand upon this and define distributional linearizability [alistarh2018distributionally], which requires a distribution over the internal states of the object for its error analysis. Rinberg et al. consider strongly linearizable -relaxed semantics for randomized objects [rinberg2019fast]. We differ from these works in two points: First, a sequential history of an IVL object must adhere to the sequential specification, whereas in these relaxations even a sequential history may diverge from the specification. The second is that these relaxations are measured with respect to a single linearization. We, instead, bound the return value between two legal linearizations. The latter is the key to preserving the error bounds of sequential objects, as we next show.

4 -bounded objects

In this section we show that for a large class of randomized objects, IVL concurrent implementations preserve the error bounds of the respective sequential ones. More specifically, we focus on randomized objects like data sketches, which estimate some quantity (or quantities) with probabilistic guarantees. Sketches generally support two operations: update(), which processes element , and query(

), which returns the quantity estimated by the sketch as a function of the previously processed elements. Sequential sketch algorithms typically have probabilistic error bounds. For example, the Quantiles sketch estimates the rank of a given element in a stream within

of the true rank, with probability at least  [agarwal2013mergeable].

We consider in this section a general class of -bounded objects capturing PAC algorithms. A bounded object’s behavior is defined relative to a deterministic sequential specification , which uniquely defines the ideal return value for every query in a sequential execution. In an -bounded object, each query returns the ideal return value within an error of at most with probability at least . More specifically, it over-estimates (and similarly under-estimates) the ideal quantity by at most with probability at least . Formally:

Definition 3.

A sequential randomized algorithm implements an -bounded object if for every query returning in an execution of with any schedule and a randomly sampled coin flip vector ,


A sequential algorithm satisfying Definition 3 induces a sequential specification of an -bounded object. We next discuss parallel implementations of this specification.

To this end, we must specify a correctness criterion on the object’s concurrent executions. As previously stated, the standard notion (for randomized algorithms) is strong linearizability, stipulating that we can “collapse” each operation in the concurrent schedule to a single point in time. Intuitively, this means that every query returns a value that could have been returned by the randomized algorithm at some point during its execution interval. So the query returns an approximation of the ideal value at that particular point. But this point is arbitrarily chosen, meaning that the query may return an approximation of any value that the ideal object takes during the query’s execution. We therefore look at the minimum and maximum values that the ideal object may take during a query’s interval, and bound the error relative to these values.

We first define these minimum and maximum values as follows: For a history , denote by the set of linearizations of . For a query that returns in and an ideal specification , we define:

Note that even if is infinite and has infinitely many linearizations, because returns in , it appears in each linearization by the end of its execution interval, and therefore can return a finite number of different values in these linearizations, and so the minimum and maximum are well-defined. Correctness of concurrent -bounded objects is then formally defined as follows:

Definition 4.

A concurrent randomized algorithm implements an -bounded object if for every query returning in an execution of with any schedule and a randomly sampled coin flip vector ,


In some algorithms, depends on the stream size, i.e., the number of updates preceding a query; to avoid cumbersome notations we use a single variable , which should be set to the maximum value that the sketch’s bound takes during the query’s execution interval.

The following theorem shows that IVL implementations allow us to leverage the “legacy” analysis of a sequential object’s error bounds.

Theorem 1.

Consider a sequential specification of an -bounded object (Definition 3). Let be an IVL implementation of (Definition 2). Then implements a concurrent -bounded object (Definition 4).


Consider a skeleton history of with some schedule , and a query that returns in . As is an IVL implementation of , there exist linearizations and of , such that for every , . As captures a sequential -bounded object, is bounded as follows:


Furthermore, by definition of and :

Therefore, with probability at least , and with probability at least , , as needed.

While easy to prove, Theorem 1 shows that IVL is in some sense the “right” correctness property for -bounded objects. It is less restrictive – and as we show below, sometimes cheaper to implement – than linearizability, and yet strong enough to preserve the salient properties of sequential executions of -bounded objects. As noted in Section 3.4, previously suggested relaxations do not inherently guarantee that error bounds are preserved. For example, regular-like semantics, where a query “sees” some subset of the concurrent updates [stylianopoulos2020delegation], satisfy IVL (and hence bound the error) for monotonic objects albeit not for general ones. Indeed, if object values can both increase and decrease, the results returned under such regular-like semantics can arbitrarily diverge from possible sequential ones.

The importance of Theorem 1 is that it allows us to leverage the vast literature on sequential -bounded objects [morris1978counting, flajolet1985approximate, cichon2011approximate, liu2016one, cormode2005improved, agarwal2013mergeable] in concurrent implementations. As an example, in the next section we give an example of an IVL parallelization of a popular data sketch. By Theorem 1, it preserves the original sketch’s error bounds.

5 Concurrent CountMin sketch

Cormode et al. propose the CountMin (CM) sketch [cormode2005improved], which estimates the frequency of an item , denoted , in a data stream, where the data stream is over some alphabet . The CM sketch supports two operations: update(), which updates the object based on , and query(), which returns an estimate on the number of update() calls that preceded the query.

The sequential algorithm’s underlying data structure is a matrix of counters, for some parameters determined accordingly to the desired error and probability bounds. The sketch uses hash functions , for . The hash functions are generated using the random coin flip vector , and have certain mathematical properties whose details are not essential for understanding this paper. The algorithm’s input (i.e., the schedule) is generated by a so-called weak adversary, namely, the input is independent of the randomly drawn hash functions.

The CountMin sketch, denoted , is illustrated in Figure LABEL:img:cmSketch, and its pseudo-code is given in Algorithm 1. On update(), the sketch increments counters for every . query() returns .

1:array Initialized to
2:hash functions , initialized using
4:procedure update()
5:     for  do
6:         atomically increment      
7:procedure query()
9:     for  do
11:         if   then       
12:     return
Algorithm 1 CountMin() sketch.

Cormode et al. show that, for desired bounds and , given appropriate values of and , with probability at least , the estimate of a query returning is bounded by , where is the the number of updates preceding the query. Thus, for , CM is a sequential -bounded object. Its sequential specification distribution is .

Proving an error bound for an efficient parallel implementation of the CM sketch for existing criteria is not trivial. Using the framework defined by Rinberg et al. [rinberg2019fast] requires the query to take a strongly linearizable snapshot of the matrix [ovens2019strongly]. Distributional linearizability [alistarh2018distributionally] necessitates an analysis of the error bounds directly in the concurrent setting, without leveraging the sketch’s existing analysis for the sequential setting.

Instead, we utilize IVL to leverage the sequential analysis for a parallelization that is not strongly linearizable (or indeed linearizable), without using a snapshot. Consider the straightforward parallelization of the CM sketch, whereby the operations of Algorithm 1 may be invoked concurrently and each counter is atomically incremented on line 6 and read on line 10. We call this parallelization . We next prove that i is IVL.

Lemma 1.

is an IVL implementation of .


Let be a history of an execution of . Let be a linearization of such that every query is linearized prior to every concurrent update, and let be a linearization of such that every query is linearized after every concurrent update. Let for be a sequential execution of with history . Consider some query() that returns in , and let be the concurrent updates to .

Denote by the value read by from in line 10 of Algorithm 1 in an execution . As processes only increment counters, for every , is at least (the value when the query starts) and at most (the value when all updates concurrent to the query complete). Therefore, .

Consider a randomly sampled coin flip vector . Let be the loop index the last time query alters the value of its local variable (line 11), i.e., the index of the minimum read value. As a query in a history of returns the minimum value in the array, . Furthermore, is at least , otherwise would have read this value and returned it instead. Therefore:

As needed. ∎

Combining Lemma 1 and Theorem 1, and by utilizing the sequential error analysis from [cormode2005improved], we have shown the following corollary:

Corollary 1.

Let be a return value from query . Let be the ideal frequency of element when the query starts, and let be the ideal frequency of element at its end, and let where is the stream length at the end of the query. Then:

The following example demonstrates that is not a linearizable implementation of .

Example 2.

Consider the following execution of : Assume that is such that , and . Assume that initially

First, process invokes update which increments to and stalls. Then, process invokes query which reads and and returns , followed by query which reads and and returns . Finally, process increments to be .

Assume by contradiction that is a linearization if , and . The return values imply that and . As is a linearization, it maintains the partial order of operations in , therefore . A contradiction.

6 Shared batched counter

We now show an example where IVL is inherently less costly than linearizability. In Section 6.1 we present an IVL batched counter, and show that the update operation has step complexity . The algorithm uses single-writer-multi-reader(SWMR) registers. In Section 6.2 we prove that all linearizable implementations of a batched counter using SWMR registers have step complexity for the update operation.

6.1 IVL batched counter

We consider a batched counter object, which supports the operations update() where , and read(). The sequential specification for this object is simple: a read operation returns the sum of all values passed to update operations that precede it, and if no update operations were invoked. The update operation returns nothing. When the object is shared, we denote an invocation of update by process as update. We denote the sequential specification of the batched counter by .

1:shared array
2:procedure update()
4:procedure read
6:     for  do
8:     return
Algorithm 2 Algorithm for process , implementing an IVL batched counter.

Algorithm 2 presents an IVL implementation for a batched counter with processes using an array of SWMR registers. The implementation is a trivial parallelization: an update operation increments the process’s local register while a read scans all registers and returns their sum. This implementation is not linearizable because the reader may see a later update and miss an earlier one, as illustrated in Figure 1. We now prove the following lemma:

Lemma 2.

Algorithm 2 is an IVL implementation of a batched counter.


Let be a well-formed history of an execution of Algorithm 2. We first complete be adding appropriate responses to all update operations, and removing all pending read operations, we denote this completed history as .

Let be a linearization of given by ordering update operations by their return steps, and ordering read operations after all preceding operations in , and before concurrent ones. Operations with the same order are ordered arbitrarily. Let be a linearization of given by ordering update operations by their invocations, and ordering read operations operations before all operations that precede them in , and after concurrent ones. Operations with the same order are ordered arbitrarily. Let for be a sequential execution of a batched counter with history .

By construction, and are linearizations of . Let be some read operation that completes in . Let be the array as read by in , as read by in and as read by in . To show that , we show that for every index .

For some index , only can increment . By the construction of , all update operations that precede in also precede it in . Therefore . Assume by contradiction that . Consider all concurrent update operations to . After all concurrent update operations end, the value of index is . However, by construction, is ordered after all concurrent update operations in , therefore . This is a contradiction, and therefore .

As all entries in the array are non-negative, it follows that , and therefore . ∎

Figure 1 shows a possible concurrent execution of Algorithm 2. This algorithm can efficiently implement a distributed or NUMA-friendly counter, as processes only access their local registers thereby lowering the cost of incrementing the counter. This is of great importance, as memory latencies are often the main bottleneck in shared object emulations [mahapatra1999processor]. As there are no waits in either update or read, it follows that the algorithm is wait-free. Furthermore, the read step complexity is , and the update step complexity is . Thus, we have shown the following theorem:

Theorem 2.

There exists a bounded wait-free IVL implementation of a batched counter using only SWMR registers, such that the step complexity of update is and the step complexity of read is .

Figure 1: A possible concurrent history of the IVL batched counter: and update their local registers, while reads. returns an intermediate value between the counter’s state when it starts, which is , and the counter’s state when it completes, which is .

6.2 Lower bound for linearizable batched counter object

The incentive for using an IVL batched counter instead of a linearizable one stems from a lower bound on the step-complexity of a wait-free linearizable batched counter implementation from SWMR registers. To show the lower bound we first define the binary snapshot object. A snapshot object has components written by separate processes, and allows a reader to capture the shared variable states of all processes instantaneously. We consider the binary snapshot object, in which each state component may be either or  [hoepman1993binary]. The object supports the update() and scan operations, where the former sets the state of component to value a and the latter returns all processes states instantaneously. It is trivial that the scan operation must read all states, therefore its lower bound step complexity is . Israeli and Shriazi [israeli1998time] show that the update step complexity of any implementation of a snapshot object from SWMR registers is also . This lower bound was shown to hold also for multi writer registers [attiya2006complexity]. While their proof was originally given for a multi value snapshot object, it holds in the binary case as well [hoepman1993binary].

1:local variable Initialized to
2:shared batched counter object
4:procedure update()
5:     if  then return
7:     if  then .update()
8:     if  then .update()
9:procedure scan
10:     .read()
11:      Initialize an array of ’s
12:     for  do
13:         if bit is set in  then       
14:     return
Algorithm 3 Algorithm for process , solving binary snapshot with a batched counter object.

To show a lower bound on the update operation of wait-free linearizable batched counters, we show a reduction from a binary snapshot to a batched counter in Algorithm 3. It uses a local variable and a shared batched counter object. In a nutshell, the idea is to encode the value of the component of the binary snapshot using the least significant bit of the counter. When the component changes from to , update adds , and when it changes from to , update adds . We now prove the following invariant:

Invariant 1.

At any point in history of a sequential execution of Algorithm 3, the sum held by the counter is , such that is the parameter passed to the last invocation of update in before if such invocation exists, and otherwise, for some integer .


We prove the invariant by induction on the length of , i.e., the number of invocations in , denoted . As is a sequential history, each invocation is followed by a response.

Base: The base if for , i.e., is the empty execution. In this case no updates have been invoked, therefore for all . The sum returned by the counter is . Choosing satisfies the invariant.

Induction step: Our induction hypothesis is that the invariant holds for a history of length . We prove that it holds for a history of length . The last invocation can be either a scan, or an update() by some process . If it is a scan, then the counter value doesn’t change and the invariant holds. Otherwise, it is an update(). Here, we note two cases. Let be ’s value prior to the update() invocation. If , then the update returns without altering the sum and the invariant holds. Otherwise, . We analyze two cases, and . If , then . The sum after the update is , where if , and , and the invariant holds. If , then . The sum after the update is , where if , and , and the invariant holds. ∎

Using the invariant, we prove the following lemma:

Lemma 3.

For any sequential history , if a scan returns , and update() is the last update invocation in prior to the scan, then . If no such update exists, then .


Let be a scan in . Consider the sum sum as read by scan . From Invariant 1, the value held by the counter is . There are two cases, either there is an update invocation prior to , or there isn’t. If there isn’t, then by Invariant 1 the corresponding . The process sees bit , and will return . Therefore, the lemma holds.

Otherwise, there is a an update prior to in . As the sum is equal to , by Invariant 1, bit is equal to iff the parameter passed to the last invocation of update was . Therefore, the scan returns the parameter of the last update and the lemma holds. ∎

Lemma 4.

Algorithm 3 implements a linearizable binary snapshot using a linearizable batched counter.


Let be a history of Algorithm 3, and let be where each operation is linearized at its access to the linearizable batched counter, or its response if on line 5. Applying Lemma 3 to , we get and therefore is linearizable. ∎

It follows from the algorithm that if the counter object is bounded wait-free then the scan and update operations are bounded wait-free. Therefore, the lower bound proved by Israeli and Shriazi [israeli1998time] holds, and the update must take steps. Other than the access to the counter in the update operation, it takes steps. Therefore, the access to the counter object must take steps. We have proven the following theorem.

Theorem 3.

For any linearizable wait-free implementation of a batched counter object with processes from SWMR registers, the step-complexity of the update operation is .

7 Conclusion

We have presented IVL, a new correctness criterion that provides flexibility in the return values of quantitative objects while bounding the error that this may introduce. IVL has a number of desirable properties: First, like linearizability, it is a local property, allowing designers to reason about each part of the system separately. Second, also like linearizability but unlike other relaxations of it, IVL preserves the error bounds of PAC objects. Third, IVL is generically defined for all quantitative objects, and does not necessitate object-specific definitions. Finally, IVL is inherently amenable to cheaper implementations than linearizability in some cases.

Via the example of a CountMin sketch, we have illustrated that IVL provides a generic way to efficiently parallelize data sketches while leveraging their sequential error analysis to bound the error in the concurrent implementation.



Appendix A Locality proof

We now prove that IVL is a local property: See 1


Let be a deterministic algorithm, and let be the sequential specification of object , for every . The “only if” part is immediate.

Denote by the linearizations of guaranteed by the definition of IVL. We first construct a linearization of , defined by the order as follows: For every pending operation on object , we either add the corresponding response or remove it based on . We then construct a partial order of operations as the union of and the realtime order of operations in . As must adhere to the realtime order of , and therefore , and the order of operations in are disjoint, this partial order is well defined. Consider two concurrent operations in . If they do not belong to the same history for some object , we order them arbitrarily in . We construct linearization of by defining the order of operations in a similar fashion.

By construction all invocations and responses appearing in appear both in and in , and and preserve the partial order . Therefore, and are linearizations of .

Consider some read on some object that returns in . As is IVL . Note that . Furthermore, for , as objects other than do not affect the return value of this operation. Therefore is IVL. ∎