Efficient Partial Snapshot Implementations

In this work, we propose the λ-scanner snapshot, a variation of the snapshot object, which supports any fixed amount of 0 < λ≤ n different SCAN operations being active at any given time. Whenever λ is equal to the number of processes n in the system, the λ-scanner object implements a multi-scanner object, while in case that λ is equal to 1, the λ-scanner object implements a single-scanner object. We present the λ-Snap snapshot object, a wait-free λ-scanner snapshot implementation that has a step complexity of O(λ) for UPDATE operations and O(λ m) for SCAN operations. The space complexity of λ-Snap is O(λ m). λ-Snap provides a trade-off between the step/space complexity and the maximum number of SCAN operations that the system can afford to be active on any given point in time. The low space complexity that our implementations provide makes them more appealing in real system applications. Moreover, we provide a slightly modified version of the λ-Snap implementation, which is called partial λ-Snap, that is able to support dynamic partial scan operations. In such an object, processes can execute modified SCAN operations called PARTIAL_SCAN that could obtain a part of the snapshot object avoiding to read the whole set of components. In this work, we first provide a simple single-scanner version of λ-Snap, which is called 1-Snap. We provide 1-Snap just for presentation purposes, since it is simpler than λ-Snap. The UPDATE in 1-Snap has a step complexity of O(1), while the SCAN has a step complexity of O(m). This implementation uses O(m) CAS registers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/10/2020

An Efficient Universal Construction for Large Objects

This paper presents L-UC, a universal construction that efficiently impl...
02/26/2020

Upper and Lower Bounds on the Space Complexity of Detectable Object

The emergence of systems with non-volatile main memory (NVM) increases t...
05/25/2018

A Multi-Scan Labeled Random Finite Set Model for Multi-object State Estimation

State space models in which the system state is a finite set--called the...
01/19/2018

Towards a Theory of Data-Diff: Optimal Synthesis of Succinct Data Modification Scripts

This paper addresses the Data-Diff problem: given a dataset and a subseq...
02/01/2021

Jiffy: A Lock-free Skip List with Batch Updates and Snapshots

In this paper we introduce Jiffy, the first lock-free, linearizable orde...
07/23/2019

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

In an era when the performance of a single compute device plateaus, soft...
07/23/2019

Scaling Back-propagation by Parallel Scan Algorithm

In an era when the performance of a single compute device plateaus, soft...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We inarguably live in an era where almost any activity is supported either by smart devices or potent servers, relying on multi-core CPUs. As this new equipment promises to perform more services per time unit, executing increasingly complex jobs, any application that does not use the many cores that are provided by the hardware is gradually becoming obsolete.

At the heart of exploiting the potential that multiple cores provide, are concurrent data structures, since they are essential building blocks of concurrent algorithms. The design of concurrent data structures, such as lists [25], queues [15, 22], stacks [6, 22], and even trees [7, 10] is a thoroughly explored topic. Compared to sequential data structures, the concurrent ones can simultaneously be accessed and/or modified by more than one process. Ideally, we would like to have the best concurrent implementation, in terms of space and step complexity, of any given data structure. However, this cannot always be the case since the design of those data structures is a complex task.

In this work, we present a snapshot object, a concurrent object that consists of components which can be read and modified by any process. Concurrent snapshot objects are used in numerous applications in order to provide a coherent “view” of the memory of a system. They are also used to design and validate various concurrent algorithms such as the construction of concurrent timestamps [13], approximate agreement [5], etc, and the ideas at their core can be further developed in order to implement more complex data structures [2]. Applications of snapshots also appear in sensor networks where snapshot implementations can be used to provide a consistent view of the state of the various sensors of the network. Under certain circumstances, snapshots can even be used to simulate concurrent graphs, as seen e.g. in [20]. The graph data structure is widely used by many applications, such as the representation of transport networks [1], video-game design [8], automated design of digital circuits [19], making the study of snapshot objects pertinent even to these areas.

There are many different implementations of snapshot objects based on the progress guarantee that they provide. However, in order to be fault tolerant against process failure, a concurrent object has to have strong progress guarantees, such as wait-freedom, i.e. the progress guarantee which ensures that an operation invoked by any process that does not fail, returns a result after it executes a finite number of steps. We provide two wait-free algorithms that implement a snapshot object, namely an algorithm for a single-scanner snapshot object, i.e. a snapshot object where only one process is allowed to read the values of the components, although any process may modify the values of components; and an algorithm for a -scanner snapshot object, where up to predefined processes may read the components of the object, while any process may change the value of any component. Note that should be lower than or equal to , i.e. the number of processes in the system. In case the value of is equal to , we obtain a general multi-scanner snapshot object. Our -scanner implementation allows us to study trade-offs, since the increase of the value of leads to a linear increase of the space and step complexity. Our algorithms can be modified to obtain partial snapshot implementations (see Sections 3.1 and 4.1), where processes execute modified operations that can obtain the values of just a subset of the snapshot components.

In terms of shared registers, our algorithm has a low space complexity of , where is the number of the components of the snapshot object. This does not come with major compromises in terms of step complexity, since the step complexity of an operation is , while that of a operation is . The registers we use are of unbounded size, although the only unbounded value that they store is a sequence number. This is a common practice from many state-of-the-art implementations [11, 24]. The atomic primitive the registers need to support is  (Compare And Swap), although we present a version of the algorithm using registers in order to be more comprehensive and easier to prove correct. An register can be constructed by registers using known constructions [16, 23].

The rest of this work is organized as follows. Section 1.1 provides a brief comparison of our work with other state-of-the-art algorithms that solve similar problems. Section 2 exposes the theoretical framework we use. Section 3 presents , our wait-free implementation of a single-scanner snapshot object, and Section 4 presents , our wait-free -scanner implementation. Section 5 contains a concluding discussion.

1.1 Related work

Most of current multi-scanner snapshot implementations that use registers of relatively small size either have step complexity that is linear to the number of processes  [4, 14] or the space complexity is linear to the number of  [3, 14, 17, 18, 20]. The only exception is the multi-scanner snapshot implementation presented by Fatourou and Kallimanis in [9]

. However, this snapshot implementation uses unrealistically large registers, since it requires registers that contain a vector of

values as well as a sequence number. The step complexity of is for and for , while it uses registers. In cases where is a relatively small constant, the number of registers used can be reduced almost to , while the step complexity of is almost linear to and the step complexity of is almost constant. Compared to current single-scanner snapshot implementations [12, 9, 11, 18, 21, 24], offers the capability to have more than one operation at each point of time by slightly worsening the step complexity. In the worst case where the value of is equal to , provides an implementation of a multi-scanner snapshot object that uses a smaller amount of registers compared to the implementations in [4, 17, 18, 24]. To the best of our knowledge, provides the first trade-off between the number of active scanners and the step/space complexity.

We now compare snapshot with other multi-scanner algorithms. In Table 1, we present the basic characteristics of each snapshot implementation that is reviewed in this section. Riany et al. have presented in [24] an implementation of snapshot objects that uses registers and achieves and step complexity for and operations respectively. Attiya, Herlihy & Rachman present in [4] a snapshot object that has step complexity for both and operations, while it uses dynamic Test&Set registers.

Fatourou and Kallimanis [9] present a multi-scanner implementation with for step complexity operations and step complexity for operations. In contrast to , this snapshot implementation requires registers that contain a vector of values as well as a sequence number. Moreover, the multi-scanner snapshot implementation of [9] does not support partial snapshots.

Kallimanis and Kanellou [20] present a wait-free implementation of a graph object. This implementation can be slightly modified to simulate a snapshot object, which supports partial operations. This algorithm manages to implement and operations with step complexity of , where is the number of active processes in a given execution. It also maintains a low space complexity of but the registers used are of unbounded size. In essence, the algorithm needs registers that can contain integer values, where half of those values are unbounded.

Imbs and Raynal [14] provide two implementations of a partial snapshot object. The first implementation uses simpler registers than the registers used in the second implementation, but it has a higher space complexity. Thus, we concentrate on the second implementation that achieves a step complexity of for and for , where is a value that is relative to the helping mechanism the operations provide. This implementation uses Read/Write (abbr. ) and registers. Finally, the implementation of Imbs and Raynal provides a new helping mechanism by implementing the “write first, help later” technique in their work.

Attiya, Guerraoui and Ruppert [3] provide a partial snapshot algorithm that uses CAS registers. The operations of this implementation have a step complexity of . The step complexity of is , where is the number of active operations, whose execution interval overlaps with the execution interval of , and is the maximum number of components that any operation may read in any given execution.

Implementation Partial Regs type Regs number
-Snap LL/SC &
partial -Snap LL/SC &
Attiya, et. al. [4] dynamic Test&Set unbounded
Fatourou &
Kallimanis [9]
CAS &
Jayanti [18] CAS or LL/SC &
Jayanti [17] CAS or LL/SC &
Riany et al. [24] CAS or LL/SC & Fetch&Inc &
Kallimanis &
Kanellou [20]
CAS or LL/SC &
D. Imbs &
M. Raynal [14]
LL/SC &
Attiya, Guerraoui & Ruppert [3] CAS &
Table 1: Known multi-scanner snapshot implementations

We now compare and snapshot with other single-scanner algorithms. Recall that gives the ability to a snapshot object to have more than one operation at each point of time by slightly worsening the step complexity. In Table 2, we present the basic characteristics of each snapshot implementation that is reviewed in this section.

In [9, 11], Fatourou and Kallimanis provide a single-scanner snapshot implementation, which is called , that achieves step complexity for and for . By applying some trivial modifications to , a partial snapshot implementation with step complexity for and for could be derived. In contrast to , uses an unbounded number of registers. Moreover, the and snapshot implementations presented in [9, 11] do not support partial operations. In [18], Jayanti presents a single-scanner snapshot algorithm with step complexity for and for , while it uses LL/SC & registers. The algorithm of [18] could be easily modified to support partial operations without having any negative impact on step and space complexity. Therefore, and (for ) match the step complexity of implementations presented in [9, 11, 18], which is for and for . Denote that the single-scanner implementations of [11, 9] use registers, while and use registers. The partial versions of and (for ) have step complexity of that is reduced to , where is the amount of components the operation wants to read.

Kirousis et al. [21] provide a single scanner implementation that uses an unbounded number of registers and has unbounded time complexity for SCAN. A register recycling technique is applied to this snapshot implementation resulting a snapshot implementation with step complexity for and for . Riany, et al. [24] present an algorithm a single-scanner implementation, which is a simplified variant of the algorithm presented in [21]. This snapshot implementation achieves step complexity for and for . By applying some trivial modifications, a partial snapshot implementation could be derived. However, the snapshot implementation of [24] is a single-updater snapshot object, since it does not allow more than one processes to update the same component at each point of time.

Implementation Partial Regs type Regs number
LL/SC & SW
LL/SC & SW
[12, 11]
 [9, 11](modified) Unbounded
[9, 11]
 [9, 11]
[21]
[24]
[18] LL/SC &
Table 2: Known single-scanner snapshot implementations

In [12, 11], Fatourou and Kallimanis provide the algorithm that achieves step complexity for both and , while it uses registers. This implementation does not support partial operations.

2 Model

We consider a system consisting of uniquely distinguishable processes modeled as sequential state machines, where processes may fail by crashing. The processes are asynchronous and communicate through shared . A base object stores a value, and it provides a set of , through which the object’s value can be accessed and/or modified.

  1. A (), is a shared object that stores a value from a set and that supports the primitives: (i) that writes the value in , and returns , and (ii) that returns the value stored in .

  2. An is a shared object that stores a value from a set and supports the primitives: (i) which returns the value of , and (ii) which can be executed by a process only after the execution of an by the same process. An writes the value in only if the state of hasn’t changed since executed the last , in which case the operation returns ; it returns otherwise.

  3. An is a shared object that stores a value from a set. It supports the same primitives as an and in addition, the primitive that writes the value in , and returns .

A is a data structure that can be accessed and/or modified by processes in the system. Each shared object provides a set of . Any process can access and/or modify the shared object by invoking operations that are supported by it. An of a shared object uses base objects to store the state of the shared object and provides a set of algorithms that use the base objects to implement each operation of the shared object. An operation consists of an by some process and terminates by returning a to the process that invoked it. Similar to each base object, each process also has an internal state. A of the system is a vector that contains the state of each of the processes and the value of each of the base objects at some point in time. In an initial configuration, the processes are in an initial state and the base objects hold an initial value. We denote an initial configuration by . A taken by a process consists either of a primitive to some base object or the response to that primitive. Operation invocations and responses are also considered steps. Each step is executed atomically.

An is a (possibly infinite) sequence , alternating between configurations and steps, starting from some initial configuration , where each , results from applying step to configuration . If is a configuration that is present in we write . An of a given execution is a subsequence of which starts with some configuration and ends with some configuration (where . An of an operation is an execution interval with its first configuration being the one right after the step where was invoked and last being the one right after the step where responded.

Given an execution , we say that a configuration if . Similarly, we say that step precedes step if . We say that a configuration precedes the step in , if . On the other hand, we say that the step precedes in if . We furthermore say that precedes if the step where responds precedes the step where is invoked. Given two execution intervals of , we say that precedes if any configuration contained in precedes any configuration contained in .

An operation is called with an operation in execution if there is at least one configuration , such that both and are active in . An execution is called if in any given there is at most one active . An execution that is not is called . Executions and are equivalent if they contain the same operations and only those operations are invoked in both of them by the same process, which in turn have the same responses in and .

An execution is if it is possible to assign a linearization point, inside the execution interval of each operation in , so that the response of in is the same as its response would be in the equivalent sequential execution that would result from performing the operations in sequentially, following the order of their linearization points. An implementation of a shared object is linearizable if all executions it produces are linearizable. An implementation of a shared object is if any operation , of a process that does not crash in , responds after a finite amount of steps. The maximum number of those steps is called of .

A is a shared object that consists of components, each taking values from a set, that provides the following two primitives: (i) which returns a vector of size , containing the values of components of the object, and (ii) which writes the non value on the component of the object. A is a shared object that consists of distinct components denoted by , each taking values from a set, that provides the following two primitives: (i) which, given a set that contains integer values ranging from to , returns for each the value of the component , and (ii) which writes the non value on . A snapshot implementation is if in any execution produced by the implementation there is no in which there are more than one active operations. Similarly, a snapshot implementation is if in any execution produced by the implementation there is no in which there are more than active operations.

3 1-Snap

In this section, we present the snapshot object (see Listings LABEL:alg:1opt_ds-LABEL:alg:1opt_apply_update).

In , only a single, predefined process is allowed to invoke operations, while all processes can invoke operations on any component of the snapshot object. uses shared integer variable , with initial value , in order to provide sequence numbers to operations. Each applied operation gets a sequence number by reading the value of . An operation that is applied with a smaller sequence number than that of another operation is considered to be applied before . Since only operations can increase the value of by one and since in any given configuration there is only one active operation in our implementation, the register can safely be a register.

struct value_struct {
        val     value;
        int seq;
        val proposed_value;
};
struct pre_value_struct {
        val value;
        int seq;
};
shared int seq;
shared value_struct values[0..m-1]=[<NULL,NULL,NULL>,…,<NULL,NULL,NULL>];
shared pre_value_struct pre_values[0..m-1]=[<NULL,NULL>,…,<NULL,NULL>];
private int view[0..m-1]=[NULL,NULL,…,NULL,NULL];

uses shared vector , consisting of structs, to represent the components of the snapshot object. Each struct of is stored in an register and any process can execute and operations on each of them. The component of the snapshot object is stored in the struct of the data structure, this struct is denoted and its type is . Each of those structs contains the following three fields: (1) a variable called which stores the value of the component of the snapshot object that is simulated by , (2) an integer variable called , which stores the sequence number of the last operation that has been applied to the component of the snapshot. This is also referred to as the sequence number of the component, and (3) a val variable called which stores the value that the announced operation wants to apply on the component of the snapshot.

This means that each component of the snapshot object can store two values, namely its current value and the proposed value. The process that executes the operations uses a unique vector , which consists of structs that are stored in an register and any process can execute and operations on them. The struct of , , contains a previous value of the component and a sequence number of the component of the snapshot object. This sequence number is always smaller than that of the executed by process . Since we apply a helping mechanism, any and operation can read and modify the components of this data structure regardless of their process id.

A operation increases the value of by one and uses this increased value as its sequence number (line LABEL:seqinc). operations that have been applied with a greater or equal sequence number than that of this , are not “visible” by it (recall that operations are considered to be applied in increasing order of their assigned sequence number). Afterwards, for each component of the snapshot object (lines LABEL:for1snap-LABEL:endfor1snap), the performs the following steps: (1) It tries to copy the value of this component to data structure if the sequence number of the component is lower than the sequence number of the corresponding (lines LABEL:scanitem1 - LABEL:scanitem1end). (2) It tries to apply an announced to this component of the snapshot object (lines LABEL:scanitem2 - LABEL:scanitem2end). (3) Finally, returns its copy of the snapshot object (line LABEL:1snapreturn).

An operation on the component executed by process first tries to announce the new value that it wants to store on the component of the snapshot. This is achieved by trying to write on the field of the component (lines ). Afterwards, tries to copy the value of the component of the snapshot to data structure if needed (lines 52 - 60). Then it tries to update the value of the component of the snapshot using a local copy of as its sequence number (lines 61 - 67). If the announcement was successful, then the operation ends its execution after the aforementioned last step. Otherwise, it repeats all previous steps for one last time. Doing so will make sure that an operation (may or may not be the same as ) on the component of the snapshot object is applied and furthermore linearized inside the execution interval of .

void UPDATE(int j, val value){
        int i;
        struct value_struct up_value, cur_value;
        for (i=0; i<2; i++){
                cur_value=LL(values[j]);
                up_value=cur_value;
                up_value.proposed_value=value;
                if (cur_value.proposed_value==NULL){
                        if (SC(values[j],up_value)){
                                ApplyUpdate(j);
                                break;
                        }
                }
                ApplyUpdate(j);
        }
}
pointer SCAN(){
        int j;
        struct value_struct v1;
        struct pre_value_struct v2;
        seq=seq+1;    @\label{seqinc}@
        for (j=0;j<m;j++){     @\label{for1snap}@
                ApplyUpdate(j);
                v1=values[j];
                v2=pre_values[j];
                if (v1.seq<seq){
                        view[j]=v1.value;
                }else{
                        view[j]=v2.value;
                }   @\label{for1snapend}@
        }    @\label{endfor1snap}@
        return view[0..m-1];  @\label{1snapreturn}@
}
 void ApplyUpdate(int j) {
        struct value_struct cur_value;
        struct pre_value_struct cur_pre_value,proposed_pre_value;
        cur_value=LL(values[j]);
        cur_seq=seq;
        for (t=0; t<2; t++) {
                cur_pre_value=LL(pre_values[j]); @\label{scanitem1}@
                cur_value=values[j];
                if (cur_value.seq<seq){
                        proposed_pre_value.seq=cur_value.seq;
                        proposed_pre_value.value=cur_value.value;
                        SC(pre_values[j],proposed_pre_value);
                }
        }  @\label{scanitem1end}@
        if (cur_value.proposed_value!=NULL) { @\label{scanitem2}@
                cur_value.value=cur_value.proposed_value;
                cur_value.seq=cur_seq;
                cur_value.proposed_value=NULL;
                SC(values[j],cur_value);
        }  @\label{scanitem2end}@
}

3.1 A partial version of 1-Snap

The snapshot implementation can be trivially modified in order to implement a partial snapshot object (see Listing LABEL:alg:1opt_partial). In order to do that, a new function is introduced. This function is invoked by operations in order to read the values of the components indicated by , which a subset of the components of the snapshot object. For each component that is contained in , the operation tries to help an operation that wants to update the value of by invoking the . Afterwards, it reads the value of by invoking the function.

void PARTIAL_SCAN(A){
        seq=seq+1;
        for each j in A{
                ApplyUpdate(j);
                Read(j);
        }
}
val Read(j){
        struct value_struct v1;
        struct pre_value_struct v2;
        v1=values[j];
        v2=pre_values[j];
        if (v1.seq<seq){
                view[j]=v1.value;
        }else{
                view[j]=v2.value;
        }
        return view[j];
}

3.2 Step and space complexity of 1-Snap

The step complexity of any operation of is measured by the number of accesses that are executed in shared registers, inside its execution interval.

We start with the worst-case analysis of .

  1. In lines 48-51 only an operation is performed at line 50 and a read of shared variable (line 51).

  2. Lines 52-60 contain a loop that is executed at maximum two times. In each iteration of this loop, there are executed at maximum two operations (the of line 53 and the of line 58) and one read of line 54.

  3. Lines 61-66 contain just a single operation (line 65).

Thus, executes shared memory accesses.

We now proceed with the worst-case analysis of the step complexity of any . The loop of lines 17-28 can be executed two times at maximum and contains an (line 18), an (line 22) and two invocations of (lines 23 and 27). We previously proved that any executes shared memory accesses. It follows that any operation executes shared memory accesses.

Finally, the worst-case analysis of the step complexity of any is as follows.

  1. A write operation on the shared value is executed on line 34.

  2. Lines 35-44 contain a loop that is executed exactly times. In each iteration of the loop an invocation of is executed (line 36) and two read operations (lines 37 and 38) are performed.

It follows that any operation executes shared memory accesses.

Both partial and non-partial provide the same step complexity to operations of and have the same space complexity of . However, partial provides a step complexity of to operations, where is the number of elements contained in . In contrast, the step complexity that non-partial provides to operation is , higher than that of the partial version, since .

The space complexity of algorithm is measured through counting the number of shared registers that are needed for its implementation. The implementation of deploys three different shared objects:

  1. A shared integer variable called which is stored in a multi-read/write register.

  2. A shared table called that is consisted of registers.

  3. A shared table called that is consisted of registers.

Thus, our implementation deploys unbounded registers and register. It follows that the space complexity of our algorithm is .

The implementation of 1 presented in this work uses registers of unbounded size (one sequence number and two integer values). Although registers should be unbounded it can be proven that they need to have a size of , where is the maximum number of in a given execution. Thus, in executions that the maximum number of operation is not too big, may use bounded registers.

Theorem 3.1.

is a wait-free linearizable concurrent single-scanner snapshot implementation that uses registers, and it provides step complexity to operations and to operations.

4 -Snap

In this section, we present the snapshot object (see Listings LABEL:alg:lsnap_ds-LABEL:alg:lsnap_apply_update).

struct value_struct {
        val value;
        val proposed_value;
        int seq;
};
struct pre_value_struct {
        val value;
        int seq;
};
struct scan_struct {
        int seq;
        boolean write_enable;
};
shared int seq;
shared value_struct values[0..m-1]=[<NULL,NULL,NULL>,…,<NULL,NULL,NULL>];
shared pre_value_struct pre_values[0..@$\lambda$@-1][0..m-1]=[<NULL,NULL>,…,<NULL,NULL>];
shared scan_struct s_table[0..@$\lambda$@-1]=[<NULL,0>,<NULL,0>,…,<NULL,0>];
private int view[0..m-1]=[NULL,NULL,…,NULL,NULL];

In , only a predefined set of processes are allowed to invoke operations, while all processes can perform operations on any component. Each applied operation gets a sequence number by reading the shared register . Sequence numbers assigned both to and operations. More specifically, operations get a sequence number during the beginning of their execution, while operations get an actual sequence number at the point they successfully update the component with their value. We often refer to that as the sequence number of the operation. A role of the sequence number is that an operation with a smaller sequence number than that of another operation is considered to be applied before . Also, a sequence number predetermines which operations are visible to a operation. More specifically, operations that have been applied with a greater or equal sequence number than that of the sequence number of a operation, are not visible from this .

 void UPDATE(int j, val value){
        struct value_struct up_value, cur_value;
        for (i=0; i<2; i++){
                cur_value=LL(values[j]);
                up_value=cur_value;
                up_value.proposed_value=value;
                if (cur_value.proposed_value==NULL){
                        if (SC(values[j],up_value)){
                                ApplyUpdate(j);
                                break;
                        }
                }
                ApplyUpdate(j);
        }
}
pointer SCAN(){
        s_table[p_id]={1,seq};
        for (i=0;i<3;i++){
                cur_seq=LL(seq);
                for (j=0;j<@$\lambda$@;j++){
                        cur_s_table=LL(s_table[j]);
                        if(cur_s_table.seq<seq+2 && cur_s_table.write_enable==1){
                                cur_s_table.write_enable=0;
                                cur_s_table.seq=seq+2;
                                SC(s_table[j],cur_s_table);
                        }
                }
                SC(seq,cur_seq+1);
        }
@~@
        for (j=0;j<m;j++){
                ApplyUpdate(j);
                v1=values[j];
                v2=pre_values[p_id][j];
                if (v1.seq<s_table[p_id].seq){
                        view[j]=v1.value;
                } else {
                        view[j]=v2.value;
                }
        }
        return view[0..m-1];
}
void ApplyUpdate(int j) {
        struct value_struct cur_value;
        struct pre_value_struct cur_pre_value,proposed_pre_value;
        cur_value=LL(values[j]);
        cur_seq=seq;
        for (i=0; i<@$\lambda$@; i++) {
                for (t=0; t<2; t++) {
                        cur_pre_value=LL(pre_values[i][j]);
                        cur_value=values[j];
                        if (cur_value.seq<s_table[j].seq){
                                proposed_pre_value.seq=cur_value.seq;
                                proposed_pre_value.value=cur_value.value;
                                SC(pre_values[i][j], proposed_pre_value);
                        }
                }
        }
        if (cur_value.proposed_value!=NULL) {
                cur_value.value=cur_value.proposed_value;
                cur_value.seq=cur_seq;
                cur_value.proposed_value=NULL;
                SC(values[j], cur_value);
        }
}

For assigning sequence numbers to and operations, employs a shared register (line ), which takes integer values. Only operations are able to increase the value of by one (lines ). In contrast to , operations in get sequence numbers in more complex way (lines -). More specifically, operations use a consensus-like protocol in order to increase the (using instructions) and get a new sequence number. In contrast to , more than one operations may get the same sequence number. However, for all operations that get the same sequence number, the following hold: (1) they are performed by different processes, (2) the increment of the register using instructions takes place insides their execution interval, and (3) all these operations are eventually linearized at the same point of the increment of register . Note that an operation , which has been applied with a sequence number greater or equal to the sequence number of some operation, is not visible to . Since is not visible to , is linearized after . In order to ensure that both instructions take place in the execution interval of a SCAN operation and try helping themselves and other operations, the consensus-like protocol is executed times (lines ).

Each process that is able to execute operations, owns a shared array of registers, which is called (line ). This array of registers stores a previous value and the sequence number for each component that wants to read. As a first step, each operation tries to increase the value of by executing the consensus-like protocol of lines . Afterwards, for each component of the snapshot object a operation does the following steps: (1) it tries to copy the value of this component to every data structure that is used by operations (lines  of ), (2) if the sequence number of the component is lower than that of the sequence number of the corresponding (line ), it tries to apply an announced to this component of the snapshot object (line ), and (3) it returns its copy of the snapshot object (line ).

We now concentrate on describing operations. Each component of the snapshot object stores two values. The first one is the current value of the component (i.e. the field of at line ) and the second one is the proposed value (i.e. the field of ), simpler said this is the value that an currently wants to write on the component. An operation on -th component executed by some process , it first tries to propose the new value that it wants to store on the -th component of the snapshot. This is achieved by trying to write on the of the -th component of the snapshot object (lines ). Afterwards, it tries to copy the current value of the -th component of the snapshot to every register (one for each scanner) if needed (lines ). Then it tries to the value of the -th component of the snapshot using a local copy of as its sequence number (line ). If the proposal of the new value was successful, then the operation ends its execution (line ). Otherwise, it repeats all previous steps for one last time. Doing so will make sure that an operation (may or may not be the same as ) on the -th component of the snapshot object is applied and furthermore linearized inside the execution interval of . By writing a sequence number with its value, an operation that has been applied with a sequence number less or equal to the sequence number of some operation is visible to .

In , we employ a helping mechanism where and operations try to help operations that are slow or stalled (lines ). More specifically, an operation on some component helps at most operations on the -th component (see lines ). On the other hand, a operation helps at most operations per component that it reads. Thus, the non partial version of helps at most operations (in the case of the partial version of , a operation helps at most operations, where is the number of components that wants to read).

4.1 A partial version of -Snap

We now present a slightly modified version of (see Listing LABEL:alg:lsnap_partial) that implements a partial snapshot object. The data structures used in this modified version of remain exactly the same, as shown in Listing LABEL:alg:lsnap_ds. Furthermore, the pseudocode of and function remain the same as shown in Listings LABEL:alg:lsnap and LABEL:alg:lsnap_apply_update. A new function is introduced called (Listing LABEL:alg:lsnap_partial). This function is invoked by operations in order to read the values of the snapshot object.

pointer PARTIAL_SCAN(set A) {
        s_table[p_id]={1,seq};
        for     (i=0; i<3; i++) {
                cur_seq=LL(seq);
                for     (j=0;j<@$\lambda$@;j++) {
                        cur_s_table=LL(s_table[j]);
                        if(cur_s_table.seq<seq+2 && cur_s_table.write_enable==1) {
                                cur_s_table.write_enable=0;
                                cur_s_table.seq=seq+2;
                                SC(s_table[j],cur_s_table);
                        }
                }
                SC(seq,cur_seq+1);
        }
        for     each j in A  {
                ApplyUpdate(j);
                Read(j);
        }
}
val Read(int j){
        struct value_struct v1;
        struct pre_value_struct v2;
        v1=values[j];
        v2=pre_values[j];
        if (v1.seq<seq){
                view[j]=v1.value;
        }else{
                view[j]=v2.value;
        }
        return view[j];
}

The only modification in this version of is that the operations do not read every component of the snapshot object, they only read the components of set . For each component that is contained in (the set of components that a wants to read), the operation tries to help operations on the -th component by invoking the function (lines ). Afterwards, it reads the value of the -th component by invoking the function.

Both partial and non-partial have the same step complexity of operations, and the same space complexity. Although, provides a step complexity to operations of where, is the number of components that the operation reads.

4.2 Step and space complexity of -Snap

The step complexity of an operation of is measured by the number of operations that are executed in shared registers, inside its execution interval.

We start with the worst-case analysis of .

  1. In lines 62-65 only an operation is performed at line 68 and a read of shared variable (line 69).

  2. In lines 66-76 contain a loop that is executed exactly times. In any iteration of this loop the loop of lines 67-75 is executed exactly two times. In any iteration of the later loop, four shared register operations are executed at maximum. An at line 68, two read operations (line 69 and 70) and an operation at line 73. Thus, the loop of lines 67-75 executes at maximum eight shared register operations. Furthermore, the loop of lines 67-75 is a nested loop of that of lines 66-76, so it is executed exactly times. It follows that the loop of lines 66-76 executes at maximum shared register operations.

  3. Lines 77-82 contain just a single operation (line 81).

It follows that executes at maximum shared memory accesses. Thus, has a step complexity of ).

We now proceed with the worst-case analysis of the step complexity of any . The loop of lines 21-32 can be executed two times at maximum and contains an (line 22), an (line 26) and two invocations of (lines 27 and 31). We previously proved that any executes shared memory accesses. It follows that any operation executes shared memory accesses.

We can finally proceed with the worst-case analysis of step complexity of any .

  1. A write operation on the shared table is executed in line 35.

  2. Lines 36-47 contain a loop that is executed exactly three times. In each iteration of that loop, an is executed at line 37 and an at line 46. Furthermore, the loop of lines 38-45 is executed, and exactly iterations of it are performed. In any iteration of loop of lines 38-45 at maximum three shared memory accesses are performed (an at line 39, a read of the shared variable at line 40 and an at line 43). It follows that the loop of lines 38-45 executes shared memory accesses. Since the loop of lines 36-47 is executed exactly three times it executes shared memory accesses.

  3. Lines 49-58 contain a loop that is executed exactly times. In each iteration of that loop an is invoked (line 50) and two read operations are performed (lines 51, 52). Since executes shared memory accesses and at lines 49-58 is invoked exactly times it follows that lines 49-58 execute shared memory accesses.

It follows that any operation executes shared memory accesses.

The space complexity of algorithm is measured through counting the number of shared registers that are needed for its implementation. The implementation of deploys four different shared objects:

  1. A shared register called .

  2. A shared array called that is consisted of registers.

  3. A shared array called that is consisted of registers.

  4. A shared array called that is consisted of registers.

Thus, our implementation deploys registers. It follows that the space complexity of our algorithm is .

Theorem 4.1.

is a wait-free linearizable concurrent -scanner snapshot implementation that uses registers, and it provides step complexity to operations and to operations.

5 Discussion

This work proposes the object and its implementations, providing a solution to the single-scanner snapshot problem and the multi-scanner snapshot problem simultaneously. If is equal to 1, then our algorithm simulates a single-scanner snapshot object, while if is equal to the maximum number of processes, then it simulates a multi-scanner snapshot object. To the best of our knowledge, there is no publication that provides a solution to the snapshot problem that can support a preset amount of operation that may run concurrently.

solves the single-scanner flavor of snapshot problem. Although, in our algorithm, we only allow one process with a certain id to invoke operations, this is a restriction that can be easily lifted. The system can support invocations of operations by any process, although only one process can be active in any given configuration of the execution. In this case, our algorithm would be correct only in executions that no more than one is active in any given configuration of the execution.

A can efficiently applied in systems where only a preset amount of processes may want to execute operations. Especially in systems that the amount of processes that may want to invoke a operation is small enough, our algorithm has almost the same performance as a single-scanner snapshot object. An example of such a system may be a sensor network, where many sensors are communicating with a small amount of monitor devices. In this case, sensors essentially perform operations while monitor devices may invoke operations.

References

  • [1] J. Añez, T. [. L. Barra], and B. Pérez (1996) Dual graph representation of transport networks. Transportation Research Part B: Methodological 30 (3), pp. 209–216. External Links: Document, ISSN 0191-2615, Link Cited by: §1.
  • [2] J. Aspnes and M. Herlihy (1990) Wait-free data structures in the asynchronous pram model. In Proceedings of the Second Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ’90, New York, NY, USA, pp. 340–349. External Links: Document, ISBN 0897913701, Link Cited by: §1.
  • [3] H. Attiya, R. Guerraoui, and E. Ruppert (2008) Partial snapshot objects. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’08, New York, NY, USA, pp. 336–343. External Links: Document, ISBN 9781595939739, Link Cited by: §1.1, §1.1, Table 1.
  • [4] H. Attiya, M. Herlihy, and O. Rachman (1995-03) Atomic snapshots using lattice agreement. Distributing Computing 8 (3), pp. 121–132. External Links: Document, ISSN 0178-2770, Link Cited by: §1.1, §1.1, Table 1.
  • [5] H. Attiya, N. Lynch, and N. Shavit (1994-07) Are wait-free algorithms fast?. J. ACM 41 (4), pp. 725–763. External Links: Document, ISSN 0004-5411, Link Cited by: §1.
  • [6] G. Bar-Nissan, D. Hendler, and A. Suissa (2011) A dynamic elimination-combining stack algorithm. CoRR abs/1106.6304. External Links: 1106.6304, Link Cited by: §1.
  • [7] T. Brown, F. Ellen, and E. Ruppert (2014) A general technique for non-blocking trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, New York, NY, USA, pp. 329–342. External Links: Document, ISBN 9781450326568, Link Cited by: §1.
  • [8] V. Bulitko, Y. Björnsson, N. R. Sturtevant, and R. Lawrence (2011)

    Real-time heuristic search fo pathfinding in video games

    .
    In Artificial Intelligence for Computer Games, pp. 1–30. External Links: Document, ISBN 978-1-4419-8188-2, Link Cited by: §1.
  • [9] P. Fatourou and N. D. Kallimanis (2007) Time-optimal, space-efficient single-scanner snapshots & multi-scanner snapshots using cas. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’07, New York, NY, USA, pp. 33–42. External Links: Document, ISBN 9781595936165, Link Cited by: §1.1, §1.1, §1.1, Table 1, Table 2.
  • [10] P. Fatourou and N. D. Kallimanis (2014-10) Highly-efficient wait-free synchronization. Theor. Comp. Sys. 55 (3), pp. 475–520. External Links: Document, ISSN 1432-4350, Link Cited by: §1.
  • [11] P. Fatourou and N. D. Kallimanis (2017-08) Lower and upper bounds for single-scanner snapshot implementations. Distributed Computing 30 (4), pp. 231–260. External Links: Document, ISSN 0178-2770, Link Cited by: §1.1, §1.1, §1.1, Table 2, §1.
  • [12] P. Fatourou and N. D. Kallimanis (2006) Single-scanner multi-writer snapshot implementations are fast!. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, pp. 228–237. Cited by: §1.1, §1.1, Table 2.
  • [13] R. Gawlick, N. Lynch, and N. Shavit (1992) Concurrent timestamping made simple. In Theory of Computing and Systems, D. Dolev, Z. Galil, and M. Rodeh (Eds.), Berlin, Heidelberg, pp. 171–183. External Links: ISBN 978-3-540-47214-8 Cited by: §1.
  • [14] D. Imbs and M. Raynal (2012) Help when needed, but no more: efficient read/write partial snapshot. Journal of Parallel and Distributed Computing 72 (1), pp. 1–12. External Links: Document, ISSN 0743-7315, Link Cited by: §1.1, §1.1, Table 1.
  • [15] P. Jayanti and S. Petrovic (2005) Logarithmic-time single deleter, multiple inserter wait-free queues and stacks. In FSTTCS 2005: Foundations of Software Technology and Theoretical Computer Science, S. Sarukkai and S. Sen (Eds.), Berlin, Heidelberg, pp. 408–419. External Links: ISBN 978-3-540-32419-5 Cited by: §1.
  • [16] P. Jayanti and S. Petrovic (2006) Efficiently implementing a large number of ll/sc objects. In Principles of Distributed Systems, J. H. Anderson, G. Prencipe, and R. Wattenhofer (Eds.), pp. 17–31. External Links: ISBN 978-3-540-36322-4 Cited by: §1.
  • [17] P. Jayanti (2002) F-arrays: implementation and applications. In Proceedings of the Twenty-First Annual Symposium on Principles of Distributed Computing, PODC ’02, New York, NY, USA, pp. 270–279. External Links: Document, ISBN 1581134851, Link Cited by: §1.1, Table 1.
  • [18] P. Jayanti (2005) An optimal multi-writer snapshot algorithm. In Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’05, New York, NY, USA, pp. 723–732. External Links: Document, ISBN 1581139608, Link Cited by: §1.1, §1.1, Table 1, Table 2.
  • [19] F. M. Johannes (1996) Partitioning of vlsi circuits and systems. In Proceedings of the 33rd Annual Design Automation Conference, DAC ’96, New York, NY, USA, pp. 83–87. External Links: Document, ISBN 0897917790, Link Cited by: §1.
  • [20] N. Kallimanis and E. Kanellou (2016-12) Wait-free concurrent graph objects with dynamic traversals. In 19th International Conference on Principles of Distributed Systems (OPODIS 2015), External Links: Document Cited by: §1.1, §1.1, Table 1, §1.
  • [21] L. M. Kirousis, P. Spirakis, and P. Tsigas (1994-07) Reading many variables in one atomic operation: solutions with linear or sublinear complexity. IEEE Trans. Parallel Distrib. Syst. 5 (7), pp. 688–696. External Links: Document, ISSN 1045-9219, Link Cited by: §1.1, §1.1, Table 2.
  • [22] A. Kogan and E. Petrank (2011-09) Wait-free queues with multiple enqueuers and dequeuers. Vol. 46, pp. 223–234. External Links: Document Cited by: §1.
  • [23] M. M. Michael (2004) Practical lock-free and wait-free ll/sc/vl implementations using 64-bit cas. In Distributed Computing, R. Guerraoui (Ed.), Berlin, Heidelberg, pp. 144–158. External Links: ISBN 978-3-540-30186-8 Cited by: §1.
  • [24] Y. Riany, N. Shavit, and D. Touitou (2001) Towards a practical snapshot algorithm. Theoretical Computer Science 269 (1), pp. 163–201. External Links: Document, ISSN 0304-3975, Link Cited by: §1.1, §1.1, §1.1, Table 1, Table 2, §1.
  • [25] S. Timnat, A. Braginsky, A. Kogan, and E. Petrank (2012) Wait-free linked-lists. In Principles of Distributed Systems, R. Baldoni, P. Flocchini, and R. Binoy (Eds.), Berlin, Heidelberg, pp. 330–344. External Links: ISBN 978-3-642-35476-2 Cited by: §1.