Consistency Analysis of Replication-Based Probabilistic Key-Value Stores

02/14/2020
by   Ramy E. Ali, et al.
Penn State University
0

Partial quorum systems are widely used in distributed key-value stores due to their latency benefits at the expense of providing weaker consistency guarantees. The probabilistically bounded staleness framework (PBS) studied the latency-consistency trade-off of Dynamo-style partial quorum systems through Monte Carlo event-based simulations. In this paper, we study the latency-consistency trade-off for such systems analytically and derive a closed-form expression for the inconsistency probability. Our approach allows fine-tuning of latency and consistency guarantees in key-value stores, which is intractable using Monte Carlo event-based simulations.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

05/29/2015

Variance Analysis for Monte Carlo Integration: A Representation-Theoretic Perspective

In this report, we revisit the work of Pilleboue et al. [2015], providin...
04/30/2021

A note on a PDE approach to option pricing under xVA

In this paper we study partial differential equations (PDEs) that can be...
06/15/2022

A Multifidelity Monte Carlo Method for Realistic Computational Budgets

A method for the multifidelity Monte Carlo (MFMC) estimation of statisti...
09/04/2019

Using Weaker Consistency Models with Monitoring and Recovery for Improving Performance of Key-Value Stores

Consistency properties provided by most key-value stores can be classifi...
05/25/2018

Technical Report: Optimistic Execution in Key-Value Store

Limitations of CAP theorem imply that if availability is desired in the ...
03/05/2021

A Learning-Based Approach to Address Complexity-Reliability Tradeoff in OS Decoders

In this paper, we study the tradeoffs between complexity and reliability...
05/19/2022

On Efficiently Partitioning a Topic in Apache Kafka

Apache Kafka addresses the general problem of delivering extreme high vo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Key-value stores commonly replicate the data across multiple nodes to make the data available and accessible with low latency despite the possible failures or stragglers. In order to ensure strong consistency, these systems use strict quorums where the write and the read quorums must intersect [1, 2]. Specifically, in a system of servers where and denote the write and the read quorum sizes respectively, and are chosen such that . In order to have fast access to the data, many key-values stores including Amazon’s Dynamo [3] and Cassandra [4] allow non-strict (partial, probabilistic or sloppy) quorums where to provide a lower latency and only guarantee that the users will eventually return consistent data if there are no new write operations[5, 6]. However, eventual consistency does not specify how eventual or how fast this will happen.

Several works studied probabilistic quorum systems, attempted to quantify the staleness of the data retrieved, how soon users can retrieve consistent data and providing adaptive consistency guarantees depending on the application including [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. In [7], -intersecting probabilistic quorum systems were designed such that the probability that any two quorums do not intersect is at most . In [12]

, an adaptive approach was proposed that tunes the inconsistency probability, assuming that the response time of the servers are neglected, through controlling the number of servers involved in the read operations at the run-time based on a monitoring module. The monitoring module provides a real-time estimate of the network delays. In this approach, the write operation completes when any server responds to the write client. While the data is being propagated to the remaining servers, any server is pessimistically considered stale except the first server that responded to the write operation. Hence, this approach does not fully capture expanding write quorums (anti-entropy)

[20].

In [11, 14], the trade-off that partial quorum systems provide between the staleness of the retrieved data and the latency was studied in -way replication-based key-value stores. Specifically, this work answered the question of how stale is the retrieved data through the notion of -staleness, which measures the probability that the users retrieve one of the latest complete versions. The question of how eventual a user can read consistent data is also studied in [11, 14] through the notion of -visibility that measures the probability of returning the value of a write operation units of times after it completes. While the write operation completes upon receiving acknowledgments from any servers, more servers receive the write request after that and the write quorum can continue to expand. Characterizing the -visibility is challenging as it depends on how the write quorum expands based on the delays of the write and the read requests. Hence, the study of [14] focused on obtaining insights about this question for -way replication through Monte Carlo event-based simulations.

In this paper, we study the problem of providing probabilistic guarantees for partial quorum systems analytically for replication-based key-value stores. We study the inconsistency probability for such systems in terms of the quorum sizes, mean write and read delays. For -way replication-based systems, we derive an explicit simple closed-form expression for the inconsistency probability in terms of those parameters.

The rest of this paper is organized as follows. In Section II, we describe the system model and provide a background. In Section III, we study expanding quorums that have a dynamic size. We analyze the inconsistency probability of replication-based partial quorum systems in Section IV. Finally, concluding remarks are discussed in Section V.

Ii System Model and Background

In this section, we describe our system model and provide a brief background about the order statistics and the sum of exponential random variables.

Ii-a System Model: Partial Quorums

We consider a distributed system with servers denoted by storing a shared object. A client that issues a write request sends the request to all servers and waits for the acknowledgment of servers for the write operation to complete. We denote the time that a write request takes to reach to server in addition to the server’s response time by , where . We assume that are independent and identically distributed exponential random variables with parameter . A client that issues a read request sends the request to all servers and waits for servers to respond. The time the read request takes to reach server and the server’s response time is denoted by . We assume that the read delays

are independent and identically distributed random variables according to exponential distribution with parameter

. Finally, we assume that write and read acknowledgments are instantaneous (See Remark 1).

In strict quorum systems, and are chosen such that . In partial quorum systems however, and hence the write and the read quorums may not intersect. This may result in a consistency violation. In real-world quorum systems however, the write quorum expands as the write request propagate to more servers. In [11], the notion of -visibility was developed which aims to capture the probability of inconsistency for expanding quorums for a read operation that starts units of time after the write completes. Our goal in this work is to characterize the inconsistency probability for expanding quorums as a function of and the quorum sizes.

Remark 1.

While we assume that the write acknowledgments are instantaneous for simplicity, a deterministic delay of the acknowledgment denoted by can be taken into account by studying the consistency units of time after servers respond to the write request.

Ii-B Background: Order Statistics and Sum of Exponentials

In this subsection, we provide a brief background about exponential random variables that we build on later in Section III to study expanding quorums. We first recall the following useful Lemma [21] for the order statistics of independent exponential random variables with a common parameter .

Lemma 1 (Order Statistics of Independent Exponentials).

Let be independent and identically distributed random variables according to , then we have

(1)

where denotes the -th smallest of , and .

We also recall the following Lemma from [22] which studies the sum of independent exponential random variables with different parameters.

Lemma 2 (Sum of Exponentials).

Let be independent exponentials random variables with parameters respectively, where denotes the density function of . The density function of

(2)

is given by

(3)

Iii Expanding Quorums

In this section, we characterize the probability distribution of the number of servers in the write quorum

units of time after the write completes. As we have explained, a client that issues a write request sends the request to all servers and waits to receive acknowledgments from any servers. The first received responses determine the write latency , but the write quorum will continue to expand as more servers receive the write request. We denote the set of servers that have received the write value units of time after it completes by , where and . In Theorem 1, we characterize the probability mass function (PMF) of .

Theorem 1 (Dynamic Quorum Size).

The PMF of the number of servers that have received a complete version units of time after it completes, , is given by

(4)
(5)

for , where .

We provide the proof of Theorem 1 in Appendix A.

In Fig. 1, we show the PMF of for and . In Fig. 2, we show the PMF of for and .

Fig. 1: The probability mass function of for the case where and .
Fig. 2: The probability mass function of for the case where and .

Iv Consistency Analysis

In this section, we study the inconsistency probability of replication-based partial quorum systems. The worst-case probability of inconsistency assuming non-expanding write quorums and instantaneous reads is given by

(6)

Since write quorum expands as the write request propagate to more servers, equation (6) is in fact an upper bound of the inconsistency probability [11].

Our objective in this section is to characterize the exact inconsistency probability for expanding quorums. The read client returns inconsistent data if the first servers that respond to the read request return stale data. A server is considered stale if it replies to the read request before receiving the latest compete version. That is, server is stale if . Denote the first servers that respond to the read request by , where is the server the replies first, is the server that replies second and so on. The event that server is stale is expressed as follows

(7)

where . In order to keep the notation simple, we denote by . The probability that a read returns stale data units of time after that latest version completes is the probability that all servers in return stale data. Thus, the inconsistency probability can be expressed as follows

(8)

We note that characterizing the inconsistency probability exactly is challenging as are dependent, hence we express the inconsistency probability as follows

(9)

In order to find the inconsistency probability, we first need to characterize the PMF of the number of servers in the write quorum units of time after the write completes.

Lemma 3.

The probability mass function of the number of servers in the write quorum units of time, where , after the write completes is given by

(10)
(11)

for , where and .

The proof of Lemma 3 is straightforward, but we provide it in Appendix B for completeness.

In Theorem 2, we provide our main result in which we characterize the inconsistency probability of the widely-used -way replication technique.

Theorem 2 (Inconsistency Probability of Replication-based Systems with ).
  • The worst-case inconsistency probability for the case where and is expressed as follows

    (12)
  • The worst-case inconsistency probability for the case where and is expressed as follows

    (13)
  • The worst-case inconsistency probability for the case where and is expressed as follows

    (14)

The proof of Theorem 2 can be found in Appendix C.

Remark 2.

It can be verified that at , the limit of the inconsistency probability of Theorem 2 as grows is equal to the inconsistency probability assuming instantaneous reads given in (6). That is, we have

(15)
Remark 3.

It is worth noting that the upper bound of the inconsistency probability given in (6) is quite loose. In order to see this, we observe that this bound gives an inconsistency probability of for the case where and also for the case where . Hence, this bound does not differentiate between these two cases.

We show the probability of inconsistency for the different cases in Fig. 3 as a function of .

Fig. 3: The probability of inconsistency for the case where and .
Remark 4 (Asymmetry).

It is worth noting that the inconsistency probability is asymmetric in the write and read quorum sizes and also the write and read mean delays.

Remark 5 (Replication Factor).

While the case of is the typical case in replication-based systems, our approach can also be used to derive the inconsistency probability for general and . In general, there are cases to be considered. For instance, for , the following cases lead to violating the consistency

  1. ,

  2. ,

  3. ,

  4. ,

  5. ,

  6. .

V Conclusion

In this paper, we have studied the consistency-latency trade-off for Dynamo-style replication-based key-value stores analytically and derived a closed-form expression for the inconsistency probability for the -way replication technique. Our study allows fine-tuning of latency and consistency guarantees based on the mean values of the write and read delays of the data store. An immediate future work is to incorporate our tuning policy in a distributed key-value store and evaluate its performance. Extending this study to derive a tight upper bound on the inconsistency probability for any given distributions of the write delays, read delays and acknowledgments delays are also interesting future research directions.

Vi Appendices

Vi-a Proof of Theorem 1

For the case where , we have

where the last equality follows Lemma 1.
For the case were , we have

where . Since are independent and identical exponential random variables, then is an exponential random variable with parameter , where from Lemma 1. Since are independent exponential random variables, from Lemma 2, we have

Vi-B Proof of Lemma 3

Based on Lemma 2

, we can express the probability density function of

(16)

as follows

where . Therefore, from Theorem 1, we can express as follows

where and . Similarly for , we have

Vi-C Proof of Theorem 2

The probability of inconsistency for the case where and can be expressed as follows

Similarly, for the case where and , we have

For the case where , we can express the probability of inconsistency as follows

If , it may happen that as well or and these two cases need to be handled separately. Therefore, we express the inconsistency probability as follows

It is important to note that

Hence, we have

where

(17)
(18)
(19)

and

(20)

Therefore, we can express the probability of inconsistency in this case as follows

Acknowledgment

The author would like to thank Viveck Cadambe and Mohammad Fahim for their helpful comments.

References

  • [1] N. A. Lynch, Distributed algorithms.   Elsevier, 1996.
  • [2] H. Attiya, A. Bar-Noy, and D. Dolev, “Sharing memory robustly in message-passing systems,” Journal of the ACM (JACM), vol. 42, no. 1, pp. 124–142, 1995.
  • [3] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: amazon’s highly available key-value store,” in ACM SIGOPS operating systems review, vol. 41, no. 6.   ACM, 2007, pp. 205–220.
  • [4] A. Lakshman and P. Malik, “Cassandra: a decentralized structured storage system,” ACM SIGOPS Operating Systems Review, vol. 44, no. 2, pp. 35–40, 2010.
  • [5] D. Abadi, “Consistency tradeoffs in modern distributed database system design: Cap is only part of the story,” Computer, vol. 45, no. 2, pp. 37–42, 2012.
  • [6] W. Vogels, “Eventually consistent,” Communications of the ACM, vol. 52, no. 1, pp. 40–44, 2009.
  • [7] D. Malkhi, M. K. Reiter, A. Wool, and R. N. Wright, “Probabilistic quorum systems,” Information and Computation, vol. 170, no. 2, pp. 184–206, 2001.
  • [8] X. Wang, S. Yang, S. Wang, X. Niu, and J. Xu, “An application-based adaptive replica consistency for cloud storage,” in 2010 Ninth International Conference on Grid and Cloud Computing.   IEEE, 2010, pp. 13–17.
  • [9] S. Sakr, L. Zhao, H. Wada, and A. Liu, “Clouddb autoadmin: Towards a truly elastic cloud-based data store,” in 2011 IEEE International Conference on Web Services.   IEEE, 2011, pp. 732–733.
  • [10] H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu, “Data consistency properties and the trade-offs in commercial cloud storage: the consumers’ perspective.” in CIDR, vol. 11, 2011, pp. 134–143.
  • [11] P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica, “Probabilistically bounded staleness for practical partial quorums,” Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 776–787, 2012.
  • [12] H.-E. Chihoub, S. Ibrahim, G. Antoniu, and M. S. Perez, “Harmony: Towards automated self-adaptive consistency in cloud storage,” in 2012 IEEE International Conference on Cluster Computing.   IEEE, 2012, pp. 293–301.
  • [13] ——, “Consistency in the cloud: When money does matter!” in 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.   IEEE, 2013, pp. 352–359.
  • [14] P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica, “Quantifying eventual consistency with PBS,” The VLDB Journal, vol. 23, no. 2, pp. 279–302, 2014.
  • [15] W. Golab, M. R. Rahman, A. AuYoung, K. Keeton, and I. Gupta, “Client-centric benchmarking of eventual consistency for cloud storage systems,” in 2014 IEEE 34th International Conference on Distributed Computing Systems.   IEEE, 2014, pp. 493–502.
  • [16] S. Liu, S. Nguyen, J. Ganhotra, M. R. Rahman, I. Gupta, and J. Meseguer, “Quantitative analysis of consistency in nosql key-value stores,” in International Conference on Quantitative Evaluation of Systems.   Springer, 2015, pp. 228–243.
  • [17] M. McKenzie, H. Fan, and W. Golab, “Fine-tuning the consistency-latency trade-off in quorum-replicated distributed storage systems,” in 2015 IEEE International Conference on Big Data (Big Data).   IEEE, 2015, pp. 1708–1717.
  • [18] S. Chatterjee and W. Golab, “Brief announcement: A probabilistic performance model and tuning framework for eventually consistent distributed storage systems,” in Proceedings of the ACM Symposium on Principles of Distributed Computing.   ACM, 2017, pp. 259–261.
  • [19] M. R. Rahman, L. Tseng, S. Nguyen, I. Gupta, and N. Vaidya, “Characterizing and adapting the consistency-latency tradeoff in distributed key-value stores,” ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 11, no. 4, p. 20, 2017.
  • [20] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms for replicated database maintenance,” in Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, 1987, pp. 1–12.
  • [21] A. Rényi, “On the theory of order statistics,” Acta Mathematica Hungarica, vol. 4, no. 3-4, pp. 191–231, 1953.
  • [22] M. Bibinger, “Notes on the sum and maximum of independent exponentially distributed random variables with different scale parameters,” arXiv preprint arXiv:1307.3945, 2013.