Bucket Oblivious Sort: An Extremely Simple Oblivious Sort

by   Gilad Asharov, et al.

We propose a conceptually simple oblivious sort and oblivious random permutation algorithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivious sort uses 6nlog n time (measured by the number of memory accesses) and 2Z client storage with an error probability exponentially small in Z. The above runtime is only 3× slower than a non-oblivious merge sort baseline; for 2^30 elements, it is 5× faster than bitonic sort, the de facto oblivious sorting algorithm in practical implementations.



page 1

page 2

page 3

page 4


Sorting by Prefix Block-Interchanges

We initiate the study of sorting permutations using prefix block-interch...

Is this the simplest (and most surprising) sorting algorithm ever?

We present an extremely simple sorting algorithm. It may look like it is...

Rotational analysis of ChaCha permutation

We show that the underlying permutation of ChaCha20 stream cipher does n...

An algebraic 1.375-approximation algorithm for the Transposition Distance Problem

In genome rearrangements, the mutational event transposition swaps two a...

A Curious Link Between Prime Numbers, the Maundy Cake Problem and Parallel Sorting

We present new theoretical algorithms that sums the n-ary comparators ou...

Searching and Sorting with O(n^2) processors in O(1) time

The proliferation of number of processing elements (PEs) in parallel com...

On the Joint Typicality of Permutations of Sequences of Random Variables

Permutations of correlated sequences of random variables appear naturall...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the increased use of outsourced storage and computation, privacy of the outsourced data has been of paramount importance. A canonical setting is where a client with a small local storage outsources its encrypted data to an untrusted server. In this setting, encryption alone is not sufficient to preserve privacy. The access patterns to the data may reveal sensitive information.

Two fundamental building blocks for oblivious storage and computation [GO96, GM11, SS13] are oblivious sorting and oblivious random permutation. In these two problems, an array of elements is stored on an untrusted server, encrypted under a trusted client’s secret key. The client wishes to sort or permute the elements in a data-oblivious fashion. That is, the sequence of accesses it makes to the server should not reveal any information about the elements (e.g., their relative ranking). The client has a small amount of local storage, the access pattern to which cannot be observed by the server. This work presents simple and efficient algorithms to these two problems, named bucket oblivious sort and bucket oblivious random permutation.

1.1 State of the Affairs

For oblivious sort, it is well-known that one can leverage sorting networks such as AKS [AKS83] and Zig-zag sort [Goo14] to obliviously sort elements in time. Unfortunately, these algorithms are complicated and incur enormous constants rendering them completely impractical. Thus, almost all known practical implementations [SS13, LWN15, NWI15] instead employ the simple bitonic sort algorithm [Bat68]. While asymptotically worse, due to the small leading constants, bitonic sort performs much better in practice.

Oblivious random permutation (ORP) can be realized by assigning a sufficiently long random key to each element, and then obliviously sorting the elements by the keys. To the best of our knowledge, this remains the most practical solution for ORP. It then follows that while algorithms exist in theory, practical instantiations resort to the bitonic sort. There exist algorithms such as the Melbourne shuffle [OGTU14] that do not rely on oblivious sort; but they require client storage to permute elements. Other approaches include the famous Thorp shuffle [CV14] and random permutation networks [Czu15], but none of these solutions are competitive in performance either asymptotically or concretely.

1.2 Our Results

Let be a statistical security parameter that controls the error probability. Our bucket oblivious sort runs in time ( for bucket ORP) and has an error probability around when the client can store elements locally. This is at most slower than the non-oblivious merge sort, and is at least faster than bitonic sort for (cf. Table 1). Therefore, we recommend bucket oblivious sort and bucket ORP as attractive alternatives to bitonic sort in practical implementations.

Table 1: Runtime of bucket oblivious sort and classic non-oblivious and oblivious sort algorithms. Bitonic sort requires comparisons. The number of comparisons for AKS sort and zig-zag sort are cited from [Goo14]. Runtime represents the number of memory accesses, which is four times the number of comparisons.
Figure 1: Oblivious random bin assignment with 8 buckets. The MergeSplit procedure takes elements from two buckets at level and put them into two buckets at level , according to the -th most significant bit of the keys. At level , every consecutive buckets are semi-sorted by the most significant bits of the keys.
Figure 1: Oblivious random bin assignment with 8 buckets. The MergeSplit procedure takes elements from two buckets at level and put them into two buckets at level , according to the -th most significant bit of the keys. At level , every consecutive buckets are semi-sorted by the most significant bits of the keys.

The core of our algorithms is to assign each element to a random bin and then route the elements through a butterfly network to their assigned random bins. This part is inspired by Bucket ORAM [FNR15]. In more detail, we divide the elements into buckets of size each and add dummy elements to each bucket. Now, imagine that these buckets form the inputs of a butterfly network — for simplicity, assume is a power of two. Each element is uniformly randomly assigned to one of the output buckets, represented by a key of bits. The elements are then routed through the butterfly network to their respective destinations. Assuming the client can store two buckets locally at a time, at level , the client simply reads elements from two buckets that are distance away in level and writes them to two adjacent buckets in level , using the -th bit of each element’s key to make the routing decision. We refer readers to Figure 1 for a graphical illustration.

The above algorithm is clearly oblivious, as the order in which the client reads and writes the buckets is fixed and independent of the input array. If no bucket overflows, all elements reach their assigned destinations. By setting appropriately, we can bound the overflow probability.

Our bucket oblivious sort and bucket ORP algorithms are derived from the above oblivious random bin assignment building block.

From oblivious random bin assignment to ORP and oblivious sort.

To obtain a random permutation, we simply remove all dummy elements and randomly permute each bucket of the final layer. Since the client can hold elements, permuting each bucket can be done locally. We show that the algorithm is oblivious and gives a random permutation despite revealing the number of dummy elements in each destination bucket. To get oblivious sort, we can first perform ORP on the input array then apply any non-oblivious, comparison-based sorting algorithm (e.g., quick sort or merge sort). We show that the composition of ORP and non-oblivious sort results in an oblivious sort.

Dealing with small client storage.

In Section 4.1, we extend our algorithms to support client storage. We can rely on bitonic sort to realize the MergeSplit operation that operates on 4 buckets at a time, which would result in runtime.


Algorithmic performance when the data is stored on disk has been studied in the external disk model (e.g., [RW94, AFGV97, Vit01, Vit06]) and references within). Recently, Asharov et al. [ACN19] extended this study to oblivious algorithms. We discuss how our algorithms can be made locality-friendly in Section 4.3.

2 Preliminaries

Notations and conventions.

Let denote the set . Throughout this paper, we will use to denote the size of the instance and use to denote the security parameter. For an ensemble of distributions (parametrized with ), we denote by a sampling of an instance from the distribution . We say two ensembles of distributions  and  are -statistically-indistinguishable, denoted , if for any unbounded adversary ,

Random-access machines.

A RAM is an interactive Turing machine that consists of a memory and a CPU. The memory is denoted as

, and is indexed by the logical address space . We refer to each memory word also as a block and we use to denote the bit-length of each block. The memory supports read/write instructions , where , and . If , then and the returned value is the content of the block located in logical address in the memory. If , then the memory data in logical address is updated to . We use standard setting that (so a word can store an address).

Obliviousness. Intuitively, a RAM program obliviously simulates a RAM program if: (1) it has the same input/output behavior as ; (2) There exists a simulator that produces access pattern that is statistically close to the access pattern of , i.e., it can simulate all memory addresses accessed by during the execution on , without knowing

. In case the access pattern and the functionality are randomized, we have to consider the joint distribution of the simulator and the output of the RAM program or the functionality.

For a RAM machine and input , let denote the distribution of memory addresses a machine produces on an input .

Definition 2.1.

A RAM algorithm obliviously implements the functionality with -obliviousness if the following hold:

If , we say is perfectly oblivious.

The two main functionalities that we focus on in this paper are the following:

Oblivious sort:

This is a deterministic functionality in which the input is an array of memory blocks (i.e., each , representing a key). The goal is to output an array which is some permutation of the array , i.e., , such that .

Oblivious permutation:

This is a randomized functionality in which the input is an array of memory blocks. The functionality chooses a random permutation and outputs an array such that for every .

3 Our Construction

We first present the oblivious random bin assignment algorithm (Section 3.1) and then use it to implement our bucket oblivious random permutation (Section 3.2) and bucket oblivious sort (Section 3.3).

Algorithm 3.1: Oblivious Random Bin Assignment
Input: an array of size
Choose a bucket size and let be the smallest power of two that is .
Define arrays, each containing buckets of size . Denote the -th bucket of the -th array .
For each element in , assign a uniformly random key in .
Evenly divide into groups. Put the -th group into

and pad with dummy elements to have size

for  do
     for  do
          Input: -th pair of buckets with distance in ; Output: -th pair of buckets in
     end for
end for
Output: .
function  MergeSplit()
      receives all real elements in where the -st MSB of the key is
      receives all real elements in where the -st MSB of the key is
     If either or receives more than real elements, the procedure aborts with overflow
     Pad and to size with dummy elements and return
end function

3.1 Oblivious Random Bin Assignment

The input to the oblivious random bin assignment algorithm is an array of elements. The goal is to obliviously and uniformly randomly distribute the elements into a set of bins. Each element is assigned to independent random bin, and elements are then routed into the bins obliviously.

The algorithm first chooses a bucket size , which can be set to the security parameter . Then, it constructs buckets each of size . Without loss of generality, assume is a power of — if not, pad it to the next power of 2. Note that the algorithm introduces dummy elements, and the output is twice the size of the input array.

Figure 1 gives a graphic illustration of the algorithm for 8 input buckets and Algorithm 3.1 gives the pseudocode. Each element in is assigned a random key in which represents a destination bucket. Next, the algorithm repeatedly calls the MergeSplit subroutine to exchange elements between bucket pairs in levels to distribute elements into their destination buckets. The operation involves four buckets at the time, distributing the elements in the two input buckets and into two output buckets and . receives all the keys with -th most significant bit (MSB) as 0 and receives all the keys with -th MSB as 1.

For now, assume the client can locally store two buckets. For each MergeSplit, it reads (and decrypts) the two input buckets, swaps elements in the two buckets according to the above rule, and writes to the two output buckets (after re-encryption). It is then easy to see that Algorithm 3.1 is oblivious since the order in which the client reads and writes the buckets is fixed and independent of the input array.

When no bucket overflows, all real elements are correctly put into their assigned bins. We now show that the probability of overflow is exponentially small in . Intuitively, this is because each bucket contains (in expectation) half dummy elements that serve as a form of “slack” to disallow overflow.

Lemma 3.2.

Overflow happens with at most probability.


Consider a bucket at level . Observe that this bucket can receive real elements from initial buckets, each containing real elements. For each such element, we have chosen an independent and uniformly random key; the element reaches only when the most significant bits of its key match , which happens with exactly probability. A Chernoff bound shows that overflows with less than probability. Hence, a union bound over all levels and all buckets shows that overflow happens with less than probability. ∎

3.2 Bucket Oblivious Random Permutation

After performing the oblivious random bin assignment, ORP can be simply achieved as follows: scan the array and delete dummy elements from each bin (note that within each bin it is guaranteed that the real elements appear before the dummy elements). Then obliviously permute each bin and finally concatenate all bins. We have:

Lemma 3.3.

Bucket ORP oblivious implement the permutation functionality except for probability.


We first describe the simulator. The access pattern of the oblivious bin assignment algorithm is deterministic and the same for every input, where the overflow even is independent of the input itself. Therefore, it is easy to simulate the bin assignment. The simulator then pretends to simulate the randomly permuting of each bin. Then, the simulator chooses random loads , where is the load of the real elements in the th bin. This is done by simply throwing elements into bins (“in the head”). If there is some for which then the simulator aborts. The removal of the dummy elements is equivalent to the revealing of these loads.

Clearly, are distributed the same as in the real execution. The only difference between the simulated access pattern and the real one is in the case where the algorithm aborts as a result of an overflow before the last level, which occurs with at most probability.

We next show that the output of the algorithm is a random permutation, conditioned on the access pattern. As we previously described, it is actually enough to condition on the vector of random loads

. We show that given any such vector, all permutations are equally likely.

Fix a particular load . The algorithm works by first assigning the real elements into the bins, and then permuting within each bin. For every input, there are exactly ways to distribute the real elements into the bins while achieving the vector of loads . Then, each bin is individually permuted, i.e., within each bin , we have different possible ordering. Overall, the total number of possible outputs with that load is then

That is, even conditioned on some specific loads , all permutations are still equally likely. Therefore, , , and

Our algorithm fails to implement the ORP only when some bin overflows during the oblivious random bin assignment, which happens with probability by Lemma 3.2. ∎

3.3 Bucket Oblivious Sort

Once we have ORP, it is easy to achieve oblivious sort: just invoke any non-oblivious comparison-based sort after ORP.

Since the functionality is deterministic, it is enough to consider separately correctness and simulation. Correctness follows from directly from the correctness of the ORP and the non-oblivious sort. As for obliviousness, given any input array, one can easily simulate the algorithm by first randomly permuting the array and then running the comparison-based non-oblivious sort. The access patterns of a comparison-based sort depend only on the relative ranking of the input elements, which is independent of the input array once the array has been randomly permuted.

3.4 Efficiency

We analyze the efficiency of our algorithms and compare them to classic non-oblivious oblivious sorting algorithms in Table 1. We measure runtime using the number of memory accesses the clients needs to perform on the server.

For our algorithms, assuming the client can store elements locally, each -sized array is read and written once and there are of them. So oblivious bin assignment and bucket ORP run in (less than) time. Note that the last step of ORP, i.e., permuting each output bucket, can be incorporated with the last level of oblivious bin assignment. Bucket oblivious sort additionally invokes a non-oblivious sort, and thus runs in time. This is within of merge sort and beats bitonic sort when is moderately large; for example, faster than bitonic for . For an overflow probability of and most reasonable values of , suffices.

4 Extensions

4.1 Extension to Constant Client Storage

We now discuss how to extend our algorithms to the case where the client can only store elements locally.

Each MergeSplit can be realized with a single invocation of bitonic sort. Concretely, we first scan the two input buckets to count how many real elements should go to buckets vs. , then tag the correct number of dummy elements going to either buckets, and finally perform a bitonic sort.

Next, we need to permute each output bucket obliviously with local storage. This can be done as follows. First, assign each element in a bucket a uniformly random label of bits. Then, obliviously sort the elements by their random labels using bitonic sort. Since the labels are “short” (i.e., logarithmic in size), we may have collisions with probability for some constant , in which case we simply retry. In expectation, it succeeds in trials.

Since we invoke instances of bitonic sort on elements at each level, the runtime is roughly .

4.2 Better Asymptotic Performance

Our algorithms can also be extended to have better asymptotic performance. For this instantiation, we use a primitive called oblivious tight compaction. Oblivious tight compaction receives elements each marked as either 0 or 1, and outputs a permutation of the elements such that all elements marked 0 appear before the elements that are marked 1. It should not be hard to see that oblivious tight compaction can be used to achieve MergeSplit. Using the -client-storage and -time oblivious tight compaction construction from [AKL18], bucket oblivious sort achieves runtime and client storage. Setting , bucket oblivious sort achieves runtime, client storage, and a negligible in error probability.

4.3 Locality

Algorithmic performance when the data is stored on disk has been studied in the external disk model (e.g., [RW94, AFGV97, Vit01, Vit06]) and references within). Recently, Asharov et al. [ACN19] extended this study to oblivious algorithms. In this setting, an algorithm is said to have locality if it has access to disks and accesses in total discontiguous memory regions in all disks combined. As an example, it is not hard to see that merge sort is a non-oblivious sorting algorithm that sorts an array of size in and -locality, whereas quick sort is not local for any reasonable . This locality metric is motivated by the fact that real-world storage media such as disks support sequential accesses much faster than random seeks. Thus an algorithm that makes mostly sequential accesses would execute much faster in practice than one that makes mostly random accesses — even if the two have the same runtime in a standard word-RAM model.

Guided by this new metric, Asharov et al. [ACN19] consider how to design oblivious algorithms and ORAM schemes that achieve good locality. Since sorting is one of the most important building blocks in the design of oblivious algorithms, inevitably Asharov et al. [ACN19] show a locality-friendly sorting algorithm. Concretely, they show that there is a specific way to implement the bitonic sort meta-algorithm, such that the entire algorithm requires accessing distinct memory regions (i.e., as many as the depth of the sorting network) require only 2 disks to be available — in other words, the algorithm achieves -locality.

We observe that our algorithm, when implemented properly, is a locality-friendly oblivious sorting algorithm. Our algorithm outperforms Asharov et al. [ACN19]’s scheme by an almost logarithmic factor improvement in locality. To achieve this, the crux is to implement all instances of MergeSplit in the same layer of the butterfly network while accessing a small number of discontiguous regions. Specifically, the MergeSplit operation works on 4 buckets at a time, while reading two buckets from the input layer, and writing to two consecutive buckets in the output layer. Moreover, the different invocations of MergeSplit on the same layer deal with consecutive buckets. By carefully distributing the buckets among the different disks, and by using bitonic sort while implementing the MergeSplit operation, we conclude:

Corollary 4.1.

There exists a statistically oblivious sort algorithm which, except with probability, completes in work and with ) locality.


  • [ACN19] Gilad Asharov, T-H Hubert Chan, Kartik Nayak, Rafael Pass, Ling Ren, and Elaine Shi. Locality-preserving oblivious RAM. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 214–243. Springer, 2019.
  • [AFGV97] Lars Arge, Paolo Ferragina, Roberto Grossi, and Jeffrey Scott Vitter. On sorting strings in external memory (extended abstract). In

    ACM Symposium on the Theory of Computing (STOC ’97)

    , pages 540–548, 1997.
  • [AKL18] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, Enoch Peserico, and Elaine Shi. OptORAMa: optimal oblivious RAM. Cryptology ePrint Archive, 2018.
  • [AKS83] Miklós Ajtai, János Komlós, and Endre Szemerédi. An sorting network. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pages 1–9. ACM, 1983.
  • [Bat68] Kenneth E Batcher. Sorting networks and their applications. In Proceedings of the April 30–May 2, 1968, spring joint computer conference, pages 307–314. ACM, 1968.
  • [CV14] Artur Czumaj and Berthold Vöcking. Thorp shuffling, butterflies, and non-markovian couplings. In ICALP (1), volume 8572 of Lecture Notes in Computer Science, pages 344–355. Springer, 2014.
  • [Czu15] Artur Czumaj. Random permutations using switching networks. In STOC, pages 703–712. ACM, 2015.
  • [FNR15] Christopher W Fletcher, Muhammad Naveed, Ling Ren, Elaine Shi, and Emil Stefanov. Bucket ORAM: Single online roundtrip, constant bandwidth oblivious RAM. Cryptology ePrint Archive, 2015.
  • [GM11] Michael T Goodrich and Michael Mitzenmacher. Privacy-preserving access of outsourced data via oblivious RAM simulation. In International Colloquium on Automata, Languages, and Programming, pages 576–587. Springer, 2011.
  • [GO96] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious rams. Journal of the ACM, 43(3):431–473, 1996.
  • [Goo10] Michael T Goodrich. Randomized Shellsort: A simple oblivious sorting algorithm. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1262–1277. SIAM, 2010.
  • [Goo14] Michael T Goodrich. Zig-zag sort: A simple deterministic data-oblivious sorting algorithm running in time. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 684–693. ACM, 2014.
  • [LWN15] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi. ObliVM: A programming framework for secure computation. In Symposium on Security and Privacy. IEEE, 2015.
  • [NWI15] Kartik Nayak, Xiao Shaun Wang, Stratis Ioannidis, Udi Weinsberg, Nina Taft, and Elaine Shi. GraphSC: Parallel secure computation made easy. In Symposium on Security and Privacy. IEEE, 2015.
  • [OGTU14] Olga Ohrimenko, Michael T Goodrich, Roberto Tamassia, and Eli Upfal. The melbourne shuffle: Improving oblivious storage in the cloud. In International Colloquium on Automata, Languages, and Programming, pages 556–567. Springer, 2014.
  • [RW94] Chris Ruemmler and John Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17–28, 1994.
  • [SS13] Emil Stefanov and Elaine Shi. Oblivistore: High performance oblivious cloud storage. In Symposium on Security and Privacy. IEEE, 2013.
  • [Vit01] Jeffrey Scott Vitter. External memory algorithms and data structures. ACM Comput. Surv., 33(2):209–271, 2001.
  • [Vit06] Jeffrey Scott Vitter. Algorithms and data structures for external memory. Foundations and Trends in Theoretical Computer Science, 2(4):305–474, 2006.