The growing adoption of IoT has led to an ever-increasing number of applications that collect sensitive user data. This growth has come with mounting concerns over protecting the data and the privacy of users. To date, the norm has been that user data is collected and governed by application providers, e.g., Fitbit. The problem with this status quo is that, because data lives in narrow and disjoint silos, it severely limits a user’s ability to control access to her data, extract additional value from it, or otherwise move data across applications. This problem has led many – from both the technical and non-technical communities – to call for new user-centric models for IoT services, in which the storage of user data is decoupled from the application logic, and control over access to this data is in the hands of end users rather than serviceproviders [30, 102, 70, 107, 105].
However, if we are to realize this vision, we need system designs that guarantee the security and privacy of user data while at the same time ensure that users can securely, selectively, and flexibly grant access to their data to third parties, i.e., principals. Realizing such flexible yet secure access control is key if we are to extract insightful value of user data111Note that users can always delegate control to a third party provider just like today - this is permissible, just not the de-facto model., e.g., drive large-scale analytics from IoT data.
Such access control must ideally provide the following properties: (i) strong data confidentiality and integrity, with cryptographic guarantees, accompanied with efficient cryptographic operations. This is particularly essential in the context of resource-constrained IoT devices and the high volumes of time series data they generate. (ii) fine-grained access control; specify who can access what part of a data stream. (iii) no trusted intermediaries; systems today rely heavily on trusted intermediaries, e.g., for delegated access, making them trust bottlenecks. In addition to the above, any solution must satisfy standard requirements for access control, such as support for revocation and (optionally) auditability.
No existing solution simultaneously provides all of the above properties. The de-facto approach in deployments today [74, 9, 34, 53, 95] assumes that the entity that enforces access control – e.g., Fitbit or a storage provider – is within the data owner’s trusted domain and consequently can see data in the clear. However, this approach does not meet our goals of user-centric control (since the provider controls data access); in fact, as many have argued [72, 97, 91, 102, 33, 110, 92, 90, 89], this approach fails to provide even basic privacy since the provider sees data in cleartext and consequently can share or sell data without user authorization [100, 38].
The emerging alternative to the above approach is to ensure end-to-end encryption of data [47, 102, 110, 84]; here data is encrypted at the user device and stored encrypted at the storage provider; encryption/decryption is only executed at authorized parties, without disclosing any secret keys to the storage provider. This, however, introduces the challenge of providing selective sharing of encrypted data. Solutions adopted today for sharing [98, 64, 58] fall short in expressiveness (i.e., allowing fine-grained access policies), flexibility (i.e., updates to access control), and usability (i.e., key management and revocation). For instance, one approach is to encrypt data towards a principal’s public key; this suffers from hard-coded access control  and is not scalable for fine-grained access policies, specially when considering high-volume and high-velocity data streams. Attribute-based encryption [55, 104] is a promising alternative but yet prohibitively expensive in the context of large volumes of time series data (§4). Hence, existing approaches fail either in terms of ensuring security/privacy or are too inefficient to be practical.
The main question, and the focus of this paper, is then: how do we enforce access control in this architecture? A solution to access control has two parts: (i) data protection (e.g., encrypting data such that a principal can only access the authorized data segment), and (ii) authorization (e.g., verifying the identity of a principal and the access permissions).
The canonical authorization approach in today’s systems is to use a standalone authorization service that is decoupled from data protection; i.e., a data owner registers the authorized principals with an authorization service, such as OAuth2 , which serves as a trusted intermediary to issue and later verify access tokens for resources at a service provider. Current authorization frameworks, besides suffering from several vulnerabilities [102, 33] have two key design problems that we address with this work. First, they require users to put unlimited trust in the intermediaries running the services. Few companies dominate this space and also learn about all services users interact. Trusted third parties are, however, inherently prone to compromise , misconduct [38, 18, 15], and collusion/corruption . Second, these schemes leave the enforcement of access control to the service provider. Hence, they do not provide any assurance about data access, as they are decoupled from the underlying data protection. Consequently, users have no guarantees that their data will not be shared against their will, nor that the sharing relationship will remain private. Our system, Droplet, resembles an authorization service; similar to OAuth2, which does not suffer from these limitations.
In this paper, we devise a new system architecture and a crypto-based data access scheme to address the above problems. Droplet builds on two insights; The first is that access control and authorization need to be co-designed for end-to-end encrypted systems. The second insight is that there is a need for decentralized authorization services which operate without relying on trusted intermediaries. Hence, we opt to leverage blockchains; replicated state machines. In contrast to traditional append-only distributed databases, blockchains provide guarantees about the existence and status of a shared state in an environment, where no single trusted intermediary is in charge and control over data is not logically centralized.
While blockchains enable operation in a trustless environment, their use comes with challenges. Blockchains inherently exhibit a high overhead and low capacity due to their consensus protocols (i.e., narrow bandwidth). While read operations are fast, chain-writes are inherently slow. The key challenge is avoiding/bypassing these limitations. We design Droplet such that we store the absolute minimum control metadata in the blockchain and outsource data streams and metadata to off-chain storage, via indirections. We construct the authorization service of Droplet by leveraging a running blockchain to build a replicated access control state machine. Any node can independently bootstrap the authorization state in a decentralized manner and check the access permissions.
To realize the crypto-based access control in Droplet; devices encrypt their data before uploading it to remote storage. Data owners register data streams and securely associate privacy-preserving access permissions through Droplet’s authorization service. Only authorized principals are cryptographically able to access (decrypt) the intended data segments. We design a novel key distribution and management scheme to enable efficient key updates and fine-grained yet scalable sharing of both arbitrary temporal ranges and open-ended streams. Our design builds on key regression and hash trees via a layered encryption technique. Droplet can (optionally) guarantee data integrity such that even the data owner cannot alter data once uploaded. In summary, Droplet ensures data owner’s sovereignty and ownership over their data, such that they maintain the ultimate power to selectively and flexibly share their data with desired parties.
With a prototype implementation222Droplet is available under https://dropletchain.github.io/ of Droplet, we quantify Droplet’s overhead and compare its performance to the state-of-the-art systems. When deploying Droplet with Amazon’s S3 as a storage layer, we experience a slowdown of only 3% in request throughput compared to the vanilla S3. Moreover, we show the potential of Droplet as an authorization service for the server-less computing domain with an AWS Lambda-based prototype. We show Droplet’s performance is within the range of the industry-standard protocol for authorization (OAuth2). Also, we deploy Droplet with a decentralized storage layer to give insights about its potential for the emerging decentralized storage services [64, 98]. With our example apps on top of Droplet, we show that real-world applications with unaltered user-experience (i.e., perceived delay) can be developed. In summary, the contributions of this paper are:
Droplet, the first decentralized authorization service that enables secure sharing of encrypted data and works without trusted intermediaries.
a new crypto-enforced access control scheme that allows flexible and fine-grained access to encrypted data streams.
a design that couples authorization with crypto-enforced access to mitigate the limitations of sharing in end-to-end encryption (i.e., static policies) and current authorization services (i.e., lack of cryptographic guarantees).
an open-source prototype and evaluation of Droplet showing its feasibility and competitive performance.
Internet of Things Data. Today, IoT services control sensitive data (e.g., health, home) with little or no transparency. Enabling secure and transparent sharing and access to data is crucial to the success of the IoT, as data in this space becomes most valuable when processed and fused by external services (e.g., analytical). This is specially relevant for bilateral data sharing (Droplet’s focus), which in contrast to multilateral sharing (e.g., crowd-sourcing, model training, statistical datasets) cannot benefit from group privacy techniques (e.g., differential privacy ). Time series data is deemed as the most pervasive type of data in the IoT space [31, 62, 27]. It is a sequence of time-stamped data points, where time is the primary axis. IoT data is predominantly characterized as immutable time series data, i.e., append-only data records, with a single writer who generates data and multiple readers who consume data. Analytics over such data are often centered around analyzing data within a specific time period attributed to an event (e.g., vital body signs during running). These insights have shaped the design of our crypto-enforced access scheme that is innate to these characteristics (§5.1.1).
Blockchain. Blockchain is a replicated state machine, with a consensus protocol as its foundation that forms agreement over the sequence of updates to a shared state among untrusted participants. The consensus protocol tackles the Byzantine fault tolerance problem, where a given threshold of malicious participants in the network can be tolerated. From a system’s point of view, a blockchain is a distributed append-only global log, without centralized control. Logs are itemized within blocks which are cryptographically chained together via hashes. As each new block contains a hash to the last valid block, blocks form a tamper-evident data structure. On top of the ordered logs, decentralized application logic can be realized [43, 3].
Permissionless blockchains are open to unknown participants and typically run a Proof-of-Work (PoW) probabilistic leader election algorithm , such as the Nakamoto consensus . Permissioned (closed) blockchains have a designated set of authorized validators and use a variant of the practical Byzantine fault tolerance (PBFT)  consensus, which tolerates a malicious behavior by validators among . PBFT can handle a higher transaction throughput compared to PoW blockchains. Due to the high communication overhead of traditional PBFT (in ), only deployments with up to a few tens of validators are practical.
Blockchain Evolution. Permissionless blockchains have to cope with a dynamic set of membership, where anyone can join and leave at any time. Hence, they leverage the expensive PoW to mitigate Sybil attacks, which induces high overhead regarding throughput, latency, and energy footprint. To understand the extent of this overhead, consider Bitcoin as an example; it has a throughput of 7 transactions per second with an average latency of 10 min and finality after 6 blocks . Next-generation blockchains [52, 28, 45, 77, 69, 68] promise higher throughputs and lower latencies, which is crucial for the adoption of blockchain-based systems in retail payments and financial sector, and for realizing large-scale decentralized applications. Recent works [68, 69]
introduce a hybrid consensus by combining the slow PoW to bootstrap the faster PBFT algorithm, where for each epoch a random set of validators is selected. Hence, they bring the best of both worlds: secure open enrollment and high throughput and low latency. These scalable blockchain protocols, e.g., OmniLedger, lay the groundwork enabling practical advanced decentralized services, such as Droplet. Blockchain research is moving at a fast pace to address associated security [68, 51, 66], privacy [88, 26, 56], and scalability issues. Droplet can be deployed on top of any blockchain that supports total ordering of transactions, as elaborated in §5.1.3.
3 Security Model
Threat model. (i) Storage: the threat model addressed by Droplet consists of an honest-but-curious (passive) adversary, who is interested in learning about users data without necessarily being noticed (i.e., it follows the protocol correctly). Our threat model covers malicious storage nodes, potential real-world security vulnerabilities leading to data leakages, and as well external adversaries who gain access to data as a result of system compromise. We also consider an adversary who coerces the storage provider to hand out data without the owner’s consent. Moreover, an adversary can launch a data scraping attack against storage nodes. (ii) Access Permissions State: an adversary may access and bootstrap the access control state machine, but it cannot alter or learn sensitive information about the access permissions (e.g., sharing relationships or keying material). For an adversary to alter the access permission states, it needs to break the security of the underlying blockchain. The general blockchain threat model assumes that an adversary cannot control a defined ratio of nodes in the network, for the blockchain to be considered secure. The actual ratio depends on the deployed consensus protocol by the underlying blockchain. For instance, given total blockchain nodes and adversary nodes, a ratio of for Nakamoto-style consensus mechanisms  or for PBFT consensus mechanisms  is required for the honest majority.
Guarantees. Droplet embodies a decentralized encryption-based access control mechanism that enable secure and selective access to stream data within the above discussed threat model. Data is encrypted at the client-side and keys are never revealed to the storage provider, guaranteeing confidentiality. Decryption keys are only shared with authorized parties via a blockchain-based indirection. Data chunks are digitally signed, allowing parties without decryption keys to verify data authenticity and integrity. Droplet enables checking the freshness of data and it provides data immutability optionally via an authenticated data structure anchored in the blockchain, such that even the data owner can no longer modify past data. Droplet cryptographically prevents evicted users from accessing future data. Though evicted users may have already cached past data, they are however prevented from future access. Droplet encodes user-defined access permissions in the blockchain, eliminating trusted intermediaries and assuring collusion-resistance and auditability. Even malicious institutions cannot illegitimately modify access permissions. Moreover, we employ privacy-preserving access permissions, preventing an observer from learning the identities of the sharing parties. Droplet does not protect against denial-of-service attacks nor does it hide access patterns. It could be extended with ORAM techniques to hide access patterns [63, 96]. Cryptographic techniques alone are not sufficient to prevent a malicious storage provider from denial-of-service or deconstruction of data. Hence, adequate replication strategies on multiple providers are necessary to ensure preservation and availability of data.
Assumptions. In Droplet, we make the following assumptions. We assume the storage nodes to be honest-but-curious, such that they follow the protocol correctly. This is a valid assumption, since the storage node could face financial (and potentially legal) consequences upon detection of misbehavior. We assume the adversaries to be subject to the standard cryptographic hardness and the underlying blockchain to be secure, i.e., similar to previous work [3, 101, 6, 17], we assume transactions are immutable after a confirmation period and the blockchain network to be reliable. We assume users store their keys securely and that key recovery techniques are deployed (we discuss in §7.3 potential recovery techniques, such as Shamir’s secret sharing). We assume data producers to report correct data and to perform data serialization including encryption correctly. We assume there is a financial agreement between the storage provider and data owner to provide persistent storage which can be facilitated through the cryptocurrency feature of the underlying blockchain.
4 Related Work
Droplet’s main objective is to empower users with full control (ownership) over their data while ensuring data confidentiality. We define data ownership as having the right and control over data, wherein the owner can define/restrict access, restrict the scope of data utility (e.g., sharing aggregated/ homomorphically-encrypted data), delegate these privileges, or give up ownership entirely without the need to rely on any trusted entities to facilitate this. A true realization of this definition requires work on two fronts: (i) privacy-preserving computation (i.e., differential privacy and secure computation) and (ii) secure and privacy-preserving access control of remotely stored data with strong confidentiality guarantees. In this work, we focus on the latter, specifically in the context of time series data. We briefly discuss limitations of current solutions in facilitating ownership and make a case for Droplet.
Crypto-enforced Data Access. End-to-end encryption provides the strongest level of protection for data stored in the cloud, as data remains encrypted and only authorized entities are trusted with decryption keys. However, fine-grained access and sharing of data is a challenge here. A simple approach to selective sharing of encrypted data is to encrypt the target data segments towards the principal’s public key; although simple this approach suffers from three drawbacks: (i) hard-coded access control ; at encryption time the access permission is defined and cannot subsequently be altered or revoked, (ii) storage overhead; if the same data is shared with multiple principals, the user ends up storing redundant data as she needs to encrypt the same data under each principal’s public key, and (iii) scalability and practicality issues particularly when considering fine-grained access policies. These drawbacks are pronounced with time series data, where high volume of data is continuously produced and a high key-rotation is necessary to ensure flexible control in encryption-based access control.
Various cryptographic schemes [23, 8] have been introduced to overcome some of these challanges, among which attribute-based encryption (ABE) [55, 102, 87, 54] offers the best expressiveness. ABE encrypts data towards a policy (i.e., associated with a set of attributes), and only principals with the secret key satisfying the policy can decrypt the data. Several ABE-based systems [102, 104] introduce crypto-based access control for remote storage services. However, ABE suffers from expensive crypto operations and the costs grow linearly with the number of attributes, limiting the granularity of access due to computational burdens [50, 2]. The overhead dominates even with a hybrid encryption technique [102, 104], where data is encrypted with fast encryption and only encryption keys are encrypted with the expensive ABE, e.g., only two attributes result in 100 ms for enc/decryption on desktops and few seconds on low-power IoT devices . In Droplet, we opt to design an efficient crypto-based data access mechanism that is tailored for the velocity of data streams and supports scalable fine-grained sharing (§5.1.1).
Signature-based schemes (e.g., public-key certificates [21, 40]) require a centralized, hierarchical network of certification authorities (CA) to issue certificates, which come with their weaknesses . Alternative public-key based approaches, e.g., SPKI/SDSI  and follow-up schemes , eliminate the need for complex X.509 public key infrastructure and CAs. However, these schemes are either based on the idea of local names and suitable for deployments under a single administrative domain (e.g., smart home) or build upon an organically growing trust model . While the key idea of public-key-based schemes underpins Droplet, our system neither suffers from certificate-chain discovery nor requires a complex certificate infrastructure (§5.1.2).
Blockchain-based Systems. Decentralized blockchain-based applications (i.e., without trusted intermediaries) beyond cryptocurrencies have gained more attention in recent years. Example applications include; medical data access , IoT device commissioning and management , financial auditing , name and identity management , software-update transparency and verifiability  and preventing unauthorized certificate issuance . Closest to our work are; Enigma [110, 111] which envisions a decentralized personal data management and secure multi-party computation platform for multilateral sharing. They use a single data encryption key among the sharing parties (i.e., no fine-grained crypto-based access) and require blockchain transactions for each read/write request (i.e., limited scalability). Calypso  introduces on-chain secrets, with associated access policies. A set of trustees collectively enforces the policies via threshold encryption and distributed key generation. None of the above systems addresses the challenge of fine-grained access control for encrypted time series data.
5 Droplet Design
Droplet in a Nutshell. At a high level, Droplet is a decentralized access control system that enables users to securely and selectively share their IoT data streams with principals. Droplet’s design marries a novel crypto-enforced access control scheme tailored for time series data and a decentralized authorization service. Our crypto-enforced access control scheme enables users to express flexible access control policies (§5.1.1). Data is end-to-end encrypted yet can be selectively shared and accessed with our crypto-based data access scheme. The key idea behind our encryption-based access control is to serialize time series data into chunks where each chunk is encrypted with a unique encryption key. This resembles a crypto-based access control, that allows expressing access policies at the chunk granularity. The challenge here becomes how to efficiently generate and manage the large number of unique encryption keys and allow expressing access polices with a minimum shared state that is then used to derive all decryption keys associated with the access policy. To address this specific challenge, we introduce a novel key management scheme (§5.1.1). Although crypto-based access control is powerful it is not suffice by itself, as it does not handle authorization and revocation adequately. To address this issue, we introduce a decentralized authorization service (§5.1.2) that interplays with our crypto-based access control scheme. Our decentralized authorization in its essence is similar to OAuth2, however, we realize the access control state machine on top of a running blockchain (§5.1.3), to eliminate the need for trust intermediates which OAuth2 realizations today heavily depend on. The access control state machine assembles the current global state (i.e., access permissions and data ownership) through embedded private state transitions.
System Overview. As illustrated in Figure 1, our design considers the following four actors: data owner is someone who owns a set of IoT devices (e.g., wearables, appliances, or apps) which produce time series data, i.e., data producers. In an industrial setting, the data owner can be an organization that owns a swarm of IoT devices. The generated data is stored on remote storage services and data owners can decide to selectively expose their data to data consumers (i.e., principals) who can produce an added value from the data (e.g., fuse several streams for prediction tasks). Each principal computes the corresponding decryption keys locally based on an authorization token (i.e., embodies an access policy state) shared through Droplet. Data owner, data producer, and data consumer run Droplet’s client engine, which covers the tasks of data serialization, enc/decryption, and key management. The storage node is in charge of storing data and providing access to principals as defined by the data owner. The storage node grants or denies access requests via Droplet’s decentralized authorization, i.e., in accordance with user-defined access permissions. The storage node can take various forms, such as edge, decentralized (e.g., a node in a p2p storage service ), or cloud storage (e.g., Amazon’s S3). The storage node runs Droplet’s storage engine and can additionally run an instance of Droplet’s authorization service to handle access requests locally. As a matter of fact, anyone can run an instance of Droplet’s authorization service to either expose it as a service or to monitor the state of access permissions. The data owner and data consumer run an instance of Droplet’s decentralized authorization to set/adjust/monitor access permissions. Note that Droplet’s decentralized authorization instances are stateless and can selectively persist relevant access permissions for fast lookup.
We now introduce and discuss different aspects of Droplet. We begin with a simplified description of our system components and gradually converge to the full system design.
5.1 Decentralized Access Control
In the following, we elaborate on our crypto-enforced data access scheme. As the backbone of our crypto-based data access, we present the design of an efficient key-management scheme. Later, we discuss how we manage identities and access permissions for authorization.
5.1.1 Crypto-enforced Data Access
There exist three common types of sharing modalities desired for time series data, varying based on the role and purpose of the data consumer; (i) subscription, where the data consumer is granted continuous access to the data stream as it is generated, either temporarily or until revoked, (e.g., a visualization app rendering an overview of the user’s daily activity based on wearable data), (ii) sharing arbitrary intervals of past data (e.g., a practitioner app accessing and analyzing user’s health data during past pregnancy), and (iii) a combination of i and ii. Droplet supports the above sharing abstractions. To meet our design objective of realizing an efficient fine-grained yet scalable crypto-based access control, we have to accommodate for resource-constraint IoT devices (i.e., computationally limited ) and the large volume of time series data (i.e., vast number of encryption keys).
At a high level, the cryptographic access control in Droplet is based on hybrid encryption. Each data chunk of the data stream is encrypted under a random symmetric key derived from a hash tree. Encryption keys are time encoded and mapped to the corresponding data chunk – we focus on time series data with a stream nature, where data records are generated continuously and serialized using time-based chunking (§5.2). Keys are rotated for each chunk (i.e., epoch key rotation) permitting access permissions at the chunk level. Access policies represent tokens that a principal can use to derive the necessary keys to decrypt data stream segments that they are authorized to access. Note that data owners do not need to define access policies prior to encryption, owners can introduce new and different access polices for various principals at any point of time during the stream lifespan. Droplet allows flexible access policies for individual data consumers, without the need of data re-encryption or introducing redundant data.
The design of our cryptographic access control in its core builds on hash trees  and key regression  to enable expressing stream-specific access policies and efficient management of encryption keys. Both key regression and hash tree support computing a large segment of keys from a single shared state, instead of sharing individual keys. We construct our system, such that the shared state maps to an access policy. We discuss how by combining a hash tree and key regression into a compound key management, we can define expressive access policies for data streams. In the following, we first describe Droplet’s overall key management and the role of hash trees in our design. We then introduce dual key regression which builds on the basic key regression to support bounded interval sharing. As our key management heavily relies on hash chains, we later discuss how to construct compact chains for an efficient and fast key rotation.
Droplet’s Key Management. In the following, we introduce the two key components of Droplet’s key management. We later describe how they come together to create a hybrid key management scheme. We start by describing the role of binary hash trees (BHT) in our system. A BHT is constructed top-down using two cryptographic hash functions for the left - hash(), and right - hash() child nodes. Initially the hash functions are applied to the root node - a secret seed. Afterwards they are successively applied to the left and right outputs (i.e., parent serves as input to the corresponding hash functions to compute the left and right child nodes). BHTs are similar to hash chains in that due to the preimage resistance of cryptographic hash functions, it is computationally intractable to find the parent of a given child node, while the reverse is efficiently computable.
For Droplet’s key management, we construct a hash tree of depth . The leaf nodes deliver the data encryption keys (i.e., via a key derivation function), as depicted in Figure 2. The encryption keys are time-encoded such that each key maps to a well-defined time interval, during which a data chunk is generated. To share any arbitrary interval, the data owner just shares the inner nodes in the BHT necessary to compute the corresponding keys. For instance, in Figure 2, given the two highlighted inner nodes a data consumer is granted access to two disjoint intervals and , and can compute the corresponding decryption keys. Our construction so far, while consistent with our efficiency and low overhead requirements, lacks support for sharing in subscription mode, where data consumers have continuous access to data streams. Realizing this mode of sharing with BHT requires maintaining and sharing a growing state per data consumer.
To overcome this challenge, we combine BHT with dual key regression via a layered encryption technique. Dual key regression resembles a linear chain of keys, where given a constant state, i.e., beginning and end indices, one can compute all the keys in between, as described in the next section. Conceptually, we exploit the hash tree to allow arbitrary sharing of intervals and the dual key regression to support sharing in subscription mode. The layered encryption consists of two steps: (i) the hash tree delivers time-encoded data encryption keys which we use to encrypt data generated during the time epoch . (ii) the dual key regression delivers also time-encoded subscriber encryption keys for the epoch . We use to encapsulate the corresponding data encryption key: . For fast access, each encrypted data chunk holds the encapsulated .
With this construction, we can give access to data encryption keys either via the hash tree (arbitrary intervals) or dual key regression (subscription), as depicted in Figure 3. To a subscriber, s appear as random encryption keys per epoch. For principals with access to past data, s are the leaf nodes of the BHT which they locally compute based on the shared inner nodes (e.g., root nodes of the corresponding subtrees). Note that a principal can be granted access in both modes simultaneously, as illustrated in the example of Figure 3. In this example, the data owner has granted the principal access to the intervals and , where access to the corresponding data encryption keys is realized through the hash tree. Additionally, the principal is granted a subscription from which is realized over dual key regression.
Dual Key Regression. The concept of key regression  relies on hash chains. Given a single hash token, one can derive all previous keys by applying the hash function successively. However, no future keys can be computed (i.e., forward-secrecy). More specifically, given key in time one can compute all keys until the initial key , i.e., . This is not always desirable, since key regression enables sharing of all keys from the beginning until current time (i.e., all-or-none principle). Hence, we design a key management mechanism that enables sharing with a defined lower time bound, e.g., access to data of a particular stream from Jan’18 till revoked. To realize this, we extend key regression with an additional hash chain in the reverse order, to cryptographically enforce both boundaries of the shared interval (Figure 4). In key regression, hash tokens are consumed in the reverse order of chain generation as input to a key derivation function to derive the current key. Due to the pre-image resistance property of hash functions, it is computationally hard to compute future tokens and hence future keys. However, the reverse can be computed efficiently. We leverage this property of hash chains for defining the beginning of an interval through a secondary hash chain in the reverse order, as depicted in Figure 4.
In dual key regression, the KDF takes a second token : ) = , with from the secondary hash chain (Figure 4). For instance, to share a data stream from time to , the user provides the tokens and . Since it is infeasible to compute , no key posterior to can be computed. Conversely, since it is infeasible to compute , no key prior to can be computed. With access to the two hash tokens (, ), indicating the beginning and end of the shared interval, one can compute all the encryption keys within this interval.
Key Distribution. An important aspect to address in crypto-based access control schemes is how to efficiently distribute keys, in our system this is specially tricky for the subscription mode, where new data chunks are arriving continuously and each one is encrypted with a new key. We now describe our key distribution mechanism and refer to §5.1.2 for insights on obtaining the keying material over the decentralized authorization service storage network. When a new data consumer is added, an authorization token resembling the defined access policies is issued which contains either (i) the state to compute decryption keys for past data intervals (i.e., inner nodes of the hash tree) or (ii) in case of sharing in the subscription mode the hash token for the start of the interval (i.e., dual key regression). For the subscription mode the challenge is to give the active set of subscribers continuous access to the latest token (i.e., from the main chain), such that they can compute the current decryption key. If we were to encrypt the current hash token for each subscriber individually, this would incur communication/computation overheads in , given subscribers.
To reduce this overhead, we distribute the latest dual key regression token within a digitally signed and encrypted lockbox. Authorized subscribers obtain the long-term distribution key to open the lockbox. Note that lockbox encryption is significantly more efficient than per subscriber encryption. When sharing access to a data stream, we share the distribution key encrypted for the new subscriber through the authorization service (§5.1.2). While data encryption keys and hence dual key regression tokens are frequently updated at a defined interval, the distribution key is only updated after an access revocation event, as detailed next.
A subscriber decrypts the current data encryption key given the current token and start token as:
with as a hash function. The secondary token is stored along the long-term per principal key information (§5.1.2).
Revocation. To revoke data stream access, the data owner updates the distribution key (i.e., crypto-based access) and issues a state update transaction (i.e., authorization) to evict the revoked service. The transaction includes a new distribution key contained in the encrypted key information per subscriber. Hereafter, the new data encryption key is only available to the remaining authorized subscribers, protected with the new distribution key.
With the newly issued transaction, the global access permission state is updated (§5.1.2). In our access control model, Droplet cryptographically prevents any future access to new data by the evicted subscriber. Any future access requests by the evicted subscriber to old data are declined during authorization. However, we cannot prevent access to data that the user has already cached or stored locally.
Compact Hash Chains. Our key management, specifically dual key regression, relies heavily on hash chains. The underlying chains can grow quickly due to frequent key updates. Due to memory-constraints of IoT devices, a combination of re-computing on demand and storing a segment of the hash chain is desirable, to achieve fast and efficient key rotations. We leverage hierarchical hash chains  which maintain the same security features as traditional hash chains but reduce the worst case compute time to . In our evaluation discussions in §7.1, we show how compact chains allow for a two-orders of magnitude key rotation speed-up.
5.1.2 Decentralized Authorization Service
So far, we covered Droplet’s crypto-based access control mechanism. Now we describe Droplet’s authorization service which handles access permissions. At a high level, through Droplet’s exposed API, users can view their data streams, the associated sharing policies, and storage information, and can set/edit access permissions accordingly. Similar to today’s authorization frameworks, e.g., OAuth, our authorization service acts on behalf of users, forgoing direct interaction of individual services with the data owner. Storage providers query Droplet’s authorization service directly to validate access requests. Moreover, principals query the authorization service to get their authorization token for computing decryption keys.
In our design, we employ a publicly verifiable blockchain to maintain an accountable distributed access control system without a central trusted entity. The reasons why we opt to design an authorization service that requires no trusted intermediaries and instead utilizes blockchain are manifold: (i) resilience against vulnerabilities of trusted intermediaries, (ii) identity management, specifically of relevance to the IoT, (iii) transparent audibility of access permissions by authorized parties, (iv) immutability of data streams after a defined interval, (vi) potential of nano-payments for storage services and data market.
Droplet embeds ownership of data streams and corresponding access permissions in the blockchain transactions. We now describe the owner-device pairing, blockchain encoded access permissions, and how we protect the privacy of principals.
Owner-Device Pairing. The blockchain ecosystem relies on public key cryptography for identification and authentication of the involved principals. The hash digest of the public key serves as a unique pseudo-identity in the network. We leverage this feature to allow IoT devices to securely and autonomously interact with the storage service. This way we overcome the hurdle of passwords and rely on public-key crypto for authentication and authorization. During the bootstrap phase of a new device, it creates a pair of public-private keys locally, where the private key is stored securely (e.g., in the trusted hardware) and never leaves the device. Through an initial two-way multisignature registration transaction on the blockchain, Droplet allows the binding of the IoT device (, ) to the owner (, ). Henceforth, the owner can set access permissions (via the signing key ) and the IoT device is permitted to securely store data (via the signing key ). The necessary keying material for encryption (§5.1.1) on the data producer is also exchanged during the initial phase. Note that the data owner’s signing key is powerful in that it sets/updates access permissions. Hence, a key recovery mechanism must be in place for handling a potential key loss (see §7.3).
In the event of device decommissioning, the new owner must issue a new multisignature device-binding transaction, to gain ownership of future data. Note that there is no need for the IoT device to interact with the blockchain network directly. The owner creates the raw multisignature registration transaction and uses an out-of-band channel (e.g., Bluetooth Low Energy) to get the device’s signature. After adding her signature, she broadcasts the register transaction to the network. During this process, neither the owner’s nor the device’s private keys leave the secure local memory area.
Access Permissions. We utilize the blockchain to store access permissions in a secure, tamperproof, and time-ordered manner. Access permissions are granted per data stream. Initially, the data owner issues a transaction including the stream ID which creates the initial state. To change this state, e.g., grant read access permissions to a principal, the data owner issues a subsequent transaction which holds, among others, (i) the stream ID, (ii) the public key of the principal they want to share their data with, (iii) the temporal scope of access (e.g., intervals of past or open-end subscription), and (iv) encrypted keying material for data decryption (§5.1.1). Note that for public key discovery of principals decentralized identity management solutions, such as Keybase , can be leveraged. Such solutions serve as a key directory that maps online identities (e.g., Twitter, Github, Facebook) to public keys in a publicly auditable manner.
For any request to store or retrieve data, storage nodes query their instance of Droplet’s authorization service (§5.1.3) for the corresponding access permissions, as illustrated in Figure 5. To enforce the permissions, the storage node verifies the identity of the requesting user via a signature-based authentication . Data owners express and dynamically adjust permissions through Droplet which interacts only with the underling blockchain and not with individual services. The authorization service additionally protects storage nodes’ network resources (i.e., bandwidth/memory) from unauthorized users. For instance, this mitigates an attack, where malicious parties flood the network with download/storage requests of large files. The storage node can terminate malicious sessions (e.g., data scraping and storage spamming attacks) after checking the access permissions (§5.1.3). Droplet supports auditing of access permissions by authorized entities, as we explain in the next section.
In public blockchains, users are represented through virtual addresses, providing pseudonymity. However, advanced clustering heuristics can potentially lead to the de-anonymization of users[76, 5]. Access permissions in Droplet should be enforceable by storage nodes (i.e., verify authorization) and be auditable by authorized parties. However, we want to protect the privacy of sharing relationships from the public. To realize this, we leverage dual-key stealth addresses. Stealth addresses  do not require off-blockchain communication (i.e., no out-of-band channel) and provide strong anonymity for the principals that are granted access permissions. Moreover, different streams shared with the same principal are unlinkable. Conceptually, each principal is represented by two public keys (main and viewer keys: , , , ), which are used by other parties to generate a new unlinkable address . The viewer private key can be shared with an auditor to audit the permissions. However, access to both main and viewer private keys is required for data access, i.e., and are needed to compute , of which only the principal is capable.
ACL Indirections. Blockchain storage is scarce and expensive, as it is replicated and maintained by the blockchain network. This entails placing only the minimum necessary logic in the blockchain. To keep the number and more importantly size of transactions as low as possible, our design incorporates off-chain storage of the access control list (ACL), as illustrated in Figure 6. The transaction, instead of holding the address information of all services, just includes an indirection to the ACL via the hash digest of it. This allows managing access permissions with an unlimited number of services in a single transaction. Besides, the ACL can now contain advanced access control logic (e.g., XACML ), such as access groups and delegating parties. Any change to the ACL requires a new transaction. The hash digest serves as a data pointer and more importantly ensures integrity protection of the ACL. The ACL is stored off-chain either in the P2P storage network of Droplet (§5.1.3) or alternative storage services. The time until an access permission change comes into effect is tied to the transaction confirmation time of the underlying blockchain, ranging from few seconds to minutes depending on the blockchain.
5.1.3 Access Control State Machine
Today, there are two main options developers can take for realizing decentralized applications that employ a blockchain as a ubiquitous trust network (i.e., a shared ground truth): (i) operating a new blockchain, or (ii) embedding the application logic into an existing secure blockchain deployment [101, 80]. We opt for the latter where we embed our logic without alternation of the underlying blockchain. This allows us to benefit from the security properties of an existing blockchain and make our design generic. We briefly discuss the reasons why we opt for this choice and detail on how we realize this efficiently.
Integrating a new application logic into a running blockchain typically results in consensus-breaking changes and hard forks , i.e., a new blockchain with only a subset of peers enforcing the new logic. While necessary for specific applications, this results in parallel blockchains which may not exhibit strong security properties due to a smaller network of peers. To benefit from security properties of a strong and robust blockchain, new applications can embed their log of state changes in transactions . This is in turn used to bootstrap the global state in a secure and decentralized manner.
We employ the approach of virtualchain [80, 3] which allows us to embed Droplet’s logic efficiently. A virtualchain is a fork*-consistent replicated state machine, allowing a different application logic to run on top of any production blockchain, without breaking the consensus. While the combination of virtualchain and blockchain is comparable to Ethereum with its built-in scripting language, we explicitly decide to make our design independent of a specific blockchain, such that Droplet can be deployed on the most secure and efficient blockchain of choice. The virtualchain instance in Droplet creates a global state of access permissions based on the blockchain’s totally-ordered and tamper-resistant transaction logs and updates the state according to new state transitions. A virtualchain instance essentially scans the blockchain for the corresponding access permission transactions and maintains the global state in a database that can be queried for access permissions of a given data stream and principal. The virtualchain  incorporates several technical solutions that make maintaining the global state efficient and fast, such as automatic fork-resolution, cross-chain migration, and fast bootstrap via checkpointing and skip lists. Droplet’s decentralized authorization instance is realized employing virtualchains, which anyone can run, either as a storage node to lookup access permissions or just to provide it as a service. Droplet’s virtualchain instances span among each other a peer-to-peer storage network, which we employ to store part of access control metadata, e.g., ACL and keying material. Note that since the integrity of the off-chain metadata is protected via the blockchain, they can also be stored on the data storage.
5.2 Data Serialization
Droplet focuses on time series data with a stream nature, where data records are generated continuously, as depicted in Figure 7. In Droplet’s data model, a data stream is divided in chunks of predefined time intervals; chunking and batching are common techniques for time series data [57, 48, 71, 108]. Instead of storing individual data records, we store data chunks, which are an ordered batch of data records of an arbitrary type (i.e., pairs of timestamp/value). Although chunking prevents random access at the record level, it results into a positive performance gain for data retrieval since in time series data most queries require access to temporally co-located data [57, 108]. E.g., data analytic apps work with temporal data records (e.g., all records of a day).
Encryption. Each data chunk is initially compressed and then encrypted at the source with an efficient symmetric cipher. We rely on AES-GCM, as an authenticated encryption scheme. Note that NIST bounds the use of AES-GCM to encryptions for a given key/nonce pair. Due to our frequent key rotations, we stay far below this threshold. The chunks have a metadata segment containing, among others, the chunk identifier, the owner’s address, hashes to previous chunks (§7), and the stream identifier. The data field contains the encrypted and compressed data records. Services with access to the encryption key can verify the integrity of the chunk and perform an authenticated decryption. To ensure data ownership, for instance towards the storage node, each chunk is also digitally signed. This allows parties without access to the encryption key to still be able to verify the owner of the data stream, albeit at a higher computation cost. In general, digital signature operations are three orders of magnitude slower than symmetric key operations, as discussed in §7.1.
Storage Interface. The storage nodes in Droplet expose a key-value interface, with a common store/get interface with various flavors of get, such as getAll or getRange. For each incoming request, the storage node first verifies the identity of the client (i.e., authentication) and looks up the corresponding access permissions regarding the client’s identity (i.e., authorization). Each request, is accompanied with a universally unique identifier (UUID), defined as the hash of the tuple: owner address, streamID, counter, where streamID is a unique identifier of an owner’s data stream. Traditional indexing for data retrieval cannot be applied here as data chunks are encrypted. Hence, we need to devise a mechanism to perform temporal range queries over encrypted data efficiently. To avoid consistency issues of a shared index, we exploit a simple local lookup mechanism to enable temporal range queries. For a constant lookup time of a record with timestamp , we compute the counter of the chunk holding it based on the known time interval of the chunks: . For instance, we can map the lookup of value 7 in Figure 7 to the identifier of chunk 1. The chunk metadata is included in the initial stream registering transaction, as depicted in Figure 6. Note that the chunk metadata additionally enables freshness checks for chunks, since the chunk interval indicates the frequency and time at which new data chunks are generated.
Strong Data Immutability. While Droplet provides integrity protection via authenticated encryption and digital signatures, the data owner can still modify old data. Specific applications might require a stronger notion of immutability such that even the data owner can no longer modify the data (e.g., contractual agreements in logistics). Droplet enables such notion of immutability through blockchain’s append-only property . The application developer can define a grace period after which data chunks become immutable. For sensitive applications, this can be per chunk. Otherwise, a longer period can be selected. To accommodate for the narrow bandwidth of blockchains, we leverage an anchoring technique, where data immutability transactions are reduced to the level of the grace period. To realize this, the first data chunk holds a pointer to the registration transaction and after the grace period a transaction with a pointer to the latest chunk is issued, as depicted in Figure 8. Since all data chunks are cryptographically linked via hashes, all data chunks in the grace period become immutable at once, forming a chain of data chunks. To avoid a linear verification time, chunks hold hashes to several previous chunks, forming a geometric series. This enables a logarithmic verification time.
5.3 Privacy and Security Analysis
Authorization. For an adversary to alter access permissions in the blockchain, it requires forging a digital signature (i.e., breaking public key cryptography with 128-bit security level) or gaining control over the majority of the computing power in the blockchain network. Existing production blockchains, such as Bitcoin and Ethereum, can be subject to security attacks, such as routing  and selfish mining , which can lead to access permission state update transactions to be dropped, delayed, or excluded.
An adversary is not capable of learning sensitive information from the public blockchain, since only unlinkable pseudo-identities and stream identifiers are stored. In profiling attacks, the adversary creates profiles of all user identifiers and the network of users . An adversary can break the pseudonymity of specific users. Hence, a large body of research aims at concealing identity and relationships in public blockchains while maintaining verifiability [88, 26, 56]. Droplet currently employs dual-key stealth addresses, where the anonymity set is equal to the set of users using nonspendable stealth addresses. A malicious storage node could hand out data without permission or data leakage might take place due to system compromise. However, the impact of this action is limited since data is end-to-end encrypted. The blockchain provides auditable information about when a stream was shared with whom; a crucial piece of information to prove or disprove access right violations should the need arise.
Data Serialization. Data chunks are encrypted, integrity protected, and authenticated. Any data chunk manipulations are detectable with the digital signature and authenticated encryption. The optional data immutability is based on the security and immutability of blockchain. The secure channel (i.e., TLS) for storing and fetching data prevents replay attacks, in addition to ensuring an authenticated and confidential channel. An adversary with access to disclosed encryption keys cannot alter old data, as it requires access to the signing private key (in case additional immutability is not enabled).
|AES Encrypt||SHA Hash||ECDSA Sign|
Our reference implementation of Droplet is composed of three entities implemented in Python: the client engine, the storage-node engine, and the virtualchain. Droplet’s client engine is in charge of composing a data stream and data serialization. It handles viewing, setting, and modifying access permissions. It provides the interface to interact with the storage layer. The client engine is implemented in 1700 sloc. We utilize Pythons’s cryptography library  for our crypto functions. For compression, we use Lepton  for images and zlib  for all other value types.
The storage engine can either run on the cloud or nodes of a p2p storage network. Currently, we have integrated drivers for Amazon’s S3 storage service. On individual nodes, we employ LevelDB, an efficient key-value database . We have as well a realization of Droplet with a serverless computing platform with ASW Lambda serving as the interface to the storage (i.e., S3). Once Lambda is invoked, it performs a lookup in the access control state machine to process the authorization request. For comparison, we implement as well an OAuth2 authorization, based on AWS Cognito . For the distributed storage, we build a DHT-based storage network. We instantiate a Kademlia library  and extend it with the security features of S/Kademlia . The Kademlia protocol runs an asynchronous JSON/RPC over UDP. Our extensions amount to 2400 sloc.
The virtualchain is instantiated from Blockstack  and extended to implement our access control state machine. The virtualchain scans the blockchain, filters relevant transactions, validates the encoded operations, and applies the outcome to the global state. The state is persisted in an SQLite database. The global state can either be queried through a REST API or accessed directly through the SQLite database. Our extensions to the virtualchain amount to 1400 sloc. As the underlying blockchain, we employ a Bitcoin test-network with a low block generation time to emulate a hybrid consensus blockchain  (i.e., ca. 15 s block confirmation).
We now discuss the micro-benchmark evaluation of Droplet functionalities and then present the overall system performance in the end-to-end evaluation. We present the performance of Droplet on top of centralized and decentralized storage layers, i.e., Amazon’s S3 and 1024 DHT nodes running in real time on an emulated network. Evaluating and prototyping Droplet within a decentralized storage setting is an interesting case, as peer-to-peer storage networks could become a viable solution for the IoT . Additionally, this setup resembles storage-oriented blockchains (e.g., Storj , Filecoin ), which still lack adequate mechanisms for secure data sharing, where Droplet can be helpful.
We additionally evaluate the performance of Droplet in a serverless setting and compare it to OAuth2 authorization. Emerging serverless platforms such as Lambda , require request-level authorization . Hence, this is particularly an insightful setting as Droplet serves as an independent Authorization as a Service, which can be particularly useful to the Function as a Service (FaaS) paradigm.
Setup. The serverless setup is composed of Lambda, S3 storage, and AWS Cognito in case of the OAuth2 baseline benchmark. Our setup for the decentralized storage consists of a memory-optimized instance of Amazon’s EC2 (r3.4xlarge, 122 GiB, 16 vCPU), where we run up to 1024 instances of storage nodes. We use netem  to emulate a network roundtrip time of 20 ms between storage node instances. We use one instance of Amazon’s S3 storage service (20 ms latency) for the centralized storage scenario. For the crypto operations, we use four classes of devices: (i) IoT: OpenMotes are equipped with 32-bit ARM Cortex-M3 SoC at 32 MHz, with a public-key crypto accelerator running up to 250 MHz. Fitbit trackers utilize a similar class of microcontrollers; (ii) smartphone: LG Nexus5 equipped with a 2.3 GHz quad-core 64-bit CPU and 2 GiB RAM; (iii) laptop: MacBook Pro equipped with 2.2 GHz Intel i7 and 8 GiB RAM; (iv) Cloud: EC2 t2.micro (1 vCPU, 1 GiB RAM).
Datasets. We validate Droplet on three datasets and quantify the end-to-end overhead: (i) for the Fitbit activity tracker, we use one of the co-author’s data for one year (16 data types, 130 MB). (ii) for the Ava health tracker, we use an anonymized dataset we received from Ava (10 s intervals, 13 sensors, 1.3 GB). (iii) for the ECOviz smart meter dashboard, we use the ECO dataset (1.85 GB) for 6 households over a period of 8 months  (1 Hz accuracy).
We instrument the client engine to perform the micro benchmark operations in isolation with up to 1000 repetitions.
Cryptographic Operations. Table 1 summarizes the costs of the crypto operations involved in Droplet on four different platforms. All these operations, namely AES encryption, SHA hash, and ECDSA signature are performed once per chunk for store requests. For data retrieval, the client does not perform a signature verification, since AES-GCM has built-in authentication. Running the crypto operations only in software on the IoT devices shows the highest cost, with 3.4k encryptions/hashes per sec and only 3.7 signatures per sec. With the onboard hardware crypto, the cost of AES and SHA is improved by one order of magnitude and approaches that of smartphones. Note that overall signatures are three orders of magnitude slower than symmetric key operations.
Crypto-based Access. Hash computations are the basis for dual key regression. The computation takes place at the initial setup and each key update if the client chooses to re-compute keys on-demand rather than store them. Assuming a chain length of 9000 (hourly key updates for one year), it takes 405 ms to compute the entire chain on smartphones and 2.7 s on an IoT device without hardware crypto engine. With compact hash chains, we reduce this worst case compute time to 4.3 and 28.2 ms, respectively. The performance gains become pronounced with smaller epoch intervals. The hash tree induces computations for keys, which amounts to 48 s (laptop) with keys.
The per chunk overhead consists of key computation (hash tree and dual key regression), chunk encryption, key encryption, and signature, which amounts to 1.5 ms (laptop) without caching. Compared to ABE (§4), Droplet’s crypto-based data access is by a factor of 57x faster. E.g., with ABE per chunk overhead with only two attributes (timestamp for temporal access and data type) amounts to 86 ms (laptop).
7.2 System Performance
To model the real-world performance of Droplet, we constructed an end-to-end system setup, where we use our three app datasets. Note that we do not cache any data to emulate worst case scenarios. The stream chunk size is set to 8 KiB. We evaluate get and store requests to the storage layer, which include the overhead of Droplet’s access control.
Serverless Computing. In the serverless setting, Lambda either runs Droplet for the access control or uses the AWS Cognito service, which runs OAuth2, as the baseline. Lambda with both Droplet and Cognito exhibits a latency of around 118 ms (0.4% longer with Droplet). Note that with OAuth2, to reach the same level of access granularity as with Droplet, separate access tokens are required for each data chunk, which is impractical. This is why in practice long-lived and more broadly-scoped access tokens are granted.
Cloud. We extend AWS S3 storage with Droplet and compare its performance against vanilla S3. Figure 9(a) shows the throughput for different request types. We follow Amazon’s guidelines to maximize throughput: for instance, the chunk names are inherently well distributed allowing the best performance of the underlying hash-table lookup. The vanilla S3 throughput of 211 gets/s is within Amazon’s optimal range (100-300). With Droplet, we maintain an average rate of 204 get/s (3% drop). Figure 9(b) shows the latency for individual store and get operations. In Droplet, the latency overhead is 13% for get and 11% for store (incl. crypto). Part of the overhead is due to the expensive signature operation. Also there is an overhead for a fresh lookup of access permissions at the access control DB of the virtualchain instance.
Distributed Storage We measure the performance of get and store requests on a secure DHT with Droplet, with varying network sizes, from 16 to 1024 nodes. Figure 9(a) shows the throughput results. As the number of nodes increases from 16 to 1024, the performance decreases from 142 to 96 get/s. Figure 9(b) shows the latency results, divided into routing and retrieval. The total get latency increases from 76 to 140 ms as the number of nodes grows. This is about 3 times slower than S3’s centralized storage. However, note that this slowdown is dominated by the routing cost. After resolving the address of the storage node with the data chunk, the secure retrieval time is similar to that of S3. Also, note that get requests have a lower routing overhead than store requests. This is because for get requests, the routing process is aborted as soon as a node holding the data chunk is found.
Applications. For our three applications, we measure the overhead of store and get for different views in the app running on top of Droplet. As the storage layer, we discuss the case of the decentralized storage setting with 1024 nodes. Fitbit and Ava rely on a smartphone to store their data. Due to memory constraints, data synchronization is required at least weekly for Fitbit and daily for Ava. This results in an average store latency of 176 ms and 1.2 s for Fitbit and Ava, respectively. Note that store operations run in the background. For different views, the maximum get latency is below 150 ms. Hence, the user experience remains unaltered.
In contrast to Fitbit and Ava, the smart meter node has direct Internet connectivity. Instead of synchronizing periodically, it stores chunks after generation. This takes 176 ms per chunk. The most comprehensive view in the ECOViz dashboard can visualize the entire data stream. Figure 10 shows the latency to fetch chunks dependent on the number of days requested. Fetching data for 128 days of 6 h chunk size takes about 10 s, whereas the one-week size takes less than 1 s.
Blockchain. In Droplet, we inherit the security and reliability properties, as well as the limitations of the underlying blockchain. Consequently, the performance of Droplet, specifically with respect to access permissions updates, is bound to that of the underlying blockchain. In our prototype, the transaction confirmation time is set to 15 s, similar to that of Ethereum. The slow blockchain writes have a direct impact on the time until new access permissions take effect, which is significantly higher compared to OAuth2 protocol. Read-throughput is, however, fast and comparable to that of OAuth2. Data stream registrations and access permission adjustments (e.g., grant/revoke access) require transaction writes. To scale Droplet to billions of data streams, a blockchain throughput of few thousand transactions per sec is necessary (i.e., assuming 25% of streams require an access permission modification per day). While currently deployed blockchains achieve only a fraction of this throughput (e.g., 0.5% with Ethereum), next-generation blockchains promise to close this gap . Note that Droplet only anchors indirections in the blockchain (§5.1.2), as we store ACLs and metadata off-chain, minimizing the memory impact.
7.3 Discussion & Limitations
We highlight some research questions that remain open.
Deployment. Droplet is designed to work and coexist with several storage modalities, from cloud to emerging decentralized storages [64, 98]. Droplet does not map to conventional business models of IoT ecosystems today. In contrary, our design echoes the voice of a large body of research [94, 30, 102, 70, 107, 105] that calls to break the monolithicity dominating today’s deployments. As long as we lack adequate alternative economic models that facilitate horizontal and modular developments, IoT apps will be developed and deployed isolated and vertically, making it hard to fuse data streams from a variety of sources to provide holistic and large-scale analytics.
Beyond IoT. An authorization service with Droplet’s properties is crucial for systems that advocate for data sovereignty [44, 100, 107] or handle privacy-sensitive data, e.g., sharing medical records , and humanitarian aid . The storm of recent privacy incidents [36, 19] has prompted a rethinking of this space. Moreover, decentralized storage services that run on blockchain (e.g., Filecoin) can integrate Droplet for data sharing. Services with varying trust assumptions can, however, run Droplet’s authorization service rather by a federated set of servers (i.e., permissioned blockchain).
Beyond Streams. This paper addresses the challenge of flexible and granular authorization, sharing, and accessing of stream data in an end-to-end encrypted setting without the need for centralized trusted entities to facilitate that. The decentralized authorization service and crypto-enforced access are two key components designed to achieve this particular goal. Our authorization is not tight to stream data and could be coupled with other crypto access control schemes (e.g., ABE) for other types of data, e.g., large files.
Usability. Droplet is a user-centric system that empowers data owners with control over their data. However, this leaves several important usability considerations open. In this design paradigm, the data owner will have to make and manage granular decisions regarding their data. We acknowledge that this is a challenge that needs to be addressed with practical abstractions. Additionally, in an end-to-end encryption model, protection and recovery mechanisms for private master keys should be addressed with adequate solutions. For instance, Shamir’s secret sharing scheme  allows reconstruction of the secret from a set of recovery keys which are, e.g., distributed among the data owner’s personal devices  or a group of friends . The recovery keys can only collectively reconstruct a lost master secret key.
To empower users with full control of their data, this paper introduces Droplet, a decentralized access control system that enables secure, selective, and flexible access control. With Droplet we present a design that marries a decentralized authorization service and a novel encryption-based access control scheme tailored for time series data. Our prototype implementation and experimental results, show the feasibility and applicability of Droplet as a decentralized authorization service for end-to-end encrypted data streams.
-  Adzic, G., and Chatley, R. Serverless Computing: Economic and Architectural Impact. In ACM Symposium on the Foundations of Software Engineering (FSE) (2017).
-  Agrawal, S., and Chase, M. FAME: Fast Attribute-based Message Encryption. In ACM Conference on Computer and Communications Security (CCS) (2017).
-  Ali, M., Nelson, J., Shea, R., and Freedman, M. J. Blockstack: A Global Naming and Storage System Secured by Blockchains. In USENIX Annual Technical Conference (ATC) (2016).
-  Anderson, A., Nadalin, A., Parducci, B., Engovatov, D., Lockhart, H., Kudo, M., Humenn, P., Godik, S., Anderson, S., Crocker, S., et al. eXtensible Access Control Markup Language (XACML) Version 1.0. OASIS (2003).
-  Androulaki, E., Karame, G. O., Roeschlin, M., Scherer, T., and Capkun, S. Evaluating User Privacy in Bitcoin. In Financial Cryptography and Data Security (FC) (2013).
-  Andrychowicz, M., Dziembowski, S., Malinowski, D., and Mazurek, L. Secure Multiparty Computations on Bitcoin. In IEEE Symposium on Security and Privacy (SP) (2014).
-  Apostolaki, M., Zohar, A., and Vanbever, L. Hijacking Bitcoin: Routing Attacks on Cryptocurrencies. In IEEE Symposium on Security and Privacy (SP) (2017).
-  Ateniese, G., Fu, K., Green, M., and Hohenberger, S. Improved Proxy Re-encryption Schemes with Applications to Secure Distributed Storage. In Symposium on Network and Distributed System Security (NDSS) (2005).
-  AWS. Identity and Access Management (IAM). https://aws.amazon.com/iam/.
-  AWS Cognito. https://aws.amazon.com/cognito/.
-  AWS Lambda. https://aws.amazon.com/lambda/.
-  Azaria, A., Ekblaw, A., Vieira, T., and Lippman, A. Medrec: Using Blockchain for Medical Data Access and Permission Management. In IEEE Conference on Open and Big Data (OBD) (2016).
-  Bano, S., Sonnino, A., Al-Bassam, M., Azouvi, S., McCorry, P., Meiklejohn, S., and Danezis, G. Consensus in the age of blockchains. arXiv preprint arXiv:1711.03936 (2017).
-  Baumgart, I., and Mies, S. S/Kademlia: A Practicable Approach Towards Secure Key-based Routing. In IEEE International Conference on Parallel and Distributed Systems (2007).
-  BBC. Fitness App Strava Lights up Staff at Military Bases. Online: http://www.bbc.com/news/technology-42853072, 2018.
-  Beckel, C., Kleiminger, W., Cicchetti, R., Staake, T., and Santini, S. The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. In Proceedings of the ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys) (2014).
-  Bentov, I., and Kumaresan, R. How to use Bitcoin to Design Fair Protocols. In International Cryptology Conference (2014).
-  Biddle, S. Stop Using Unroll.me, right now. It sold your data to Uber. Online: https://theintercept.com/2017/04/24/, 2017.
-  Biggs, J. It’s time to build our own Equifax with blackjack and crypto, 2017. https://techcrunch.com/2017/09/08/its-time-to-build-our-own-equifax-with-blackjack-and-crypto/.
-  Birgisson, A., Politz, J. G., Erlingsson, U., Taly, A., Vrable, M., and Lentczner, M. Macaroons: Cookies with Contextual Caveats for Decentralized Authorization in the Cloud. In Symposium on Network and Distributed System Security (NDSS) (2014).
-  Blaze, M., Feigenbaum, J., and Lacy, J. Decentralized Trust Management. In IEEE Symposium on Security and Privacy (SP) (1996).
-  Blockstack Virtualchain. https://github.com/blockstack/virtualchain.
-  Boldyreva, A., Goyal, V., and Kumar, V. Identity-based Encryption with Efficient Revocation. In ACM Conference on Computer and Communications Security (CCS) (2008).
-  Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J. A., and Felten, E. W. SoK: Research Perspectives and Challenges for Bitcoin and Cryptocurrencies. In IEEE Symposium on Security and Privacy (SP) (2015).
-  Briscoe, B. MARKS: Zero Side Effect Multicast Key Management Using Arbitrarily Revealed Key Sequences. Networked Group Communication 1736 (1999), 301–320.
-  Bünz, B., Bootle, J., Boneh, D., Poelstra, A., Wuille, P., and Maxwell, G. Bulletproofs: Short Proofs for Confidential Transactions and More. In IEEE Symposium on Security and Privacy (SP) (2018).
-  Burkhalter, L., Shafagh, H., Hithnawi, A., and Ratnasamy, S. TimeCrypt: A Scalable Private Time Series Data Store. arXiv preprint arXiv:1811.03457 (2018).
-  Buterin, V., and Griffith, V. Casper the Friendly Finality Gadget. arXiv preprint arXiv:1710.09437 (2017).
-  Castro, M., and Liskov, B. Practical Byzantine Fault Tolerance and Proactive Recovery. ACM Transactions on Computer Systems (TOCS) 20, 4 (2002), 398–461.
-  Chajed, T., Gjengset, J., Van Den Hooff, J., Kaashoek, M. F., Mickens, J., Morris, R., and Zeldovich, N. Amber: Decoupling User Data from Web Applications. In ACM Workshop on Hot Topics in Operating Systems (HotOS) (2015).
-  Challapalli, K. The Internet of Things: A time series data challenge. IBM, Online: http://www.ibmbigdatahub.com/blog/internet-things-time-series-data-challenge, 2014.
-  Chaum, D., and Van Antwerpen, H. Undeniable Signatures. In Conference on the Theory and Application of Cryptology (1989).
-  Chen, E. Y., Pei, Y., Chen, S., Tian, Y., Kotcher, R., and Tague, P. OAuth demystified for Mobile Application Developers. In ACM Conference on Computer and Communications Security (CCS) (2014).
-  Cloud, G. Identity and Access Management (IAM). https://cloud.google.com/iam/.
-  Compression Library zlib. https://zlib.net/.
-  Confessore, N. Cambridge Analytica and Facebook: The Scandal and the Fallout So Far. The New York Times, Online: https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html, April 2018.
-  Courtois, N. T., and Mercer, R. Stealth Address and Key Management Techniques in Blockchain Systems. In International Conference on Information Systems Security and Privacy (ICISSP) (2017).
-  Dredge, S. Yes, those Free Health Apps are Sharing your Data with other Companies. Guardian, Online: theguardian.com/technology/appsblog/2013/sep/03/fitness-health-apps-sharing-data-insurance, 2013.
-  Dropbox JPEG compression. https://github.com/dropbox/lepton.
-  Ellison, C. M., Frantz, B., Thomas, B., Ylonen, T., Rivest, R., and Lampson, B. SPKI Certificate Theory. RFC 2693 (Sep 1999), Online: https://www.ietf.org/rfc/rfc2693.txt, 1999.
-  Erbsen, A., Shankar, A., and Taly, A. Distributed Authorization in Vanadium. arXiv preprint arXiv:1607.02192 (2016).
-  Erlingsson, Ú., Pihur, V., and Korolova, A. Rappor: Randomized Aggregatable Privacy-Preserving Ordinal Response. In ACM Conference on Computer and Communications Security (CCS) (2014).
-  Ethereum. White-Paper. Online: https://github.com/ethereum/wiki/wiki/White-Paper, 2017.
-  European Union. GDPR: Council regulation (EU) no 679/2016. GDPR, Online: http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679&rid=1, 2016.
-  Eyal, I., Gencer, A. E., Sirer, E. G., and Van Renesse, R. Bitcoin-NG: A Scalable Blockchain Protocol. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2016).
-  Eyal, I., and Sirer, E. G. Majority is not Enough: Bitcoin Mining is Vulnerable. In Financial Cryptography and Data Security (FC) (2014).
-  Feldman, A. J., Zeller, W. P., Freedman, M. J., and Felten, E. W. SPORC: Group Collaboration Using Untrusted Cloud Resources. In USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2010).
-  Freedman, M. Time-series data: Why (and how) to Use a Relational Database Instead of NoSQL. Timescale, Online: https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c, 2017.
-  Fu, K., Kamara, S., and Kohno, T. Key Regression: Enabling Efficient Key Distribution for Secure Distributed Storage. In Symposium on Network and Distributed System Security (NDSS) (2006).
-  Garrison, W. C., Shull, A., Myers, S., and Lee, A. J. On the Practicality of Cryptographically Enforcing Dynamic Access Control Policies in the Cloud. In IEEE Symposium on Security and Privacy (SP) (2016).
-  Gervais, A., Karame, G. O., Wüst, K., Glykantzis, V., Ritzdorf, H., and Capkun, S. On the Security and Performance of Proof of Work Blockchains. In ACM Conference on Computer and Communications Security (CCS) (2016).
-  Gilad, Y., Hemo, R., Micali, S., Vlachos, G., and Zeldovich, N. Algorand: Scaling Byzantine Agreements for Cryptocurrencies. In ACM Symposium on Operating Systems Principles (SOSP) (2017).
-  Gollmann, D. Computer Security. John Wiley & Sons, Inc., New York, NY, USA, 1999.
-  Goyal, V., Jain, A., Pandey, O., and Sahai, A. Bounded Ciphertext Policy Attribute Based Encryption. In International Colloquium on Automata, Languages and Programming (ICALP) (2008).
-  Goyal, V., Pandey, O., Sahai, A., and Waters, B. Attribute-based Encryption for Fine-grained Access Control of Encrypted Data. In ACM Conference on Computer and Communications Security (CCS) (2006).
-  Green, M., and Miers, I. Bolt: Anonymous Payment Channels for Decentralized Currencies. In ACM Conference on Computer and Communications Security (CCS) (2017).
-  Gupta, T., Singh, R. P., Phanishayee, A., Jung, J., and Mahajan, R. Bolt: Data Management for Connected Homes. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2014).
-  Handy, P. How Storj Increases Object Storage Security Exponentially. Sorj Blog, Online: https://blog.storj.io/post/145305561698/how-storj-increases-object-storage-security, June 2016.
-  Hardjono, T., and Smith, N. Cloud-based Commissioning of Constrained Devices using Permissioned Blockchains. In Workshop on IoT Privacy, Trust, and Security (2016).
-  Hithnawi, A., Li, S., Shafagh, H., Gross, J., and Duquennoy, S. CrossZig: Combating Cross-Technology Interference in Low-power Wireless Networks. In ACM International Conference on Information Processing in Sensor Networks (IPSN. (2016).
-  Hu, Y.-C., Jakobsson, M., and Perrig, A. Efficient Constructions for One-way Hash Chains. In International Conference on Applied Cryptography and Network Security (ACNS) (2005).
-  InfluxData, I. Modern IoT Data Platform. Online: https://www.influxdata.com/customers/iot-data-platform/, 2017.
-  Jia, Y., Moataz, T., Tople, S., and Saxena, P. OblivP2P: An Oblivious Peer-to-Peer Content Sharing System. In USENIX Security Symposium (USENIX Security) (2016).
-  Juan Benet. IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3). https://github.com/ipfs/papers, 2017.
-  Keybase. Publicly Auditable Proofs of Identity. Online: https://keybase.io/, (accessed May 03, 2018).
-  Kiayias, A., Russell, A., David, B., and Oliynykov, R. Ouroboros: A Provably Secure Proof-of-Stake Blockchain Protocol. In International Cryptology Conference (CRYPTO) (2017).
-  Kokoris-Kogias, E., Alp, E. C., Siby, S. D., Gailly, N., Gasser, L., Jovanovic, P., Syta, E., and Ford, B. CALYPSO: Auditable Sharing of Private Data over Blockchains. Cryptology ePrint Archive:209 https://eprint.iacr.org/2018/209.pdf (2018).
-  Kokoris-Kogias, E., Jovanovic, P., Gailly, N., Khoffi, I., Gasser, L., and Ford, B. Enhancing Bitcoin Security and Performance with Strong Consistency via Collective Signing. In USENIX Security Symposium (USENIX Security) (2016).
-  Kokoris-Kogias, E., Jovanovic, P., Gasser, L., Gailly, N., Syta, E., and Ford, B. Omniledger: A secure, scale-out, decentralized ledger via sharding. In IEEE Symposium on Security and Privacy (SP) (2018).
-  Kolb, J., Chen, K., and Katz, R. H. The Case for a Local Tier in the Internet of Things. In Technical Report No. UCB/EECS-2016-222 (2016).
Lautenschlager, F., Philippsen, M., Kumlehn, A., and Adersberger, J.
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data.In USENIX Conference on File and Storage Technologies (FAST) (2017).
-  Le Blond, S., Cuevas, A., Troncoso-Pastoriza, J. R., Jovanovic, P., Ford, B., and Hubaux, J.-P. On Enforcing the Digital Immunity of a Large Humanitarian Organization. In IEEE Symposium on Security and Privacy (SP) (2018).
-  LevelDB by Google. https://github.com/google/leveldb.
-  Lodderstedt, T., McGloin, M., and Hunt, P. OAuth 2.0 Threat Model and Security Considerations. IETF, RFC 6819 (January 2013).
-  Matsumoto, S., and Reischuk, R. M. IKP: Turning a PKI Around with Decentralized Automated Incentives. In IEEE Symposium on Security and Privacy (SP) (2017).
-  Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G. M., and Savage, S. A Fistful of Bitcoins: Characterizing Payments Among Men with no Names. In ACM Internet Measurement Conference (IMC) (2013).
-  Miller, A., Xia, Y., Croman, K., Shi, E., and Song, D. The Honey Badger of BFT Protocols. In ACM Conference on Computer and Communications Security (CCS) (2016).
-  Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
-  Narula, N., Vasquez, W., and Virza, M. zkLedger: Privacy-Preserving Auditing for Distributed Ledgers. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2018).
-  Nelson, J., Ali, M., Shea, R., and Freedman, M. J. Extending Existing Blockchains with Virtualchain. In Workshop on Distributed Cryptocurrencies and Consensus Ledgers (2016).
-  Netem. https://wiki.linuxfoundation.org/networking/netem.
-  Nikitin, K., Kokoris-Kogias, E., Jovanovic, P., Gailly, N., Gasser, L., Khoffi, I., Cappos, J., and Ford, B. CHAINIAC: Proactive Software-Update Transparency via Collectively Signed Skipchains and Verified Builds. In USENIX Security Symposium (USENIX Security) (2017).
-  Popa, R. A. The Importance of Eliminating Central Points of Attack. Preveil, Online: https://www.preveil.com/blog/importance-eliminating-central-points-attack/, 2017.
-  Popa, R. A., Lorch, J. R., Molnar, D., Wang, H. J., and Zhuang, L. Enabling Security in Cloud Storage SLAs with CloudProof. In USENIX Annual Technical Conference (ATC) (2011).
-  Python Cryptography Library. https://cryptography.io/.
-  Python DHT library (Kademlia). https://github.com/bmuller/kademlia.
-  Sahai, A., and Waters, B. Fuzzy Identity-Based Encryption. In Advances in Cryptology (EUROCRYPT) (2005).
-  Sasson, E. B., Chiesa, A., Garman, C., Green, M., Miers, I., Tromer, E., and Virza, M. Zerocash: Decentralized Anonymous Payments from Bitcoin. In IEEE Symposium on Security and Privacy (SP) (2014).
-  Shafagh, H., Burkhalter, L., Hithnawi, A., and Duquennoy, S. Towards Blockchain-based Auditable Storage and Sharing of IoT Data. In ACM Cloud Computing Security Workshop (CCSW) (2017).
-  Shafagh, H., and Hithnawi, A. Privacy-preserving Quantified Self: Secure Sharing and Processing of Encrypted Small Data. In ACM SIGCOMM 2017 Workshop on Mobility in the Evolving Internet Architecture (MobiArch) (2017).
-  Shafagh, H., Hithnawi, A., Burkhalter, L., Fischli, P., and Duquennoy, S. Secure Sharing of Partially Homomorphic Encrypted IoT Data. In ACM Conference on Embedded Networked Sensor Systems (SenSys) (2017).
-  Shafagh, H., Hithnawi, A., Dröscher, A., Duquennoy, S., and Hu, W. Talos: Encrypted Query Processing for the Internet of Things. In ACM Conference on Embedded Networked Sensor Systems (SenSys) (2015).
-  Shamir, A. How to Share a Secret. Communications of the ACM 22, 11 (1979), 612–613.
-  Shen, C., Singh, R. P., Phanishayee, A., Kansal, A., and Mahajan, R. Beam: Ending Monolithic Applications for Connected Devices. In USENIX Annual Technical Conference (ATC) (2016).
-  Stamp, M. Information Security: Principles and Practice, 2nd ed. Wiley Publishing, 2011.
-  Stefanov, E., and Shi, E. Oblivistore: High Performance Oblivious Cloud Storage. In IEEE Symposium on Security and Privacy (SP) (2013).
-  Tassanaviboon, A., and Gong, G. OAuth and ABE based Authorization in Semi-trusted Cloud Computing. In ACM Workshop on Data Intensive Computing in the Clouds (2011).
-  Techical Report. Filecoin: A Cryptocurrency Operated File Network. http://filecoin.io/filecoin.pdf, 2014.
-  Techical Report. Storj: A Peer-to-Peer Cloud Storage Network. https://storj.io/storj.pdf, 2016.
-  Thielman, S. Your Private Medical Data is for Sale and it is Driving a Business Worth Billions. The Guardian, Online: https://www.theguardian.com/technology/2017/jan/10/medical-data-multibillion-dollar-business-report-warns, 2018.
-  Tomescu, A., and Devadas, S. Catena: Efficient Non-Equivocation via Bitcoin. In IEEE Symposium on Security and Privacy (SP) (2017).
-  Wang, F., Mickens, J., Zeldovich, N., and Vaikuntanathan, V. Sieve: Cryptographically Enforced Access Control for User Data in Untrusted Clouds. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2016).
-  Wang, X., Zhang, J., Schooler, E. M., and Ion, M. Performance Evaluation of Attribute-based Encryption: Toward Data Privacy in the IoT. In IEEE Conference on Communications (ICC) (2014).
-  Yu, S., Wang, C., Ren, K., and Lou, W. Achieving Secure, Scalable, and Fine-grained Data Access Control in Cloud Computing. In IEEE International Conference on Computer Communications (INFOCOM) (2010).
-  Zachariah, T., Klugman, N., Campbell, B., Adkins, J., Jackson, N., and Dutta, P. The Internet of Things Has a Gateway Problem. In Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications (HotMobile) (2015).
-  Zetter, K. Google Discovers Fraudulent Digital Certificate issued for its Domain. https://www.wired.com/2013/01/google-fraudulent-certificate/.
-  Zhang, B., Mor, N., Kolb, J., Chan, D. S., Lutz, K., Allman, E., Wawrzynek, J., Lee, E., and Kubiatowicz, J. The Cloud is Not Enough: Saving IoT from the Cloud. In USENIX HotCloud (2015).
-  Zheng, W., Li, F., Raluca, R. A., Popa, A., and Stoica, I. Minicrypt: Reconciling Encryption and Compression for Big Data Stores. In EuroSys (2015).
-  Zimmermann, P. R. The official PGP user’s guide. MIT press, 1995.
-  Zyskind, G., Nathan, O., and Pentland, A. Decentralizing Privacy: Using Blockchain to Protect Personal Data. In IEEE Security and Privacy Workshops (2015).
-  Zyskind, G., Nathan, O., and Pentland, A. Enigma: Decentralized Computation Platform with Guaranteed Privacy. arXiv (whitepaper) http://www.enigma.co/enigma_full.pdf, 2015.