SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing

The growing adoption of distributed data processing frameworks in a wide diversity of application domains challenges end-to-end integration of properties like security, in particular when considering deployments in the context of large-scale clusters or multi-tenant Cloud infrastructures. This paper therefore introduces SecureStreams, a reactive middleware framework to deploy and process secure streams at scale. Its design combines the high-level reactive dataflow programming paradigm with Intel's low-level software guard extensions (SGX) in order to guarantee privacy and integrity of the processed data. The experimental results of SecureStreams are promising: while offering a fluent scripting language based on Lua, our middleware delivers high processing throughput, thus enabling developers to implement secure processing pipelines in just few lines of code.


page 1

page 2

page 3

page 4


End-to-End Security for Distributed Event-Driven Enclave Applications on Heterogeneous TEEs

This paper presents an approach to provide strong assurance of the secur...

The Art of the Meta Stream Protocol: Torrents of Streams

The rise of streaming libraries such as Akka Stream, Reactive Extensions...

SecureCloud: Secure Big Data Processing in Untrusted Clouds

We present the SecureCloud EU Horizon 2020 project, whose goal is to ena...

Using Trusted Execution Environments for Secure Stream Processing of Medical Data

Processing sensitive data, such as those produced by body sensors, on th...

Reactive Liquid: Optimized Liquid Architecture for Elastic and Resilient Distributed Data Processing

Today's most prominent IT companies are built on the extraction of insig...

FACT-Tools - Processing High-Volume Telescope Data

Several large experiments such as MAGIC, FACT, VERITAS, HESS or the upco...

Enabling Security-Oriented Orchestration of Microservices

As cloud providers push multi-tenancy to new levels to meet growing scal...

1. Introduction

The data deluge imposed by a world of ever-connected devices, whose most emblematic example is the Internet of Things (IoT), has fostered the emergence of novel data analytics and processing technologies to cope with the ever increasing volume, velocity, and variety of information that characterize the big data era. In particular, to support the continuous flow of information gathered by millions of IoT devices, data streams have emerged as a suitable paradigm to process flows of data at scale. However, as some of these data streams may convey sensitive information, stream processing requires support for end-to-end security guarantees in order to prevent third parties accessing restricted data.

This paper therefore introduces SecureStreams, our initial work on a middleware framework for developing and deploying secure stream processing on untrusted distributed environments. SecureStreams supports the implementation, deployment, and execution of stream processing tasks in distributed settings, from large-scale clusters to multi-tenant Cloud infrastructures. More specifically, SecureStreams adopts a message-oriented (Curry, 2005) middleware, which integrates with the SSL protocol (Freier et al., 2011) for data communication and the current version of Intel®’s software guard extensions (SGX) (Costan and Devadas, Costan and Devadas) to deliver end-to-end security guarantees along data stream processing stages. SecureStreams can scale vertically and horizontally by adding or removing processing nodes at any stage of the pipeline, for example to dynamically adjust according to the current workload. The design of the SecureStreams system is inspired by the dataflow programming paradigm (Uustalu and Vene, 2006): the developer combines together several independent processing components (e.g., mappers, reducers, sinks, shufflers, joiners) to compose specific processing pipes. Regarding packaging and deployment, SecureStreams smoothly integrates with industrial-grade lightweight virtualization technologies like Docker (doc, 2017a).

In this paper, we propose the following contributions: (i) we describe the design of SecureStreams, (ii) we provide details of our reference implementation, in particular on how to smoothly integrate our runtime inside an SGX enclave, and (iii) we perform an extensive evaluation with micro-benchmarks, as well as with a real-world dataset.

The remainder of the paper is organized as follows. To better understand the design of SecureStreams, Section 2 delivers a brief introduction to today’s SGX operating mechanisms. The architecture of SecureStreams is then introduced in Section 3. Our implementation choices and an example of a SecureStreams program are reported in Section 4. Section 5 discusses our extensive evaluation, presenting a detailed analysis of micro-benchmark performances, as well as more comprehensive macro-benchmarks with real-world datasets. Some related works to this topic are gathered in Section 6. Finally, Section 7 briefly describes our future work and concludes.

2. SGX Lightning Tour

The design of SecureStreams revolves around the availability of SGX features in the host machines. It consists in a trusted execution environment (TEE) recently introduced into Intel® SkyLake, similar in spirit to ARM TrustZone (arm, 2009) but much more powerful. Applications create secure enclaves to protect the integrity and the confidentiality of the data and the code being executed.

The SGX mechanism, as depicted in Figure 1, allows applications to access confidential data from inside the enclave. The architecture guarantees that an attacker with physical access to a machine will not be able to tamper with the application data without being noticed. The CPU package represents the security boundary. Moreover, data belonging to an enclave is automatically encrypted and authenticated when stored in main memory. A memory dump on a victim’s machine will produce encrypted data. A remote attestation protocol allows one to verify that an enclave runs on a genuine Intel® processor with SGX. An application using enclaves must ship a signed (not encrypted) shared library (a shared object file in Linux) that can possibly be inspected by malicious attackers.

In the current version of SGX, the enclave page cache (EPC) is a area of memory111Future releases of SGX might relax this limitation (McKeen et al., 2016). predefined at boot to store enclaved code and data. At most around can be used by application’s memory pages, while the remaining area is used to maintain SGX metadata. Any access to an enclave page that does not reside in the EPC triggers a page fault. The SGX driver interacts with the CPU to choose which pages to evict. The traffic between the CPU and the system memory is kept confidential by the memory encryption engine (MEE) (Gueron, 2016), also in charge of tamper resistance and replay protection. If a cache miss hits a protected region, the MEE encrypts or decrypts data before sending to, respectively fetching from, the system memory and performs integrity checks. Data can also be persisted on stable storage protected by a seal key. This allows the storage of certificates, waiving the need of a new remote attestation every time an enclave application restarts.

Figure 1. SGX core operating principles.

The execution flow of a program using SGX enclaves is like the following. First, an enclave is created (see Figure 1-➊). As soon as a program needs to execute a trusted function (➋), it executes SGX’s primitive ecall (➌). The call goes through the SGX call gate to bring the execution flow inside the enclave (➍). Once the trusted function is executed by one of the enclave’s threads (➎), its result is encrypted and sent back (➏) before giving back the control to the main processing thread (➐).

3. Architecture

The architecture of SecureStreams comprises a combination of two different types of base components: worker and router. A worker component continuously listens for incoming data by means of non-blocking I/O. As soon as data flows in, an application-dependent business logic is applied. A typical use-case is the deployment of a classic filter/map/reduce pattern from the functional programming paradigm (Bird and Wadler, 1988). In such a case, worker nodes execute only one function, namely map, filter, or reduce. A router component acts as a message broker between workers in the pipeline and transfers data between them according to a given dispatching policy. Figure 2 depicts a possible implementation of this dataflow pattern using the SecureStreams middleware.

Figure 2. Example of SecureStreams pipeline architecture.

SecureStreams is designed to support the processing of sensitive data inside SGX enclaves. As explained in the previous section, the enclave page cache (EPC) is currently limited to . To overcome this limitation, we settled on a lightweight yet efficient embeddable runtime, based on the Lua virtual machine (LuaVM(Ierusalimschy et al., 1996) and the corresponding multi-paradigm scripting language (lua, 2017). The Lua runtime requires only few kilobytes of memory, it is designed to be embeddable, and as such it represents an ideal candidate to execute in the limited space allowed by the EPC. Moreover, the application-specific functions can be quickly prototyped in Lua, and even complex algorithms can be implemented with an almost 1:1 mapping from pseudo-code (Leonini et al., 2009). We provide further implementation details of the embedding of the LuaVM inside an SGX enclave in Section 4.

Each component is wrapped inside a lightweight Linux container (in our case, the de facto industrial standard Docker (doc, 2017a)). Each container embeds all the required dependencies, while guaranteeing the correctness of their configuration, within an isolated and reproducible execution environment. By doing so, a SecureStreams processing pipeline can be easily deployed without changing the source code on different public or private infrastructures. For instance, this will allow developers to deploy SecureStreams to Amazon EC2 container service (??, aws), where SkyLake-enabled instances will soon be made available (ama, 2017), or similarly to Google compute engine (gce, 2017). The deployment of the containers can be transparently executed on a single machine or a cluster, using a Docker network and the Docker Swarm scheduler (doc, 2017c).

The communication between workers and routers leverages ZeroMQ, a high-performance asynchronous messaging library (zer, 2017a). Each router component hosts inbound and outbound queues. In particular, the routers use the ZeroMQ’s pipeline pattern (zer, 2017b) with the Push-Pull socket types.

The inbound queue is a Pull socket. The messages are streamed from a set of anonymous222Anonymous refers to a peer without any identity: the server socket ignores which worker sent the message. Push peers (e.g., the upstream workers in the pipeline). The inbound queue uses a fair-queuing scheduling to deliver the message to the upper layer. Conversely, the outbound queue is a Push socket, sending messages using a round-robin algorithm to a set of anonymous Pull peers—e.g., the downstream workers.

This design allows us to dynamically scale up and down each stage of the pipeline in order to adapt it to application’s needs or the workload. Finally, ZeroMQ guarantees that the messages are delivered across each stage via reliable TCP channels.

We define the processing pipeline components and their chaining by means of Docker’s Compose (doc, 2017b) description language. Listing 1 reports on a snippet of the description used to deploy the architecture in Figure 2. Once the processing pipeline is defined, the containers must be deployed on the computing infrastructure. We exploit the constraint placement mechanisms to enforce the Docker Swarm’s scheduler in order to deploy workers requiring SGX capabilities into appropriate hosts. In the example, an sgx_mapper nodes is deployed on an SGX host by specifying "constraint:type==sgx" in the Compose description.

2  image: "${IMAGE_SGX}"
3  entrypoint: ./ sgx-mapper.lua
4  environment:
5    - TO=tcp://router_mapper_filter:5557
6    - FROM=tcp://router_data_mapper:5556
7    - "constraint:type==sgx"
8  devices:
9    - "/dev/isgx"
12  image: "${IMAGE}"
13  hostname: router_data_mapper
14  entrypoint: lua router.lua
15  environment:
16    - TO=tcp://*:5556
17    - FROM=tcp://*:5555
18    - "constraint:type==sgx"
21  image: "${IMAGE}"
22  entrypoint: lua data-stream.lua
23  environment:
24    - TO=tcp://router_data_mapper:5555
25    - "constraint:type==sgx"
26    - DATA_FILE=the_stream.csv
Listing 1: SecureStreams pipeline examples. Some attributes (volume, networks, env_file) are omitted.

4. Implementation Details

SecureStreams is implemented in Lua (v5.3). The implementation of the middleware itself requires careful engineering, especially with respect to the integration in the SGX enclaves (explained later). However, a SecureStreams use-case can be implemented in remarkably few lines of code. For instance, the implementation of the map/filter/reduce accounts for only lines of code (without counting the dependencies). The framework partially extends RxLua (git, 2017b), a library for reactive programming in Lua. RxLua provides to the developer the required API to design a data stream processing pipeline following a dataflow programming pattern (Uustalu and Vene, 2006).

Listing 2 provides an example of a RxLua program (and consequently a SecureStreams program) to compute the average age of a population by chaining :map, :filter, and :reduce functions.333Note that in our evaluation the code executed by each worker is confined into its own Lua file. The :subscribe function performs the subscription of 3 functions to the data stream. Following the observer design pattern (Szallies, 1997), these functions are observers, while the data stream is an observable.

2 :map(
3   function(person)
4     return person.age
5   end
6 )
7 :filter(
8   function(age)
9     return age > 18
10   end
11 )
12 :reduce(
13   function(accumulator, age)
14     accumulator[count] = (accumulator.count
15       or 0) + 1
16     accumulator[sum] = (accumulator.sum
17       or 0) + age
18     return accumulator
19   end
20 )
21 :subscribe(
22   function(datas)
23     print("Adult people average:",
24       datas.sum / datas.count)
25   end,
26   function(err)
27     print(err)
28   end,
29   function()
30     print("Process complete!")
31   end
32 )
Listing 2: Example of process pipeline with RxLua.

SecureStreams dynamically ships the business logic for each component into a dedicated Docker container and executes it. The communication between the Docker containers (the router and the worker components) happens through ZeroMQ (v4.1.2) and the corresponding Lua bindings (git, 2017a). Basically, SecureStreams abstracts the underlying network and computing infrastructure from the developer, by relying on ZeroMQ and Docker.

Under the SGX threat model where the system software is completely untrusted, system calls are not allowed inside secure enclaves. As a consequence, porting a legacy application or runtime, such as the Lua interpreter, is challenging. To achieve this task, we traced all system calls made by the interpreter to the standard C library and replaced them by alternative implementations that either mimic the real behavior or discard the call. Our changes to the vanilla Lua source code consist of the addition of about lines of code, or of its total size. By doing so, Lua programs operating on files, network sockets or any other input/output device do not execute as they normally do outside the enclaves. This inherent SGX limitation also reinforces the system security guarantees offered to the application developers. The SecureStreams framework safely ships the data and code to enclaves. Hence, the Lua scripts executed within the SGX enclave do not use (read/write) files or sockets. Wrapper functions are nevertheless installed in the SGX-enabled LuaVM to prevent any of such attempts.

An additional constraint imposed by the secure SGX enclaves is the impossibility of dynamically linking code. The reason is that the assurance that a given code is running inside a SGX-enabled processor is made through the measurement of its content when the enclave is created. More specifically, this measurement is the result of EREPORT instruction, an SGX-specific report that computes a cryptographically secure hash of code, data and a few data structures, which overall builds a snapshot of the state of the enclave (including threads, memory heap size, etc.) and the processor (security version numbers, keys, etc.). Allowing more code to be linked dynamically at runtime would break the assurance given by the attestation mechanism on the integrity of the code being executed, allowing for example an attacker to load a malicious library inside the enclave.

In the case of Lua, a direct consequence is the impossibility of loading Lua extensions using the traditional dynamic linking technique. Every extension has to be statically compiled and packed with the enclave code. To ease the development of SecureStreams applications, we statically compiled json (rfc, 2014), and csv (Shafranovich, 2005) parsers within our enclaved Lua interpreter. With these libraries, the size of the VM and the complete runtime still remains reasonably small, approximately ( larger than the original).

Figure 3. Integration between Lua and Intel® SGX.

While this restricted Lua has been adapted to run inside SGX enclaves, we still had to provide a support for communications and the reactive streams framework itself. To do so, we use an external vanilla Lua interpreter, with a couple adaptations that allowed the interaction with the SGX enclaves and the LuaVM therein. Figure 3 shows the resulting architecture. We extend the Lua interface with 3 functions: sgxprocess, sgxencrypt, and sgxdecrypt. The first one forwards the encrypted code and data to be processed in the enclave, while the remaining two provide cryptographic functionalities. In this work, we assume that attestation and key establishment was previously performed. As a result, keys safely reside within the enclave. We plan to release our implementation as open-source.444

5. Evaluation

This section reports on our extensive evaluation of SecureStreams. First, we present our evaluation settings. Then, we describe the real-world dataset used in our macro-benchmark experiments. We then dig into a set of micro-benchmarks that evaluate the overhead of running the LuaVM inside the SGX enclaves. Finally, we deploy a full SecureStreams pipeline, scaling the number of workers per stage, to study the limits of the system in terms of throughput and scalability.

5.1. Evaluation Settings

We have experimented on machines using a Intel® Core™ i7-6700 processor (int, 2017) and RAM. We use a cluster of 2 machines based on Ubuntu 14.04.1 LTS (kernel 4.2.0-42-generic). The choice of the Linux distribution is driven by compatibility reasons with the Intel® SGX SDK (v1.6). The machines run Docker (v1.13.0) and each node joins a Docker Swarm (doc, 2017c) (v1.2.5) using the Consul (con, 2017) (v0.5.2) discovery service. The Swarm manager and the discovery service are deployed on a distinct machine. Containers building the pipeline leverage the Docker overlay network to communicate to each other, while machines are physically interconnected using a switched network.

5.2. Input Dataset

In our experiments, we process a real-world dataset released by the American Bureau of Transportation Statistics (rit, 2017). The dataset reports on the flight departures and arrivals of air carriers (sta, 2017). We implement a benchmark application atop of SecureStreams to compute average delays and the total of delayed flights for each air carrier (cf. Table 1). We design and implement the full processing pipeline, that (i) parses the input datasets (in a comma-separated-value format) to data structure (map), (ii) filters data by relevancy (i.e., if the data concerns a delayed flight), and (iii) finally reduces it to compute the desired information.555This experiment is inspired by Kevin Webber’s blog entry diving into Akka streams: We use the last years of the available dataset (from 2005 to 2008), for a total of millions of entries to process and of data.

System layer Size (LoC)
DelayedFlights app
SecureStreams library
RxLua runtime
Table 1. Benchmark app based on SecureStreams.

5.3. Micro-Benchmark: Lua in SGX

We begin our evaluation with a set of micro-benchmarks to evaluate performance of the integration of the LuaVM

inside the SGX enclaves. First, we estimate the cost of execution for functions inside the enclave. This test averages the execution time of 1 million function calls, without any data transfer. We compare against the same result without SGX. While non-enclaved function calls took

, the performances inside the enclave drop down to on average i.e., approximately two orders of magnitude worse. We then assess the cost of copying data from the unshielded execution to the enclave and we compare it with the time required to compute the same on the native system. We initialize a buffer of with random data and copy its content inside the enclave. The data is split into chunks of increasing sizes. Our test executes one function call to transfer each chunk, until all data is transfered. Each point in the plot corresponds to the average of runs. Correctness of the copies was verified by SHA256 digest comparison between reproduced memory areas.

Figure 4 shows the results for different variants, comparing the native and the SGX version to only copy the data inside the enclave (in) or to copy it inside and copying it back (in/out). When using smaller chunks, the function call overhead plays an important role in the total execution time. Moreover, we notice that the call overhead steadily drops until the chunk size reaches the size of (vertical line). We can also notice that copying data back to non-SGX execution imposes an overhead of at most when compared to the one-way copy. These initial results are used as guidelines to drive the configuration of the streaming pipeline, in particular with respect to the size of the chunks exchanged between the processing stages. The larger the chunks, the smaller the overhead induced by the transfer of data within the SGX enclave.

Figure 4. Execution time to copy of memory inside an SGX enclave (in) or to copy it back outside in/out.

Once the data and the code are copied inside the enclave, the LuaVM must indeed execute the code before returning the control. Hence, we evaluate here the raw performances of the enclaved SGX LuaVM. We select available benchmarks from a standard suite of tests (Bolz and Tratt, 2015). We based this choice on their library dependencies (by selecting the most standalone ones) and the number of input/output instructions they execute (selecting those with the fewest I/O). Each benchmark runs

times with the same pair of parameters of the original paper, shown in the even and odd lines of Table 

2. Figure 5

depicts the total time (average and standard deviation) required to complete the execution of the

benchmarks. We use a bar chart plot, where we compare the results of the Native and SGX modes. For each of the benchmarks, we present two bars next to each other (one per executing mode) to indicate the different configuration parameters used. Finally, for the sake of readability, we use a different y-axis scale for the binarytrees case (from to  s), on the right-side of the figure.

configuration memory ratio
parameter peak SGX/Native
dhrystone 50 K 275 KB 1.14
5 M 275 KB 1.04
fannkuchredux 10 28 KB 0.99
11 28 KB 1.04
nbody 2.5 M 38 KB 0.99
25 M 38 KB 1.00
richards 10 106 KB 1.02
100 191 KB 0.97
spectralnorm 500 52 KB 1.00
5 K 404 KB 0.99
binarytrees 14 25 MB 1.18
19 664 MB 4.76
Table 2. Parameters and memory usage for Lua benchmarks.

We note that, in the current version of SGX, it is required to pre-allocate all the memory area to be used by the enclave. The most memory-eager test (binarytrees) used more than of memory, hence using the wall clock time comparison would not be fair for smaller tests. In such cases, almost the whole execution time is dedicated to memory allocation. Because of that, we subtracted the allocation time from the measurements of enclave executions, based on the average for the runs. Fluctuations on this measurement produced slight variations in the execution times, sometimes producing the unexpected result of having SGX executions faster than native ones (by at most ). Table 2 lists the parameters along with the maximum amount of memory used and the ratio between runtimes of SGX and Native executions. When the memory usage is low, the ratio between the Native and SGX versions is small—e.g., less than  % in our experiments. However, when the amount of memory usage increases, performance drops to almost worse, as reflected in the case of the binarytrees experiment. The smaller the memory usage, the better performance we can obtain from SGX enclaves.

Synthesis. To conclude this series of micro-benchmarks, taming the overhead of secured executions based on SGX requires balancing the size of the chunks transfered to the enclave with the memory usage within this enclave. In the context of stream processing systems, SecureStreams therefore uses reactive programming principles to balance the load within processing stages in order to minimize the execution overhead.

5.4. Benchmark: Streaming Throughput

The previous set of experiments allowed us to verify that our design, implementation, and the integration of the LuaVM into the SGX enclaves is sound. Next, we deploy a SecureStreams pipeline which includes mappers, filters and reducers. To measure the achievable throughput of our system, as well the network overhead of our architecture, we deploy the SecureStreams pipeline in 3 different configurations. In each case, the setup of the pipeline architecture, i.e. the creation of the set of containers, has been done in for the lightest configuration, in for the heaviest one.

The first configuration allows the streaming framework to blindly bypass the SGX enclaves. Further, it does not encrypt the input dataset before injecting it into the pipeline. This mode operates as the baseline, yet completely unsafe, processing pipeline. The second mode encrypts the dataset but lets the encrypted packets skip the SGX enclaves. This configuration requires the deployers to trust the infrastructure operator. Finally, we deploy a fully secure pipeline, where the input dataset is encrypted and the data processing is operated inside the enclaves. The data nodes inject the dataset, split into 4 equally-sized parts, as fast as possible. We gather bandwidth measurements by exploiting Docker’s internal monitoring and statistical module.

Figure 5. Enclave versus native running times for Lua benchmarks.

The results of these deployments are presented in Figure 6. For each of the mentioned configurations, we also vary the number of workers per stage, from one (Figure 6-a,d,g), two (Figure 6-b,e,h), or four (the remaining ones.) We use a representation based on stacked percentiles. The white bar at the bottom represents the minimum value, the pale grey on top the maximal value. Intermediate shades of grey represent the 25th-, 50th–, and 75th-percentiles. For instance, in Figure 6-a (our baseline) the median throughput at into the experiment almost hits , meaning that

of the nodes in that moment are outputting data at

or less. The baseline configuration, with only worker per stage, completes in , with a peak of . By doubling the number of workers reduces the processing time down to (Figure 6-d), a speed-up of . Scaling up the workers to 4 in the baseline configuration (Figure 6-g) did not produce a similar speed-up.

As we start injecting encrypted datasets (Figure 6-b and follow-up configurations with 2 and 4 workers), the processing time almost doubles (). The processing of the dataset is done after the messages are decrypted. We also pay a penalty in terms of overall throughput—i.e., the median value rarely exceeds . On the other hand, now we observe substantial speed-ups when increasing the workers per stage, down to and with and workers, respectively.

The deployment of the most secure set of configurations (right-most column of plots in Figure 6) shows that when using encrypted datasets and executing the stream processing inside SGX enclaves one must expect longer processing times and lower throughputs. This is the (expected) price to pay for higher-security guarantees across the full processing pipeline. Nevertheless, one can observe that the more workers the less penalty is imposed by the end-to-end security guarantees provided by SecureStreams.

(a) Streaming throughput. Data in clear text, no SGX, 1 worker per stage.
(b) Streaming throughput. Encrypted data, no SGX, 1 worker per stage.
(c) Streaming throughput. Encrypted data, processing SGX, 1 worker per stage.
(d) Streaming throughput. Data in clear text, no SGX, 2 workers per stage.
(e) Streaming throughput. Encrypted data, no SGX, 2 workers per stage.
(f) Streaming throughput. Encrypted data, processing SGX, 2 workers per stage.
(g) Streaming throughput. Data in clear text, no SGX, 4 workers per stage.
(h) Streaming throughput. Encrypted data, no SGX, 4 workers per stage.
(i) Streaming throughput. Encrypted data, processing SGX, 4 workers per stage.
Figure 6. Throughput comparison between normal processing (with cleartext data and no encryption), encrypted data but without enclaves, and with encrypted data and SGX processing. We scale the number of worker nodes per stage from (left-most column), (center colum) and (right-most column).
Figure 7. Scalability: processing time, average and standard deviation. The experiment is repeated 5 times, with a variation on the number of workers for each stage, each worker using SGX.
Figure 8. Scalability: processing time, average and standard deviation. The experiment is repeated 5 times, with a variation on the number of mappers SGX, other workers—1 filter worker and 1 reduce worker—do not use SGX.

5.5. Benchmark: Workers’ Scalability

To conclude our evaluation, we study SecureStreams in terms of scalability. We consider a pipeline scenario similar to Figure 2 with some variations in the number of workers deployed for each stage. We do so to better understand to what extents the underlying container scheduling system can exploit the hardware resources at its disposal.

First, we increase the number of workers for each stage of the pipeline, from to . For each of the configurations, the experiment is repeated times. We present average and standard deviation of the total completion time to process the full dataset in Figure 7. As expected, we observe ideal speed-up from a configuration using worker to that using workers. However, in the configuration using workers by stage, we do not reach the same acceleration. We explain this because, in this latter case, the number of deployed containers (which equals the sum of input data streams, workers, and routers, hence containers) is greater than the number of physical cores of the hosts ( for each of the hosts used in our deployment—i.e., cores on our evaluation cluster).

We also study the total completion time while increasing only the number of mapper workers in the first stage of the pipeline (which we identified as the one consuming most resources) from to and maintaining the numbers of filters and reducers in the following stages constant. As in the previous benchmark, the experiment is repeated times for each configuration and we measure the average and standard deviation of the total completion time. Figure 8 presents the results. Here again, we observe ideal speed-up until the number of deployed containers reaches the number of physical cores. Beyond this number, we do not observe further improvements. These two experiments clearly show that the scalability of SecureStreams according the number of deployed workers across the cluster is primarily limited by the total number of physical cores available.

Apart from this scalability limitation, there are other factors that reduce the observed streaming throughput, with or without involving the SGX enclaves. For instance, our throughout experiments highlight that the system does not manage to saturate the available network bandwidth in all cases. We believe this behaviour can be explained by the lack of optimizations in the application logic as well as possible tuning options of the inner ZeroMQ queues.

As part of our future work, we therefore plan to further investigate these effects and to build on this knowledge to only scale the appropriate workers in order to maximize the overall speed-up of the deployed application. In particular, we intend to leverage the elasticity of workers at runtime in order to cope with the memory constraints imposed by SGX and the configuration of the underlying hardware architecture, for each of the available nodes, in order to offer the best performances for secured execution of data stream processing applications built atop of SecureStreams.

6. Related Work

Spark (Zaharia et al., 2013) has recently gained a lot of traction as prominent solution to implement efficient stream processing. It leverages Resilient Distributed Datasets (RDD) to provide a uniform view on the data to process. Despite its popularity, Spark only handles unencrypted data and hence does not offer security guarantees. Recent proposals (Shah et al., 2016) study possible software solutions to overcome this limitation.

Several big industrial players introduced their own stream processing solutions. These systems are mainly used to ingest massive amounts of data and efficiently perform (real-time) analytics. Twitter’s Heron (Kulkarni et al., 2015), and Google’s Cloud DataFlow (Akidau et al., 2015) are two prominent examples. These systems are typically deployed on the provider’s premises and are not offered as a service to end-users.

A few dedicated solutions exist today for distributed stream processing using reactive programming. For instance, Reactive Kafka (rea, 2017) allows stream processing atop of Apache Kafka (apa, 2017a; Kreps et al., 2011). These solutions do not, however, support secure execution in a trusted execution environment.

More recently, some open-source middleware frameworks (e.g., Apache Spark (apa, 2017c), Apache Storm (apa, 2017b), Infinispan (inf, 2017)) introduced APIs to allow developers to quickly set up and deploy stream processing infrastructures. These systems rely on the Java virtual machine (JVM) (Lindholm et al., 2014). However, SGX currently imposes a hard memory limit of 128 MB to the enclaved code and data, at the cost of expensive encrypted memory paging mechanisms and serious performance overheads (Pires et al., 2016; Brenner et al., 2016) when this limit is crossed. Moreover, executing a fully-functional JVM inside an SGX enclave would currently involve significant re-engineering efforts.

DEFCon (Migliavacca et al., 2010) relies also on the JVM. This event processing system focuses on security by enforcing constraints on event flows between processing units. The event flow control is enforced using application-level virtualisation to separate processing units in a ad-hoc JVM.

A few recent contributions tackle privacy-preserving data processing, particularly in a MapReduce scenario. This is the case of Airavat (Roy et al., 2010) and Gupt (Mohan et al., 2012). These systems leverage differential-privacy techniques (Dwork et al., 2006) and can face a different threat model than the one supported by SGX and hence by SecureStreams. In particular, when deploying such systems on a public infrastructure, one needs to trust the cloud provider. Our system greatly reduces the trust boundaries, and only requires trust of Intel® and their SGX implementation.

Some authors contest that public clouds may be secure enough some parts of an application. They propose to split the jobs, running only the critical parts in private clouds. A privacy-aware framework on hybrid clouds (Xu and Zhao, 2015) has been proposed to work on tagged data, at different granularity levels. A MapReduce preprocessor splits data into private and public clouds according to their sensitivity. Sedic (Zhang et al., 2011) does not offer the same tagging granularity, but proposes to automatically modify reducers to optimize the data transfers in a hybrid cloud. These solutions require splitting application and data in two parts (sensitive and not) and impose higher latencies due to data transfers between two different clouds. Yet, they cannot offer better security guarantees that the software stack itself offers, be it public or private.

MrCrypt (Tetali et al., 2013) proposes using homomorphic encryption instead of trusted elements. Through static code analysis, it pinpoints different homomorphic encryption schemes for every data column. Still, some of the demonstrated benchmarks are ten times slower than the unecrypted execution. SecureStreams avoids of complex encryption schemes, decrypts data entering enclaves and processes in plaintext.

The Styx (Stephen et al., 2016) system uses partial homorphic encryption to allow for efficient stream processing in trusted cloud environments. Interestingly, the authors of that system mention Intel® SGX as possible alternative to deploy stream processing systems on trusted hardware offered by untrusted/malicious cloud environments. SecureStreams offers insights on the performances of exactly this approach.

To best of our knowledge, SecureStreams is the first lightweight and low-memory footprint stream processing framework that can fully execute within SGX enclaves.

As we described before, SecureStreams is executing processes taking advantage of SGX enclaves inside Docker containers. SCONE (Pietzuch et al., 2016), which is not yet openly available, is a recently introduced system that offers a secure container mechanism for Docker to leverage the SGX trusted execution support. It proposes a generic technology to embed any C program to execute inside an SGX enclave. Rather than generic programs, SecureStreams offers support to execute a lightweight LuaVM inside an SGX enclave and securely execute chunks of Lua code inside it. In our experiments, we execute this LuaVM inside Docker containers.

7. Conclusion

Secure stream processing is becoming a major concern in the era of the Internet of Things and big data. This paper introduces our design and evaluation of SecureStreams, an concise and efficient middleware framework to implement, deploy and evaluate secure stream processing pipelines for continuous data streams. The framework is designed to exploit the SGX trusted execution environments readily available in Intel®’s commodity processors, such as the latest SkyLake. We implemented the prototype of SecureStreams in Lua and based its APIs on the reactive programming approach. Our initial evaluation results based on real-world traces are encouraging, and pave the way for deployment of stream processing systems over sensitive data on untrusted public clouds.

We plan in our future work to further extend and thoroughly evaluate SecureStreams against other known approaches on secure stream processing, like Styx (Stephen et al., 2016), MrCrypt (Tetali et al., 2013) or DEFCon (Migliavacca et al., 2010). In particular, we plan to extend SecureStreams with full automation of container deployments, as well as enriching the framework with a library of standard stream processing operators and efficient yet secure native plugins, to ease the development of complex stream processing pipelines.


The research leading to these results has received funding from the European Commission, Information and Communication Technologies, H2020-ICT-2015 under grant agreement number 690111 (SecureCloud project). Rafael Pires is also sponsored by CNPq, National Counsel of Technological and Scientific Development, Brazil.


  • (1)
  • ?? (aws) Amazon EC2 Container Service.
  • arm (2009) 2009. Building a Secure System using TrustZone® Technology. ARM Limited (2009).
  • rfc (2014) The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159. DOI: 
  • ama (2017) Amazon EC2 and SkyLake CPUs.
  • apa (2017a) Apache Kafka.
  • apa (2017b) Apache Storm.
  • con (2017) Consul.
  • sta (2017) Data Expo’09 ASA Statistics Computing and Graphics.
  • doc (2017a) Docker.
  • doc (2017b) Docker Comopse.
  • doc (2017c) Docker Swarm.
  • gce (2017) Google Compute Engine and SkyLake CPUs.
  • inf (2017) Infinispan.
  • int (2017) Intel® Core™ i7-6700.˙00-GHz.
  • lua (2017) Lua.
  • git (2017a) Lua binding to ZeroMQ.
  • git (2017b) Reactive Extensions for Lua.
  • rea (2017) Reactive Streams for Kafka.
  • rit (2017) RITA | BTS.˙Delay/OT˙DelayCause1.asp.
  • apa (2017c) Spark Streaming.
  • zer (2017a) ZeroMQ.
  • zer (2017b) ZeroMQ Pipeline.
  • Akidau et al. (2015) Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1792–1803. DOI: 
  • Bird and Wadler (1988) Richard Bird and Philip Wadler. 1988. Introduction to Functional Programming. Prentice Hall.
  • Bolz and Tratt (2015) Carl Friedrich Bolz and Laurence Tratt. 2015. The impact of meta-tracing on VM design and implementation. Science of Computer Programming 98 (2015), 408–421.
  • Brenner et al. (2016) Stefan Brenner, Colin Wulf, David Goltzsche, Nico Weichbrodt, Matthias Lorenz, Christof Fetzer, Peter Pietzuch, and Rüdiger Kapitza. 2016. SecureKeeper: Confidential ZooKeeper Using Intel® SGX. In Proceedings of the 17th International Middleware Conference (Middleware ’16). ACM, 14:1–14:13. DOI: 
  • Costan and Devadas (Costan and Devadas) Victor Costan and Srinivas Devadas. Intel® SGX explained. Technical Report. Cryptology ePrint Archive, Report 2016/086, 2016.
  • Curry (2005) Edward Curry. 2005. Message-Oriented Middleware. John Wiley & Sons, Ltd, 1–28.
  • Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference. Springer, 265–284.
  • Freier et al. (2011) Alan Freier, Philip Karlton, and Paul Kocher. 2011. The secure sockets layer (SSL) protocol version 3.0. (2011).
  • Gueron (2016) Shay Gueron. 2016. A Memory Encryption Engine Suitable for General Purpose Processors. IACR Cryptology ePrint Archive 2016 (2016), 204.
  • Ierusalimschy et al. (1996) Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes Filho. 1996. Lua—An Extensible Extension Language. Software: Practice and Experience 26, 6 (June 1996), 635–652.
  • Kreps et al. (2011) Jay Kreps, Neha Narkhede, Jun Rao, and others. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB. 1–7.
  • Kulkarni et al. (2015) Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 239–250. DOI: 
  • Leonini et al. (2009) Lorenzo Leonini, Étienne Rivière, and Pascal Felber. 2009. SPLAY: Distributed Systems Evaluation Made Simple (or How to Turn Ideas into Live Systems in a Breeze). In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI’09). USENIX Association, Berkeley, CA, USA, 185–198.
  • Lindholm et al. (2014) Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java Virtual Machine Specification: Java SE 8 Edition. Pearson Education.
  • McKeen et al. (2016) Frank McKeen, Ilya Alexandrovich, Ittai Anati, Dror Caspi, Simon Johnson, Rebekah Leslie-Hurd, and Carlos Rozas. 2016. Intel® Software Guard Extensions (Intel® SGX) Support for Dynamic Memory Management Inside an Enclave. In Proceedings of the Hardware and Architectural Support for Security and Privacy 2016 (HASP 2016). ACM, New York, NY, USA, Article 10, 9 pages. DOI: 
  • Migliavacca et al. (2010) Matteo Migliavacca, Ioannis Papagiannis, David M. Eyers, Brian Shand, Jean Bacon, and Peter Pietzuch. 2010. DEFCON: High-performance Event Processing with Information Security. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’10). USENIX Association, Berkeley, CA, USA, 1–1.
  • Mohan et al. (2012) Prashanth Mohan, Abhradeep Thakurta, Elaine Shi, Dawn Song, and David Culler. 2012. GUPT: Privacy Preserving Data Analysis Made Easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA, 349–360. DOI: 
  • Pietzuch et al. (2016) P. R. Pietzuch, S. Arnautov, B. Trach, F. Gregor, T. Knauth, A. Martin, C. Priebe, J. Lind, D. Muthukumaran, D. O’Keeffe, M. Stillwell, D. Goltzsche, D. Eyers, K. Rüdiger, and C. Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016. USENIX.
  • Pires et al. (2016) Rafael Pires, Marcelo Pasin, Pascal Felber, and Christof Fetzer. 2016. Secure Content-Based Routing Using Intel® Software Guard Extensions. In Proceedings of the 17th International Middleware Conference (Middleware ’16). ACM, New York, NY, USA, Article 10, 10 pages. DOI: 
  • Roy et al. (2010) Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and Privacy for MapReduce. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10). USENIX Association, Berkeley, CA, USA, 20–20.
  • Shafranovich (2005) Yakov Shafranovich. Common Format and MIME Type for Comma-Separated Values (CSV) Files. RFC 4180. DOI: 
  • Shah et al. (2016) S. Y. Shah, B. Paulovicks, and P. Zerfos. 2016. Data-at-rest security for Spark. In 2016 IEEE International Conference on Big Data (Big Data). 1464–1473. DOI: 
  • Stephen et al. (2016) Julian James Stephen, Savvas Savvides, Vinaitheerthan Sundaram, Masoud Saeida Ardekani, and Patrick Eugster. 2016. STYX: Stream Processing with Trustworthy Cloud-based Execution. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC ’16). ACM, New York, NY, USA, 348–360. DOI: 
  • Szallies (1997) Constantin Szallies. 1997. On using the observer design pattern. XP-002323533,(Aug. 21, 1997) 9 (1997).
  • Tetali et al. (2013) Sai Deep Tetali, Mohsen Lesani, Rupak Majumdar, and Todd Millstein. 2013. MrCrypt: Static analysis for secure cloud computations. ACM Sigplan Notices 48, 10 (2013), 271–286.
  • Uustalu and Vene (2006) Tarmo Uustalu and Varmo Vene. 2006. The Essence of Dataflow Programming. Springer Berlin Heidelberg, Berlin, Heidelberg, 135–167. DOI:˙5 
  • Xu and Zhao (2015) Xiangqiang Xu and Xinghui Zhao. 2015. A Framework for Privacy-Aware Computing on Hybrid Clouds with Mixed-Sensitivity Data. In IEEE International Symposium on Big Data Security on CloudIEEE International Symposium on Big Data Security on Cloud. IEEE, 1344–1349.
  • Zaharia et al. (2013) Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). ACM, New York, NY, USA, 423–438. DOI: 
  • Zhang et al. (2011) Kehuan Zhang, Xiaoyong Zhou, Yangyi Chen, XiaoFeng Wang, and Yaoping Ruan. 2011. Sedic: privacy-aware data intensive computing on hybrid clouds. In Proceedings of the 18th ACM conference on Computer and communications security. ACM, 515–526.