In cloud event processing, data generated at the edge is processed in real-time by cloud resources. Both distributed stream processing (DSP) and Function-as-a-Service (FaaS) have been proposed to implement such event processing applications. FaaS emphasizes fast development and easy operation, while DSP emphasizes efficient handling of large data volumes. Despite their architectural differences, both can be used to model and implement loosely-coupled job graphs. In this paper, we consider the selection of FaaS and DSP from a cost perspective. We implement stateless and stateful workflows from the Theodolite benchmarking suite using cloud FaaS and DSP. In an extensive evaluation, we show how application type, cloud service provider, and runtime environment can influence the cost of application deployments and derive decision guidelines for cloud engineers.READ FULL TEXT VIEW PDF
The increasing degree of data generation at the edge, e.g., by web clients or IoT devices, has led to a growing demand for live data and event processing in the cloud [1, 2, 3]. Today, the most popular paradigms for this are distributed stream processing (DSP) and Function-as-a-Service (FaaS) [4, 5]. In both paradigms, data processing applications are modeled as loosely-coupled graphs of data operations.
In DSP, this graph is a network of operators, deployed on a stream processing engine running on a distributed cluster of compute nodes. The stream processing engine partitions incoming data across nodes for horizontal scalability, hence, parallelizing the data processing workflow for the developer [1, 6]. Typical examples of stream processing engines include Apache Flink111https://flink.apache.org/ and Google Cloud Dataflow222https://cloud.google.com/dataflow/ [7, 8]. FaaS platforms, e.g., AWS Lambda333https://aws.amazon.com/lambda/ and Google Cloud Functions444https://cloud.google.com/functions/, allow developers to deploy small, stateless functions on managed infrastructure that are billed per invocation and run duration. These functions can also be chained to build larger applications, e.g., through synchronous invocations or asynchronously by sharing state in a database [9, 10]. The managed approach promises high elasticity and scalability for developers and allows cloud service providers to allocate their infrastructure more efficiently [11, 12].
Despite their architectural differences, both DSP and FaaS can be used to model the loosely-coupled job graphs that underlie cloud data processing [13, 5]. Beyond some qualitative concerns, different billing models introduce a cost dimension that should be taken into account when designing data processing applications and choosing between paradigms [14, 11]. In this paper, we quantify this cost dimension through cost benchmarking  to let application developers and cloud engineers make more informed decisions when designing event processing applications. We make the following contributions:
We present an application-centric benchmark with both stateful and stateless applications for cost-benchmarking of DSP and FaaS environments (Section 3).
In experiments, we analyze the impact of processing paradigm, type of application, execution environment, and choice of cloud provider on the cost of an event processing application deployment (Section 4).
We provide decision guidelines for application developers based on our quantitative data (Section 5).
We discuss the limitations of our work and derive avenues for future work (Section 6).
We make our implementation available as open-source555https://github.com/pfandzelter/cloud-event-processing-costs to enable other researchers and practitioners to conduct their own experiments.
While the concept of cloud computing is well-established in both research and industry, paradigms for cloud applications are constantly evolving. In this section, we give an overview of distributed stream processing and Functions-as-a-Service, two of today’s most common cloud data processing paradigms [1, 5], and introduce the related terminology.
Most distributed stream processing engines extend the well-known MapReduce pattern  with support for processing continuous data streams. In modern DSP engines, developers define dataflow graphs (called pipelines or jobs) of operators using a declarative programming model [7, 6]. Prominent examples for DSP engines are the open source projects Apache Flink , Apache Samza , and Apache Kafka Streams , or cloud services such as Google Cloud Dataflow . Apache Beam666https://beam.apache.org/ is a framework providing a unified programming model  to define dataflow graphs, which can be executed by many stream processing engines.
DSP engines are deployed as clusters of multiple instances (e.g., on different computing nodes). To enable horizontal scalability, data streams between operators are partitioned and operators are scheduled on multiple instances, where each operator instance processes only a portion of the data. The key idea is that state should only be maintained locally in an operator instance. Additionally, DSP engines often use periodic checkpointing and require durable, replayable data sources to ensure fault tolerance.
While stream processing engines have traditionally been operated as long-running clusters on virtual machines, they are now often deployed in standalone, cloud-native applications. In particular, containerization techniques and Kubernetes, the de-facto standard for container orchestration , are used to reduce the operational complexity when running DSP jobs at large scale; managed Kubernetes services are provided on all major cloud platforms. In addition to a (fixed) cluster management fee, users of such services are billed variable cost for the allocated VMs or, more recently, for the actual resource usage of containers.
In the FaaS programming model, developers deploy applications in the form of individual functions to a FaaS platform that handles event-driven code invocation and horizontal scaling. Function infrastructure is completely handled by the cloud service provider, i.e., “serverless”, and consumers pay per request based on the resources consumed by a function [12, 22]. Functions can be implemented in a number of programming languages and can be invoked by web requests, IoT sensor readings, database updates, and even other functions, so-called function chaining [23, 24].
A key element to horizontal scalability is that function instances logically exist only for the duration of a single invocation and do not support any state beyond that execution . To support stateful applications, functions usually leverage serverless, pay-per-request cloud datastores such as Google Cloud Firestore777https://cloud.google.com/firestore/ or AWS DynamoDB888https://aws.amazon.com/dynamodb/ [25, 26].
In combination with lightweight virtualization techniques, such as containers or microVMs [27, 28], FaaS platforms can quickly spin up new and destroy old function instances, enabling rapid elasticity. The low management burden for the consumer and the wide range of possible applications are clear advantages for developers. For cloud service providers, the fine-grained execution of functions enables a more efficient allocation of their infrastructure .
Both FaaS and DSP can be used to build cloud event processing applications. To quantify the cost dimension of the decision between the two paradigms when building such an application, we introduce a new application-centric cost benchmark that can be applied to any cloud processing paradigm. The proposed benchmark comprises an application implementing two example use-cases, which could easily be extended, the load generator which creates requests for the application, and its configuration. The system under test (SUT) in each benchmark is either a FaaS or a DSP platform. In this section, we present our proposed benchmark and the methodology for executing it.
Our example application is derived from the Theodolite suite of stream processing scalability benchmarks . Both use-cases are designed for Industrial Internet of Things (IIoT) event processing in the context of a smart factory, where sensors at the edge produce large amounts of data that require real-time event processing in the cloud . We chose both a stateless and a stateful use-case in order to quantify the impact of state management on application cost.
The first use-case (UC1) persists incoming data in a cloud database. Such an operation is often required for archiving events and making them accessible to other applications. As shown in Fig. 1, incoming events are first transformed to match the data format, required by the database API, and then written to that database system. Since each data item is treated individually, no state is maintained within the application.
As a second use-case, we chose a sliding window data aggregation (UC2). Such aggregations are used in many scenarios, e.g., to derive a smoothed trend. As illustrated in Fig. 2, incoming data is first windowed in a sliding window. Data within a window is then aggregated by computing summary statistics, yielding a moving aggregation. Results of this windowed aggregation can then be used in further data processing. In this use-case, the windowing of data requires application state.
Executing the benchmark for a platform entails threes steps:
An application containing the two use-cases is implemented for the chosen platform.
For different rates of data ingress, i.e., workload levels, load is generated against the application.
The total cost of running the application is measured for the duration of processing a constant data rate.
Our benchmark first requires that the two use-cases are implemented for a chosen SUT, such as a DSP in a specific cloud environment or a FaaS offering. Although we present some implementations in Section 4, we cannot provide a generic, ready-to-use implementation for all possible SUTs, as implementation details are highly SUT-specific. The application implementation should also conform to any best practices for the chosen SUT to support a fair comparison .
The benchmark is not restricted to evaluating only the difference between DSP and FaaS, but can also be used to evaluate other scenarios: Users might, e.g., use the benchmark to compare two different stream processing engines, compare the same engine deployed on different cloud providers, or compare the same FaaS functions using different event triggers. We explore such options in our experiments in Section 4.
Load is generated through a dedicated load generator deployed within the same cloud datacenter as the SUT using the Theodolite load generators. The load generators emulate a number of sensors that send data in a fixed interval in an open workload model 
, i.e., requests are non-blocking. By varying the number of sensors that are emulated by the load generator, we can achieve cost estimates for varying request loads. To simplify our cost calculations, we set the fixed interval at one second so that, e.g., 500 emulated sensors lead to a load of 500 requests/s.
A constant arrival rate does not necessarily reflect real world data ingress patterns. However, the goal of our benchmark is not to measure scalability or elasticity of a given platform but rather to explore the cost of operating an application for a given data rate that may reflect an average rate over time.
The result of our benchmark is an hourly cost estimate for the implementation of a use-case for a given level of constant load. To yield such an estimate, we can leverage different kinds of information provided by cloud platforms or measurements. For experiments on FaaS platforms with pay-per-request pricing models, cost estimates can be derived by extrapolating from small-scale environments, as the cost can be expected to scale linearly with the number of requests for current cloud pricing models. Additionally, any costs for database reads and writes can be derived by tracking database access and calculating the resulting cost based on per-request cost of the database system used. To achieve a cost estimate for a DSP deployment, we benchmark multiple infrastructure configurations until the least expensive deployment, which can still handle the configured load without an increase in event consumer lag, is found [30, 31].
In this section, we present an extensive evaluation of different cloud event processing deployments. After an initial comparison of DSP and FaaS (Section 4.1), we use our benchmark to explore the parameter space. Specifically, we evaluate the impact of chosen event passing paradigm (Section 4.2), cloud service provider (Sections 4.4 and 4.3), FaaS runtime environment (Section 4.5), DSP engine choice (Section 4.6), a serverless DSP offering (Section 4.7), and a managed Kubernetes service (Section 4.8).
As our baseline, we compare Google Cloud Functions and Apache Flink, running Apache Beam pipelines on Google Kubernetes Engine (GKE).
In the stateless storage use-case (UC1), client events are sent over HTTP and stored in Google Cloud Firestore (see Fig. (a)a). As necessary for Apache Flink, HTTP events are enqueued in Apache Kafka by a middleware prior to processing (see Fig. (b)b). Cloud Functions, on the other hand, can directly expose an HTTP endpoint.
The stateful windowed aggregation application (UC2) also receives events over HTTP, but results are emitted to the output log of the respective platform. In a real application, a further stateless operation such as UC1 might be performed afterwards, yet our goal here is to study the stateful operator in isolation. For our implementation with Flink, we use the built-in window aggregation mechanisms with RocksDB as state backend (see Fig. (d)d). To support stateful windowed aggregation on stateless functions, we store intermediate window state in a Google Cloud Firestore collection for each window (see Fig. (c)c). Both implementations are configured to aggregate data over windows of 30 seconds, with a new window starting every 3 seconds. This results in 10 windows per emulated sensor that are maintained in parallel.
As Apache Flink and its operators are implemented in Java, we also use the Java 11 runtime for our cloud functions to account for effects caused by programming language or runtime. We set the function memory to 256 MB, which is the smallest amount that can support a function execution without running into memory errors. This also limits our per-function compute resources to 0.1667 vCPU.
For our streaming implementation, we deploy Flink in a GKE cluster with different numbers of e2-standard-4 virtual machines. The overall deployment consists of one coordinating Flink jobmanager, varying numbers of Flink taskmanagers, a three-node Apache Kafka cluster, a component redirecting incoming HTTP requests to Kafka as well as some additional components for monitoring and cluster management. To ensure a reasonable degree of fault tolerance, Flink is configured with a 30-second checkpointing interval and each Kafka partition is replicated across three brokers.
All experiments are conducted in the europe-west-3 (Frankfurt) Google Cloud region, with the load generators deployed on e2-highcpu-4 virtual machines on Google Compute Engine in the same region.
We show the results of our baseline evaluation in Fig. 4. For the application that we consider, costs scale linearly with request loads, yet at different rates. This is expected for functions, which are billed by request and where requests can be processed independently. In essence, FaaS is variable cost only. In stream processing, we instead observe a pattern of steps, which can be seen in Fig. (a)a (and more pronounced at a larger scale in Fig. (b)b). This is a result of a more coarsely grained allocation of resources, i.e., servers that need to be added to the cluster. Additionally, there is a minimum cost of running the cluster, which is the cost of a single server. Overall, this means that DSP costs are here a combination of fixed cost and variable cost which need to be added in batches. This leads to the intersection of function and cluster costs at a specific request level (200 req/s for UC1 and 5 req/s for UC2): At a request rate below this level, the fixed cost of running a single-server cluster is higher than paying per request for FaaS functions. Beyond this request rate, the overhead of operating full servers in a cluster is negligible compared to the premium of serverless functions.
Interestingly, the break-even point is at a higher load rate for the stateless UC1 than for the stateful aggregation in UC2. For the cloud function implementation of UC2, the largest share of costs per request are caused by writes (62.2%) and reads (20.8%) to Cloud Firestore, as shown in Fig. 5. This database access is required to store intermediate state – in our implementation, each window is stored as a database entry, leading to ten read and write requests for each function invocation. In the streaming implementation, on the other hand, there is no such database access required since all state is maintained inside the Flink taskmanagers.
Our baseline experiments show that FaaS is an economical choice over DSP for stateless applications with low to medium event arrival rates, in our case from 0 to 200 req/s. For stateful applications, where functions need to store intermediate state in a database, the cost of database access makes FaaS infeasible for anything but low-rate event processing.
While we use HTTP endpoints for sensors in our baseline evaluation, this does not necessarily reflect all IoT environments, where data distribution paradigms such as publish/subscribe are more common . We thus further quantify the impact of endpoint choice on DSP and FaaS costs.
We extend our baseline implementation with support for Google Cloud Pub/Sub999https://cloud.google.com/pubsub/. For our function implementation, this requires adding an event trigger and application logic for event parsing. In our Apache Flink setup, we replace the previous HTTP middleware and the Apache Kafka deployment with a direct connection to Google Cloud Pub/Sub, using the PubSubIO connectors provided by Apache Beam. Instead of sending JSON objects as done with our HTTP implementation, we send binary encoded Apache Avro101010https://avro.apache.org/ records via Pub/Sub.
As shown in Fig. 5, using Cloud Pub/Sub has a noticeable effect on the execution duration of our FaaS implementations, especially in UC1, where processing costs increase by 154.6%. This effect is less pronounced for UC2, where duration increases by 8.6%. One possible explanation for this effect is an increased overhead caused by message parsing compared to HTTP, where request data is passed to our function directly as JSON rather than encoded. However, due to the relatively high costs of database access, this has only a small impact on total costs (12.9% increase for UC1 and 1.4% increase for UC2). At less than $0.04 per 1,000,000 messages, the cost per Cloud Pub/Sub message is two orders of magnitude smaller than costs incurred by message processing.
Figure 6 shows how costs increase with increasing load when using Cloud Pub/Sub in our Apache Flink implementation. Pub/Sub introduces an additional cost factor to the overall deployment. These costs increase at a steeper rate than the costs for the Kubernetes cluster: While the share of Pub/Sub costs in total costs is 1.5% for UC1 and 2.9% for UC2 at a load intensity of 100 req/s, it grows to 2.6% and 17.5%, respectively, at a load of 1,000 req/s. On the other hand, these additional costs are compensated by the slightly higher loads which Flink can process with Pub/Sub before requiring an additional virtual machine. Figure 7 shows that, averaged over all evaluated load profiles, costs for processing messages from Pub/Sub are similar to redirecting HTTP requests via Kafka.
Our experiments show that there is no clear difference in costs when choosing Pub/Sub or HTTP, neither in DSP nor in FaaS. However, small savings are possible when using a transform method that simplifies processing. Hence, it does not seem to be reasonable to add a dedicated message transform layer just to save costs.
In our baseline FaaS evaluation, we use Google Cloud Functions, yet other cloud providers offer their own serverless platforms that may have different runtime behavior and pricing, impacting the cost results of our experiments. In this experiment, we thus compare our Google Cloud Function implementation with an implementation on AWS Lambda.
We implement our benchmark for AWS Lambda with an AWS DynamoDB serverless database. To ensure comparability, we use the Java 11 runtime and conduct our experiments in the eu-central-1 (Frankfurt) region. We again set the memory limit to 256 MB. Our load generator for this implementation runs in the same region on an m5.xlarge EC2 instance.
As we expect the costs for function execution to scale linearly with event arrival rate, we consider the average cost for individual function execution which we show in Fig. 5. The average cost per function execution is 6.4% higher on AWS Lambda than on Google Cloud Functions for both applications, which is caused mainly by the more expensive database access in DynamoDB over Cloud Firestore.
In our experiments, the choice of FaaS provider had only a limited impact on the total cost of execution, yet we see that the cost difference can depend on the type of application as applications using other cloud platform services may encounter significant costs (which may vary between providers).
Similar to our evaluation of different FaaS Platforms, we also compare GKE and AWS Elastic Kubernetes Service (EKS).
Deployment descriptions for Kubernetes are largely platform independent, allowing us to almost use the same deployment with EKS as with GKE. As in our evaluation of different FaaS Platforms, we write incoming events in our UC1 implementation to an AWS DynamoDB serverless database. Both our EKS cluster and the load generator for this implementation use m5.xlarge EC2 instances, running in the eu-central-1 (Frankfurt) region.
As shown in Fig. (a)a, the costs for our UC1 deployment on EKS increase at a steeper rate than in the GKE deployment. Averaged over all evaluated load profiles, EKS has 24.3% higher costs than GKE as shown in Fig. (a)a. Interestingly, EKS has higher costs although the EKS deployment requires significantly less Flink taskmanager instances: Loads up to 1,100 req/s can be processed by a single taskmanager, compared to 8 instances required in the GKE deployment. However, higher costs per VM instance and especially higher costs per database write outweigh this superior performance. As we do not see such a difference in resource usage for UC2, we conclude that either DynamoDB provides faster writes than Firestore or Beam’s DynamoDB writer is more resource efficient than the Firestore writer.
In our implementation of the stateful application, we use only native Apache Beam functionality. As shown in Fig. (b)b, costs increase in EKS at a similar rate as in GKE. Depending on the load intensity, at which VMs have to be added to the cluster, either GKE or EKS is cheaper. Averaged over all evaluated load profiles, EKS has 8.8% higher costs than GKE (see Fig. (b)b). This is in accordance with the slightly higher costs per VM instance in AWS.
Similar to our findings from evaluating different FaaS platforms, the choice of cloud infrastructure for running a DSP engine has a small but noteworthy impact on the total costs. The discrepancy results mainly from different costs for cloud resources, which even outweigh significant performance gaps.
In our baseline FaaS evaluation, we use the Java 11 runtime in order to account for effects of programming language or runtime performance when comparing to Apache Flink. Most modern FaaS platforms support a wider variety of runtimes, and the choice of language may have an indirect impact on execution cost when an implementation requires more resources or function executions take more time.
To quantify the effect of runtime choice, we implement our benchmark in Node.js and Go. Node.js is one of the most popular choices for cloud functions, while Go is the only programming language supported by Google Cloud Functions that is compiled directly to machine code and may thus have the smallest performance overhead .
As shown in Fig. 5, the choice of programming language has only a small effect on the cost of function execution, with overall costs changing by -1.9% and -7.5% (Go) and 0.4% and -1.9% (Node.js) for UC1 and UC2, respectively. Although the duration of a function execution changes by -22.7% and -50.8% for UC1 and UC2 with Go, the effect on costs is insignificant compared to costs for database access. Surprisingly, the Node.js implementation is as efficient as our Java implementation. This might be caused by a more mature and optimized execution environment in Google Cloud Functions, as Node.js is one of the most popular languages for FaaS functions.
As the majority of costs for the execution of a function are incurred by database access and not function duration, the choice of programming language has no considerable effect on the cost of our application. For stateless applications without database access, and especially for more complex functions where the largest share of costs is incurred by execution duration rather than function invocation, comparing implementation runtimes may nevertheless be beneficial.
We use Apache Flink for our baseline evaluation, which is a DSP engine originating in academia and extensively studied in research. In this experiment, we compare this to Apache Samza, an open source DSP engine developed in industry at LinkedIn . Samza is built around similar concepts as Flink and can also be used to run Apache Beam pipelines.
Thanks to Apache Beam, we can use exactly the same implementation for Samza as we use for Flink. In contrast to Flink, Samza does not need a dedicated coordinator, but instead uses our existing Kafka/ZooKeeper deployments for coordination among instances.
In case of the stateless application, we found that Samza has a significantly higher resource demand than Flink, causing higher costs as shown in Fig. (a)a. As processing 300 req/s already requires 14 Samza instances, we extrapolated the costs for higher loads. We assume that this huge discrepancy is because we did not enable bundling, a Beam feature, which is used in Beam’s FirestoreIO to write multiple records as batch. Bundling is disabled per default and its usage is not documented for Samza.
With the stateful application, Samza performs similar to Flink. As, however, Samza scales in smaller steps, the rather small load profiles studied here result in slightly lower costs for Samza as shown in Fig. (b)b.
In general, different stream processing engines can be operated at similar costs. However, different feature sets and inappropriate configuration options might cause cost pitfalls, particularly when interacting with other cloud services.
In our baseline evaluation, we compare serverless FaaS implementations with streaming implementations running in Kubernetes. Major cloud vendors also provide managed streaming offerings, which run DSP pipelines on top of hosted stream processing engines. While requiring the same development skills than with other DSP engines, serverless stream processing services can be considered an in-between of self-operated DSP engines and FaaS in terms of operational complexity.
To compare the costs of self-operating a DSP engine with a fully-managed one, we run our Apache Beam implementations on Google Cloud Dataflow with varying numbers e2-standard-4 instances. Similar to the other engines, Dataflow should be used with a durable data source instead of ingesting data directly via HTTP. As we consider using a serverless DSP service along with a self-operated Kafka cluster to be less realistic for real-world systems, we focus on processing data from Google Cloud Pub/Sub and use the Flink experiments with Pub/Sub as baseline.
As shown in Fig. 6, Google Cloud Dataflow has significantly lower costs than our Apache Flink on Kubernetes deployment. Averaged over all evaluated load profiles (see Fig. 7), Dataflow has 85.6% of the costs for operating Flink for UC1 and only 41.2% for UC2. This is primarily due to the massively reduced costs for the virtual machines as with Dataflow, fewer instances are required to process the same load, e.g., the stateful application can be run with a single VM at all tested load rates. We observed that costs for Dataflow could be further reduced when using smaller instances such as n1-standard-1 ones. Additionally, there are no general managing fees for Dataflow, while Google charges customers $0.10 per hour for managing a Kubernetes cluster. The impact of this fee on total costs decreases with increasing load. Since the largest cost driver in the stateless application are database writes, costs are reduced less than in the stateful application. An in-depth analysis of resource efficiency advantages in Dataflow is beyond the scope of this work, but possible reasons are:
Dataflow might in general offer a better performance than other stream processing engines.
Apache Beam might be optimized for Google Cloud Dataflow and, as shown in previous research , Flink provides much better performance when running native Flink pipelines instead of using Beam.
Flink’s default configuration might not be optimal and additional tuning is required to reach comparable performance.
Resource utilization when running Flink in small Kubernetes clusters might not be optimal.
Processing event streams with Google Cloud Dataflow had significantly lower costs in our experiments compared to our Flink deployment. Thus, serverless stream processing services can be a compelling alternative to running stream processing engines manually in Kubernetes, reducing both operational complexity and costs.
Recently, cloud providers started offering managed Kubernetes services, which charge users per container resource usage instead of for the underlying VM instances. A prominent example for such a service is GKE Autopilot .
As autoscaling of the Kubernetes cluster takes a considerable amount of time, running dedicated experiments with GKE Autopilot is unpractical. However, we can get a reasonable cost approximation by using the results of our baseline evaluation, in which we determined the required number of Flink taskmanagers per load profile on a sufficiently dimensioned cluster. Total costs are then the costs for the taskmanagers, combined with the constant costs for other components such as Kafka, HTTP Bridge, or monitoring.
Independent of the load profile and the use case, GKE Autopilot has higher costs compared to GKE’s default mode (see Fig. 6). The relative cost difference appears to decrease with higher loads. This can be explained by a minimal cost per container that is charged independent of the actual resource usage. Moreover, the cost difference is less pronounced in the stateless application, where costs are heavily influenced by database writes (see Fig. 7).
While serverless Kubernetes offerings reduce the management burden, they also have higher cloud service costs. Nevertheless, costs for running self-operated DSP engines in a serverless Kubernetes cluster are still lower than for FaaS at medium and high loads.
In our experiments, we have quantitatively evaluated the choice between functions and stream processing for cloud event processing and have explored the impact of choosing cloud providers, endpoints, programming languages, and platforms. We see that the major influences on cost are the rate at which events arrive and the type of application. As shown in Fig. 8, FaaS is the economic choice for applications that manage little to no state and process events with low to medium arrival rates. DSP is better suited for operations that require state, such as windowed aggregation, and for applications that process more events, i.e., on the order of thousands of events per second.
Beyond these considerations, we could not observe any considerable impact of other deployment parameters on costs. The choice of a specific messaging paradigm, such as Pub/Sub or HTTP, should thus be based not on cost but on functional differences. Similarly, the choice of cloud service provider did not influence costs significantly and might be influenced more by specific services that a provider offers.
In our benchmarks and guidelines, we consider solely the cost incurred by cloud resources for different deployments of our applications. Particularly, we did not try to quantify the “human resource” costs for implementing, operating, and maintaining a specific target design. Beyond both cost types, there are other aspects that may influence the design of a cloud event processing application. We discuss these perspectives here and derive avenues in which our work could be extended in the future.
In our benchmark experiments, we consider a constant event arrival rate as our goal is to measure deployment costs at a specific load. In some domains, workloads may instead fluctuate, requiring elasticity from the processing application. This elasticity is handled differently in DSP and FaaS: As functions are stateless and can be scaled horizontally quickly, load peaks can be processed in real-time. This will briefly increase costs for a FaaS deployment. In DSP, such peaks may be handled by queuing events and processing them once load has reduced. This does not require any additional infrastructure and hence does not incur additional costs as long as sufficient queue capacity exists. Alternatively, infrastructure can be expanded easily by adding more compute nodes to the cluster. Compared to FaaS platforms, such horizontal scaling is rather slow and will still require queuing. Depending on the billing scheme of the runtime platform as well as the scale-in strategy, short load spikes can also mean that the DSP cluster is overprovisioned (and thus over-expensive) for some time after the load spike whereas FaaS providers pay the costs for keeping functions warm after a load spike.
Our experiments show that building a stateful processor with serverless functions leads to high costs incurred by database access used to persist state. Recently, there have been some proposals to add mechanisms for stateful stream processing to function platforms, e.g., [25, 26, 36]. These approaches typically include a dedicated datastore directly in the FaaS platform, which could reduce access costs. However, public cloud vendors do not offer such services at this time, leaving engineers only the option of dedicated cloud datastores. As an alternative, engineers might use an open source FaaS system and retrofit the “sharding by key” features to its load balancer and use local ephemeral storage for state. This would require significant engineering and infrastructure management efforts, breaking the concept of “serverless” platforms.
In addition to deployment costs, there are “hidden” costs to building cloud applications with managed services such as Kubernetes engines or FaaS platforms: Lock-in effects increase the effort required to move between cloud vendors. Such effects could also influence the decision between DSP and FaaS as paradigms for a cloud event processing application: We were able to move our Apache Flink benchmark implementation from Google Kubernetes Engine to AWS EKS with little effort (Section 4.4) as both platforms understand similar Kubernetes application descriptions. Porting our implementation from Google Cloud Functions to AWS Lambda (Section 4.3), however, required changing the highly platform-specific function implementation almost completely.
A further factor that is beyond the scope of our benchmark is the influence of different service level agreements (SLA) and service level objectives (SLO) on the true cost of an application deployment. For self-managed streaming applications in Kubernetes, only very basic SLAs are guaranteed by the cloud provider such as the availability of compute instances. Application-level SLOs such as maximum latency must be monitored and managed by the operator. As FaaS platforms are fully managed by the provider, they may provide further guarantees on availability.
Finding a cost-optimal configuration (e.g, machine type, cluster size, or stream processing engine settings) for a self-operated DSP deployment is a complex task, especially in comparison to FaaS. This is even more important when comparing managed stream processing services against self-operated ones and may also explain why we found Cloud Dataflow to be significantly less expensive than running Apache Flink. We cannot exclude that Apache Flink can be tuned for better performance to achieve similar or better cost efficiency than FaaS for low event rates or than Google Cloud Dataflow in general. However, such performance tunings come at the cost of expert knowledge or extensive benchmarking.
Although including a cost model in cloud benchmarking studies is considered good scientific practice , in existing benchmarking studies on FaaS [38, 39, 40, 24, 41] and DSP [42, 43, 4], cost evaluations can mainly be found for cloud functions, where the pay-per-execution pricing model has presented a significant paradigm shift.
LIBRA  is an approach to offload FaaS function invocations to self-managed function infrastructure to leverage economies of scale and decrease costs for FaaS applications. Conversely, SplitServe  offloads latency-sensitive Apache Spark jobs to a FaaS platform to manage unexpected spikes in demand. Chadha et al.  present a comprehensive evaluation of the impacts of runtime, region, and processor architecture choice on the performance and cost of compute-intensive functions on Google Cloud Functions. Similarly, Eivy  gives an overview and discussion of cloud FaaS pricing and Cordingly et al.  introduce SAAF, a cost and performance predictor for serverless functions. In the context of DSP, Truong et al.  present a resource provisioning strategy that optimizes costs for cloud data processing and Bedini et al.  show an approach to model the performance of the Apache Storm stream processing engine. To the best of our knowledge, existing work has not compared FaaS and DSP to implement the same application.
Copik et al.  evaluate how Infrastructure-as-a-Service costs relate to FaaS costs, finding that IaaS provides better performance at lower costs if high utilization could be reached. Similarly, Müller et al.  compare the costs of Query-as-a-Service systems with FaaS costs and show that cold data can be requested significantly cheaper with FaaS.
Previous research comparing different stream processing systems focuses on self-operated, open source systems such as Apache Storm, Apache Flink, and Apache Spark and does not include cloud services for DSP [52, 53, 54, 4]. Akidau et al.  present a performance comparison of Apache Flink and Google Cloud Dataflow on GCP. These evaluations, however, do not focus on cloud infrastructure costs.
In previous work , we have considered the choice between functions, stream processing, and batch processing for IoT data and event processing in the fog from a qualitative perspective and derived a set of best practices. With a focus on cloud event processing in this paper, we have extended this with a quantitative evaluation focusing on the cost dimension.
In this paper, we have presented a cost perspective on cloud event processing. We have presented a novel application-centric cost benchmark with workflows from an IIoT context that include both a stateless and a stateful job graph. Further, we have used this benchmark to compare distributed stream processing and Functions-as-a-Service, today’s most popular cloud event processing paradigms, and have explored the parameter space to evaluate which factors influence the cost of operating event processing applications in the cloud. Based on these learnings, we have derived guidelines for designing such applications.
Partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 415899119. This material is based upon works supported by the Google Cloud Research Credits program with the awards GCP209186206 and GCP203304083.