Serverless computing is a new paradigm in cloud computing that allows developers to develop and deploy applications on cloud platforms without having to manage any underlying infrastructure, e.g., load-balancing, auto-scaling, and operational monitoring (Carreira et al., 2018; de Lara et al., 2016; Malawski et al., 2020; Jonas et al., 2017; Chard et al., 2017; Bila et al., 2017; Fouladi et al., 2017). Due to its significant advantages, serverless computing has been an increasingly hot topic in both academia (Akkus et al., 2018; McGrath and Brenner, 2017; Jonas et al., 2019) and industry (M. Shahrad, R. Fonseca, Í. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, and R. Bianchini (2020); 4; 36; 19); its market growth is expected to exceed $8 billion per year by 2021 (17). In serverless computing, developers prototype an event-driven application as a set of interdependent functions (named as serverless functions), each of which performs a single logical task (Datta et al., 2020). To facilitate the coordination among these functions, in recent years, major cloud providers have rolled out serverless workflow services (e.g., AWS Step Functions (5)), which aim to orchestrate serverless functions in a reliable way. Specifically, serverless workflow allows developers to only specify the execution logic among functions, without having to implement this logic through complex nested function calls (Akkus et al., 2018).
With the help of serverless workflow, complex application scenarios (e.g., data processing pipeline and machine learning pipeline) can be accomplished more efficiently(37). Given the surging interest in serverless computing and the increasing dependence of current serverless computing on serverless workflow, characterizing existing serverless workflow services is of great significance. On one hand, it can help developers understand the pros and cons of these services and thus make better choices among them according to the application scenarios. On the other hand, it can provide insightful implications for cloud providers to improve these services in a more targeted manner. However, to the best of our knowledge, the characteristics of these serverless workflow services have not been systematically analyzed.
To fill the knowledge gap, this paper presents the first empirical study on characterizing and comparing existing serverless workflow services. Specifically, we focus our analysis on four mainstream serverless workflow services, including AWS Step Functions (ASF), Azure Durable Functions (ADF) (42), Alibaba Serverless Workflow (ASW) (3), Google Cloud Composer (GCC) (18). We first review their official documents and compare them in terms of six dimensions including orchestration way, data payload limit, parallelism support, etc. Then, we compare the performance (including execution time of functions, execution time of workflows, orchestration overhead time of workflows, etc.) of the four services under varied experimental settings. The comparison is performed in two representative application scenarios, i.e., sequence applications and parallel applications, which refer to applications that can be prototyped as multiple functions executing in a sequence and parallel way, respectively. Sequence applications and parallel applications can be represented as sequence workflows and parallel workflows, respectively. More specifically, we focus on the following three aspects that we can provide insights for developers and cloud providers:
The effect of activity complexity. We first compare the performance of the selected serverless workflow services under various levels of activity complexity (Cardoso, 2006) (i.e., the numbers of serverless functions contained in a workflow). We find that the execution time of workflows, execution time of functions, and orchestration overhead time of workflows all become longer for ASF, ADF, ASW, and GCC with the increase of activity complexity in both sequence and parallel workflows, except that the orchestration overhead time of workflows of GCC has certain fluctuation in parallel workflows. Additionally, in sequence workflows, we find that the execution time of workflows in ASF, ADF, and ASW are mainly generated by the execution time of functions, whereas GCC is the orchestration overhead of workflows. In parallel workflows, when more functions are required, the execution time of functions gradually increases, and it determines the changing trend of the execution time of workflows.
The effect of data-flow complexity. We then compare the performance of four serverless workflow services under different levels of data-flow complexity (Cardoso, 2006) (i.e., the size of data payloads passed among functions). We find that only under high data-flow complexity conditions will ASF, ADF, and ASW have a certain impact on the performance in sequence and parallel workflows, while GCC is affected by whether there is a payload or not.
The effect of function complexity. We also compare the performance of these serverless workflow services under different levels of function complexity (i.e., the specified duration time of serverless functions). We find that the execution time of workflows and execution time of functions become longer for ASF, ADF, and ASW as function complexity increases in sequence and parallel workflows, whereas there is no obvious impact on GCC. Besides, we find that the orchestration overhead of workflows is less affected by function complexity.
Based on the derived findings, we have drawn insightful implications for developers and cloud providers. Specific findings and implications are shown in Table 1. We also offer the source code111They will be made public later. used in this study as an additional contribution to the research community for other researchers to replicate and build upon.
2. Feature Comparison of serverless workflow Services
We first select four mainstream serverless workflow services from public cloud platforms, including AWS Step Functions (ASF) (released December 1, 2016), Azure Durable Functions (ADF) (released May 7, 2018), Alibaba Serverless Workflow (ASW) (released July 2019), Google Cloud Composer (GCC) (released May 1, 2018). These services have relatively mature application practices and are more standardized rather than those of private cloud platforms. Then, through reviewing official documents, we compare the features of these serverless workflow services from the following dimensions:
Orchestration way: the workflow definition model and model definition language of serverless workflow services.
Data payload limit: the size constraint of data payloads transmitted among serverless functions of a serverless workflow.
Parallelism support: whether serverless workflow services support to parallel multiple serverless functions.
Execution time limit: the maximum execution time of workflows supported by serverless workflow services.
Reusabiluty: whether a serverless workflow can be used to a part (called sub-workflow) of another serverless workflow.
Supported development language: the supported development languages for serverless workflow services.
Table 2 shows the results of the feature comparison as of Sep. 2020. In detail, the feature of each dimension is explained as follows:
Orchestration way. We explain the orchestration way of serverless workflow services in terms of two perspectives. From the perspective of the workflow definition model, each serverless workflow service uses the respective workflow model. Specifically, ASF and ASW are based on State Machine222A state machine is just a collection of states, the relationship among the states and their inputs and outputs. and Flow333Flow defines the business logic description and the general information required to execute the process., respectively, whereas ADF uses a new type of function (called Orchestrator Functions444Orchestrator functions are the heart of ADF and describe the order in which actions are executed.) and GCC defines workflows by creating a Directed Acyclic Graph (DAG555A DAG is a collection of tasks organized to reflect their directional interdependencies.). From the perspective of model definition language, developers write workflow models of ASF and ASW through the JSON format, whereas ADF and GCC need to leverage the procedural code. Specifically, the definition languages of the workflow model in ASF and ASW are Amazon State Language666https://states-language.net/spec.html and Flow Definition Language777https://help.aliyun.com/document_detail/122492.html?spm=a2c4g.11186623.6.575.fecf52c2lWEbbq, in Chinese, respectively. State Definition Language and Flow Definition Language are both the JSON format. However, in ADF, developers design and orchestrate serverless functions in the code of the orchestrator function, whereas DAG of GCC is written in a Python script.
Data payload limit. Official documents of ASF and ASW specifically explain the data payload constraints in workflows, whereas ADF and GCC do not. Specifically, in ASF, the maximum input or result data size for a task or execution defaults to 256KB (6). Differently, for ASW, the total size of the input, output, and local variables cannot exceed 32KB. Note that the local variable is used to store the output of functions, and we have verified that the data size of the local variable will account for part of the total data size in our experiments. Although documents of ADF and GCC do not mention their data constraints, we find some interesting phenomenons in our later experiments. For example, ADF can achieve a larger data transmission (i.e., 1024KB) than ASF (i.e., 256KB). For GCC, without the help of external storage, native data will be limited by the 32KB of its database storage built in the Cloud Composer environment.
Parallelism support. ASF, ADF, and ASW support to parallel serverless functions. However, regarding the parallel structure in GCC, there is no explicit mention in its document. Though reviewing the operators relationship of the DAG concept (40), we find that the list relationship can express the function parallelism.
Execution time limit. ASF and ASW have the execution time limitation of workflows, whereas ADF and GCC are not limited and they can execute for a long time. Specifically, when creating a state machine in ASF, two types (39) can choose: (i) Standard type that can run for up to one year and is ideal for long-time, durable, and auditable workflows; (ii) Express type that can run for up to five minutes and is ideal for high-volume, event-processing workloads. Additionally, ASW supports the flow execution for up to one year.
Reusability. ASF, ADF, and SW can integrate with their own API to form nested workflows. When building new workflows, using nested workflows can reduce the complexity of the main workflow. However, GCC cannot integrate and reuse its workflows because it relies on a managed Airflow deployment. The Airflow is an approach for workflow management on a dedicated long-running workflow-execution engine. Its existence indicates that the DAGs composed of multiple tasks cannot be treated as serverless functions.
We consider two representative scenarios, i.e., sequence applications and parallel applications, which refer to applications that can be prototyped as multiple functions executing in a sequence and parallel way, respectively. Sequence applications and parallel applications can be represented as sequence workflows and parallel workflows, respectively. To measure and compare the performance of serverless workflow services under varied experimental settings, we show an overview of the methodology of our study in Figure 1.
Step 1: Determine performance metrics. In the first step, we determine the performance metrics of our study. Generally, time for the process of workflow executions is spent on the execution of workflows and functions, as well as orchestration overhead of functions. Thus, the metrics related to them are considered, and the specific representations and meanings are explained as follows. (i) totalTime is the total execution time to complete a workflow. (ii) funTime is the actual execution time of functions contained in workflows. The calculation strategies of funTime are different in different application scenarios. Specifically, in sequence workflows, funTime is the sum of actual execution times of all functions contained in workflows, whereas funTime is the time interval between the start time of the first function execution and the end time of the last function completion in parallel workflows. (iii) overheadTime is the actual overhead time produced in the orchestration process of a workflow. overheadTime may contain duration times of workflow start, function scheduling, data state transition, parallel branch and merge, etc. (iv) theooverheadTime is the theoretical overhead time produced in the orchestration process of a parallel workflow. Theoretically, paralleling multiple functions costs the time of the function with the longest execution time. The theoretical parallel workflow is a zero-overhead parallel composition. Removing the theoretical specified duration time of functions from the total time of workflows is the theoretical overhead time of workflows. Though comparing overheadTime with theooverheadTime, we believe some interesting findings can be found in our study. Note that the sum of funTime and overheadTime equals to totalTime for a workflow.
Step 2: Set up experiment. Most of our experiments were done from June 15 - August 20, 2020. In our study, without considering the cold start of spawning the function containers, serverless functions in workflows are in warm state to avoid undesired startup latency. A common practice is to reuse launched containers by keeping them warm for a period of time. The first call to a serverless function is still cold, but subsequent calls to this serverless function can be served by the warm container. Thus, each group experiment is repeated several measurements to ensure the correctness of the results. We discord the result of the first measurement and keep the remaining results to evaluate the final performance of serverless workflow services. Additionally, we adopt the median of remaining results to compare totalTime, funTime, and overheadTime of various serverless workflow services.
Step 3: Measure effects of activity complexity. Activity complexity describes the number of functions a workflow has (Cardoso, 2006). Considering both sequence and parallel workflows, we configure various numbers of functions in workflows. Because the maximum number of branches in parallel is limited to 100 in ASW, the function number of our study specifies as 2, 5, 10, 20, 40, 80, 100, and 120. Particularly, the experimental test about the function number with 120 is to verify the parallel limitations of ASF, ADF, and GCC, because their documentations do not explicitly indicate. Shahrad et al. (Shahrad et al., 2020) presented that the distribution of function execution times on Azure Functions (7)
shows a sufficiently log-normal fit to the distribution of the average function execution time. We find that the probability that the function execution time is one second is the greatest, thus it illustrates that a majority of serverless functions implement the functionality with one-second execution. In experiments of the activity complexity, we set all serverless functions to sleep for one second. TheStep 3 corresponds to the details of the experimental setting ES1.
Step 4: Measure effects of data-flow complexity. Data-flow complexity reflects on the date payload used in effect pre- and post-conditions of function execution (Cardoso, 2006). Considering both sequence and parallel workflows, we configure various data payloads of functions for workflows. Table 2 shows that the maximum data size between functions in ASW is 32KB (B). To verify whether ASW can pass a larger data payload, we set data payloads with 0B, B, B, B, B. For most of the sequence applications (37), we find that about five serverless functions can basically fulfill their requirements unless applications need to add additional functionalities. For example, the video processing needs the following steps: (i) video extraction, (ii) slicing, (iii) transcoding, (iv) merging, (v) post-processing, and additional functionalities (e.g., watermark insertion or information update). Thus, sequence workflows in our study contain five same serverless functions, which each one receives a parameter (i.e., payload), sleeps for one second, and returns this parameter. However, for parallel applications (e.g., data processing pipeline), a large amount of data may need to be processed in parallel. Generally speaking, the number of parallel functions depends on the data scale and expected efficiency. Thus, we cannot determine the function number contained in parallel workflows. Considering that our main purpose is to explore the effects of various payloads on parallel workflows, it is reasonable and comparable as long as different serverless workflow services set the same function number in parallel workflows. In our experiments, we set the five same serverless functions in parallel workflows. The Step 4 corresponds to the details of the experimental setting ES2.
Step 5: Measure effects of function complexity. Function complexity reflects on the time required to implement a serverless function. Considering both sequence and parallel workflows with five same serverless functions, we configure various specified duration times of serverless functions. Shahrad et al. (Shahrad et al., 2020) mentioned 96 of serverless functions take less than 60 seconds on average. Thus, we set the specified duration time of sleep functions as 50ms, 100ms, 1s, 10s, 20s, 40s, 60s, and 120s without data payloads.
The Step 5 corresponds to the details of the experimental setting ES3.
In this section, we show and discuss results under various levels of activity complexity, data-flow complexity, and function complexity considering both sequence and parallel application scenarios. All result values obtained from our experiments are in seconds and figures are available on our Github. Then, we report a series of findings and implications for developers and cloud providers. In our result explanation, we first discuss the execution time of workflows (totalTime), and execution times of functions (funTime) with the increase of the number of functions, then orchestration overhead time of workflows (overheadTime), finally the distributions of measurement results about these three metrics.
4.1. Activity Complexity (ES1)
Activity complexity reflects on the numbers of serverless functions contained in a workflow.
4.1.1. Sequence application scenario
Figure 2 represents totalTime, funTime, and overheadTime about various numbers of functions contained in sequence workflows for ASF, ADF, ASW, and GCC. The horizontal axis is the number of functions, and the vertical axis is the duration time in seconds. Each bar in Figure 2 consists of funTime and overheadTime produced from the workflow with a fixed number of functions. Note that the sum of funTime and overheadTime equals to totalTime for this workflow. The value next to the bar indicates the percentage of funTime to totalTime.
For totalTime and funTime in Figure 2, as more serverless functions are added into sequence workflows, totalTime and funTime of ASF, ADF, ASW, and GCC both increase. Undoubtedly, when the number of functions contained in sequence workflow increases, funTime will inevitably increase, thus totalTime increases. Generally speaking, five one-second functions have a totalTime of more than five seconds, ten one-second functions are more than ten seconds, etc. We find that totalTime of ASF, ADF, and ASW basically conforms to such a growing trend. Besides, totalTime of ASF, ADF, and ASW depends on funTime. Specifically, the percentage value of ASF fluctuates between 96.76 and 98.19, ADF is between 91.00 and 95.85, and ASW is between 90.74 and 94.13. However, the percentage value of GCC is only between 4.12 and 8.60, thus it illustrates that most of the time on GCC is spent on overheadTime rather than funTime. The main reason may be due to the environment setting itself.
For overheadTime in Figure 2, we find that overheadTime of ASF, ADF, ASW, and GCC gets longer as more functions are added to sequence workflows. Thus, the number of functions contained in sequence workflows will affect the orchestration overhead of workflows.
To comprehensively compare the performance of ASF, ADF, ASW, and GCC, we display the statistical results of all measurements in a format of the box plot. Figure 3 shows that the comparison of totalTime under varied numbers of functions for ASF, ADF, ASW, and GCC. We observe that ASF has the lowest and most stable totalTime, whereas GCC is the opposite. Additionally, when the number of functions contained in sequence workflows does not exceed 40, the overall result about totalTime of ADF is lower than that of ASW. However, when the number of functions increases (larger than 40), totalTime of ADF begins to exceed that of ASW. To explore which factors affect totalTime, we observe the distribution results of funTime and overheadTime. We find funTime is longer than the theoretical execution time of functions in Figure 4, where the origin point of the Y-axis on each sub-graph is the theoretical execution time of functions, i.e., 2s, 5s, 10s, 20s, 40s, 80s, 100s, 120s. Figure 4 also shows that ADF has the lowest funTime, followed by ASW, ASF, and finally GCC. For the overheadTime comparison, due to the space reason, its distribution is not displayed. We find that no matter how many functions are in sequence workflows, ASF has often the lowest overheadTime, and GCC is still the highest. In particular, for the change trend pf ADF and ASW, it is basically consistent with the comparison of totalTime in Figure 3. Thus, the changing trend of the total time of workflows in sequence workflow is mainly affected by the orchestration overhead of workflows. Reducing the orchestration overhead is vital for serverless workflow. Strategies about workflow start, state transition, and function scheduling need to be rethought to define by cloud providers.
4.1.2. Parallel application scenario
Figure 5 shows that totalTime, funTime, and overheadTime of varied numbers of functions contained in parallel workflows for ASF, ADF, ASW, and GCC. As more serverless functions are added to parallel workflows, totalTime of ASF, ADF, ASW, and GCC is showing an increasing trend. From the percentage values (the ratio of funTime to totalTime) in ASF and ADF, we can observe that totalTime is mainly used for their funTime. Values in ASF ranges from 85.33 to 99.25, whereas ADF is 85.62 to 95.94. Additionally, for ASW, when the number of functions is small, its totalTime is mainly used for function executions. However, when more functions are added into parallel workflows, its proportion values gradually decrease. It illustrates that overheadTime is increasing with the increase of the number of functions. On the contrary, for GCC, when the number of functions is small, its proportions is sufficiently low. It illustrates that the time consumed is longer for overheadTime in GCC. When more functions participate into parallel workflows, funTime of GCC gradually increases. The possible reason is that there are many parallel functions and the execution scheduling between them is heavy.
In parallel workflows, theoretically, all serverless functions with the same task start and complete at the same time. Thus, excluding the execution time of a single function from totalTime is the theoretical orchestration overhead (theooverheadTime) of this workflow. Figure 6 represents the comparison of overheadTime and theooverheadTime under various numbers of functions contained in parallel workflows for ASF, ADF, ASW, and GCC. It shows that theooverheadTime increases as more functions are added into parallel workflows. We find that theooverheadTime is much larger than overheadTime. The value next to the bar indicates the percentage of theooverheadTime higher than overheadTime. Specifically, for ASF, its value can arrive as high as 12862.54, ADF is as high as 1832.45, ASW is as high as 200.91, and GCC is as high as 4020.73. In addition, as the number of parallel functions increases, overheadTime becomes longer for ASF, ADF, and ASW. It takes a certain amount of time to process the branch and merge in parallel workflows. However, there is certain fluctuation in overheadTime of GCC, and fluctuation may be caused by its environment.
Figure 7 and Figure 8 are comparisons of totalTime and overheadTime under various numbers of functions contained in parallel workflows. Figure 7 shows that GCC has the longest totalTime in parallel experiments. When parallel functions with small-scale (less than or equal to 10), totalTime of ASF, ADF, and ASW is not much different, but the result of ADF is the lowest. When more functions (between 10 and 100) are paralleled into the workflow, ASW begins to show its advantages that have the totalTime result with lower and more stable compared to ASF and ADF. Since ASW has a limit of 100 for parallel tasks, no more parallel functions can be executed. In the case of parallel functions with greater than 100, ADF can complete workflows in a shortertotalTime. Through observing the distribution of funTime, we find that the changing characteristics of funTime is the same as totalTime of Figure 7 for ASF, ADF, and ASW. Due to space reasons, the distribution figure is not displayed. It illustrates that the changing trend of totalTime depends on funTime. Figure 6 shows that the number of functions affects overheadTime of parallel workflows. Similarly, Figure 8 also shows such characteristics. Moreover, when the number of functions does not exceed 40, ADF has the lowest overheadTime. As the number of functions increases from 40 to 120, ASF exhibits lower overheadTime than ADF, ASW, and GCC. However, overheadTime values are relatively small for ASF, ADF, and ASW, and have little effect on their totalTime. Thus, in actual scenarios, the effect of totalTime in parallel workflows is usually considered.
For ES1, see Findings F.1, F.2, F.3, F.4 and Implications I.1, I.2, I.3, I.4 in Table 1.
4.2. Data-flow Complexity (ES2)
Data complexity reflects on the sizes of data payloads passed among serverless functions in a workflow.
4.2.1. Sequence application scenario
Figure 9 shows the performance of data payloads between 0B to B for ASF, ADF, ASW, and GCC. We add additional measurements for each serverless workflow service and find some problems. (i) For ASF, the size limit of the data payload in a workflow is 256KB (B). We conduct measurements about the data payload with B, and its totalTime, funTime, and overheadTime are respectively 6,599s, 5,683, and 0.916s. In addition, when we add additional measurements that the data payload is large than B, a validation error is detected, and it prompts the value at “input” failed to satisfy the constraint and must have a length less than or equal to 262,144 (i.e., B). This error illustrates the data payload restriction described in the ASF document is consistent with the actual usage. (ii) For ADF, its document does not mention its size limit about the data payload. In order to measure whether ADF supports a larger data payload, we conduct measurements of the data payload with B, and its totalTime, funTime, and overheadTime are 27.613s, 6.688s, and 20.925s, respectively. (iii) For ASW, it exists the concept of the local variable. When the payload is set as B, we find that a failure occurred. It also verifies that the total size of the input, output, and local variables of the step in ASW cannot exceed 32KB. To observe the impact of the data payload, we add measurements of the data payload with B to Figure 9. (iv) For GCC, when the data payload is set as B, a “mysql” error occurs that storing a message is bigger than 65,535 bytes. We check the environment resources of GCC and find that Cloud SQL is used to store Airflow metadata to minimize the possibility of data loss. Thus, experiments of the data payload with B cannot be performed due to the storage limitation.
Figure 9 shows that when the data payload is less than or equal to B, totalTime, funTime, and overheadTime of ASF, ADF, and ASW have little effect. When the data payload is greater than B, totalTime of ASF and ASW increases slightly. However, considering the results of ADF in the data payload B, we find that totalTime of ADF increases significantly. Thus, we conclude that the ASF, ADF, and ASW have a little impact under low data payload conditions. Only under high data payload conditions will ASF, ADF, and ASW have a certain impact. We also find that totalTime about the data payload B of ADF is not much different from totalTime about the data payload B of ASF. It illustrates that ASF is more suitable in high data payloads (between B and B). However, ADF can achieve larger data payloads (larger than B) in sequence workflow than ASF, ASW, and GCC. For GCC in Figure 9, we observe that the data payload has much effect on overheadTime compared with funTime. Similarly, it shows a consistent conclusion with “F.2” of Table 1.
To compare the result distribution of measurements, we draw their respective box plots. Figure 10 shows that the comparison of totalTime under various data payloads in sequence workflows. In the low data payload range ( B), totalTime of ASF, ADF, and ASW is not much different, and totalTime of ASF and ADF is lower than ASW. In the high data payload range (between B and B), totalTime of ASF is the lowest. Considering previous analysis about the data payload with between B and B in Figure 9, similarly, ASF is lower than ADF with regard to totalTime. However, whether in the low data payload or high data payload, totalTime of GCC is the highest. For the distribution of funTime, similar to Figure 4, ADF has the lowest funTime, followed by ASW, ASF, and finally GCC. Due to space reasons, the distribution figure is not displayed. Specifically, when payloads are within B, values of ASF, ADF, and ASW maintain between 5s and 5.6s, while GCC is between 20s and 150s. Figure 11 is the comparison of overheadTime under various data payloads in sequence workflows. We observe that overheadTime of ASF is the lowest among them, and its result distribution is more compact and stable when the data payload is no greater than B. Additionally, previous results of the data payload with between B and B in Figure 9 also shows that ASF is lower than ADF with regard to overheadTime. In this situation, ASF is the most best choose. However, when the data payload passing among functions grows to over 256KB, if developers still want to use ASF, advice to use Amazon S3 to store the data, and pass the Amazon Resource Name (ARN) instead of raw data.
4.2.2. Parallel application scenario
Figure 12 represents the performance of various data payloads in parallel workflow for ASF, ADF, ASW, and GCC. First, we discuss the performance of totalTime. totalTime of ASF and ASW is basically not affected by low data payloads. However, transmit the data payload with B into parallel workflows in ASF, and produce 5 times B data size to the workflow output. The high data output (larger than B) causes a failure of the workflow execution. For ASW, it has the data payload limit (32KB), and the merge of parallel functions also need to be considered. From Figure 12, we observe that when data payloads are within B in ADF, its totalTime keeps stable. In the high data payload (¿ B), totalTime of ADF increases. We also carry out the parallel experiments with a data payload of B, where totalTime is 2,130s that is larger than the data payload with B. For GCC, totalTime is affected by whether there is a payload or not. When there is a payload, totalTime will increase, but as the payload increases, it does not show a regular trend. Then, we discuss the performance of funTime and overheadTime. funTime and overheadTime of ASF and ASW do not change much and are basically stable (maintain acceptable fluctuations, e.g., 100ms). For ADF, under high data payloads, overheadTime increases greatly, and funTime does not change much. Thus, overheadTime of ADF under high data payloads is the main reason affecting totalTime change in parallel workflows. For GCC, funTime and overheadTime both increase in parallel workflow with the payload transmission. Thus, only under high payload conditions will ASF, ADF, SW have a certain impact, while GCC is affected by whether there is a payload or not.
The result distribution of totalTime is shown in Figure 13. Result distributions of funTime and overheadTime are similar to Figure 13. Due to space reasons, figures are not displayed. When the data payload is set to be small ( B), totalTime, funTime, and overheadTime of ASF and ADF are low and relatively stable, whereas GCC is discrete and volatile. When data payloads are between B and B, totalTime, funTime, and overheadTime of ASF are lowest. However, if developers want to pass into a large payload, only ADF or ASF with the external storage can execute in parallel workflows.
For ES2, see Findings I.5, I.6, I.7 and Implications I.5, I.6, I.7 in Table 1.
4.3. Function Complexity (ES3)
Function complexity reflects on the specified duration time of serverless functions contained in a workflow.
4.3.1. Sequence application scenario
Figure 14 represents the performance of various specified duration times of functions in sequence workflows for ASF, ADF, ASW, and GCC. totalTime of ASF, ADF, and ASW increases as the specified duration time of functions gradually grows, whereas totalTime of GCC is not affected. Besides, there is no obvious trend in GCC for funTime and overheadTime under various specified duration times of functions. This situation in Figure 14 is as described in “F.2” of Table 1.
The specific changes about overheadTime are shown in Table 3. When the number of functions contained in sequence workflows is fixed, overheadTime generally does not increase significantly for ASF, ADF, ASW, and they are roughly maintained within a certain range.
For ASF, the range is 0.1s to 0.2s, ADF is 0.2s to 0.5s, and ASW is 0.5s to 1.2s. Thus, we conclude that changes within the function may not affect overheadTime, whereas changes between the workflow structure and data payload may have a certain impact on it. However, the fluctuation of overheadTime of GCC is relatively large, ranging from 430s to 900s.
Due to space reasons, distribution figures about totalTime, funTime, and overheadTime are not displayed. We find that both ASF and ADF have lower totalTime than ASW and GCC. Furthermore, the measurement results of ASF are more stable than ADF. For the funTime distribution, similar to Figure 4, ADF has the lowest funTime, followed by ASW, ASF, and finally GCC. Regarding the distribution of overheadTime, we find that ASF has the lowest result among all serverless workflow services overall.
4.3.2. Parallel application scenario
When the number of functions contained in parallel workflows is deterministic and the specified duration time of functions increases, totalTime and funTime must increase. However, totalTime and funTime of GCC does not increase with the increase of the specified duration time of functions in parallel workflows. Due to space reasons, this figure is not displayed. To observe the changes in overheadTime more intuitively, Figure 15 shows the comparison for overheadTime and theooverheadTime. We find that theooverheadTime is larger than overheadTime. We also find that overheadTime and theooverheadTime for ASF, ADF, ASW, and GCC do not change significantly with the growth of the specified duration time of functions, and they basically fluctuate within a certain range. ASF, ADF, and SW are below 0.5s, while GCC is below 200s. At the same time, Figure 15 shows that overheadTime of ASF is relatively stable, whereas ADF, ASW, and GCC have certain fluctuations.
The distributions of totalTime, funTime, and overheadTime are similar. Due to space reasons, figures are not displayed. We find that the results of ADF are the lowest in terms of totalTime, funTime, and overheadTime
, whereas GCC is the highest. However, as far as the stability of the result is concerned, ASF is the best. For ASW, its results are relatively unstable compared with ADF, and there are more outliers.
For ES3, see Findings F.8, F.9 and Implications I.8, I.9 in Table 1.
For verifying our findings, we conduct experiments of two serverless application workloads, i.e., KMeans and MapReduce. Then, we discuss some limitations of our study.
KMeans application is implemented in a sequence workflow, and accomplishes the clustering functionality for point sets with three-dimensional space. First, use a serverless function to generate 1500 points, because the data payload limit of ASW cannot generate the data of 2000 points. Second, initialize the centroid points randomly. For the KMeans algorithm, the K-value of clustering needs to be given in advance. We adopt Elbow Method presented by Yuan et al. (Yuan and Yang, 2019) to determine K as 8. Next, based on the point set and centroid points, perform the clustering functionality of KMeans. Finally, output and show the clustering result.
Figure 16 represents the comparison of totalTime, funTime, and overheadTime of the KMeans application for ASF, ADF, ASW, and GCC. ASF shows the shortest totalTime and overheadTime, and ADF has the shortest funTime (F.8 in Table 1). This is consistent with the implication I.3 in Table 1. We also find the same inferred that the changing trend of totalTime in sequence workflow is mainly affected by overheadTime (F.3 in Table 1) because funTime cost the relatively low and stable time in this KMeans application. In terms of data-flow complexity about the data payloads, the previous conclusion (I.6 in Table 1) is that when the data payload is less than , ASF is advised to use. In the KMeans application, the data payload size is within , and the performance of ASF is best considering totalTime, overheadTime. Figure 17 shows execution times of respective functions. Compared with GCC, the execution time of each function of ASF, ADF, and ASW is lower and more stable. It can still be concluded that the performance ADF is the best on the function execution (F.8 in Table 1).
MapReduce application is implemented in a parallel workflow and is accomplished by the workflow solution example (15). The application goal is to generate a batch of data to be processed, the value of data is or . Count the number of occurrences of various data leveraging MapReduce processing frame mode.
I.1 in Table 1 presents that ADF is used in small-scale activity-intensive parallel workflows. In Figure 18, it also shows ADF has the relatively short totalTime. However, results from totalTime and overheadTime of ASF are more stable than ADF. In the MapReduce application, there are certain data payload to be transmitted. In the presence of the data payload, the previous conclusion is the ASF is more suitable when data payloads are less than B in parallel workflow (I.6 in Table 1). Thus, results of ASF show a relatively satisfactory totalTime and overheadTime. For respective function execution times, we also find that the performance of ADF is best and the same as our previous conclusions (F.8 in Table 1). Due to space reasons, the distribution figure about respective execution times of functions is not displayed.
Limitations. We discuss the limitations of our study. (i) Selection of application scenarios. Our study is based on sequence and parallel workflows. We may ignore other complex structures, e.g., choice, missing valuable insights with regard to the structure complexity of workflows. In future work, we plan to extend our study to diversify workflow structure to further obtain interesting findings. (ii) Experiments of GCC. In our study, the results of GCC fluctuate greatly, and we suppose it may be related to its environmental setting. To minimize this impact, we repeat several measurements. From the perspective of serverless computing, we suppose that functions performed in DAGs of GCC are not serverless, i.e., GCC is not designed for orchestrating serverless functions. Until October 2020, we verify our assumption. Beta launch of Workflows (43) service is released in the category of serverless computing of Google Cloud. However, its functionality has been immature yet, and we look forward to furthering research in our future work.
6. Related Work
In this section, we summarize the related work serverless computing and serverless workflow.
Serverless computing is a new paradigm of cloud computing. In general, computation offloading for various applications can be accomplished on cloud (Huang et al., 2016; Chen et al., 2020; Huang et al., 2019), mobile client (Zhang et al., 2017; Huang et al., 2017)) using different hardware (such as CPU, GPU). Particularly, edge computing is an emerging and promising technology dedicated for improving use experience of today’s interactive applications (Xu et al., 2020a). This also promote the development of service usage. In this situation, service-oriented situational applications have shown great potential in solving immediate and quick roll-out problems (Liu et al., 2014). The concept and technique (Liu et al., 2019; Xu et al., 2020b) of service computing is becoming more and more mature. Thus, Function-as-a-Service becomes a popular trend. Major cloud vendors present the correspond strategies like resource management rethink (Liu et al., 2018a, 2016b, 2016a) and performance (Ma et al., 2015; Liu et al., 2018b) of previous mobile web work. In serverless computing, cloud paltforms uniformly manage resources to ensure scalability and load balancing.
Nowadays, serverless computing has already been used in various scenarios including Internet of Things and edge computing (de Lara et al., 2016), data processing (Jonas et al., 2017; Chard et al., 2017; Fouladi et al., 2017), scientific workflow (Malawski et al., 2020), system security (Bila et al., 2017), etc. Authors generally believe running applications in a serverless architecture is more cost-efficient than microservices or monoliths. Wang et al. (Wang et al., 2018) conducted the largest measurement study for AWS Lambda, Azure Functions, and Google Cloud Functions, and they used more than 50,000 function instances to characterize architectures, performance, and resource management. Preliminary measurements on AWS Lambda, Azure Functions, Google Cloud Functions, and IBM OpenWhisk were accomplished by McGrath et al. (McGrath and Brenner, 2017), and they found AWS can achieve better scalability, cold-start latency, and throughput than other platforms.
In this paper, we present the first empirical study on characterizing and comparing existing serverless workflow services, i.e., AWS Step Functions, Azure Durable Functions, Alibaba Serverless Workflow, and Google Cloud Composer. We first compare their characteristics from six dimensions, e.g., orchestration way, data payload limit, parallelism support, etc. Then we measure the performance of these serverless workflow services under varied experimental settings (i.e., different levels of activity complexity, data-flow complexity, and function complexity). Based on the results, some interesting findings, e.g, only under high data-flow complexity conditions will the performance of serverless workflow have a certain impact, can be useful. and help guide developers and cloud providers. Finally, we report a series of findings and implications to further facilitate the current practice of serverless workflow.
-  2018 serverless community survey: huge growth in serverless usage. Note: https://www.serverless.com/blog/2018-serverless-community-survey-huge-growth-usageRetrieved on September 10, 2020 Cited by: §3.
- SAND: towards high-performance serverless computing. In Proceedings of the 2018 USENIX Annual Technical Conference, pp. 923–935. Cited by: §1, §6.
-  Aliyun serverless workflow (in chinese). Note: https://help.aliyun.com/product/113549.html?spm=a2c4g.111866220.127.116.1124f72UVzDSoRetrieved on September 10, 2020 Cited by: §1.
-  Amazon. Note: https://aws.amazon.com/?nc2=h_lgRetrieved on September 10, 2020 Cited by: §1.
-  AWS step functions documentation. Note: https://docs.aws.amazon.com/step-functions/index.htmlRetrieved on September 10, 2020 Cited by: §1.
-  AWS step functions increases payload size to 256kb. Note: https://aws.amazon.com/about-aws/whats-new/2020/09/aws-step-functions-increases-payload-size-to-256kb/?nc1=h_lsRetrieved on September 10, 2020 Cited by: §2.
-  Azure functions. Note: https://azure.microsoft.com/en-us/services/functions/Retrieved on September 10, 2020 Cited by: §3.
- Leveraging the serverless architecture for securing linux containers. In Proceedings of 37th IEEE International Conference on Distributed Computing Systems Workshops, pp. 401–404. Cited by: §1, §6.
- Approaches to compute workflow complexity. In Proceedings of Role of Business Processes in Service Oriented Architectures (Dagstuhl Seminar Proceedings), Cited by: 1st item, 2nd item, §3, §3.
A case for serverless machine learning.
Proceedings of Workshop on Systems for ML and Open Source Software at NeurIPS, Vol. 2018. Cited by: §1.
- Ripple: home automation for research data management. In Proceedings of 37th IEEE International Conference on Distributed Computing Systems Workshops, pp. 389–394. Cited by: §1, §6.
A comprehensive study on challenges in deploying deep learning based software. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 750–762. Cited by: §6.
- Valve: securing function workflows on serverless computing platforms. In Proceedings of the 29th International Conference on World Wide Web, pp. 939–950. Cited by: §1.
- Hierarchical serverless computing for the mobile edge. In Proceedings of IEEE/ACM Symposium on Edge Computing, pp. 109–110. Cited by: §1, §6.
-  ETL-dataprocessing using mapreduce. Note: https://github.com/awesome-fnf/ETL-DataProcessingRetrieved on September 10, 2020 Cited by: §5.
- Encoding, fast and slow: low-latency video processing using thousands of tiny threads. In Proceedings of 14th USENIX Symposium on Networked Systems Design and Implementation, pp. 363–376. Cited by: §1, §6.
-  Function-as-a-service market by user type (developer-centric and operator-centric), application (web & mobile based, research & academic), service type, deployment model, organization size, industry vertical, and region - global forecast to 2021. Note: https://www.marketsandmarkets.com/Market-Reports/function-as-a-service-market-127202409.htmlRetrieved on September 10, 2020 Cited by: §1.
-  Google cloud composer. Note: https://cloud.google.com/composer?hl=enRetrieved on September 10, 2020 Cited by: §1.
-  Google. Note: https://cloud.google.com/Retrieved on September 10, 2020 Cited by: §1.
- Programming situational mobile web applications with cloud-mobile convergence: an internetware-oriented approach. IEEE Transactions on Services Computing 12 (1), pp. 6–19. Cited by: §6.
- Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 1674–1683. Cited by: §6.
- Shuffledog: characterizing and adapting user-perceived latency of android apps. IEEE Transactions on Mobile Computing 16 (10), pp. 2913–2926. Cited by: §6.
- Occupy the cloud: distributed computing for the 99%. In Proceedings of 2017 Symposium on Cloud Computing, pp. 445–451. Cited by: §1, §6.
- Cloud programming simplified: a berkeley view on serverless computing. arXiv preprint arXiv:1902.03383. Cited by: §1.
- ReWAP: reducing redundant transfers for mobile web browsing via app-specific resource packaging. IEEE Transactions on Mobile Computing 16 (9), pp. 2625–2638. Cited by: §6.
- Data-driven composition for service-oriented situational web applications. IEEE Transactions on Services Computing 8 (1), pp. 2–16. Cited by: §6.
- Rethinking resource management in mobile web: measurement, deployment, and runtime. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1347–1356. Cited by: §6.
- SWAROVsky: optimizing resource loading for mobile web browsing. IEEE Transactions on Mobile Computing 16 (10), pp. 2941–2954. Cited by: §6.
- Decentralized services computing paradigm for blockchain-based data governance: programmability, interoperability, and intelligence. IEEE Transactions on Services Computing 13 (2), pp. 343–355. Cited by: §6.
- I-jacob: an internetware-oriented approach to optimizing computation-intensive mobile web browsing. ACM Transactions on Internet Technology 18 (2), pp. 1–23. Cited by: §6.
- Triggerflow: trigger-based orchestration of serverless workflows. In Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems, pp. 3–14. Cited by: §6.
- Comparison of faas orchestration systems. In Proceedings of 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion, pp. 148–153. Cited by: §6.
- Measurement and analysis of mobile web cache performance. In Proceedings of the 24th International Conference on World Wide Web (WWW), pp. 691–701. Cited by: §6.
- Serverless execution of scientific workflows: experiments with HyperFlow, AWS lambda and google cloud functions. Future Generation Computer Systems 110, pp. 502–514. Cited by: §1, §6.
- Serverless computing: design, implementation, and performance. In Proceedings of 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops, pp. 405–410. Cited by: §1, §6.
-  Microsoft. Note: https://azure.microsoft.com/en-us/Retrieved on September 10, 2020 Cited by: §1.
-  Serverless workflow applicable scenarios and best practices (in chinese). Note: https://developer.aliyun.com/article/751573Retrieved on September 10, 2020 Cited by: §1, §3.
- Serverless in the wild: characterizing and optimizing the serverless workload at a large cloud provider. In Proceedings of the 2020 USENIX Annual Technical Conference, pp. 205–218. Cited by: §1, §3, §3.
-  Standard vs. express workflows. Note: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.htmlRetrieved on September 10, 2020 Cited by: §2.
-  The concept of dag. Note: https://airflow.apache.org/docs/stable/concepts.htmlRetrieved on September 10, 2020 Cited by: §2.
- Peeking behind the curtains of serverless platforms. In Proceedings of the 2018 USENIX Annual Technical Conference, pp. 133–146. Cited by: §6.
-  What are durable functions?. Note: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharpRetrieved on September 10, 2020 Cited by: §1.
-  Workflows documentation on google cloud. Note: https://cloud.google.com/workflows/docsRetrieved on October 07, 2020 Cited by: §5.
- The case for fpga-based edge computing. IEEE Transactions on Mobile Computing. Cited by: §6.
- Approximate query service on autonomous iot cameras. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys), pp. 191–205. Cited by: §6.
Research on k-value selection method of k-means clustering algorithm. J—Multidisciplinary Scientific Journal 2 (2), pp. 226–235. Cited by: §5.
Enabling accurate and efficient modeling-based cpu power estimation for smartphones. In 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), pp. 1–10. Cited by: §6.