Harnessing the power of cloud computing can improve the delivery of computing, storage, networking, and applications over the Internet. Cloud computing enables IT enterprises to run multi-tenant applications and databases in large scale without considering upfront costs, updating software, and disposing outdated hardware. These merits drive IT enterprises to exploit either public or private cloud according to the required performance, the scale of their business, duration of resource usage, and compliance with data and application security. Hybrid cloud is a combination of private and pubic clouds to attain ”the best of all possible worlds” in terms of expanding resources, data availability, and data security111Hybrid and Multi-cloud solutions:https://azure.microsoft.com/en-au/solutions/hybrid-cloud-app/ [rimal2009].
However, the centralization of cloud servers introduces delay for time critical processing since a tremendous amount of data should be transferred from data sources to a cloud on a Wide Area Network (WAN). This downside has led to a new paradigm of computing, so-called edge computing, that enables processing and storage of data close to data sources rather than relying on sending data through multiple hops to a cloud and receiving response. Thus, edge computing improves response time and reduces bandwidth consumption for time-critical IoT applications. Nevertheless, edge computing is constrained by energy, networking, storage, and computing resources, which makes it unsuitable to deploy all kinds of applications and databases [wang2017enorm]. Such constraints imply that cloud and edge computing are complementary rather than alternatives to each other.
Therefore, the synergy of cloud and edge computing leads to task offloading
concept which means that a task is transferred from resource-constrained devices to platforms such as cluster, grid and cloud, or generally to more powerful servers. A task may vary from video gaming, Machine Learning (ML) processing, decision analytics, to database operations. The task offloading concept raises the question ofwhen, where and what task should be offloaded222The answer to “when” a task should be transferred is beyond the scope of this paper. Interested readers are refereed to [Jiang2019].?
To answer where a task should be offloaded, we considered three options: edge device, adjacent server, and remote servers. In our work, a laptop and a cluster of Raspberry Pi (RPi) are considered edge devices. A high performance server with a distance of several meters from the edge devices is adjacent server. Hybrid cloud is another option that consists of a combination of remote servers in OpenStack and Microsoft Azure clouds [Mansouri20]. All edge devices, adjacent server, and the hybrid cloud have been connected through WireGuard333WireGuard: https://www.wireguard.com VPN. These three options allow us to investigate offloading tasks under different scenarios in which the richness of resources and the distance between the location of data generators and data processing come into play.
The execution of database workloads on nodes is supposed to be the task in this study. Databases are generally classified into NoSQL and Relational databases[li2013performance]. The former one is designed to be used across large scale distributed infrastructure with different data models, while the latter one is used in small-scale infrastructure with a pre-defined data model. In addition, NoSQL databases aim to be faster and more scalable in comparison to Relational databases. Currently, there are more than 225 commercial and open-source NoSQL databases444NoSQL list: https://hostingdata.co.uk/nosql-database/ which are mostly used in large scale infrastructure to handle big data. Our main criteria to select NoSQL and Relational databases are popularity, usage, and commercialization by the well-known cloud providers - Amazon Web Services (AWS) and Microsoft Azure. Thus, we selected Cassandra, Mongo, Redis, and MySQL. These databases are often evaluated only in terms of execution time and scalability [Rabl2012][Kuhlenkamp2014][Li2013]. Despite being important, the sole use of these metrics is not enough. This is because such evaluation hides several attributes (e.g. energy consumption) that are quite critical in some scenarios such as mobile edge computing. Moreover, the existing studies (e.g., [Mansouri20][Li2014]) investigate the performance of these databases in single node/cloud environment. However, there is a paucity of research aimed at evaluating these databases in an integrated edge-cloud environment, where a database’s workload is offloaded from an edge device to a powerful cloud. Therefore, we investigated the resources cost of these databases as the workloads are offloaded from edge devices to the more powerful computing nodes.
The resources cost consists of energy, bandwidth and storage consumption. Energy consumption is a key cost function in the context of offloading [chen2018multi]. This is due to the fact that edge devices commonly have limited battery life, which depletes quicker with higher consumption of energy. Hence, users need to recharge the device on frequent basis, which directly affects users’ experience. Not only from the edge device perspective, energy efficiency is also important for adjacent servers/cloud. According to [gao2012s], in 2020, 8% of the worldwide electricity was expected to be consumed by data centers. Therefore, in this paper, we study energy consumption with a focus on the energy consumed by CPU, RAM and the rest of the system (i.e., SSD, ports, screen, and so on). Bandwidth consumption is another imperative cost of offloading so that it is accounted one of the reasons behind shifting from edge to cloud computing [yu2017survey]. We define this cost as the amount of bandwidth required to transfer data from edge devices to the more powerful servers. Storage cost, another essential attribute, refers to the data storage consumption of edge node or remote servers on which a task is performed [shi2016edge]. Therefore, the imperative research question we investigate in this work is: How efficient is a database in terms of resource costs (energy, bandwidth, and storage consumption) while performing various operations (e.g., read, write, insert) under different scenarios?. To answer this question, we designed different usage scenarios based on the richness of resources, connection types, and distance between the location of data generators and task processors.
In order to follow the selected criteria for different scenarios, we leveraged different computing nodes: RPi, laptop (termed edge node hereafter), high performance server (termed edge server node henceforth), a cluster of RPi, and a cluster of VMs in the hybrid cloud. We also exploited WiFi and cable connection options between a node that offloads database workloads/operations to a node that processes the database workloads. We consider offloading from resource-constrained nodes to richer computing nodes. Thus, we have six main categories for offloading scenarios: A single RPi (edge node, edge server node, hybrid cloud), edge node (edge server node, hybrid cloud), edge server node (hybrid cloud). In each scenario, the node in the left side of arrow offloads database workloads to the nodes in the right side of arrow. In addition, we considered non-offloaded (local) scenarios in which the node that issues and processes the database operations is the same. For this purpose, we ran databases on a RPi, edge node, and edge server node. We evaluated the indicated scenarios in terms of resource costs with the help of different tools. To measure energy consumption as a part of resource cost, we exploited Intel’s Running Average Power Limit (RAPL) [Khan2018]. This hardware feature allows us to measure energy consumption with reasonable accuracy. We used iperf3555Iperf3: https://iperf.fr, ifconfig666ifconfig: https://linux.die.net/man/8/ifconfig, and iftop777iftop: https://linux.die.net/man/8/iftop network tools to measure the traffic and bandwidth between the database worker/client that runs the YCSB workloads [Cooper2010] and the DB servers that host databases. These tools support bidirectional data transfer measurement of both TCP and UDP traffic. To measure storage consumption, we exploited df (abbreviation for disk free) as a standard Linux tool to monitor the amount of available disk space for file system on which database is deployed. In summary, our contributions are twofold:
We present a modular and extensible edge-cloud framework in which the most prominent distributed databases are installed and clustered
We evaluate the up-to-date resource usage evaluation and comparison between the four most used and modern distributed databases. This leads us to find several insights about the impact of resources richness, communication types and distance on the workloads offloading of distributed databases.
|[Rabl2012]||DDB||Private cloud||Cassandra, HBase, Redis,|
|Voldemort, VoltDB, MySQL||✘||✓(throughput,latency)||✘||✓|
|[Kuhlenkamp2014]||NDB||Public cloud||Cassandra and HBase||✘||✓(throughput,scalability)||✘||✘|
|[Klein2015]||NDB||Public cloud||MongoDB, Cassandra, Riak||✘||✓(throughput vs. consistency)||✘||✘|
|[Li2013]||DDB||Private cloud||MongoDB, RavenDB, CouchDB, MySQL Cassandra,Hypertable, Couchbase||✘||✓(throughput,latency)||✘||✘|
|[Abramova2013]||NSDB||Private cloud||Cassandra, MongoDB||✘||✓(latency)||✘||✘|
|[Mansouri20]||DDB||Hybrid cloud||Cassnadra, MongoDB, Riak,|
|CouchDB, Redis, MySQL||✘||✓(throughput, latency)||✘||✘|
|[Mansouri20dis]||DDB||Hybrid cloud||Cassnadra, MongoDB, Riak,|
|CouchDB, Redis, MySQL||✘||✓(throughput vs. distance)||✘||✘|
|[Bani2016]||DDB||NA||MongoDB, MySQL, PostgresSQL||✓||✓(response time)||✘||✘|
|[Li2014]||DDB,BD||Single node||Cassandra, HBase, Hive, Hadoop||✓||✓(response time)||✘||✘|
|[Hajji2016]||General||RPis||Hadhoop, Spark||✓||✓(CPU usage)||✓||✘|
|[Scolati2019]||General||RPis||Hadhoop, Spark||✘||✓(CPU and RAM usage)||✓||✘|
|Our work||DDB||Hybrid cloud, Edge||MongoDB, Cassandra, Redis, MySQL||✓||✓(Run-time)||✓||✓|
DDB stands for distributed databases and includes both relational and NoSQL databases. NDB stands for NosQL databases and includes only NoSQL databases. BD stands for Big Data.
2 Related Work
To position the novelty of our work with respect to the state-of-the-art, we divided the related studies into the following categories. Table I compares these notable studies.
Performance Evaluation of Distributed Databases on Clouds: With the advent of NoSQL databases, researchers conducted a variety of experimental evaluations and achieved notable results from performance perspective. Rabl et al. [Rabl2012] presented a comprehensive performance evaluation in terms of throughput, latency and disk usage for six modern databases on two different private clusters using the YCSB workloads. Kuhlenkamp et al. [Kuhlenkamp2014] evaluated the correlation between scaling speed and throughput for Cassandra and HBase on different Amazon EC2 infrastructure configurations. Klein et al. [Klein2015] analysed the impact of consistency models (eventual, quorum-based, and strong) of MongoDB, Cassandra and Riak running on a single node and a cluster of nodes at Amazon EC2. In [Li2013], authors investigated the read and write performance. They concluded that not all No-SQL databases have outperformed SQL database. In [Abramova2013], the study compared MongoDB and Cassandra in read and write performance on VMware Player. We recently evaluated the performance of six distributed databases on a hybrid cloud [Mansouri20]. Also, we measured the impact of distance on the performance of distributed databases as the vertical and horizontal scalability of a hybrid cloud are changed [Mansouri20dis]. Differently, this work evaluated the resource utilization of distributed databases in the edge-cloud framework.
Energy Evaluations of Distributed Databases on Cloud: Several studies evaluated distributed databases in terms of energy efficiency, where queries optimization comes into play. Mahajan et al. [Mahajan2017] evaluated the impact of the optimized indexed/non-indexed and simple/complex quires on performance, power and energy efficiency for MongoDB, Cassandra and MySQL in a single and shared server. Bani [Bani2016] presented an empirical study on the impact of cloud applications (i.e., Local Database Proxy, Local Sharding-Based Router, and Priority Message Queue) on the performance and energy consumption of MongoDB, MySQL, and PostgreSQL. Li et al. [Li2014] studied a benchmark of energy consumption of Selection, Grep, Aggregation, and Join operations for Cassandra, Hbase, Hive, and Hadoop on a single node. These studies differ from our study in the infrastructure model, applications, and evaluation metrics.
Performance Evaluation of Distributed Databases in Edge Computing: Some researchers studied the capability of this paradigm in the context of data-intensive applications. In [Morabito2017], the author deployed different models of RPis in edge computing in the form of native (bare metal) and Docker virtualization to evaluate energy, network, disk and RAM consumption under intensive- computing and networking scenarios. Several studies made effort to select or adapt cloud-based distributed databases for edge computing paradigm. Alelaiwi et al. [ALELAIWI2018] explored an analytic hierarchy process to evaluate usability, portability, and supportability of database development tools for IoT databases in the edge computing. Mayer et al. [Mayer2017] tailored distributed data store for fog computing and deployed the MaxiNet network emulator [szymaniak2005] on a server with 8 cores to simulate 6 fog nodes to measure operations latency for Cassandra based on the proposed policy. Lin et al. [Lin2019] presented a protocol to measure CPU and bandwidth usage of read-only and update transactions for PostgresSQL. None of these studies evaluated the energy, bandwidth and storage consumption of distributed databases.
Several studies explored the deployment of big data frameworks on RPi. In [Hajji2016], authors evaluated the performance of HDFS (Hadoop Distributed File System) on a single RPi and a 12-node RPi cluster and demonstrated that overhead for intensive computing workloads is significant. Scolati et al. [Scolati2019] conducted the evaluation of the same frameworks, however, on a containerized cluster of RPis to measure CPU and RAM usage and delineated the limitation of RPis usage for such frameworks. Our work differs from these studies in application and infrastructure types.
Computation Offloading Towards Edge Computing: Researchers recently focused on computation-intensive tasks offloading from mobile devices to the richer computational nodes to extend battery life of a resource-constrained device/node. Authors in [vu2020] provided a joint offloading and resource allocation framework for a hierarchical cooperative fog computing nodes to optimise energy consumption through intensive simulation experiments. Pei et al. [Pei2020] studied energy-efficient resource allocation through latency-sensitive tasks offloading in the hierarchical Mobile Edge Computing (MEC) architecture in heterogeneous networks. Their numerical simulation-based experiments exhibited energy efficiency improvements of the proposed solution. In contrast, Echo framework [Lin2019] accelerates computational tasks and save energy consumption in the layered architecture including mobile and cloud-based server nodes. Kang et al. [Kang2020] developed an effective offloading model through collaboration between edge nodes to avoid overload on a particular node and improve response time. Ghmary et al. [Khan2020] conducted simulation-based experiments for offloading delay-constrained tasks from a single mobile device to MEC resources to save energy consumption and reduce processing time of tasks. Authors in [Duy2019] developed human- and device-driven intelligent algorithms for offloading tasks to reduce energy consumption and latency in the context of edge computing nodes. Wang et al.[Wang2019], however, proposed a new architecture for data synchronization through offloading from cloud computing to improve performance and data security. For more details in this context, interested readers are refereed to [AAZAM2018][MANSOURI2021]. Compared to the above discussed studies, we have empirically investigated energy, bandwidth, and storage consumption for the workload offloading of distributed databases in the edge-cloud context.
3 Design and Implementation of Edge-Cloud Framework
This section discusses the design and implementation of our edge-cloud framework.
3.1 Implementation of Edge-cloud Framework
We designed a hierarchical edge-cloud framework (Fig. 1). The bottom layer is hardware infrastructure that consists of on-premise resources, Microsoft Azure datacenter, RPis, and edge nodes. The network connection layer includes WireGuard, as a modern VPN, that provides network communication across different nodes. We extended these two layers of the framework with three main components (Fig. 2).
The first component is a hybrid cloud in which a private cloud is combined with one or more public clouds. To have a repetitive, safe, and easy procedure to create, change, and improve infrastructure deployment across both clouds, we used Terraform888Terraform: https://www.terraform.io/. Cloud solutions usually provide illusion of unlimited resources, shift the cost burden from capital to operational expenditure, and support high scalability. However, increasing data and real-time analysis requirements have given rise to edge computing to combat the drawbacks of cloud. Thus, as depicted in Fig. 2, we deployed RPis and edge nodes, where their computing and storage resources form a hierarchy with regard to resource richness. To have richer resources in the edge computing, we built a cluster of 8 RPis, where RPis are connected through a Gigabit switch.
To make a network connection across all computing nodes in the edge-cloud framework, we leveraged WireGuard that is faster and more cost efficient in comparison to the VPNs provided by the commercial cloud providers [Mansouri20]. A key metric of network connection strength is the network throughput (measured in transferred and received data per second). We leveraged Iperf3 999Iperf3: https://iperf.fr, a cross-platform networking tool, to measures the throughput between the two end nodes in both directions. We installed this tool on all nodes and ran for 10 minutes to record the throughput between each pair of computing nodes, as labeled on links in Fig. 2. As can be seen, the network connection between VMs in the private cloud achieves the highest throughput of 3.57 Gbits/sec, where this value for VMs in the public cloud holds the second rank and is of 998 Mbits/sec. The reason behind such values is the VMs in the private cloud may reside on the same server, while in the public cloud VMs might be provisioned in different servers or even different racks. In contrast, the lowest network throughput is observed between two VMs across private and public clouds (24.6 Mbits/sec), and the master RPi and the broker VM in the private cloud (34.4 Mbit/sec).
The last layer of the framework is VMs and edge deployment (Fig. 1), where we used Terraform to provision the required VMs in the hybrid cloud. The output of this layer is a set of VM IPs, which enables the deployment of distributed databases across computing nodes to measure the resource utilization discussed in the next section.
|Edge-node-energy-consumption||The energy consumption of edge node/edge server node||RAPL|
|Battery||The energy consumption of battery for edge node||Upower|
|USB-energy-consumption||The energy consumption of master RPi||USB Power Meter|
|Power-socket-energy-consumption||The energy consumption of RPis cluster||Energy Cost Meter|
|Bandwidth consumption||The transferred data between all computing nodes||iftop utility command|
|Storage consumption||The required storage to run the YCSB Worklaod||df utility command|
3.2 Implementation of Resources Consumption Probes
We discus the following resource consumption probes as summarized in Table II.
Energy consumption probes: These probes are implemented through both software and hardware tools, which depend on the facilities provided by the computing nodes. For the edge node and edge server node, we provided an edge-node-energy-consumption-probe that leverages energy Running Average Power Limit (RAPL) to measure the energy consumption of CPU and RAM [Khan2018]. For the edge node, we implemented a battery-probe, which exploits Upower101010Upower: https://www.commandlinux.com/man-page/man1/upower.1.html command to measure the battery depletion of the edge node. Based on these two probes, we measured the energy consumed by the rest of the system (i.e., storage, ports, screen, etc.) in the edge node. For the master RPi, we implemented an USB-energy-consumption-probe in which the energy consumption of the master RPi is recorded with the help of USB Power Meter (UPM) – WEB U2 model. UPM can provide voltage readings down to 0.01V and current to 0.001A, which can be either displayed on the built-in LCD or recorded in a file. We exploited Energy Cost Meter (ECM) to implement power-socket-energy-consumption-probe to measure the energy of the whole cluster of RPis. ECM measures voltage and current range of 200-276V AC and 0.01-10A, respectively. For the virtualisation resources in the hybrid cloud, we did not provide energy measurement probes for two main reasons. (i) The depletion of energy resources of the edge computing nodes is crucial in the context of edge computing. (ii) It is almost impossible to measure energy consumption of a server for individual task in a cloud since each server provides multi-tenant services.
Bandwidth consumption probe: This probe captures the amount of data transferred and received between nodes. This measurement is implemented through iftop, which monitors the ingress and egress bandwidth of a network interface. This service is termed bandwidth-consumption-probe and we activate it on the network interfaces of computing nodes issuing and receiving operations.
Storage consumption probe: This probe measures the consumed storage during operations execution against a particular database. We used the standard df to implement the storage-consumption-probe. We activate this service during experiments on the disk hosting the database.
4 Distributed Databases and Workloads
The deployment of distributed databases includes database server (DBS) and worker, where the former hosts the database engine and data and the latter one executes operations against database servers.
4.1 NoSQL and Relational Databases
NoSQL databases are designed to be exploited across large distributed systems. These databases are significantly more scalable and faster in handling very large data loads than traditional relational databases [Davoudian2018]. Unlike relational databases, NoSQL databases allow for querying and storing loosely-structured data. We deployed MongoDB and Cassandra as the document-based and the key-value NoSQL databases respectively. For relational data model, we selected MySQL as the most-used in the industry sector. In addition to these disk-based databases, we chose Redis as a general purpose and in memory database. It is worth noting that all these selected databases are supported by the well-known cloud providers (i.e., Azure, AWS, and Google). The interested readers are referred to [Mansouri2017a][Han2011].
|Workload A||50% Read + 50% Update||Write-intensive|
|Workload B||95% Read + 5% Update||Read-intensive|
|Workload C||100% Read||Read-only|
|Workload D||95% Read + 5% Insert||Read-latest|
|Workload E||95% Scan + 5% Insert||Scan|
|Workload F||50% Read + 50% update||RMW|
RMW stands for read-modify-write
We used YCSB as a well-known benchmark workload to evaluate both NoSQL and relational databases. The YCSB workload facilitates a set of tunable parameters and acts on a loose schema including a string key assigned to a collection of fields, which themselves are string to binary blob key-value pairs. The YCSB Workload consists of elementary operations such as read, write, insert for a record based on a single key. YCSB also supports a complicated “scan” operation, which refers to a paging operation starting from a particular key. Due to these advantages, our experiments targeted 6 core workloads as summarised in Table III.
The YCSB workload provides a set of configuration parameters. We used the default values except for two parameters: the number of records and operations. We adjusted them based on our hardware infrastructure support. For RPis and the edge node, we setup 10K records, while for edge server node, which is more powerful, we set this parameter to a value of 10M records. Nevertheless, we used a variable value for the number of operations in each workload for RPi, edge node, edge server node. The reason behind such setting is that the information about battery depletion of the edge node is updated every two minutes. If we set the number of operations with a small value, then the implemented battery-depletion-probe might record zero energy consumption. This implies that the workload runs out before updating data regarding to battery depletion. To avoid such issue, we initially ran the workload for 10K operations and then the number of operations was calculated as throughput achieved for 10K operations multiplied by 1200 second (20 minutes). This duration time of 20 minutes for running the YCSB workload gives a good enough precision with respect to the battery depletion information.
5 Performance Evaluation
In this section, we describe the setup of our edge-cloud framework and delineate our experimental results.
|Private VM||1-8||2||4 GiB||40 GiB|
|Public VM||0-7||1||2 GiB||30 GiB|
|Server edge node||1||8||16 GiB||1 TB|
|Edge node||1||4||8 GiB||250 GiB|
|RPi||7||4||1 GiB||16 GiB|
5.1 Testbed Setup
The edge-cloud framework consists of the following computing components as summarized in Table IV.
Hybrid cloud: We built the hybrid cloud on the on-premises infrastructure virtualized through OpenStack at Adelaide University and Azure datacenter in Sydney region [Mansouri20]. We exploited clusters of VMs in the hybrid cloud with a size of (), where and are the number of nodes on the private and public clouds, respectively. Based on the cluster size, we considered 3 combinations of the hybrid cloud configuration settings: , , . This allowed us to evaluate the hybrid cloud when (a) most nodes sit on either the private or public cloud, or (b) nodes are equally distributed in each cloud side. Each VM in the private cloud has 2 VCPUs, 4 GiB RAM, and 40 GiB HDD, and the size of each VM in the public cloud is Standard B1m (1 vCPU, 2 GiB, 30 GiB HDD).
RPis cluster: We built a homogeneous cluster of 8 RPis 3 Model B+, where each RPi is equipped with a Quad core CPU, 1 GiB RAM, and 16 GiB microSD storage.111111Please note that a single VM on both private and public clouds possesses CPU with the less core compared to the RPi, but a cluster of VMS provides more CPU cores. The impact of horizontal and vertical scalability of VMs on the energy consumption remains as a future work.
Edge node: We deployed two different edge nodes from the perspective of resource richness. (i) a laptop, refereed as edge node, has a Quad core CPU, 16 GiB RAM and 256 GiB SDD. (ii) The high performance edge server, refereed as edge server node, provides 8-cores CPU, 32 GiB RAM, 1 TB SSD.
|Scenario#||Database worker||Database servers||Concept|
|Scenario 2||RPi||Edge node (C)||Offloading|
|Scenario 3||RPi||Edge node (W)||Offloading|
|Scenario 4||RPi||Edge server node (C)||Offloading|
|Scenario 5||RPi||Edge server node (W)||Offloading|
|Scenario 6||RPi||Hybrid cloud||Offloading|
|Scenario 7||Edge node||Edge node||Non-offloading|
|Scenario 8||Edge node||Edge server node (C)||Offloading|
|Scenario 9||Edge node||Edge server node (W)||Offloading|
|Scenario 10||Edge node||Hybrid cloud||Offloading|
|Scenario 11||Edge server node||Edge server node||Non-offloading|
|Scenario 12||Edge server node||Hybrid cloud||Offloading|
|Scenario 13||RPi||Cluster of RPis||Offloading|
C stands for a cable connection.
W stands for a WiFi connection.
Experimental scenarios: As summarized in Table V, we considered three types of worker and four types of database servers. We generally offloaded data from the resource-constrained to the more powerful computing resources (the database servers). This concept of offloading includes all scenarios except scenarios 1, 7 and 11 in which the worker and the server are the same computing node. Such scenarios, termed non-offloading (local) scenarios, provide more insight into the databases in terms of energy consumption when databases are utilised locally. Furthermore, we considered different connection types for the RPi and edge node that give us insight into the effectiveness of databases from energy consumption perspective as a faster connection (cable vs. WiFi) is used (Scenarios 2, 3, 4, 5, 8, and 9). For simplicity of presentation, a scenario of (A B (C/W)) indicates that A is the worker and B is the database server, and connection between them is either Cable or WiFi.
To evaluate the experimental scenarios, we implemented them in a modular approach. As illustrated in Fig. 3, we have three components: controller node, worker node, and database server nodes. The controller node initially receives IPs of computing nodes as an input and then runs installation and cluster configuration of databases across those database servers if needed. In the same time, the controller node communicates with the worker node to setup the probes and runs the YCSB workload. Once the database operations are sent to the DB server nodes, all resource consumption probes are activated to record the consumed energy, bandwidth and storage of the worker and server(s). It should be noted that uploading the probes consumes energy, and we thus exclude it in the experimental results. In addition, we ran all scenarios without running YCSB on the worker node for 20 minutes and measured only the idle energy consumption. Then, this idle energy consumption is subtracted from the one for the corresponding scenario in which the YCSB workload was run.
5.2 Experimental Results
This sections explains energy, bandwidth and storage consumption for the scenarios listed in Table V.
5.2.1 Energy Consumption
This set of experiments investigates the energy consumption (measured in Joules per Million Operations (J/MOPs)) of different databases for different scenarios 121212It is worth mentioning that there is a direct correlation between energy consumption and database throughput in all experiments. Though, we did not plot here due to space constraint. .
(A) The energy consumption of a single RPi (Scenarios 1-6).
Fig. 4 shows the energy consumption of Cassandra. In the case of (RPi RPi), the energy consumption is about 5000 J/MOPs for workloads (A, C and D) and about 2 and 2.5 times of this value for workloads B and F, respectively. As we move to (RPi edge node (C)), the energy consumption for workloads A, B and F respectively reduces by 28%, 46%, and 78% compared to the ones for (RPi RPi). In contrast, in the same scenario with the WiFi (W) connection, the energy consumption increases by 190%-363% for all workloads compared to the ones for (RPi RPi). This implies that faster connections cause less energy consumption. For (RPi edge server node(C/W)), Cassandra requires less energy to serve workloads as compared to both discussed scenarios. The value for this scenario decreases between 50% (Workload A - Cable) - 82% (Workload B - WiFi) by contrast to the values for (RPi RPi). This indicates more powerful computing resources at the close distance with the worker allows saving energy. For (RPi hybrid cloud), as the number of VMs in the public cloud increases, the energy consumption of all workloads raises from 10 KJ/MOPs for (8_0) to 25 KJ/MOPs for (1_7). This means the worker requires more time to receive response from the DB servers due to a longer distance. All workloads (except F) have the same energy consumption (more than 15 KJ/MOPs) for (1_7), which implies the energy consumption is dominated by the distance between nodes regardless of the workload. In summary, two out of six scenarios are energy-efficient in offloading for Cassandra (Table VII).
Fig. 4 illustrates the energy consumption of Mongo. For (RPi RPi), RPi consumed energy between 2800 (workload C)- 4500 J/MOPs (workload F), which is 39% - 65% less than the consumed energy for Cassandra. This fact can be explained by memory swapping occurring on RPi to operate Cassandra due to RAM constraints. In contrast, the relaxation of this constraint through hosting Mongo on the edge node (i.e., RPi edge node (C)) with more memory capacity, the energy consumption grows by a factor of (1.05 - 2.41) against Cassandra. This is due to the fact that Cassandra utilises the CPU effectively compared to Mongo, which results from the internal design and implementation of these databases [lakshman2010cassandra]. For the WiFi connection, there is no obvious supremacy of Mongo and Cassandra to each other because the network fluctuations have impact on the execution time of databases, which leads to the increment/decrement of energy consumption. Similarly, we observe the same trend for (RPi edge server node (C)) in which Mongo requires more energy by a factor of (at most) 1.44 for workload C in comparison with Cassandra. For (RPi hybrid cloud), Mongo, compared to Cassandra, increases the energy consumption by (30% -80%) for (8_0) and by (9% - 47%) for (1_7). This is due to Cassandra is balancing data placement , while Mongo is not131313Due to space constraint, we did not present the bandwidth consumption across participant nodes for (RPi hybrid cloud). However, we presented this result for (edge node hybrid cloud) and observed such trend. . In summary, Mongo consumes energy more than Cassandra on average except for the scenario which suffers from the memory shortage. Furthermore, the hierarchy of scenarios for Mongo has changed slightly compared to the one for Cassandra (Table VI).
Fig. 4 demonstrates the energy consumption of Redis. Redis significantly reduces energy consumption compared to Cassandra and Mongo for all scenarios (except for (RPi edge node (W)). For (RPi RPi), Redis decreases energy consumption by (55%-92%) and by (75% - 90%) with respect to Cassandra and Mongo, respectively. Similarly, we can see the same trend for (RPi edge node (C/W)). Redis consumes (88% - 90%) and (55% - 92%) less energy than Mongo and Cassandra for (RPi edge node (C)); Likewise, (66% - 73%) and (60% - 63%) for (RPi edge server node (C)). For the WiFi setting, Redis also outperforms Cassandra and Mongo except for workloads B and D (Fig. 4), where we observed instability in connection. In fact, Redis reduces energy consumption by (15% - 30%) and (5% -38%) in comparison to Mongo and Cassandra respectively as it is hosted on the edge node (W). Similarly, (28% - 76%) and (17% - 25%) for the edge server node (W).
As data is offloaded into the hybrid cloud, Redis outperforms both Cassandra and Mongo in the energy consumption. As an example, the maximum energy consumption by (1_7) is slightly more than 10 KJ/MOPs for all workloads except F, while for Cassandra and Mongo, this value grows to 15 and 20 KJ/MOPs, respectively. In summary, apart from (RPi edge node (W)), Redis outperforms Mongo, which in turn, outweighs Cassandra in energy consumption. Also, the hierarchy of scenarios for Redis is different with the one for Cassandra and Mongo as shown in Table VI.
Fig. 4 depicts the energy consumption of MySQL. Results expose that for write-related workloads (A, F) under the (RPi RPi) scenario, MySQL consumes more energy than Redis by (1.5-1.9) times, whilst it uses less energy than Cassandra and Mongo by (3.74-6.97) and (2.27-2.37) times respectively. This can be explained by the fact that Redis outperforms MySQL in response time due to RAM-based nature, while MySQL provides better response time compared to Cassandra and Mongo. By contrast, for the same scenarios, the value for read-related workloads (B, C, and D), Mongo outperforms MySQL (1.1-1.9) times in energy consumption. This indicates the relative superiority of Mongo over MySQL in response time.
The (RPi edge node (C)) and (RPi edge server node (C)) scenarios respectively are more energy-efficient by 40-67% and 11-18% in offloading vs. non-offloading due to the fast CPU and connection. In contrast, using same computing nodes with the Wifi connection makes offloading non-effective. Under the same scenarios, MySQL performs worse than Redis, and these scenarios consume (4.05-13.5) and (2.56-3.12) times more energy respectively. This is because more memory allows to run faster Redis on the edge and edge server nodes. However, on average, MySQL saves 9% (resp. 18%) energy compared to Mongo (resp. Cassandra) as databases are deployed on the edge node (resp. the edge server node).
For (RPi hybrid cloud), unlike the other databases, the configuration of hybrid cloud does not impact on the energy consumption of MySQL. The results show MySQL consumes 6-10 KJ/MOPs for workloads (A-D) and around 12.5 KJ/MOPs for workload F, which is less than the ones for Cassandra and Mongo and stays competitive with the energy consumption of Redis. The reason behind such results is that MySQL supports strong consistency in a data node group (i.e., two replica on the private cloud) and then the updated data is asynchronously propagated to other data node groups. Thus, the only latency between RPi and the VMs in the private cloud is reflected in the energy consumption. In summary, Redis outperforms MySQL in almost all scenarios in terms of energy consumption, while MySQL is relatively effective in energy consumption compared to Mongo and Cassandra for (RPi hybrid cloud). Furthermore, Table VI summarizes the energy consumption of different scenarios from lowest to highest, where the rank of (RPi edge server node (W)) and (RPi hybrid cloud) is exchangeable based on the workload.
|Edge server node (C)||Edge server node (C)||Energy server node (C)||Edge server node (C)|
|Edge server node (W)||RPi (local)||RPi (local)||RPi (local)|
|RPi (local)||Hybrid cloud (4_4, 8_0)||Edge node (C)||Edge server node(B,C,D)|
|Edge node (C)||Edge server node (W)||Edge server node (W)||Hybrid cloud (A,F)|
|Hybrid cloud (all)||Edge node (C/W)||Hybrid clouds(all)||Edge node (C)|
|Edge node (W)||Hybrid cloud (1_7)||Edge node (W)||Edge node (W)|
Fig. 5 depicts energy consumption of workload E for scenarios 1-6. Results show the energy consumption of workload E is substantially higher than the one for the other workloads. This is because workload E is expensive in terms of operations, where a set of keys are searched, loaded into RAM for modification, and finally sent back to the disk. Cassandra and Mongo respectively consume the highest (48.15 KJ/MOPs) and the lowest (13.5 KJ/MOPs) energy under the non-offloading scenario. When other edge computing nodes host databases, only the edge server node (C) provides promising offloading, where its energy consumption decreases by 73% for Cassandra, 6% for Redis, and 28% for MySQL. Mongo suffers 15% energy usage increase under the same conditions. This shows that more capacity of RAM accelerates the response time of Cassandra and Redis, which results in energy consumption reduction. For (RPi hybrid cloud), MySQL operates the best and Redis behaves the worst in the case of energy usage with a value of (300 - 600) KJ/MOPs and (520 - 600) KJ/MOPs, respectively. This is because Redis transmits more data to the public, while MySQL requires the least. In the same scenario, Cassandra (at most 140 KJ/MOPs) and Mongo (at most 185 KJ/MOPs) achieves middle ranks in energy consumption.
(B) The energy consumption of the edge node (Scenarios 7-10). We make the following observations from Fig. 6. (i) Locally running YCSB and databases on the edge node requires the lowest energy consumption compared to the non-local running. As expected, Redis outperforms all databases in energy consumption (203-312 J/MOPs), while MySQL has the worst performance in energy consumption (480-3650 J/MOPs). This is because Redis is RAM-based, and MySQL, also, has to update data and logs to disk regularly compared to NoSQL databases. (ii) As databases are deployed on the edge server node (C/W), Redis and MySQL still exhibit the lowest (301-521 J/MOPs for cable and 4170-7610 J/MOPs for WiFi) and the highest (382-556 J/MOPs for cable and 7880-13110 J/MOPs for WiFi) energy consumption, respectively. The ratio of energy consumption for cable to the one for WiFi is (10-25) times for Cassandra, (1.5-25) times for Mongo, (10-16.5) times for Redis, and (1.5-10) times for MySQL. These values exhibit that the faster connection between the worker and the database servers is, the less the energy consumption is. (iii) The more nodes reside on the private cloud, the less consumed energy is for all databases except MySQL. Having on essential impact on latency, distance increases energy usage. The energy consumption of Cassandra, Mongo, and Redis respectively is at the level of 60, 80, and (80-100) KJ/MOPs for workloads (A-D) with the configuration of (7_0). With the same condition and workloads, the energy consumption for (8_0) drops by (38-76%) for Cassandra, (25-37%) for Mongo, (27-34%) for Redis. This results show that the highest reduction happens for Cassandra since it requires to read and write data on a quorum of replicas. The energy consumption of Workload F is more than workloads A-D, so that the ratio is is (1.3-2.5) times for Cassandra, (3-4.8) times for Mongo, (1.36-1.73) times for Redis and (1.96-2.7) times for MySQL. As more VMs are exploited on the public cloud, this factor significantly decreases, which implies running all workloads across WAN network is expensive.
In summary, as databases are deployed on the edge and edge server nodes (C/W), Redis and MySQL consume the lowest and highest energy respectively, followed by Mongo and Cassandra. In contrast, for the edge server node (W), there is no preference between Mongo and Cassandra in energy consumption. Furthermore, only under the (edge node edge server node (C)) scenario, offloading is effective for MySQL (Table VII).
|Edge node||Edge node||Edge node||Edge server node (C)|
|Edge server node (C)||Edge server node (C)||Edge server node (C)||Edge node|
|Edge server node (W)||Edge server node (W)||Edge server node (W)||Edge server node (W)|
|Hybrid cloud (8_0)||Hybrid cloud (8_0)||Hybrid cloud (8_0)||Hybrid cloud (all)|
|Hybrid cloud (4_4)||Hybrid cloud (4_4)||Hybrid cloud (4_4)||-|
|Hybrid Cloud (1_7)||Hybrid cloud (1_7)||Hybrid cloud (1_7)||-|
There is no particular hierarchy among hybrid cluster configurations for MySQL, and we denoted Hybrid cloud (all) in the table.
Fig. 7 shows the energy consumption of workload E for scenarios 7-10. Running workload E on the edge node consumes the lowest energy for Mongo (1980 J/MOPs) and the highest for Redis (11420 J/MOPs). For offloading data to other computing resources, Redis still needs the highest energy (19/246 KJ/MOPs) and even more (1361 - 2455 KJ/MOPs) as it is deployed on the edge server (C/W) and hybrid cloud, respectively. By contrast, Cassandra has the lowest energy consumption on the edge server node (Fig. 7). This is because Cassandra sends and receives less data across WAN to satisfy quorum consistency (See Appendix B, Table 1, workload E).
(C) The energy consumption of the edge server node (Scenarios 11-12 ). We evaluated the energy consumption of databases for scenarios 11 and 12, where the edge server node is the database worker. We observed the same trend of energy consumption for different databases so that the more computing nodes/VMs are selected at the close distance with the worker, the less energy is consumed. Similarly, workload E is the most expensive workload for all databases. Full details are provided in Appendix A.
|Workloads (A,F)||Workload E|
(D) The energy consumption of RPi Cluster (Scenario 13). Fig. 8 plots energy consumption to run the YCSB workload on a RPi and host Cassandra, Mongo, and Redis on 7 RPis141414We did not measure energy consumption for MySQL because the RPi does not support MySQL clustering. For more details, see §6.. Note that energy consumption is reported for the whole cluster including 8 RPis. Fig. 7(a) shows that the energy usage of Cassandra is the highest compared to Mongo and Redis. Cassandra requires 80-97 KJ/MOPs to run write-related workloads. This value drops by 2-6 KJ/MOPs for read-related workloads. This is likely due to memory swapping occurring to satisfy the RAM requirements (2GiB) for Cassandra, which reduces the speed of writing operations. In contrast, Redis is the most energy-efficient (74-214 J/MOPs), while Mongo is in the middle position (390-918 J/MOPs) for all workloads, except E. For workload E, this position of Redis and Mongo has been changed since Redis generally requires longer data transfer between computing nodes to serve workload E, which leads to longer execution time, in consequence, higher energy consumption (see 7(b)). Table VIII summarises the above discussion.
(E) Breakdown of the energy consumption of edge node. Fig. 9 breaks down the energy consumption of the edge node including CPU, RAM and the rest of the system (monitor, peripheral devices, ports, etc.) - termed by REST - for workload E151515Results for other workloads are skipped due to space constraints.. Simply, the energy consumption of ”REST” is the energy measured through Upower utility for battery depletion minus the one through RAPL for CPU and RAM. Results also show that the energy consumption of RAM was the lowest, which is under 7% for most of scenarios and databases. Thus, CPU and REST have the most contribution in the energy consumption of the edge node. Interestingly, when databases are locally hosted, the energy consumption of CPU made a significant contribution of 60% to the whole consumed energy, while the energy consumption of REST is 25-39%. As the databases are moved into the edge server node and hybrid cloud, this percentage of energy consumption for CPU decreases and for REST increases. This is because the worker incurs waiting time to receive response from database servers with consequence of energy consumption regardless of the activity. As an example, to run Cassandra on edge server node (C), CPU and REST respectively require 35% and 57% of the whole energy consumption, while these values respectively changed to 14% and 77% as the edge server node (W) was deployed. This is because the worker waits significantly longer to receive response from the edge server node through WiFi compared to cable. This waiting time increases the energy consumption of REST, while CPU is idle without using significant energy. We also observed the same trend between different cluster configurations of the hybrid cloud, in which more VMs in the public cloud results in longer waiting, thus, more energy consumption of REST.
5.2.2 Bandwidth Consumption
This section presents the amount of data transferred (TX) and received (RX) between the private and public clouds (Fig. 2), where the worker is the edge node161616Henceforth, due to space constraint, we only report results for the edge node as a worker.. Fig. 10 depicts TX and RX in bytes per operation for the OpenStack broker sub-net only since these metrics are symmetric for both sub-nets. Clearly, the values of TX and RX are zero for (8_0) due to all nodes being in the same cloud.
As shown in Fig. 10, we can make several observations. (i) As expected, TX and RX for workload E is at the level of several KB per operations, while for the other workloads the values are at the level of several hundreds bytes per operation. This confirms the fact in the previous experiments that workload E is the most expensive in execution time, which in turn, has the highest energy usage. (ii) The values of TX and RX for (4_4) is less than the ones for (1_7), which implies the more nodes in the public cloud the higher TX and RX values. This is again another confirmation of more energy consumption for (1_7) compared to (4_4) and (8_0). (iii) The TX values are less than TX values for the most databases and cluster configurations since TX includes the request issuing from the worker and RX is the response returning from the database server. Thus, we focus on RX for (1_7) and (4_4).
MySQL has the lowest RX values compared to the other databases for all workloads for (1_7). This is because, as summarized in Tables in Appendix B, the worker mostly sends/receives data to/from the node in the private cloud. Hence, we have a small portion of data transferred across clouds for MySQL. For the same configuration, Mongo possesses the highest RX values, followed by Redis and Cassandra. This is because, as summarized in Appendix B - Table 1, Mongo mostly sends data to the public cloud (node 7) while Redis and Cassandra almost equally spread data across all nodes in the hybrid cloud.
For (4_4), Mongo and MySQL serve the read-related workloads through the private cloud since they should satisfy eventual and strong consistency. Strong consistency is possible to provide for MySQL because the default replicas number for MySQL is two, which are located in the private cloud. For workloads A and F, Mongo and Redis obtain the highest RX values. For workload E, Cassandra and Redis generate the highest traffic on the WAN while MySQL and Mongo create the lowest. This is due to MySQL and Mongo locally serve workload E, while Cassandra and Redis spread data across all nodes (Appendix B, Table 2 - workload E).
5.2.3 Storage consumption
This set of experiments plots the storage consumption (measured in Bytes/operation), where the edge node is worker and the hybrid cloud is database server. Skipping the repetitive results for the sake of brevity, we only report results of write-related and scan workloads. For (1_7) and (8_0), Fig. 11 and 11 show that Mongo is the worst in terms of storage consumption compared to other databases for write-related workloads, where workload A uses storage space more than workload F. This is because Mongo uses a document-based data model with full replication as the default setting. By contrast, for the same configuration, MySQL is the most efficient database in storage consumption (35-57 Bytes/Ops for workload A vs. 29-43 Bytes/Ops for workload F) due to using two replicas rather than full replication for Mongo and three replicas for Cassandra. Fig. 11 exhibits the storage consumption of databases for workload E, which is more than the one for the write-related workloads. Redis uses the largest amount of storage, followed by Mongo with a 20% reduction. This correlates with high RX value of workload E for Redis. Cassandra and MySQL stay close to each other with the lowest value in the storage usage. Table IX summarizes discussed results.
We discuss findings, practical experiences and technical challenges that we encountered during experimentation.
Research findings: From the discussed evaluated experiments, it is challenging problem to select a specific database solution which incurs the lowest resource consumption (energy, bandwidth, and storage) in an edge-cloud framework for all workloads. However, from the results, we have extracted several insights as follows. (i) In terms of offloading, a few scenarios make data offloading profitable in terms of energy usage. Indeed, if database operations are offloaded from source-constrained edge nodes to powerful computing nodes with a high bandwidth and low latency connection, then we expect to save energy for weak devices (e.g., RPi edge server node (C)). (ii) Connection bandwidth and latency have a direct impact on the energy usage of data offloading. Hence, all databases exhibit less energy consumption with a faster connection between workers and data servers. (iii) The limitation of memory can increase the energy consumption of disk-based databases such as Cassandra because memory swapping further increases the response time, which directly impacts on the energy consumption. (iv) The distance between a worker and database servers, and the spread of data across computing nodes in a cluster of hybrid cloud are two key factors that affect the response time, which results in the energy consumption increment. To a large extent, the greater is the distance between worker and database servers, the greater is the energy consumption. The more data is distributed evenly among nodes in the hybrid cloud, the less energy is consumed. This is because more operations can be served through the private cloud, as seen in the case of Cassandra and Redis. (v) The energy consumption of CPU and RAM has the highest and lowest contribution respectively in the total energy consumption. This is likely the reason why Redis is superior to disk-based databases in terms of energy consumption in the most cases.
With respect to the superiority of databases to each other, Redis consumes the least amount of energy followed by Cassandra if an edge computing node supports high amount of memory capacity. This superiority is also valid when we run workloads A-F locally (i.e., on RPi, edge node, edge server node), and offload these workloads from the database worker to the edge node and edge server nodes. By contrast, for workload E, Redis performs the worst in energy consumption, while MySQL requires the least energy on average. For offloading data from the database worker to the hybrid cloud, MySQL consumes the lowest energy, followed by Redis particularly when we deployed more nodes on the public cloud. With regard to the bandwidth consumption across clouds, MySQL sends and receives the least amount of data irrespective of cloud configuration. This correlates with MySQL requiring less energy under hybrid cloud scenarios compared to other databases. In the case of storage consumption, MySQL and Cassandra require the lowest storage capacity.
Practical experiences: While we fully automated the installation and configuration of the databases across cloud and edge use cases, the ARM architecture of RPi caused some issues with MySQL. The default MySQL server package provided by Ubuntu 20.04.1 does not come with clustering components included. Thus, we had to compile our own version with the network clustering functionality explicitly enabled. Compiling on RPi node itself was failing as more than 12GB of RAM was required to complete the build process. Enabling swap file allowed us to proceed, however, the resulting build performance was unacceptably slow, requiring several days to complete. Thus, we also attempted cross-compiling ARM binaries on high-performance x86_64 server, which was significantly faster. Unfortunately, both produced packages crashed upon execution on RPi nodes due to lack of L3 cache available. Upon a brief inspection of MySQL source code and assessing the time constraints, we decided to skip MySQL server tests for RPi nodes on this occasion. Further investigation of this issue and corresponding code changes might be useful in the future.
As discussed, we leveraged different hardware and software tools to measure energy consumption of edge computing nodes. To evaluate energy consumption of edge node and edge server node, we used RAPL that exploits a software power model to estimate energy usage with the help of hardware performance. The main issue about this utility is the maximum energy range of 65Billion Micro-joules for its counter. This imposes constraints on the duration of experiment for each workload because when the energy consumption reaches this value, the counter resets, and consequently the energy consumption probe records a wrong value. Hence, we had to take extra care to adjust the counter values to compensate for this limitation.
Future Research Areas: In this work, we considered the default values for parameters in terms of replication factor, consistency model, logging databases activities, and the size of fields. Also, we conducted our experiments on homogeneous VMs and same model RPis . All these default values can raise several open research questions related to their impact on energy consumption. Thus, it would be interesting to evaluate the impact of replicas number and field size on energy and storage consumption. Another open research can be sought in the effect of vertical and horizontal scalability of the edge-cloud cluster. Last but not the least, we can repeat the same scenarios to measure energy consumption of big data frameworks (Hadoope, Spark and Flink) as they are deployed on the edge-cloud framework.
Selection of a distributed database to deploy across a edge-cloud framework is not a trivial task as results highly depend on different factors. To disclose these factors, we conducted an extensive evaluation of popular NoSQL databases (Cassandra, Mongo, and Redis) and Relational database (MySQL) through a variety of scenarios in which operations are issued from resource-constrained edge computing nodes to more powerful ones via cable and WiFi connection. We implemented these scenarios through a repeatable way and a modular framework to obtain flexibility and accuracy in trustworthy experimental data. Our evaluation demonstrated that the energy consumption of edge computing highly depends on the connection speed and latency and the computational power of database servers. Furthermore, our results exhibit that the distance between the database worker issuing operations and the database servers hosting databases is another key factor that should be considered. The last factor to take into consideration is bandwidth consumption in cloud-edge scenarios, which can impact on the energy consumption of databases. With respect to these factors, for local and offloading data to the edge devices, Redis consumes the least energy for most workloads due to not using disk storage. For offloading data to the hybrid cloud, MySQL is the most efficient in energy consumption for the most workloads on average since it transmits the least amount of data across private and public clouds. Mongo and Cassandra hold a rank after MySQL and Redis in terms of energy consumption, where Cassandra commonly outperforms Mongo as more nodes of hybrid cloud reside on the public cloud.