Researchers at Gartner estimate that there will be 20 billion IoT devices connected to the Internet by 2020. The burgeoning of such devices has sparked many efforts into researching the optimal device design. Since most IoT devices are constrained in terms of processing power and energy resources, the traditional approach has been to transmit data generated by the device to a cloud platform for server-based processing. Although cloud computing has been successfully employed, it is sometimes not desirable due to concerns about latency, connectivity, energy, privacy and security [2, 3, 4].
To overcome these concerns, edge and fog computing have emerged. These architectures aim to push processing capabilities closer to the IoT devices themselves, which is specifically possible given their significant increase in processing power. For example, the archetype of modern IoT devices, the Raspberry Pi 3, offers a quad-core processor with 1GB of RAM for only $30. The reduction in latency offered by utilizing such devices in edge and fog computing is critical to the success of applications such as object detection and image classification. These applications are used in mission-critical systems such as autonomous vehicles, surgical devices, security cameras, obstacle detection for the visually-impaired, rescue drones, and authentication systems [5, 6, 7]. However, these tasks consume a considerable amount of energy. Thus, it is especially important to understand the relationship between these algorithms and their respective energy consumption to efficiently utilize the IoT device’s power resources. This is particularly important due to two reasons: first, many of these IoT devices work in a duty-cycled fashion. They are triggered when an external event happens, perform a processing task, and transition to sleep mode again. A sample scenario is a security camera that captures an image when a motion is detected. Another example could be a flood monitoring system that captures images of a river when the water level is beyond a certain threshold to detect the type of debris being carried by the water. Enhancing energy efficiency is essential for these types of applications, especially when they are battery powered or rely on energy harvesting technologies . The second important motivation towards energy profiling and enhancement is to reduce carbon emissions. According to a study published by the Centre for Energy Efficient Telecommunications, the cloud was estimated to consume up to 43 TWh in 2015, compared to only 9.2 TWh in 2012, an increase of 460% . This is roughly equivalent to adding 4.9 million cars to the roads . Given the dramatic impact of inefficient energy management, it has become important to ensure that the most intensive of tasks, especially image classification, are using the appropriate resources and minimizing their energy consumption footprint.
Various ML algorithms, offering different accuracy and complexity, have been proposed to tackle the challenges of image classification. Despite their exceptional accuracy, they require high processing power and large storage. For example, some of the state-of-the-art neural network architectures, such as AlexNet, GoogLeNet, and ResNet  require over a million parameters to represent them and more than a billion multiply and accumulate computations (MAC) . Each MAC operation is generally associated with a number of memory accesses. In the worst case scenario, where there is no data re-use, each operation requires 3 reads and 1 write to memory. The simplest neural network from the aforementioned models requires around 2172M memory reads and 724M memory writes. Since these operations consume a considerable amount of processing power, the energy consumption of these algorithms might not meet the requirements of various application scenarios. However, the overall energy consumption can be reduced if the number of operations performed by these algorithms is also reduced. This is possible through various approaches such as reducing image resolution, reducing dataset size, and choosing the algorithm that addresses the application requirements without introducing additional processing overhead. For example, ResNet-50 processing a 2242243 image uses around 7 billion operations per inference . Running this neural network on a 160160 image would almost halve the number of operations and double the speed, immensely reducing the energy consumption 
. In terms of algorithm selection, some algorithms are better suited for servers (where there is a wider variety of accessible resources), whereas others can perform well on IoT devices. If, for example, the energy consumed to classify a single image on the device was considerably less than the energy consumed to transmit the image to the cloud and receive the result, then, as one scales, it becomes advantageous to compute locally.
There have been research efforts to deliver preliminary observations as to how resource-constrained embedded devices perform while executing ML algorithms. Cui et al.  used a Raspberry Pi 2 as a gateway and a commercial Intel SR1560SF server. To understand how different platforms perform while executing ML algorithms, they measured energy and performance. They found a strong relationship between energy and data size. In addition, they found that for some scenarios, the gateway, which employs a low-power processor, performs data processing tasks using a lower amount of energy compared to the server over a long period of time. However, their study focused on how ML algorithms perform for general tasks and generated a model to predict energy consumption solely based on data size. Unfortunately, they did not consider how the type and phase of the algorithm or how specific data characteristics, such as image resolution, impact performance. Carbajales et al.  investigated the power requirement of IoT monitoring and sensing on a Raspberry Pi 2 for a smart home application. Their goal was to present a user-friendly visualization of energy consumption across several single board computers (SBCs) including the Raspberry Pi 2B and the BeagleBone Black. Their data processing was limited to time-scaling, averaging, summing, and rounding with no consideration for more complex processing such as ML. In addition, they did not propose any method to predict or forecast the energy requirements of the system. Lane et al. 
characterized neural network algorithms for various embedded devices including wearables and smartphones. They chose the following three hardware platforms: Nvidia Tegra, Qualcomm Snapdragon and Intel Edison and measured execution time and energy consumption for each. Out of the four deep learning architectures, two were used for object detection, namely AlexNet and Street View House Numbers (SVHN). While AlexNet has seen state-of-the-art accuracy and can distinguish more than 1,000 object classes, SVHN has a more narrow use case: extracting numbers from noisy scenes. Although this research incorporated deep learning, it did not include analysis of how the data characteristics (such as image resolution) influenced the energy consumption. To summarize, despite the insights provided by the research efforts mentioned above into performance in terms of duration and energy, none of them have investigated the relationship between image input data versus energy, duration and accuracy. Furthermore, these studies did not provide a useful technique for predicting the energy consumption when multiple parameters are taken into account.
The contributions of this paper are two-fold. First, we identify and characterize how each individual factor of image classification can affect energy cost, duration and accuracy. This will equip the research community with the tools necessary to make an informed decision about the design of their edge/fog systems in a way that balances cost with performance. Second, we present a reliable method for predicting the energy consumption of a system without needing to construct and measure data from a prototype. More specifically, in this paper:
We analyze and visualize the relationships between energy consumption, duration and accuracy versus dataset size, image resolution, algorithm type, algorithm phase (i.e., training and testing), and device type, when executing ML algorithms on IoT devices. The machine learning algorithms we used in this study are support vector machines (SVM), k-nearest neighbors (k-NN), and logistic regression. These algorithms were selected based on their popularity for image classification, as well as their abundant implementations across several frameworks. We chose the Raspberry Pi 3 (RPi) and the BeagleBone Black Wireless (BB) because they are widely used by the IoT community. We found that despite the BB’s access to lower-voltage DDR3L RAM, which has twice the clock speed and transfer rate potential of the RPi’s RAM, it generally always took significantly longer for the BB to perform experiments, ultimately leading it to consume more energy. This discrepancy, in part, is credited to the RPi’s CPU which, despite being unable to utilize all four of its cores for some experiments, still has a 20% faster clock speed than that of the BB. We present the evidence that suggests increasing image resolution serves only to increase energy consumption while providing minimal benefit to accuracy. For example, using the RPi, we find that increasing the resolution of dataset images for datasets of size 300 and 1500 by 40%, results in an average increase in processing time of 191% and 217%, and an average increase in energy of 208% and 214%, respectively. Despite these significant increases in energy consumption, the accuracy for the same datasetsdecreased by 3.64% and 4.64%, respectively, suggesting that, in general, for small datasets it is not beneficial to increase image resolution. Additionally we conducted experiments utilizing the RPi’s multi-core functionality for supported algorithms and compared the results with the corresponding single-core data. In this way, we found that using multiple cores provided many benefits including a 70% and 43% reduction in processing time as well as a 63% and 60% decrease in energy consumption for k-NN and logistic regression, respectively.
Since energy measurement is a lengthy process and requires the use of an accurate power measurement tool, we utilize our experimental data to present a novel energy prediction model. In our attempts to generate the most accurate model, we used three ML algorithms: multiple linear regression, Gaussian process, and random forest regression. After applying the ML algorithms to the validation datasets, random forest regression proved to be the most accurate method, with a R-squared value of 0.95 for the Caltech-256 dataset and 0.79 for the Flowers dataset. The proposed model facilitates decision making about the target hardware platform, ML algorithm, and adjustments of parameters, based on the application at hand.
The remainder of the paper is organized as follows. Section II discusses the methodology of our experiments. Section III discusses the results of our experimentation and provides a list of guidelines for the purpose of maximizing performance and system longevity. Section IV describes our methods for generating a random forest model capable of predicting energy consumption. The paper concludes in Section V by reiterating our findings as well as describing how future work can be focused on making the random forest model generated more robust.
In this section, we present the main components of our measurement methodology, including hardware platforms, the power measurement tool, the ML algorithms, and the datasets.
Ii-a Hardware Platforms
In order to maximize the relevance and applicability of our model, we selected hardware devices that are widely adopted throughout the IoT community. The recent surveys suggest that the RPi is the most popular single board computer (SBC) [18, 19]. The RPi board contains a 1.2GHz quad-core ARM Cortex-A53 BCM2837 processor and 1 GB of DDR2 SDRAM . The RPi also utilizes a 400MHz Broadcom VideoCore IV GPU and has Wi-Fi, Bluetooth, and Ethernet capabilities. Similarly, the BB was selected because existing surveys place it between the second and third most popular SBC on the market [18, 19]. The BB contains a 1GHz AM3358 ARM Cortex-A8 OSD3358-512M-BAS processor and 512MB of DDR3L SDRAM . The BB also has both Wi-Fi and Bluetooth capabilities.
When comparing the hardware specifications of both devices, it is important to note two key differences. First, while the RPi has nearly twice the SDRAM of the BB, it uses DDR2 SDRAM, which has roughly half the clock speed and transfer rate at 400 to 1,066 MHz and 3,200 to 8,533 MB/s, respectively. Additionally, DDR2 SDRAM requires 1.8V to operate, which is relatively high based on modern standards. In contrast, the BB, which utilizes DDR3 “Low Voltage”, only requires 1.35V. A second major difference between the two boards concerns their processor caches. For the RPi, the L1 cache level contains 32kB of storage while the L2 cache level contains 512kB of storage. The BB has 64K of L1 cache storage that is subdivided into 32K of i-cache and d-cache. Additionally, the BB also has 256K of L2 cache storage. Table I presents the hardware characteristics of these two boards.
|Raspberry Pi 3 (RPi)||BeagleBone Black (BB)|
|L2 cache||512kB||256 kB|
|RAM||1 GB LPDDR2||
|Storage||SD||4GB eMMC, SD|
According to the IoT Developer Survey conducted by Eclipse in 2018 , Linux (71.8%) remains the leading operating system across IoT devices, gateways, and cloud backends. As a result, we used Ubuntu Mate on the RPi and the Debian Jessie on BB. Both operating systems are 32-bit.
In many industrial applications, IoT devices are often under strict energy constraints. Under these circumstances, the devices are set to use only absolutely essential protocols and hardware in order to reduce the power consumption of the system. There are many benefits to this system layout including an increase in power efficiency, a reduction of operating costs for line-powered systems, and an increase in the operating life for battery-powered systems [8, 23]. In order for our energy consumption model to be useful, we needed to eliminate the effect of all unwanted components on performance. Consequently, we disabled all the unnecessary modules that may interfere with energy consumption, such as Bluetooth, Wi-Fi, and Ethernet. In addition, we used a serial connection to communicate with the boards. This method consumes a negligible amount of energy, as opposed to a traditional Ethernet or HDMI connection.
Ii-B Power Measurements
Accurate measurement of energy consumption requires enabling and disabling an energy measurement tool based on the operation being performed. For example, during our experiments, it was necessary to enable and disable energy measurement right before and after the training phase, respectively. Therefore, we required a tool that could be directly controlled by the ML program running on the RPi or BB. To this end, we use the EMPIOT tool , which enables the devices under test to precisely control the instances of energy measurement using the GPIO pins. EMPIOT is capable of supersampling approximately 500,000 readings per second to data points streamed at 1KHz. The current and voltage resolution of this platform are 100A and 4mV, respectively, when the 12-bit resolution mode is configured. The flexibility of this platform allowed us to integrate it with our testbed.
Ii-C Machine Learning Algorithms
Our paper focuses on supervised image classification. Supervised learning useslabelled data. A labeled example consists of an input and output pair. The job of the supervised algorithm is to produce a model that is able to map a given input to the correct output. Types of learning tasks that are considered as supervised learning include classification and regression. Popular supervised algorithms include Support Vector Machine (SVM) and linear classifiers .
In order to grasp the impact of the ML algorithm’s effect on energy consumption, it is important to test each algorithm on a wide variety of datasets. As a result, we selected three algorithms: SVM, logistic regression, and k-Nearest Neighbors (k-NN). In addition to being very popular ML algorithms, each has specific strengths and weaknesses that we study in this paper.
SVM operates by mapping input data to a high-dimensional feature space so that data points can be classified, even when the points are not otherwise linearly separable 
. The data is then transformed in such a way that a hyper-plane can separate them. The objective of SVM is to maximize the distance from the separating hyper-plane to the support vectors (margin). Logistic regression is used to predict the probability of an event by fitting data to a logistic curve. It is intended for predicting a binary dependent variable (e.g., or . K-NN works by classifying a new sample point based on the majority label of its k-nearest neighbors .
SVM was selected for our experiments because it can effectively model non-linear decision boundaries while simultaneously being well insulated against pitfalls such as overfitting. However, because SVM may utilize multi-dimensional kernels, it is often memory intensive, thereby leading us to believe it would consume large amounts of energy for datasets with greater than two classes. Additionally, because SVM was originally designed to be a binary classifier, we wanted to measure the effectiveness of current implementations used to allow SVMs to be applied to datasets with multiple classes . Scikit-Learn, the ML library used for our experiments, implements SVM using the “one-vs-one” method to generate its classifiers. Using this approach, given binary classifiers, all pairwise classifiers are evaluated resulting in distinct binary classifiers. These classifiers, in turn, vote on the test values, which are eventually labeled as the class with the greatest number of votes .
Similar to SVM, logistic regression is also designed as a binary classifier, though it does not have the same access to non-linear kernels that SVM does. While logistic regression generally performs well for datasets consisting of two classes, its performance drops considerably as the number of classes increases. For Scikit-Learn’s implementation of logistic regression, when the dataset contains a number of classes greater than two, it uses the “one-vs-all” method. This involves training a separate binary classifier for each class. As the number of classes increases, so does the processing time per class.
The k-NN algorithm is among the simplest and most powerful ML algorithms used for classification and regression. When the task is classification, k-NN classifies an object by assigning it to the most common class among its k-nearest neighbors . While k-NN is generally recognized as a high-accuracy algorithm, the quality of predictions greatly depends on the method used for proximity measurements . Consequently, it was important to select an implementation that used an appropriate distance measurement method, especially when the data points occupy multiple dimensions.
Ii-D SciKit-Learn Framework
For the purposes of our experiments, we used Scikit-Learn, a Python library that includes ML algorithms such as SVM, logistic regression, and k-NN . Although Scikit-learn offers options to utilize multi-core processing, only two of our three algorithms implemented in Scikit-Learn can make use of multiple cores, namely k-NN and logistic regression. In order to measure the benefits of mutli-core utilization, we recorded data for the RPi and compared it with the data gathered throughout our single-core experimentation. The BB was excluded from this iteration of experimentation because it only has a single core.
Ii-E Training Datasets
In order to utilize a diverse range of data, we chose a total of 5 datasets that originally varied in many factors such as image resolution, number of classes, and dataset size. We standardized all of these factors in order to fairly compare energy consumption results across multiple datasets. No classes overlapped between the datasets, ensuring that our results were pooled from a wide range of test sources. The datasets are summarized in the following section and in Figure 1.
Ii-E1 MNIST Digits
The Modified National Institute of Standards and Technology (MNIST) Digits dataset consists of 70,000 black and white images of handwritten digits . Each of the digits have been centered in a standardized 2828 image. Each digit corresponds to a separate class resulting in a total of 10 classes. This dataset was selected because it is a standard benchmarking dataset.
The Fashion-MNIST dataset was created by researchers at an e-commerce company called Zalando . Though it does not belong to the MNIST, the fashion dataset has the name appended due to its similarity to the digits dataset. According to the creators, it is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking ML algorithms . Similar to the Digits dataset, the Fashion-MNIST dataset consists of 70,000 black and white 2828 images separated into 10 classes. We selected this dataset because it is a more challenging version of the Digits dataset. Though widely popular, the Digits dataset has become too elementary with many ML algorithms easily achieving 97% accuracy. Furthermore, most pairs of digits can be distinguished with a single pixel .
The CIFAR-10 (CIFAR) dataset was created by the Canadian Institute For Advanced Research and consists of 60,000 color images . Each image is 3232 pixels, and there are a total of 10 classes. The classes in this dataset are very diverse ranging from dogs and cats to airplanes and trucks . This dataset was selected because it is considered challenging relative to the other datasets. The 3-D color images ensure the matrices representing this dataset’s images are dense, thus requiring more computation. Additionally, because this dataset has 10 significantly different classes and the maximum dataset size is a mere 1,500 images, it was intended to represent a scenario where accuracy is low.
Ii-E4 Chest X-Ray
The Chest X-Ray (CHEST) dataset is provided by research conducted by Kermany et al. . The dataset contains 5,863 high-resolution greyscale X-ray images divided into two classes: normal and pneumonia. Images for this dataset were not square and resolutions were non-uniform. This dataset was selected because it only has two classes, which is ideal for SVM and logistic regression.
Ii-E5 Faces in the Wild
The Labeled Faces in the Wild dataset was created by researchers at the University of Massachusetts Amherst and consists of 13,000 non-square, color images . The images were collected from the web using the Viola-Jones face detector. Each of the images is labeled with the name of the person in the picture. A total of 1,680 individuals pictured within the dataset contain at least two distinct photos . This dataset was selected because much like the CIFAR dataset, this dataset contains many classes and color images. However, unlike CIFAR, the images within the Faces in the Wild dataset are two dimensional.
Ii-F Dataset Standardization
We performed dataset standardization in order to fairly determine the nature of the relationship between certain parameters and energy consumption when executing image classification algorithms. We began by first selecting 1,500 images from each dataset. Then, we created four more subsets by reducing the size by 300 images at each iteration. This yielded subsets of 1,200, 900, 600, and 300 images. Next, we scaled each of the images from those five subsets into three resolutions: 2828, 2222, and 1717. For 3-D data, the dimensionality of the images were maintained.
For each iteration of the experiment, we tested a unique combination that selected from 3 ML algorithms, 5 datasets, 2 phases, 5 sizes, and 3 resolutions. This resulted in 450 tests per a single complete experiment iteration. Furthermore, in order to ensure a reliable measurement, the experiment was run 5 times for a total of 2,250 experiments. Figure 2 depicts a visualization of the total number of single-core experiments conducted.
Iii Results and Analysis
This section presents and analyzes the different relationships we observed throughout the experiments. Specifically, we explore how the algorithm used as well as various image characteristics affect energy consumption, processing duration, and accuracy.
Iii-a Image Resolution
In this section, we first study the effect of image resolution on energy consumption and processing duration. We observed a linear trend between image resolution and energy consumption for each algorithm during both phases when the dataset size is held constant. Figure 3 displays a subset of the collected results and demonstrates this trend for both the RPi and BB during the training phase of logistic regression when the dataset size is held constant.
|Dataset||Device||Dataset Size||% Increase|
A higher resolution implies the device must analyze more pixels in order to classify the image. An increase in the number of features (pixels) increases the memory consumption and prediction latency. For a matrix of instances with features, the space complexity is in . From a computing perspective, this also means that the number of basic operations (e.g., multiplications for vector-matrix products) increases as well. Overall, the prediction time will (at least) linearly increase with the number of features. Depending on the global memory footprint and the underlying estimator used, the prediction time may increase non-linearly . Increasing the memory and the prediction latency directly increases the energy consumption.
Iii-B Dataset Size
For this experiment, we held all other parameters constant and varied the number of images in the training and testing sets using the 5 standardized datasets and sizes. Similar to the first experiment, we observed that a linear relationship exists between energy consumption and dataset size for each algorithm when the image resolution was held constant. Figure 4 is another subset of the collected results that shows this relationship on both hardware platforms during the testing phase of k-NN.
This trend was expected because as the number of images the device has to process during the training phase and classify during the testing phase increases, the longer the device will be running and consuming energy.
Iii-C Image Dimensions
Datasets with 3-D data (e.g., 28283) generally show higher energy consumption than datasets with 2-D data (e.g., 2828). Both the CIFAR and CHEST datasets had 3-D data and their energy consumption was consistently higher than the remaining datasets. In addition, the CIFAR dataset consistently consumes the most energy because not only does it contain 3-D data, but the matrices representing the images are not sparse. In order to quantify this increase, we took an average of the energy consumption for each line cluster on our plots and compared it with the lines not belonging to a cluster. For example, in Figure 3(b), we calculated the average energy consumption of the non-CIFAR datasets and compared it to the average energy consumption of the CIFAR dataset. However, in Figure 4(c), we calculated the average energy consumption of the Fashion, Digits, and Faces datasets and compared it with the average energy consumption of CIFAR and CHEST separately. On average, we found that training a logistic regression model using CIFAR images for dataset sizes of 300, 900, and 1,500 consumes 550%, 612%, and 636% more energy, respectively, on the RPi. We observe a similar trend on the BB, which, under the same circumstances, consumes 446%, 583%, and 633% more energy, respectively. This is because the CIFAR dataset images contain various colors throughout the entire image as shown in Figure 1
, whereas the CHEST images are greyscale and the variance in color is concentrated in the center of the images (the CHEST images generally show a white chest in the center and a black background) thus resulting in sparser matrices. Scipy, the Python module which Scikit-Learn is built on top of, provides sparse matrix data structures which are optimized for storing sparse data. The main benefit of sparse formats is that the space and time complexity decrease significantly. Specifically, the space complexity decreases because the format does not store zeros. Storing a non-zero value requires on average one 32-bit integer position, a 64-bit floating point value, and an additional 32-bit per row or column in the matrix. Therefore, prediction latency can be dramatically sped up by using sparse input since only the non-zero valued features impact the dot product and thus the model predictions. For example, suppose there are 500 non-zero values in dimensional space, using Scipy’s sparse formats reduces the number of multiply and add operations from to 500.
Though the image characteristics in isolation affect energy consumption, our results show that the ML algorithm used is consistently the greatest predictor of energy consumption and processing duration. This is because these algorithms are specifically tailored to specific tasks. For example, the CHEST dataset, which contains 3-D images, generally consumes the second highest amount of energy when using SVM and k-NN. However, when logistic regression, which is designed for binary classification, was run on the CHEST dataset, we see a dramatic decrease in energy levels, regardless of its high dimension because the CHEST dataset only has two classes.
In general, we found that logistic regression’s training phase consumes significantly more energy than the training phases of the other two algorithms. Figure 3(e) and 3(f) show that the training phase of logistic regression consumes up to approximately 450J and 1,400J for the RPi and BB, respectively. In comparison, the training phases of k-NN and SVM consume 2J and 100J on the RPi and 9J and 450J on the BB, respectively. This large discrepancy in energy cost is observed because training a logistic regression model for more than 2 classes, involves creating a separate classifier for each class. On the other hand, the testing phase for logistic regression consumes significantly less energy than SVM and k-NN because predicting a single image is simply a matter of taking the maximum output across each classifier generated during the training phase. This trade-off is an important consideration when determining which algorithm to use. For example, one could offload the training phase to a server, but have the predictions executed on the device itself and would be ensured the least amount of energy consumed across these three algorithms.
For k-NN, we also observed linear trends, as shown in Figure 4 for both the RPi and the BB. During its training phase, k-NN simply stores all the images in the training set. Suppose there are training examples each of dimension , then the complexity to compute the distance to one example is . To find a single nearest neighbor, the complexity is . Thus, to find the nearest neighbors, the complexity is . As the dataset size increases, the overall complexity increases, which in turn increases the energy consumed.
For SVM, the energy consumption depends on the number of support vectors. A higher number of support vectors indicates a higher model complexity. Our results show that, the processing duration asymptotically grows linearly with the number of support vectors. The number of support vectors increases when we increase resolution or dataset size, as demonstrated in Figure 5.
In addition, the non-linear kernel used (radial basis function in Scikit-learn) also influences the latency as it is used to compute the projection of the input vector once per support vector. Furthermore, since the core of a SVM is a quadratic programming problem which separates support vectors from the rest of the training data, Scikit-learn’s implementation of the quadratic solver for SVM scales betweenand , where is the number of features and is the number of samples. If the input data is very sparse, should be replaced by the average number of non-zero features in a sample vector. Figure 6 shows that the CIFAR and Faces datasets (which have dense matrices) consistently consume more energy relative to the other datasets during the training phase of the algorithm. Figure 6(e) highlights this trend with the CIFAR dataset consuming 636% more energy than the average consumption of the other four datasets.
Iii-E Time and Accuracy
In addition to measuring energy consumption, we also measured processing time and classification accuracy. Specifically, these studies enable us to offer guidelines for establishing trade-offs between energy consumption and accuracy. Table V and VI show the accuracy increase for each algorithm and dataset pair at sizes of 300 and 1500 images.
In general, when holding all other factors constant, accuracy does not significantly change when resolution was changed. For example, for the RPi, increasing the resolution from 1717 to 2828 (by 40%) while keeping the dataset size constant at 300 images, always resulted in at least double the time and little to no additional increase in accuracy. Table V shows that the maximum accuracy increase across all the experiments is approximately 14% when running logistic regression on a subset of the CIFAR dataset consisting of 300 images. However, this increases time by 236%. We observe that 7 out of the 15 experiments have the same accuracy even when increasing the resolution. Additionally, 6 out of the 15 experiments show a decrease in accuracy. Thus, 13 out of the 15, or roughly 90% of the experiments show that there is no additional benefit to using higher-resolution images. These accuracy trends, which are identical for the BB, are a critical consideration for many applications. As a result, one should generally opt for the reduced resolution.
Iii-F Multi-core vs Single-core
To quantitatively determine how the usage of multiple cores affects the energy consumption and processing time, we executed k-NN and logistic regression (the algorithms which could make use of multi-core processing) using all four cores on the RPi. Figures 7 and 8 demonstrate the differences between multi-core and single-core processing time and energy consumption for the testing phase of k-NN and the training phase of logistic regression. For both time and energy, there is a significant gap between using multi-core and single-core. On average, utilizing multi-core functionality, the processing time for k-NN and logistic regression was reduced by 70% and 42%, respectively. Using multiple cores also translated in a 63% and 60% decrease in energy consumption for the same two algorithms, respectively.
Iii-G Design Considerations
In this section, we present our main observations regarding the effect of hardware on performance as well as a set of design guidelines to create real-time IoT systems for the purpose of image classification.
While the trends we identified in the previous sections are consistent across the two hardware platforms, it is important to note the dramatic differences in their individual energy consumption. The RPi not only has a CPU that is 20% faster than the BB, but it also boasts nearly twice as much RAM. This variance in hardware results in the RPi completing the experiments much faster than the BB. Consequently, because the BB had to run longer to complete each task, we observe that on average it consumes more energy. This conclusion is best demonstrated by Figure 3(e) and 3(f) which display the training phase of logistic regression when a single core is used. The RPi, which could generally train a logistic regression model for more than 2 classes in less than 4 minutes, at most, consumed around 450J. By contrast, the BB, which generally took 15-19 minutes to train a logistic regression model, at most, consumed near 1,400J. This trend was observed to be consistent across all algorithms. More importantly, the RPi can achieve significantly higher performance compared to the BB, when the ML algorithm utilizes all the available four cores. In particular, the fast growth of low-cost, multi-core processors justifies their adoption at the IoT edge to lower energy consumption and enhance real-time processing.
Our guidelines are primarily concerned with balancing performance with energy cost. Foremost, we observe that, for small datasets, it is rarely beneficial to increase image resolution. In most cases, doing so is detrimental to the accuracy of the system and in all cases there is a significant increase in energy consumption as a result of additional image analysis. Second, we suggest that dataset size be constrained to a minimum. While increasing the training set to tens of thousands of images would likely result in greater accuracy, for the small set increments associated with current IoT systems, adding additional images provides negligible benefits. However, similar to increasing image resolution, increasing the dataset size or dimensionality will always translate in higher energy consumption. Third, we advise that images be captured in greyscale format if possible since color images often do not lend themselves to sparse matrix representations and thus cannot benefit from the optimizations associated with sparse matrix formats. It should be noted that in addition to enhancing performance, these methods also can be applied to improve user privacy. For example, low-resolution and sparse images that do not reveal the person’s identity could be captured by thermal cameras to achieve low-power and real-time activity recognition .
Iv Modeling and Predicting Energy Consumption
Our experimentation provided us with a sizable amount of data that can be used to model and predict the energy consumption. In this section, we utilize three statistical analysis techniques, discuss the drawbacks and benefits of each, and compare their performance in terms of prediction accuracy.
Iv-a Linear Regression
The first statistical analysis technique that we used is linear regression. Linear regression is the most common predictive model to identify the relationship among an independent variable and a dependent variable . Equation 1 represents the multiple linear regression line, in which we have more than one independent predictor variable. This line is fit to the data such that the sum of the squared errors is minimized.
Iv-B Gaussian Process
The second technique used is Gaussian process (GP). The main assumption of GP is that the data is sampled from a multivariate Gaussian distribution. A key benefit of using GP regression for energy consumption prediction is that we can obtain a predictive mean and variance. The function-space view of GP shows that a GP is completely specified by its mean function and co-variance function shown in Equation2 and 3 .
GP can then be written as
Iv-C Random Forest Regression
In random forest regression, a multi-variate input is used to estimate a continuous label where
using the probability density function. Our input contains the following features: device, resolution, number of images, color, number of dimensions, algorithm, and phase of the algorithm. All the features were coded to be categorical. For a feature with possible values, we created binary variables to represent it. For example, for 3 possible resolutions, we created 2 columns, and such that for a resolution of 1717, and . The full encoding for the features is summarized in Table VII.
|Resolution||17, 22, 28||2|
|Algorithm||k-NN, SVM, Log||1|
Constructing a random forest model can be broken down into the following steps:
Using random sampling, choose samples from the training data.
For each sample, randomly choose
features. Construct a decision tree using these features.
Repeat steps 1 and 2 for times to generate decision tree models.
The above process results in a random forest of trees. To predict the output of a new query point, pass the input to each of the trees. For regression, the output is the average of decision tree outputs. For classification, the output is the majority class label of the decision tree outputs . There are several benefits to using a random forest. First, it is efficient for large datasets with hundreds of input variables. Second, it does not require data pruning. Lastly, the generated forest can generalize well to data it has not been trained on .
Generating an accurate model using linear regression and Gaussian process were unsuccessful because each model requires assumptions made invalid by our data. For example, linear regression, as the name suggests, requires the relationship between the independent and dependent variables to be linear . This is not necessarily guaranteed by our data. After fitting a linear regression model to this data, the predictions were in error as shown in Table X, especially when we attempted to extrapolate beyond the range of the sample data. Likewise, a Gaussian process also requires certain assumptions about the data. For example, a Gaussian process requires that all finite dimensional distributions have a multivariate distribution 
. Specifically, each observation must be normally distributed. Considering our application, for a specific observationwhere and where if the image is 2-D or 1 if it is 3-D. For each observation, we only allow a 0 or 1 value for . This immediately precludes the normality assumption imposed by a Gaussian model .
We used k-fold cross validation with to select the random forest that produces the maximum R-squared value and minimum error. Performing k-fold cross validation with or has been shown empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance .
To assess and understand how the proposed prediction model performs on datasets that are not part of the original data, we chose two new datasets and collected data on their energy consumption. The first dataset was drawn from Caltech-256 and was chosen because it contains a more challenging set of object categories . From this dataset, we drew 10 separate, mutually exclusive classes. The images within these sub-datasets are in color and of varying image resolutions. The second verification dataset contains images of flowers . We chose this dataset because it contains five classes, which is a characteristic the random forest was not trained to predict. The images within this dataset are also in color and of varying image resolutions. Following the same experimental protocols, we separated images from each of the datasets into the standard dataset sizes and resolutions. For both datasets, we only chose the 3-D images because our prior experiments demonstrated that the datasets with 3-D images had the highest variations in energy consumption. The characteristics of both datasets are summarized in Table VIII.
Table IX highlights the drastic differences between the three attempted prediction models. Linear regression performed the worst, with the lowest R-squared value across both validation datasets and our original testing set. This table also demonstrates the poor performance exhibited by the Gaussian process model across all datasets. Though it did not perform as poorly as the linear regression model, it did not see the same level of success exhibited by the random forest.
The random forest predicts the energy consumption of the Caltech dataset with a R-squared value of 0.95 and a R-squared value of 0.79 for the Flowers dataset. The coefficient of determination, known as R-squared, can be interpreted as the percent of the variation in that is explained by the variation in the predictor . A value of 1 for R-squared indicates all of the data points fall perfectly on the regression line which means the predictor (features such as image size, resolution, etc.) accounts for all of the variation in . In general, the closer the R-squared is to 1.0, the better the model.
The random forest model is capable of ameliorating the high error rates of the two previous models because it can capture the non-linearity in the data by dividing the space into smaller sub-spaces depending on the features. In addition, there is no prior assumption regarding the underlying distribution of the features.
In order to quantitatively measure the random forest’s performance, we examined its RMSE. To place it in the context of the data the model was predicting, the RMSE must be normalized. One such method of normalization involves dividing the RMSE by the range to better capture the variety of data points. Table X highlights the range, RMSE, and normalized RMSE separated by algorithm and phase for each dataset. Datasets are coded as follows: original (O), Flowers (F), and Caltech (C).
In order to evaluate the random forest’s performance in terms of the two validation datasets, we took an average of their normalized RMSEs for each algorithm and phase. The average normalized RMSE for k-NN, SVM, and logistic regression during the training and testing phases are as follows: 72.1%, 11.6%, 8.4%, 15.8%, 18.7%, and 69.5%, respectively. Among these values, SVM has the lowest average error rate at 12.1%. This error is approximately 3.4 times less than what was exhibited by k-NN and logistic regression, on average. The model performs poorly for k-NN’s training phase and logistic regression’s testing phase on the verification datasets because of the extreme polarity and variation in the data for these algorithm and phase combinations. For example, for the Flowers dataset, the maximum value for logistic regression testing was 6.61J, while the minimum is 0.13J. This variation may cause the random forest to poorly predict this configuration. As anticipated, the forest outputs the lowest prediction accuracy for the Flowers dataset because the training set for the random forest did not include a dataset with five classes.
|ML Model||Flowers||CALTECH-256||Original Datasets|
As IoT systems become increasingly powerful, edge computing has become more practical compared to offloading data processing to cloud platforms. This trend has unlocked enormous potential in sectors focused on real-time measurement, as it allows IoT systems to quickly and reliably process data while consuming lower energy overall. This is particularly useful for IoT systems involved in image classification, where the timely processing of data is critical. Our experiments sought to explore the relationships between energy consumption, processing duration and accuracy versus various parameters including dataset size, image resolution, algorithm, phase, and hardware characteristics. In order to reliably identify and study the parameters affecting energy consumption, we benchmarked two IoT devices by running a wide variety of experiments. Our studies show that there are distinct linear relationships between dataset size and energy as well as dataset image resolution and energy. Choosing the lower resolution dramatically speeds up the execution time, reduces energy consumption, while maintaining the accuracy of a model trained on a higher resolution. Since energy profiling requires an accurate and programmable power measurement tool in addition to being a lengthy process, we proposed a random forest model which predicts energy consumption given a specific set of features.
While we demonstrated that our model can predict energy consumption with acceptable accuracy for inputs with previously unseen characteristics, it relied on most of the remaining parameters being similar. To address this concern, this model would greatly benefit from additional training on datasets with completely different characteristics. In order to further increase the versatility of the model, future work could be focused on recording data from additional hardware devices to be used to train the model to better predict energy consumption across a list of devices that more accurately depicts the diversity of hardware options in the IoT community. Additionally, in its current state, this model can only be used statically. Should a user rely on this model for a specific task, they would have to test each algorithm against that task to find which would consume the least amount of energy prior to deployment. Future work may focus on creating a system that dynamically changes the algorithm based on the real-time data returned from the model. Furthermore, future work could also focus on adding more ML algorithms to the model. Pairing this concept with the addition of a model trained on multiple hardware devices would result in a robust system capable of performing tasks optimally and making the best use of its limited computational resources, especially energy.
This research has been partially supported by the Santa Clara Valley Water District (grant# SCWD02) and Latimer Energy Lab. This project involves the development of a flood monitoring system where Linux-based wireless systems, which rely on solar or battery power, capture images for analysis using ML to classify and report the debris carried by rivers and streams.
-  M. Hung, “Leading the IoT, Gartner Insights on How to Lead in a Connected World,” 2018.
-  A. Kumar, S. Goyal, and M. Varma, “Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things,” in International Conference on Machine Learning, 2017, pp. 1935–1944.
-  R. K. Naha, S. Garg, D. Georgakopoulos, P. P. Jayaraman, L. Gao, Y. Xiang, and R. Ranjan, “Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions,” IEEE Access, 2018.
-  J. Ni, K. Zhang, X. Lin, and X. S. Shen, “Securing Fog Computing for Internet of Things Applications: Challenges and Solutions,” IEEE Communications Surveys & Tutorials, vol. 20, no. 1, pp. 601–628, 2017.
-  J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao, “A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications,” IEEE Internet of Things Journal, vol. 4, no. 5, pp. 1125–1142, Oct 2017.
-  J. Wei, “How Wearables Intersect with the Cloud and the Internet of Things : Considerations for the Developers of Wearables,” IEEE Consumer Electronics Magazine, vol. 3, no. 3, pp. 53–56, July 2014.
-  D. Metcalf, S. T. J. Milliard, M. Gomez, and M. Schwartz, “Wearables and the Internet of Things for Health: Wearable, Interconnected Devices Promise More Efficient and Comprehensive Health Care,” IEEE Pulse, vol. 7, no. 5, pp. 35–39, Sept 2016.
-  I. Amirtharaj, T. Groot, and B. Dezfouli, “Profiling and Improving the Duty-Cycling Performance of Linux-based IoT Devices,” arXiv preprint arXiv:1808.10097, 2018.
-  C. for Energy-Efficient Telecommunications, “The power of wireless cloud,” 2013.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” in
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
-  M. Shafique, T. Theocharides, C.-S. Bouganis, M. A. Hanif, F. Khalid, R. Hafız, and S. Rehman, “An Overview of Next-Generation Architectures for Machine Learning: Roadmap, Opportunities and Challenges in the IoT era,” in Design, Automation & Test in Europe Conference & Exhibition. IEEE, 2018, pp. 827–832.
-  A. Ltd, “Machine Learning on Arm — Deploying Neural networks on Android-Based Mobile and Embedded Devices.” [Online]. Available: https://developer.arm.com/technologies/machine-learning-on-arm/developer-material/how-to-guides/optimizing-neural-networks-for-mobile-and-embedded-devices-with-tensorflow/deploying-neural-networks-on-android-based-mobile-and-embedded-devices-single-page
-  W. Cui, Y. Kim, and T. S. Rosing, “Cross-platform Machine Learning Characterization for Task Allocation in IoT Ecosystems,” in Computing and Communication Workshop and Conference (CCWC), IEEE 7th Annual, 2017, pp. 1–7.
-  R. J. Carbajales, M. Zennaro, E. Pietrosemoli, and F. Freitag, “Energy-Efficient Internet of Things Monitoring with Low-Capacity Devices,” in IEEE World Forum on Internet of Things: 14-16 December, Milan, Italy: proceedings. Institute of Electrical and Electronics Engineers (IEEE), 2015, pp. 305–310.
-  N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar, “An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices,” in Proceedings of the international workshop on internet of things towards applications. ACM, 2015, pp. 7–12.
-  “Top 10 Open Source Linux and Android SBCs,” Aug 2014. [Online]. Available: https://www.linux.com/news/top-10-open-source-linux-and-android-sbcs
-  “Top 10 Best Open-Spec Hacker SBCs,” Jun 2016. [Online]. Available: https://www.linux.com/news/raspberry-pi-stays-top-survey-81-open-spec-sbcs
-  “BCM2837 ARM Peripherals.” [Online]. Available: https://web.stanford.edu/class/cs140e/docs/BCM2837-ARM-Peripherals.pdf
-  “BeagleBone Black Wireless.” [Online]. Available: https://beagleboard.org/black-wireless
-  B. Cabe, “Key Trends from the IoT Developer Survey 2018,” Eclipse Foundation, 2018.
-  D. Dang, “Reducing the Cost,Power and Size of Connectivity in IoT Designs,” Feb 2018. [Online]. Available: http://www.ti.com/lit/wp/sway013/sway013.pdf
-  B. Dezfouli, I. Amirtharaj, and C.-C. Li, “EMPIOT: An energy measurement platform for wireless IoT devices,” Journal of Network and Computer Applications, vol. 121, pp. 135 – 148, 2018.
M. Mahmud, M. S. Kaiser, A. Hussain, and S. Vassanelli, “Applications of Deep Learning and Reinforcement Learning to Biological Data,”IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2063–2079, 2018.
-  C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008. [Online]. Available: https://books.google.com/books?id=t1PoSh4uwVcC
-  F. E. Harrell, “Ordinal Logistic Regression,” in Regression modeling strategies. Springer, 2015, pp. 311–325.
-  W. contributors, “K-Nearest Neighbors Algorithm Wikipedia, The Free Encyclopedia,” 2018.
-  Z. Wang and X. Xue, “Multi-Class Support Vector Machine,” in Support Vector Machines Applications. Springer, 2014, pp. 23–48.
-  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
-  Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.
-  H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
-  A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
-  D. S. K. et al, “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,” Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.
G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments,” University of Massachusetts, Amherst, Tech. Rep. 07-49, October 2007.
-  V. Olga, “Nonparametric Density Estimation Nearest Neighbors, KNN.”
-  E. Griffiths, S. Assana, and K. Whitehouse, “Privacy-preserving image processing with binocular thermal cameras,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 1, no. 4, pp. 133:1–133:25, Jan. 2018. [Online]. Available: http://doi.acm.org/10.1145/3161198
G. A. Seber and A. J. Lee,
Linear Regression Analysis. John Wiley & Sons, 2012, vol. 329.
-  D. A. Richard, “Lecture Notes from Department of Statistics: Gaussian Processes.”
-  Y. Suter, C. Rummel, R. Wiest, and M. Reyes, “Fast and Uncertainty-Aware Cerebral Cortex Morphometry Estimation Using Random Forest Regression,” in Biomedical Imaging (ISBI), International Symposium on. IEEE, 2018, pp. 1052–1055.
-  Y. Feng and S. Wang, “A forecast for bicycle rental demand based on random forests and multiple linear regression,” in Computer and Information Science (ICIS), 16th International Conference. IEEE, 2017, pp. 101–105.
-  J. K. Jaiswal and R. Samikannu, “Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression,” in World Congress on Computing and Communication Technologies (WCCCT). IEEE, 2017, pp. 65–68.
-  R. Nau, “Lecture Notes: Testing the Assumptions of Linear Regression.”
-  G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning. Springer, 2013, vol. 112.
-  G. Griffin, A. Holub, and P. Perona, “Caltech-256 Object Category Dataset,” 2007.
-  A. Mamaev, “Flowers Dataset, Reviewed Dataset from Kaggle.”
-  “STAT 501: Regression Models, The Coefficient of Determination, R-squared,” 2018.