Zygarde: Time-Sensitive On-Device Deep Intelligence on Intermittently-Powered Systems

by   Bashima Islam, et al.

In this paper, we propose a time-, energy-, and accuracy-aware scheduling algorithm for intermittently powered systems that execute compressed deep learning tasks that are suitable for MCUs and are powered solely by harvested energy. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (even though compressed) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a deep neural network. We develop an imprecise computing-based scheduling algorithm that improves the schedulability of deep learning tasks on intermittently powered systems. We also utilize the dependency of the computational need of data samples for deep learning models and propose early termination of deep neural networks. We further propose a semi-supervised machine learning model that exploits the deep features and contributes in determining the imprecise partition of a task. We implement our proposed algorithms on two different datasets and real-life scenarios and show that it increases the accuracy by 9.45 the execution time by 14% and successfully schedules 33



There are no comments yet.


page 1

page 2

page 3

page 4


FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

Deep neural networks show great potential as solutions to many sensing a...

Scheduling Real-time Deep Learning Services as Imprecise Computations

The paper presents an efficient real-time scheduling algorithm for intel...

ReaDmE: Read-Rate Based Dynamic Execution Scheduling for Intermittent RF-Powered Devices

This paper presents a method for remotely and dynamically determining th...

Intermittent Learning: On-Device Machine Learning on Intermittently Powered System

In this paper, we introduce the concept of intermittent learning, which ...

Intermittent Inference with Nonuniformly Compressed Multi-Exit Neural Network for Energy Harvesting Powered Devices

This work aims to enable persistent, event-driven sensing and decision c...

Learnergy: Energy-based Machine Learners

Throughout the last years, machine learning techniques have been broadly...

A Lagrangian Dual Framework for Deep Neural Networks with Constraints

A variety of computationally challenging constrained optimization proble...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Batteryless Systems. The Internet of Things (IoT) promises to make our lives efficient, productive, enjoyable, and healthier by making everyday objects capable of sensing, computation, and communication. Many of these so-called IoT devices are powered by limited-capacity batteries—which makes them mobile, small, and lightweight. Batteries, however, require periodic maintenance (e.g., replacement and recharging) which is an inconvenience at a large scale. To address this practical problem, batteryless IoT devices have been proposed, which harvest energy from ambient sources, e.g., solar, thermal, kinetic, and RF to power up. These devices, in principle, last forever—as long as the energy harvesting conditions are met. They typically consist of low-power sensors, microcontrollers, and energy-harvesting and management circuitry, and their applications are in many deploy-and-forget scenarios, e.g., wildlife monitoring, remote surveillance, environment and infrastructure monitoring, wearables, and implantables. Time-Aware Inference. Many IoT applications require timely feedbacks. For instance, in an acoustic monitoring system, audio events need to be detected and reported as fast as possible to initiate prompt actions. Similarly, an air-quality monitoring system needs to identify the increase of a certain air component on time for taking proper actions. Likewise, shared resources, such as gym-equipment and shared bikes in a campus, can be monitored in real-time to detect misuses or malfunctions, and to inform the authority about the incidence on time. While a batteryless system is desirable in these real-time sensing and event detection applications, the unpredictability of the harvested energy, combined with the complexity of on-device event detection tasks, complicates timely execution of machine learning-based event detection tasks on batteryless systems.

Figure 1. (a) With constant power both deadlines are met. (b) With intermittent power, task misses deadline. (c) When, execution of the whole task is not necessary, both tasks can be scheduled successfully.

Prior Work on Timeliness. Prior works on time-aware batteryless computing systems can be broadly categorized into two types. The first category focuses on time-keeping, i.e., maintaining a reliable system clock (Rahmati et al., 2012; Hester et al., 2016b) even when the power is out. The sporadic nature of an energy harvesting system forces it to run intermittently by going through alternating episodes of power ON and OFF phases which disrupts the continuity of the system clock. By exploiting the rate of decay of an internal capacitor and the content of the SRAM, these systems enable time-keeping during the absence of power. The second category proposes runtime systems that consider the temporal aspect of data across power failures (Hester et al., 2017; Buettner et al., 2011; Zhu et al., 2012; Yıldırım et al., 2018). For instance,  (Hester et al., 2017) discards data after a predefined interval and thus saves energy by not processing stale data, and  (Yıldırım et al., 2018; Buettner et al., 2011; Zhu et al., 2012) propose energy-aware runtime systems to increase the chances of task completion. However, none of these consider the utility of data or exploit the property of inference tasks to plan real-time deadline-aware execution of tasks. Real-Time Intermittent Computing. Scheduling time-aware machine learning tasks on a batteryless computing system is an extremely challenging feat. The two main sources of challenges are the intermittent power supply and the computational demand for executing machine learning tasks. These two challenges have been studied extensively in non-real-time settings. For instance,  (Ransford et al., 2012; Mirhoseini et al., 2013a; Balsamo et al., 2015, 2016; Maeng et al., 2017; Colin and Lucia, 2016; Maeng and Lucia, 2018) enable seamless execution of non-real-time tasks on intermittently powered systems by proposing techniques that save and restore the program states across power failures.  (Gobieski et al., 2018, 2019; Nirjon, 2018; Islam and Nirjon, 2019a) propose lightweight and compressed deep neural network inference for on-device machine learning on batteryless systems. However, none of these works consider the timing constraints of the machine learning tasks. In a real-time setting, simply applying these two types of solutions in conjunction with an existing real-time scheduling algorithm does not quite solve the problem at hand, which is illustrated in Figure 1. We consider two tasks, and , released at time 0 and 25, respectively. Their deadlines are 45 and 56, and both have an execution time of 28. In Figure 1(a), we observe that under the earliest deadline first (EDF) scheduling, both tasks meet the deadlines when the power is uninterrupted. But when power is intermittent, Figure 1(b) shows that task misses its deadline. Observations. The goal of this paper is to overcome the aforementioned challenges. Towards this end, we study the energy harvesting pattern and the accuracy-execution trade-off of compressed deep neural networks (DNNs) that are executable on small systems (Gobieski et al., 2019). From these studies, we make two observations: First, energy generated by a harvester is bursty, and therefore, its energy harvesting pattern can be modeled using a stochastic framework over a short duration. Here, burstiness indicates that energy generation is maintained during a short period. Second, since deeper layers of a DNN extract fine-grained and more detailed features of the input, for a given accuracy, the amount of DNN computation required for an input data is dependent on the quality of data itself. The Zygarde Approach. By exploiting these above two observations, we design an imprecise computing (Shih and Liu, 1992; Liu et al., 1991), online scheduling algorithm that considers both the intermittent nature of the power supply and the accuracy-execution trade-off of the DNN model. This design allows us to increase both accuracy and timeliness of DNN execution in intermittent systems. For example, in Figure 1(c), when a full execution of the task is not necessary, i.e., tasks are imprecise, both tasks can meet the deadline. Our work complements previous work on time-keeping (Rahmati et al., 2012; Hester et al., 2016b) and intermittent execution of non-real-time tasks (Gobieski et al., 2018, 2019; Ransford et al., 2012; Mirhoseini et al., 2013a; Balsamo et al., 2015, 2016; Maeng et al., 2017; Colin and Lucia, 2016; Maeng and Lucia, 2018). We extend the state-of-the-art intermittent DNN execution framework, called SONIC  (Gobieski et al., 2019), by implementing a runtime framework that supports intelligent scheduling of real-time DNN tasks. To enable time-aware adaptive deep learning in intermittently powered systems, we make three key technical contributions: First, we devise a metric, i.e., the

-factor, to model the predictability of an energy harvester’s source. This metric indicates the probability of a harvester maintaining its current state over a short period in time. The introduction of

-factor abstracts away the unpredictability of an energy harvesting source and enables development of scheduling algorithms that can make informed decisions based on predicted energy over a short period in the future. Second, we redesign the DNN construction and training algorithms to enable early termination of a DNN task based on the quality of the input data. To enable this, we propose a

layer-aware loss function

to improve the accuracy of a clustering-based, semi-supervised inference algorithm that uses DNN layer outputs as the representation of the input examples. Third, we propose an imprecise computing-based, online, scheduling algorithm that improves the timeliness of DNN inference tasks running on energy harvesting systems. This algorithm leverages the -factor of the energy source, along with the properties of the input data to adapt the execution of real-time DNN tasks. Main Results. We implement the system on a TI MSP430FR5994 and evaluate its performance using two datasets: MNIST (LeCun et al., 1998) and ESC-10 (Piczak, 2015), as well as in real-world acoustic event detection experiments. We achieve 9.45% - 3.19% higher accuracy than the baseline DNN algorithms for MCUs and achieve 14% reduction in execution time. The proposed scheduling algorithm successfully schedules 33%-12% more tasks than traditional scheduling algorithms.

2. Motivation

On-Device Learning. Resource-constrained sensing and inference systems enjoy the benefit of machine learning in two ways – either they send raw or partially processed sensor readings to a remote server for inference, or they do everything on device. While most low-power IoT systems largely use the former method (de Godoy et al., 2018; Shao et al., 2018; Choudhury et al., [n. d.]; Chandrasekaran et al., 2016; Xia et al., 2019), in recent years, we see an increasing trend in on-device machine learning on embedded systems  (goo, 2017a; Blog, 2018; Ravi, 2017; goo, 2017b). This can partly be attributed to the limitations of server-based systems – which are generally energy-demanding, slow, less reliable, and privacy invasive. The other reason behind this increasing trend is the advancement in hardware and software technologies (qua, 2017a, b; Edg, 2018; app, 2018; Qua, 2017; goo, 2017a; app, 2018) that are enabling powerful machine learning features into small and low-power systems. Learning on Batteryless Systems. Previous works have addressed the need for computation including inference of deep neural network on energy harvested devices  (Gobieski et al., 2018, 2019; Ransford et al., 2012; Colin and Lucia, 2016; Maeng et al., 2017). However, the necessity of updating the models and timeliness is yet to be explored. As no two scenarios or people are genuinely similar, the need to have a customized model is inevitable. Besides, the models of a forever executing batteryless system become outdated with time. To address the same issue, on-device learning (including training) is introduced (goo, 2017a)

, which improves accuracy and provides a personalized system that ensures privacy, lower latency, and reliability. Moreover, most batteryless systems are deployed at unreachable places (e.g., deep jungles, calamity-prone areas) where the traditional power source is absent and replacing a battery is unrealistic. It is also not feasible to collect data and update the model on-site in such cases. With time, such devices observe a massive amount of information which is likely to be wasted due to the inefficiency of data transmission. To fully utilize the potential of batteryless devices, training the models is needed. However, the small footprint of batteryless devices works against the extra computational and energy load enforced by training models and the unavailability of labeled data obstruct complicated adaptation techniques (e.g., back propagation). Light-weight semi-supervised learning algorithms needs to be considered to evolve previously trained models with the incoming data stream.

Deep Inference.

Due to its non-linear and parametric model, Deep Neural Network (DNN) exhibits better performance than other traditional models, e.g., Support Vector Machine (SVM). Here, DNN refers to neural networks with more than one hidden layer. Sending the inference result instead of the raw data is more energy efficient and high accuracy is essential to maintain the system’s usability.  

(Gobieski et al., 2019) shows that inference accuracy determines the end-to-end system performance. Time-Aware Inference. The higher accuracy of DNN is achieved using more rigorous computations that result in higher execution and response time. For a usable system, the response time needs to be tolerable (Zhu et al., 2012; Moser et al., 2006, 2007). High delay of a system hampers its usability despite accuracy. Though some works have addressed real-time requirements for batteryless sensor nodes, the accuracy of the system itself has not been considered yet. Precision and responsiveness both are crucial for the usability of a learning system and are contradictory to each other. The goal is to find the sweet spot where the highest accuracy can be achieved with acceptable delay. Such a time-aware application includes event detection and monitoring wildlife, natural calamities, wearable, implantable, infrastructure, and buildings. To illustrate, an acoustic event detector at home enables home activating monitoring, intruder detection, and elderly monitoring. Similarly, gym-equipment and shared bike usage can be monitored using kinetic energy harvesting systems which can inform the authority about the required maintenance of the system.

3. System Modeling

In this section, we study and model the energy harvesting pattern of energy harvesters and the accuracy-execution trade-off of deep neural networks.

3.1. Modeling Energy Harvesting Pattern

Energy Events. Transiently powered systems operate intermittently because energy is not always available to harvest and, even when energy is available, buffering sufficient energy to perform adequate work takes time. In most cases, the pattern of this intermittency is stochastic and thus modeling this patter is not straight forward. To schedule the workload of an intermittently operating system at run-time, we decide whether to start execution of a task or not at a time instant. This decision heavily depends on the availability of harvestable energy. To model the availability of energy, we define energy event which expresses the availability of sufficient energy during a period. Energy event represents a successful generation of at least K Joules of energy in total during T time slot. Here, K and T are system dependent. In order to better understand the property of energy events, we observe the phenomenons causing energy events. For example, in a piezo-electric harvester, taking a minimal number of steps that generates at least K Joule of energy during T time slot is considered equivalent to the occurrence of an energy event. Similarly, we consider a minimal number of packet transmissions per time slot and minimum intensity of solar per time slot as energy events for RF and solar harvesters, respectively. Properties of Energy Events. We study energy event patterns of three commonly used harvesters – piezo-electric harvester, solar harvester, and RF harvester from datasets  (Meyer et al., 2016; ipd, [n. d.]). These datasets contain the number of steps taken during every 5-minute time-slot for 61 days, harvested solar energy measurements for three days and outbound packet transmission rate by an RF transmitter for 30 days. This study reveals two interesting observation about the pattern of energy events – (1) energy events occur in bursts where burstiness is the intermittent increases and decreases in activity or frequency of an event (Akbarpour and Jackson, 2018), (2) a probabilistic relation exists among the consecutive energy events during a short period. In other words, the occurrence of an energy event increases the probability of the next energy event during a short period. To illustrate, when a person starts walking, the probability of continuing the walk is high within the first few time slots and it decreases with time. Likewise, when a person is sitting, the probability of remain seated is high immediately, but decreases after a while. Conditional Energy Event. We define conditional energy event (CEE) that represents the conditional probability of an energy event occurrence based on the occurrence/absence of previous consecutive energy events. CEE(N) is the probability that an energy event will occur given immediately preceding N consecutive energy events occurred (for ) or not occurred (for ). The following equation expresses CEE.


To illustrate CEE(10) = 90% implies that the next energy event will occur with 90% probability if 10 immediately preceding consecutive energy event occurred. Similarly, CEE(-15) = 5% indicates the probability of an energy event at the current time slot is 5%, given that there were no energy events in the last 15 slots.)

(a) KW=0, =1
(b) KW=0.17, =0.65
(c) KW=0.06, =0.84
(d) KW=0.11, =0.76
Figure 6. (a)CEE for persistent power source. (b) CEE for piezo-electric harvester. (c) CEE for solar harvester. (d) CEE for RF harvester. We use N=20 for calculating KW and .

The CEE of a system powered by a persistently power supply or an ideal harvester that has no intermittence looks like Figure  6(a). Figure  6(b-d) shows the CEE of three energy harvested systems. From these figures, we observe that for a small value of N these systems demonstrate similarity with the ideal correlative harvester. We measure the similarity of CEE of a harvested system with persistent powered or ideal harvested system using Kantorovich-Wasserstein (KW) distance (Ramdas et al., 2017). Through out this paper, we use KW to express the Kantorovich-Wasserstein distance between the CEE a system (H) and the CEE of persistently powered system (P) as given in Equation 2. We also observe that for a large N the CEE drops significantly because when the interval time between the first and current event increases their probabilistic relation decreases. For example, a person is walking for a long time has a high probability of stopping.


 Factor. To quantify the predictability of a harvester, we take inspiration from  (Srinivasan et al., 2008) and define a predictability metric . Despite being informative, distance alone is not sufficient to measure predictability. Because distance does not address the imbalance between the number of elements in CEE with positive and negative N values. To address this, we quantify   as the distance of the harvester (H) from a persistently powered system compared to the distance of the random harvester from a persistently powered system with same energy event rate. Random harvester (R) refers to harvesters where energy events are completely independent.   is expressed as follows.


  indicates that the power is persistent, while   indicates energy events are totally random. This metric is also depends on the number of consecutive energy events. As CEE(N) is close to zero for high N, we consider small N throughout the paper. Figure 6(b-d) shows the KW and   where N is 20.   not only varies across different harvesters but also changes for a specific harvester. For transiently powered devices this change over time depends on different parameters, e.g., change of human weight, change of seasons or locations, the distance between RF transmitter and receiver. However, due to the infrequency of these changes, they can be ignored.

3.2. Modeling Deep Neural Network

In order to execute machine learning tasks (e.g., deep neural network (DNN) inference with convolution and fully connected layers) in a resource-constrained batteryless system, we need to minimize memory and computation costs. To achieve this goal, we study several attributes that are unique to deep learning processes. Significance of Depth. DNN have layered structures where the input of the first layer is from an external source, e.g., sensors. The output of a layer is fed as the input of the next layer, and it goes on until the end of the network. The total number of layers in a DNN is called depth

. A shallow neural network, e.g. first perceptrons are composed of one input layer, at most one hidden layer and one output layer. A neural network having more than three layers (including input and output) qualifies as ”deep” neural network. Increased number of layers and neurons both contribute to more complicated calculation resulting in higher accuracy 

(He et al., 2016). However, a shallow network requires width exponential to that of a deeper network to achieve similar accuracy. Therefore, the performance of a neural network depends on not only the number of parameters but also depth. For example, VGGNet has 16 layers with  140M parameters, while ResNet beats it with 152 layers but only  2M parameters.

Figure 7. Example of deep learning learns layers of features.

The depth of a DNN is significant because at each layer nodes train on a distinct set of features based on the output of the previous layer. The complexity and abstraction of features increase with the depth of layers, and it is known as the feature hierarchy. Deep learning can extract features from data without human intervention. This automatic feature extraction is known as representation learning. To understand the effect of the depth of a DNN, let us consider a face detector in Figure 

7(a). The first layer of this deep learning based face detector learns basic features, e.g., edges. In the next layer, a collection of edges, e.g., nose, is learned. A deeper layer learns a higher level feature, e.g., face abstraction.

Figure 9. Hard data requires complex representation to achieve similar accuracy.

Required Depth. To decrease the execution time we observe the fact that depth is highly data-dependent (Bolukbasi et al., 2017). If target classes are profoundly distinctive, then simple features can be used to distinguish them. For example, in Figure 9, audio of cat and water (easy data) are very distinguishable; thus a single layer CNN achieves 93% classification accuracy. On the other hand, similar classes, e.g., train and helicopter (hard data) needs more complex representations to be distinct and thus require five layers of CNN to achieve 81% accuracy. Representation learning uses deep neural network to extract features from raw data. By executing only necessary layers based on the complexity of data we can achieve similar accuracy with decreased execution time. We consider execution of each data sample as a task and introduce the requirement of depth as a imprecise task model (Shih and Liu, 1992; Liu et al., 1991; Canon et al., 2018). Each imprecise task consists of two portions – mandatory and optional. Mandatory portion of a task is necessary to achieve required accuracy while executing optional portions further improves the performance. We consider the required depth for a data sample as mandatory portion and rest as optional. DNNs for Ultra-Low-Power Systems. Fitting neural networks in resource-constrained energy harvesting systems is a challenge especially due to limited memory capacity. Most commonly used processors in existing intermittent systems are TI MSP430 low-power microcontrollers (Colin et al., 2018; Hester et al., 2016a; Gobieski et al., 2019; Hester et al., 2015; Hester and Sorber, 2017) that include 1-4KB of SRAM and 32-256KB of FRAM. Therefore, only small compressed networks are suitable for these systems. Compressed networks mentioned in Table 1 and previous work (Gobieski et al., 2019)

have 48,136-10,411 parameters. For 16-bit fixed point data type these parameter requires 96.3-20KB of memory. On the other hand, same networks require 943.66-180KB of memory without compression. Such networks can be executed in batteryless systems only after compression. However, larger networks e.g., Resnet (image) and EnvNet (audio) requires 5MB and 94MB of memory which is not suitable for small memory footprint devices even after compression. We execute MNIST(compressed) in TI MSP430 to estimate the size of code and other variables in the FRAM. We observed that instructions and other parameters (including a buffer to perform matrix operations) require around 128KB of memory. Though the increasing number of parameters increases the buffer size, for simplicity we ignore it. The remaining memory can be used for storing approximately 64,000 16 bit fixed point parameters. A deep neural network that requires less than 64,000 parameters can execute in a microcontroller having 256KB of FRAM. The number of hidden layers and the number of neurons at each layer depend on the number of input and output 

(Heaton, 2011)

. Therefore, very large dataset e.g., Imagenet is not suitable for such systems. These conditions are sufficient for executing 4-6 layer networks depending on the network configuration 

(Gobieski et al., 2019). This paper considers two networks summarized in Table 1. MNIST (LeCun et al., 1998) represents the image-based application, and ESC-10 (Piczak, [n. d.]) represents audio applications. We use two known techniques – rank decomposition or separation (Bhattacharya and Lane, 2016; Chollet, 2017; Xue et al., 2013; De Lathauwer et al., 2000a, b; Tucker, 1966) and pruning (Han et al., 2015; Nabhan and Zomaya, 1994) to compress each layer of the networks. Note that, our semi-supervised models avoid the last layers for inference.

Dataset Layer
MNIST Convolution 20155 31D Convolution
Convolution 1002055 1253
Fully Connected 2001600 5456
Fully Connected 500200 1892
Fully Connected 10500
ESC-10 Convolution 16155 31D Conv
Convolution 321655 1280
Convolution 643255 5068
Fully Connected 96256 2703
Fully Connected 1096
Table 1. Networks considered in this paper.

4. Zygarde System Design

Zygarde is a system architecture that executes semi-supervised deep learning with timing constraints in an intermittently powered system. It uses a deep neural network to extract complex features from data samples, where the deep neural network is pre-trained on a high-end device. Zygarde adapts to new unlabeled incoming data by using semi-supervised models, e.g., seed-based k-means 

(Basu et al., 2002). In seed-based k-means, initial centroids are defined from labeled data in training phase and at runtime the unlabel data update these centroids. Zygarde  relies on imprecise computing to maximize the number of samples meeting timing constraints in a batteryless platform. Zygarde aims to achieve three goals simultaneously – (1) minimize classification error, (2) maximize the number of samples meeting time constraints, and (3) minimize energy waste. Zygarde addresses this by terminating the deep feature extraction network early when necessary and scheduling the samples with a special online scheduling algorithm for batteryless system that considers time, classification error and availability of energy simultaneously.

Figure 10. Zygarde system architecture.

4.1. System Components

Zygarde consists of five major components – task generator, energy manager, agile DNN model, scheduler and adaptive models. Task Generator. Zygarde  gathers data from sensors (e.g. microphone, camera) and considers each data sample as a task. A task includes inference of agile DNN model and semi-supervised learning using adaptive model with adaptation for a data sample. To sense, it takes advantage of the analog to digital converter (ADC) and direct memory access (DMA) which writes sensor data to non-volatile memory without occupying the CPU. Each task (data sample) is pushed in the task queue upon arrival. Each task contains two portions – mandatory and optional. Only tasks in the task-queue are considered for execution and a task leaves the queue at the end of execution or at deadline. Energy Manager. The energy manager monitors the status of the energy storage (e.g. capacitor) and the energy harvesting rate. To measure the energy harvesting rate it relies on the system operating voltage and the voltage across the capacitor. These parameters are fed to the scheduler that determines whether to execute a task depending on these parameters. When energy is less than a minimum threshold (), power failure occurs and nothing gets executed. We use SONIC (Gobieski et al., 2019) to handle intermittent execution of tasks in this system. Agile DNN Model. Agile DNN model is a pre-trained feature extraction deep neural network. This deep neural network is trained with labeled data to extract the features from data for semi-supervised learning. This network is trained in a high-end device (e.g. server or GPU). We compress the trained network using rank decomposition and separation to fit in memory-constrained systems. To achieve better classification/ clustering accuracy in the earlier layer of the network, we propose a layer-aware loss function. The goal of this loss function is to extract distinctive features in the early stages of the network if possible. Adaptive Models.

Adaptive models are a set of seed-based k-means models. These models classify the sensor data by using the features extracted from the Agile DNN models. To select only useful features and to decrease the model size, these models use features with highest Chi-squared stats. Adaptive models are incrementally updated to evolve with new data and adapt to dynamic environment. These models are utilized to determine confidence of accuracy (utility) to exit the DNN layers.

Scheduler. The scheduler decides which task to process and partitions the task based on utility. This utility decides if the data sample requires further processing for more confident decision making. The scheduler uses this mandatory and optional segments, achieved confidence (utility) and energy status from the energy manager to decide which data sample to process. When a function of current energy of the system () and   is less than a threshold (), Zygarde  becomes conservative in its choice for execution and considers only mandatory portions of tasks. The high probability for low energy harvesting and possible power failure in the near future leads to this decision. Otherwise Zygarde  considers both mandatory and optional portions for execution.

4 1 3
4 2 2
Table 2. Task Description
Figure 11. Example of execution of Zygarde.
Time Reason of the Action
No task in the system.
(the only task) gets scheduled.
Since ¡ , optional is not scheduled.
System prioritized over (See: Section 5).
Since ¡ no task is scheduled.
System prioritized mandatory over optional .
Only optional tasks remain and ¿. The system prioritizes over due to its tighter deadline.
(the only task) gets scheduled.
(the only task) gets scheduled.
Table 3. Description of Figure 11

4.2. Example Execution

We describe a simple workload consisting of real-time inference of two examples and demonstrate how Zygarde  executes the workload. We describe the tasks in Table 2, where refers to task i. Figure 11 demonstrates execution of task and along with energy status. Here, refers to layer of task i. Table 3 provides the actions and reasons of the actions taken at each time step. Note that this example uses simplified assumptions (e.g. each layer requires single time unit to execute). The algorithm mentioned in Section 5 handles complexities (e.g., different execution time for each layer, multiple time units per layer, unknown mandatory layer number and, power-failure during layer execution).

5. Real-Time Scheduler

This section describes a generic real-time task scheduler for intermittently powered systems where each task executes as a chain of sequential subtasks that can be partitioned into mandatory and optional parts. We define the task model and task prioritization metric, and describe the scheduling algorithm.

Figure 12. Task model for task

5.1. Task Model

Define Task. We define each data sample entering Zygarde  as an imprecise task111According to the definitions in the real-time systems community, we should call each data sample a job. However, as each task has only one job in this system, we use task for each data sample for simplicity. (Shi et al., 2009; Liu et al., 1991), . Data samples enter Zygarde  in a sporadic manner and multiple data samples/tasks can exist at any time point. The task is defined at where are the deadline of task and execution time of subtask of task . The subtasks maintain a strict precedence order. An imprecise task is divided into two portions – mandatory and optional. According to the definition of imprecise scheduling, successful execution of the mandatory portion within deadline is considered as schedulable (Shi et al., 2009; Liu et al., 1991). Figure 12 shows the task model of a task . Each subtask consists of multiple units that execute atomically in an batteryless system that maintain the precedence order. A unit is similar to a task in task-based intermittent models mentioned in  (Gobieski et al., 2019, 2018; Colin and Lucia, 2018; Maeng et al., 2017) that needs to be restarted if there is a power down before it finishes. The scheduler of Zygarde  works with the subtasks and the units are maintained by SONIC (Gobieski et al., 2019). Utility and Runtime Task Partitioning. The utility is an application specific parameter that indicates the system’s goal. For example, in a control system, the completion of a correct control task results in maximum utility. The utility of a task has a non-decreasing and non-linear correspondence with execution of each subtask. Mandatory portion of a task contains the subtasks which need to be executed to achieve a minimum utility. The rest of the subtasks belong to the optional portion. Unlike traditional imprecise computing  (Shi et al., 2009; Liu et al., 1991) where the number of subtasks in the mandatory portion is a pre-knowledge, the number of subtasks in the mandatory portion of Zygarde  is determined at run-time. We define such imprecise computing model as Dynamic Imprecise Computing. Preemption and Task Switching. Each subtask performs a semantically integrated operation and cannot be preempted by the scheduler. However, the scheduler is allowed to preempt a task at the end of each subtask. This task-model follows the cooperative preemption task model at the subtask level (Musliner et al., 1993). Note that, a subtask can be preempted due to a power failure by the energy manager, but this is not related to the scheduler.

5.2. Scheduling for Persistent Systems

Before we introduce the scheduling algorithm for an intermittently powered system, we discuss the scheduling algorithm for a persistently powered system. We consider that the CPU utilization (Yuan and Nahrstedt, 2003) can be higher than one because in the intermittent system the power failure virtually blocks the CPU and increases the CPU utilization. As theoretically no scheduler can schedule all the tasks when the CPU utilization greater than one (Yuan and Nahrstedt, 2003), our goal is to maximize the number of tasks that can be scheduled. In order to schedule dynamic imprecise tasks online, we propose a priority function () which considers both the deadline and the utility of tasks. It also considers the effect of mandatory and optional portions which are crucial for imprecise scheduling. We define the priority function as following:


Here, and is the deadline and current utility of the task respectively. is the current time, and and are the scaling factors. Finally, is the imprecise factor which defines if a task is currently executing a mandatory subtask or an optional one. The following equation expresses the imprecise factor ().


Here, is the utility threshold that indicates the end of the mandatory portion. Imprecise factor guarantees the precedence of the mandatory portions before the optional portions.

Task Set 1 Task Set 2
0 7 2 45 5 59
25 7 4 56 2 69
50 7 3 93 1 84
Table 4. Task Description
Figure 13. Subtask execution time
Figure 14. (a) EDF fails to meet imprecise deadline. (b) Priority function meets imprecise deadline (c) EDF meets deadline with 80% accuracy (d) Priority function meets deadline with 88.7% accuracy.

Example. We consider two sets of three tasks described in Table 4. We consider accumulated accuracy as utility function. Each task has seven subtasks with execution time shown in Figure 13. Figure 14(a-b) shows the execution of tasks from set1 using EDF and priority based scheduling algorithm. In Figure 14(a), EDF fails to schedule the second task; however, priority based scheduler can schedule all three tasks in Figure 14(b). In Figure 14(c-d) we consider tasks from set2. Even though both EDF and priority function succeeds to schedule all the tasks, the accumulated accuracy of the EDF schedule (80%) is higher than that of priority function schedule (88.7%).

5.3. Scheduling for Intermittent System

Scheduling in intermittently powered system is challenging due to the occurrence of power failure. In Section 3.1, we introduce a probability of metric to measure predictability of an energy harvester. We use influenced priority function () to schedule an intermittent system.


Here, is the current energy generation rate and is the threshold energy generation rate. For an energy harvester with high , we leverage the predictability of energy generation and boost the utilization. When the is high, we schedule both mandatory and optional subtasks, opportunely taking advantage of correlated energy event occurrence. Otherwise, the scheduler considers the high probability of energy event non-occurrence and schedules conservatively. In this case we only schedule the mandatory subtasks. When   is low, the predictability of energy generation is minimal and thus the system schedules conservatively unless the energy generation is very high. minimizes two types of energy waste in energy harvesting systems (Buettner et al., 2011). The first one is running unnecessary tasks which we avoid when . The second wastage occurs by not running tasks when the harvester is getting continuous energy from the source to keep the capacitor charged. We handle this wastage by running optional subtasks when .

5.4. A Special Case: Scheduling Deterministic Intermittent System.

Deterministic intermittent power source is a special type of intermittent source where energy harvesting pattern is known; e.g., RF harvester with periodic signal transmission. We propose a more simplistic approach for such systems. To schedule in such a system, we pretend this energy intermittence as a hypothetical periodic task (termed energy task) with the highest priority, and it is pre-scheduled offline. This assumption allows us to schedule the tasks with energy constraints taken in to account. Note that we assume that energy task can preempt the conditional-preemptive tasks at any time instance.

Figure 16. (a) Missed a imprecise deadline of the job by not taking the deterministic intermittence power into account. (b) Successfully scheduled two tasks by considering deterministic energy pattern as energy tasks.

In Figure 16, we consider the first two tasks ( and ) from Figure 14(a-b). We consider a periodic power source with a period of 8 time units. In Figure 16(a), the system does not consider the energy intermittence and misses the imprecise deadline for . However, by considering the deterministic power source as energy tasks, the system schedules both tasks successfully.

6. Agile DNN and Model Adaptation

In this section, we first describe the task model for semi-supervised learning with deep neural features. Then we describe the construction of agile DNN Model and approximate adaptation of adaptive models.

6.1. Agile DNN Task Model

Define Task. We define each data sample entering Zygarde  as a task. The execution of one layer of agile DNN along with corresponding k-means model is defined as a subtask. In a subtask, Zygarde  extracts features of the data sample from a specific layer of the agile DNN model, use those features to execute a semi-supervised k-means model and update the centroids. Figure 17 shows the task model of a task, . , and are the subtasks of that represent the , and layers of the DNN with corresponding semi-supervised k-means clustering, respectively.

Figure 17. Flow of agile DNN task where , and are different DNN layers (subtasks). , and are the utility of after execution of , and layers.

Early Termination. For an energy constraint system, computing accuracy of unlabeled data is expensive. Popular model validation techniques including inter-intra cluster distance measure are computation heavy (Islam and Nirjon, 2019a). Therefore, instead of using accuracy as utility, we propose a light weight utility function. We define the difference between the distances of the data point from two nearest centroids as utility.


Here, and are the distance of the data sample from the closest and the second closest centroid. The intuition behind this definition is that a data point at similar distances from two centroids is not confident enough regarding which cluster it belongs to. Therefore, more complex representation is required to determine the cluster with confidence. In Figure 18, the distance between the data sample and two nearest centroids, and are and respectively. In Figure 18(a) the difference between and are very small and thus the confidence of the data sample being a member of cluster 3 is low. Therefore more complex representation is needed to provide more confident result and further execution of agile DNN is needed. On the other hand, in Figure 18(b) the difference between and are prominent and thus the data sample belongs to cluster 3 with high confidence. Thus further execution is not needed.

Figure 18. Early Termination Policy

Figure 17, shows the early termination policy and the execution of subtasks. Here, , and are the utility of after execution of , and layers. is the threshold utility.

6.2. Agile DNN Construction

Our termination policy utilizes the output vector at each layer as learned representation rather than solving a joint optimization problem to perform both classification and clustering at the last layer (Yang et al., 2016). To ensure higher utility, we need to learn a representation which maximizes the distance between the representation of different classes and also minimizes the space between the representation of the same class. Contrastive loss function obtain this at the final  (Islam and Nirjon, 2019b). However, easier samples can be distinguished with simpler representation from previous layers. Thus, we need to achieve distinctive representation in earlier layers. Layer-Aware Loss Function. In order to accomplish this, we propose a layer-aware loss function inspired by the contrastive loss  (Islam and Nirjon, 2019b). We use a convex combination of contrastive loss at each layer as a loss function that allows the system to have better distinguishable representation at the preceding layer. To ensure distinctive representations at earlier layers, early layers have more weights than the deeper layers. Therefore, easy samples get enough distinguishable representation at an earlier layer of a network mitigating their need to execute all the layers. Note that this loss function is used during the training of the agile DNN model in a traditional deep learning computing system. The layer-aware loss function, is represented as follows.


Here, is the convex coefficient at layer . and represent the total number of layers and classes in the network respectively. is the learnable weights of the network at layer . are the vectors of each class at layer . is the contrastive loss function and for two classes & at layer it is defined as following.


Here, is the representation output of a member of class j; where j = 1, 2, .., N; at layer . Coefficient if and belong to the same class and otherwise. The term represents the distance margin that is maintained between the representation of different classes. Adaptation.

Due to the prior termination in previous layers, a group of data fails to affect the clustering model with complex representation from deeper layers. It hinders the adaptation of cluster model with more complex features. One solution is to execute the non-linear calculation for each sample to get more complex representation and update the models. This calculation includes matrix multiplications, addition, and a non-linear function; e.g., RELU activation. However, this execution contradicts to our goal of avoiding unnecessary computations for simple data. This dispute imposes an exciting challenge of updating the centroid of layer

from the centroid of layer with light-weight calculation, where k = 1,2,3, …, n. Such a challenge never occurred before as this is the first work combining model update with prior termination. Let be a centroid of the clustering model at layer where is the number of members in the cluster where . For the next layer, , the centroid is


Here, and are weight and bias for layer respectively. is the non-linear function; e.g., RELU. This formula requires at least multiplication. As multiplication is an expensive function, our goal is to avoid using them. Therefore, we approximate Equation 10 using the following equation and reduce the number of multiplication by .


We assume that the non-linear function is RELU activation function. So,

. After applying the RELU function, we observe that the error, is


7. Evaluation

Figure 22. Zygarde  experimental setup

7.1. Experimental Setup

Computational Device. For evaluation we use TI-MSP430FR5994 (msp, [n. d.]) at 16MHz. This micro-controller is equipped with 256KB of FRAM, 8KB of SRAM, 16-channel 12-bit ADC, 6-channel direct memory access (DMA) and a Low Energy Accelerator (LEA). It has an operating voltage range of 1.8V to 3.6V. To program this device we use the Linux distribution and GCC compiler with an ez-FET programmer. Zygarde  uses fixed point calculation and flip-flop buffers to enable DNN execution in MSP430222The implementation of Zygarde  is available on https://github.com/zygarde-sensys/Zygarde.git. To train the agile DNN model mentioned in Table 1 we use an Intel Core i7 PC with RTX2080 GPU. We train the network offline and compress it with rank decomposition or separation  (Bhattacharya and Lane, 2016; Chollet, 2017; Xue et al., 2013; De Lathauwer et al., 2000a, b; Tucker, 1966) and pruning (Han et al., 2015; Nabhan and Zomaya, 1994). We execute this compressed trained network for inference in our target device TI-MSP430FR5994. Energy Harvester. We harvest energy from two ambient sources – solar and RF. We use a flexible Ethylene Tetrafluoroethylene (ETFE) based solar panel (sol, [n. d.]) that outputs 6V at 1W. We use a LTC3105 step up DC/DC converter with a start up voltage of 250mV (ltc, [n. d.]) to charge the capacitor with the solar panel. A Powercast P2210B (pow, [n. d.]a) is used to harvest RF energy from a 3W Power-caster transmitter (pow, [n. d.]b), Figure 22. For all the experiments with intermittent power we use a 50mF capacitor. For persistently powered experiments we rely on the power supply from the ez-FET of the MSP430 launchpad. Sensor Peripheral. For audio sensor, we use a Electret microphone (BOB, [n. d.]) that draws 3.1mA current and has a start up time of 125ms. We utilize the built in ADC and internal clocks to read data from the microphone. To calculate the FFT we use the LEA and use DMA to write data to the FRAM without occupying the CPU. Time Keeping Peripheral. Like  (Yıldırım et al., 2018; Hester et al., 2017), we use a real-time clock, DS3132 (tim, [n. d.]) via I2C, for time keeping. This choice is made for the ease of implementations. Note that, we only use this clock during power up for syncing up and maintaining time with the internal clock of the MCU. This real-time clock is easily replaceable with SRAM and capacitor based time-keeping system during power off periods (Rahmati et al., 2012; Hester et al., 2016b). Note that both time-keeping and intermittent execution in batteryless system is out of the scope of this paper. Libraries. We use MSP430 libraries provided by Texas Instrument. For maintaining intermittent execution Zygarde  uses SONIC (Gobieski et al., 2019) and the dependant libraries (e.g. ALPACA (Maeng et al., 2017)

) of this project. For training the agile DNN model we use Google’s Tensorflow 

(Abadi et al., 2016). Datasets. To evaluate our algorithms we use two popular datasets MNIST (LeCun et al., 1998) and ESC-10 (Piczak, [n. d.]). MNIST is a image based dataset which consists of pixel images and 10 classes. ESC-10 consists environmental sounds of 10 classes. Each audio clip is 5s long and has a sampling rate of 44KHz. To accommodate this dataset with our resource constrained device, we take the middle 1s audio and down-sampled it to 8KHz. Controlled Energy Source. To evaluate the system with different , we perform controlled experiments with the energy sources. For RF, we vary the distance between the harvester and the transmitter from 1 feet to 5 feet. For solar, we simulate the sun with three dimmable bulbs with varying intensity (5.6 Klx - 35 Klx) as shown in Figure 22. Note that, for the real-life experiment we use outdoor scenarios and windowed rooms to get the sunlight.

7.2. Algorithm Evaluation

Effect of Layer-Aware Loss Function and Early Termination. In this section, we evaluate our proposed layer-aware loss function, early termination and adaptation. We train all models with same hyper-parameters and only use the training dataset. While inference we rely on the testing dataset provided by MNIST and ESC-10 (we use fold 1 only) containing 10,000 and 80 samples respectively. Note that, layer-aware loss function and early termination are applicable for any system, not just intermittently powered ones. Therefore, we do not bring the energy harvesting aspect in this evaluation and perform the evaluation on a persistently powered system. We compare our layer-aware loss function (L) and layer-aware loss function with adaptation (AL) with cross-entropy loss (CE) (Zhang and Sabuncu, 2018) and contrastive loss (C) (Islam and Nirjon, 2019b). We also consider a network trained with cross-entropy loss which only exits in the last layer similar to the one shown in SONIC (Gobieski et al., 2019).

Figure 23. Prior Termination Evaluation MNIST.
Figure 24. Prior Termination Evaluation ESC-10.

For MNIST dataset in Figure 23, we observe that for cross-entropy loss (CE) with early termination decreases the accuracy along with the execution time. However, with the layer-aware loss(L) we increase the accuracy by 9.45% from cross-entropy loss (CE) and 3.19% from contrastive loss (C) while keeping similar execution time. Layer-aware loss function with adaptation (AL) increases the accuracy by 1% and allows more samples to terminate in the early layers.

Figure 25. Realtime Scheduling for Systems with different () on MNIST test dataset.
Figure 26. Realtime Scheduling for Systems with different () on ESC-10 test dataset.

ESC-10 is a complex dataset where previous works achieves 73% accuracy using models with three convolution layers, data augmentation and, 44KHz sampling rate (Piczak, 2015). Our network takes downsampled audio data of 1s duration and achieves 70% accuracy. In Figure 24, layer-aware loss function(L) achieves 76.25% accuracy and decreases the execution time by 1.22 minutes. Note that, layer-aware loss function with approximation (AL) does not show any improvement over layer-aware loss in this scenario because all data samples of several classes exit the network in earlier layers minimizing the need of approximate adaptation for deeper layers. Performance of Real-Time Scheduler. In this section, we evaluate our proposed scheduling algorithm for dynamic imprecise tasks that is applicable to both persistently (Section 5.2) and intermittently (Section 5.3) powered systems for different . We evaluate the system with two different CPU utilization. For MNIST dataset in Figure 25, we consider a CPU utilization. Thus even with persistent power we can not schedule all tasks. Without early termination the majority of samples could not be scheduled as they keep on waiting in the queue. But, with early exit 75% of samples can be scheduled. We consider sporadic tasks with a period of 3 seconds and the deadline is twice the period. According to the definition of imprecise computing, we consider completion of mandatory portion as a successful scheduling (Shih and Liu, 1992). For ESC-10 dataset in Figure 26, we consider that CPU utilization, the period is 0.36 minute and the deadline is twice the period. We compare our approach with earliest deadline first (EDF) algorithm and a variation of EDF that only executes the mandatory portion (EDF-M). We consider various   and use three systems that persistently powered, solar powered and RF powered. We notice that in all the cases Zygarde  can successfully schedule higher number of tasks with higher accuracy. There are two major things to notice in Figure 25 and Figure 26. When   is high Zygarde  increases the number of tasks with correct result by executing some of the optional portion. For a 100% accurate utility function the performance of  Zygarde  and EDF-M would be same. For small   the performance of Zygarde  and EDF-M is the same because it satisfies the second condition of Equation 6 which only executes mandatory portion. The number of tasks that can be scheduled does not depend only on   but also on the harvested energy.

Figure 27. Real-life evaluation of Zygarde  with solar
Figure 28. Real-life evaluation of Zygarde  with RF
Scenario Energy Source Source Power Harvester Placement Additional Cause Target Event Other Audios
Street Solar 74 – 80 Klx Pavement Vehicle on the closest lane Car Honk Silence, Dog, Human Voice, Car
Shaded Park Solar 1.8 – 0.6 Klx Under Tree People, objects and cloud Dog Bark Silence, Car, Car Honk, Human Voice
Staircase Solar 0.5 – 0.3 Klx Edge of the Railing People and cloud Human Voice Silence, Car, Car Honk, Dog Bark
Nursery RF -.48 – -1.66dB On Desk Change of distance Baby Cries Silence, Human Voice, Washer, Printer
Laundry RF -.48 – -1.91dB On Counter Change of distance Washer Silence, Human Voice, Baby Cries, Printer
Office RF -1.59 – -1.91dB On Desk Change of distance Printer Silence, Human Voice, Baby Cries, Printer
Table 5. Real-life evaluation setup

7.3. Real-World Application Evaluation

Experimental Setup. In this section, we evaluate your system in an uncontrolled environment. We present an acoustic event detector as the application. We choose six scenarios with seven acoustic events. In three of these scenarios we use a solar harvester and for the rest we use a RF harvester. Table 5 describes the different scenarios along with the target events. We rely on the natural events for hindrance in power generation for the solar harvested systems. For example in the street scenario, the harvester with solar panel is kept on the edge of the pavement. The vehicles (cars, buses) passing through the nearest lane to the pavement is blocking the sun and thus introducing hindrance in energy generation. However, due to the lack of programmable RF harvester or proper setup to harvest from WiFi, we change the distance between RF harvester and transmitter. We use the same setup mentioned in Section 7.1 for the computation and sensing unit. Note that other sound sources were present in all scenarios and we included silence as an event in the classifier for training. Performance. In Figure 27 shows the voltage across the capacitor along with the operating voltage of the computational unit of the three scenarios where the system is powered by a solar harvester. Note that the voltage across the capacitor and operating voltage is used to measure energy generation rate. The system runs a DNN with a single convolution layer and two fully connected layers. The execution time varies between 1.7 s and 3 s depending on the early exit. Among the 30 events we experienced we have four incorrect detection. We experience three deadline misses even though the detection is accurate. We miss four samples due to intermittence as the system did not have enough energy to be turned on. Similar to Figure 27, Figure 28 shows the result for scenarios with RF harvester. We notice two interesting facts with this experiment. The first one is that the MCU experience higher intermittence rate for RF harvester than solar harvester. However, the duration of power off is much higher in solar harvester specially when we do not have the direct sunlight.

8. Related Work

Intermittent Computing. Intermittently powered systems experience frequent power failure that resets the software execution and results in repeated execution of the same code and inconsistency in non-volatile memory. Previous works address the progress and memory consistency using software check-pointing (Ransford et al., 2012; Maeng and Lucia, 2018; Hicks, 2017; Lucia and Ransford, 2015; Van Der Woude and Hicks, 2016, 2016; Jayakumar et al., 2014; Mirhoseini et al., 2013a; Bhatti and Mottola, 2016), hardware interruption (Balsamo et al., 2015, 2016; Mirhoseini et al., 2013b), atomic task based model (Maeng et al., 2017; Colin and Lucia, 2016, 2018) and, non-volatile processors(NVP) (Ma et al., 2017, 2015). Recently  (Gobieski et al., 2018, 2019) proposes a special software system for intermittent execution of deep neural inference combining task atomic task based model with loop continuation. Zygarde  relies on  (Gobieski et al., 2018, 2019) for intermittent computation of deep neural network. Timeliness of Batteryless Systems. Prior works on intermittent computing proposes runtime systems that increases the likelihood of a task completion by finding optimum voltage for task execution (Buettner et al., 2011), adapting execution rate (Sorber et al., 2007; Dudani et al., 2002) and discarding stale data (Hester et al., 2017). However, none of these works consider the accuracy/utility of the running application or the real-time deadline-aware execution of tasks. Some works have addressed scheduling in wireless sensors (Zhu et al., 2012; Moser et al., 2006, 2007) but none of them consider the higher computation load of intermittent computing systems.  (Yıldırım et al., 2018) proposes a reactive kernel that enables energy aware dynamic execution of multiple threads. Note that they do not consider deep neural tasks or utilize early termination of tasks for increasing schedulability. Unlike Zygarde  which schedules multiple incoming data samples,  (Yıldırım et al., 2018) schedules kernel threads and can only have one data sample in the system at a time point. Other works focuses on maintaining time-keeping through power loss (Rahmati et al., 2012; Hester et al., 2016b). Our work is a complementary of these works and relies on these techniques for time keeping. Compression and Partial execution of DNN. Recent works have focused on reducing the cost of DNN inference by pruning and splitting models without compromising accuracy (Han et al., 2015; Nabhan and Zomaya, 1994; Yao et al., 2017; Yao et al., 2018b, c, a). Other works have focused on reduction of floating point and weight precision (Johnson, 2018; De Sa et al., 2017; Han et al., 2016), and factorization of computation (Bhattacharya and Lane, 2016; Chollet, 2017; Nakkiran et al., 2015; Szegedy et al., 2017; Szegedy et al., 2015, 2016; Wang et al., 2017) to reduce storage and computation cost. The proposed binary networks (Courbariaux et al., 2015; Hubara et al., 2016; Rastegari et al., 2016; Courbariaux and Bengio, 2017) are not suitable for energy harvesting systems due to the higher number parameters needed by such systems (Gobieski et al., 2019). Even though these works are crucial for enabling DNN execution in batteryless system these can be enhanced by exploiting the fact that in real-life data are usually is a combination of easy examples and difficult examples and the easy examples do not need the full DNN inference (Bolukbasi et al., 2017; Figurnov et al., 2017; Leroux et al., 2017). Unlike prior works that require an additional classifier after each layer, Zygarde  depends on semi-supervised models to reduce computational overhead.  (Bateni and Liu, 2018; Zhou et al., 2018) proposes scheduling algorithm for deep neural network in GPU and does not consider the constraints introduced by embedded system and intermittent power supply. Modeling Energy Harvesting System.  (San Miguel et al., 2018) analytically model the trade-off associated with backing up data to maximize forward propagation. Even though energy harvesting system for a specific energy source has been analyzed and modeled before (Crovetto et al., 2014; Sharma et al., 2018; Jia et al., 2018), none of the prior works focus on modeling energy harvesting systems irrespective of energy source.

9. Discussion

After execution an DNN layer, Zygarde  calculates the manhattan distance of the data sample from all centriods using top features, where is the number of clusters. This requires subtractions, comparison (to determine absolute value) and addition. Zygarde  needs comparisons to find the minimum distance and additions, multiplications and divisions to update a centroid. In our implementation, the highest number of and are 10 and 50, respectively. The total computation needed after each layer is 10 smaller than the required multiplication and addition needed for the smallest network layer mentioned in Table 1.

10. Conclusion

In this paper we propose a generic metric   that expresses the stability on an energy harvesting system. We also utilize the fact that real-life data samples are a combination of easy and hard samples and all samples do not require the same amount of computation to achieve similar performance. We propose early termination of deep neural network without compromising much accuracy. To decrease the accuracy loss due to early termination, we propose a layer-aware loss function and achieve 9.45% - 3.19% increase in accuracy and 14% decrease in execution time. We then model such DNN tasks as imprecise tasks and propose a scheduling algorithm that considers time, energy condition,   and performance of the system. We evaluate our system with state of the art scheduling algorithms and our algorithm schedules 33%-12% more tasks successfully.


  • (1)
  • sol ([n. d.]) [n. d.]. Flexible Ethylene Tetrafluoroethylene (ETFE) based solar panel. https://www.amazon.com/gp/product/B01EY5FIGW/ref=oh_aui_search_asin_title?ie=UTF8&psc=1.
  • ltc ([n. d.]) [n. d.]. LTC3105 step up DC/DC converter. https://www.analog.com/media/en/technical-documentation/data-sheets/3105fb.pdf.
  • msp ([n. d.]) [n. d.]. MSP430FR5994. http://www.ti.com/lit/ds/symlink/msp430fr5994.pdf.
  • ipd ([n. d.]) [n. d.]. Network Traffic Dataset. https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps.
  • pow ([n. d.]a) [n. d.]a. Powercast P2210B. http://www.powercastco.com/wp-content/uploads/2016/12/P2110B-Datasheet-Rev-3.pdf.
  • pow ([n. d.]b) [n. d.]b. Powercaster Transmitter. http://www.powercastco.com/wp-content/uploads/2016/11/User-Manual-TX-915-01-Rev-A-4.pdf.
  • BOB ([n. d.]) [n. d.]. SparkFun Electret Microphone Breakout. http://www.ti.com/product/OPA344.
  • tim ([n. d.]) [n. d.]. Timer Module. https://partnums.com/gtin/00747465491461.
  • goo (2017a) 2017a. Federated Learning: Collaborative Machine Learning without Centralized Training Data. http://www.googblogs.com/federated-learning-collaborative-machine-learning-without-centralized-training-data/
  • goo (2017b) 2017b. On-Device Conversational Modeling with TensorFlow Lite. http://www.googblogs.com/on-device-conversational-modeling-with-tensorflow-lite/
  • Qua (2017) 2017. Qualcomm Neural Processing SDK. https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk
  • qua (2017a) 2017a. Qualcomm On Device AI. https://www.qualcomm.com/news/onq/2017/08/16/we-are-making-device-ai-ubiquitous?cmpid=oofyus181544
  • qua (2017b) 2017b. Qualcomm On Device AI. https://www.qualcomm.com/media/documents/files/making-on-device-ai-ubiquitous.pdf
  • app (2018) 2018. Apple AI Strategy. https://www.cnbc.com/2018/06/13/apples-ai-strategy-devices-not-cloud.html
  • Edg (2018) 2018. Google Edge TPU. https://cloud.google.com/edge-tpu/
  • Abadi et al. (2016) Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
  • Akbarpour and Jackson (2018) Mohammad Akbarpour and Matthew O Jackson. 2018. Diffusion in networks and the virtue of burstiness. Proceedings of the National Academy of Sciences 115, 30 (2018), E6996–E7004.
  • Balsamo et al. (2016) Domenico Balsamo, Alex S Weddell, Anup Das, Alberto Rodriguez Arreola, Davide Brunelli, Bashir M Al-Hashimi, Geoff V Merrett, and Luca Benini. 2016. Hibernus++: a self-calibrating and adaptive system for transiently-powered embedded devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 12 (2016), 1968–1980.
  • Balsamo et al. (2015) Domenico Balsamo, Alex S Weddell, Geoff V Merrett, Bashir M Al-Hashimi, Davide Brunelli, and Luca Benini. 2015. Hibernus: Sustaining computation during intermittent supply for energy-harvesting systems. IEEE Embedded Systems Letters 7, 1 (2015), 15–18.
  • Basu et al. (2002) Sugato Basu, Arindam Banerjee, and Raymond Mooney. 2002. Semi-supervised clustering by seeding. In In Proceedings of 19th International Conference on Machine Learning (ICML-2002. Citeseer.
  • Bateni and Liu (2018) Soroush Bateni and Cong Liu. 2018. ApNet: Approximation-Aware Real-Time Neural Network. In 2018 IEEE Real-Time Systems Symposium (RTSS). IEEE, 67–79.
  • Bhattacharya and Lane (2016) Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 176–189.
  • Bhatti and Mottola (2016) Naveed Bhatti and Luca Mottola. 2016. Efficient state retention for transiently-powered embedded sensing. In International Conference on Embedded Wireless Systems and Networks. 137–148.
  • Blog (2018) Google Research Blog. 2018. Introducing the CVPR 2018 On-Device Visual Intelligence Challenge. https://research.googleblog.com/search/label/On-device%20Learning
  • Bolukbasi et al. (2017) Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. arXiv preprint arXiv:1702.07811 (2017).
  • Buettner et al. (2011) Michael Buettner, Ben Greenstein, and David Wetherall. 2011. Dewdrop: an energy-aware runtime for computational RFID. In Proc. USENIX NSDI.
  • Canon et al. (2018) Louis-Claude Canon, Aurélie Kong Win Chang, Yves Robert, and Frédéric Vivien. 2018. Scheduling independent stochastic tasks deadline and budget constraints. Ph.D. Dissertation. Inria-Research Centre Grenoble–Rhône-Alpes.
  • Chandrasekaran et al. (2016) Rishikanth Chandrasekaran, Daniel de Godoy, Stephen Xia, Md Tamzeed Islam, Bashima Islam, Shahriar Nirjon, Peter Kinget, and Xiaofan Jiang. 2016. SEUS: A Wearable Multi-Channel Acoustic Headset Platform to Improve Pedestrian Safety: Demo Abstract. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 330–331.
  • Chollet (2017) François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    . 1251–1258.
  • Choudhury et al. ([n. d.]) Tanzila Choudhury, Bashima Islam, and ABM Alim Al Islam. [n. d.]. Super-savior: A System to Aid Combating Harassment and Violence Against Women. ([n. d.]).
  • Colin and Lucia (2016) Alexei Colin and Brandon Lucia. 2016. Chain: tasks and channels for reliable intermittent programs. ACM SIGPLAN Notices (2016).
  • Colin and Lucia (2018) Alexei Colin and Brandon Lucia. 2018. Termination checking and task decomposition for task-based intermittent programs. In Proceedings of the 27th International Conference on Compiler Construction. ACM, 116–127.
  • Colin et al. (2018) Alexei Colin, Emily Ruppel, and Brandon Lucia. 2018. A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 767–781.
  • Courbariaux and Bengio (2017) M Courbariaux and Y Bengio. 2017. BinaryNet: Training deep neural networks with weights and activations constrained to+ 1 or- 1. arXiv: 1602.02830, 2016.
  • Courbariaux et al. (2015) Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123–3131.
  • Crovetto et al. (2014) Andrea Crovetto, Fei Wang, and Ole Hansen. 2014. Modeling and optimization of an electrostatic energy harvesting device. journal of microelectromechanical systems 23, 5 (2014), 1141–1155.
  • de Godoy et al. (2018) Daniel de Godoy, Bashima Islam, Stephen Xia, Md Tamzeed Islam, Rishikanth Chandrasekaran, Yen-Chun Chen, Shahriar Nirjon, Peter R Kinget, and Xiaofan Jiang. 2018. Paws: A wearable acoustic system for pedestrian safety. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI). IEEE, 237–248.
  • De Lathauwer et al. (2000a) Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000a.

    A multilinear singular value decomposition.

    SIAM journal on Matrix Analysis and Applications 21, 4 (2000), 1253–1278.
  • De Lathauwer et al. (2000b) Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000b.

    On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors.

    SIAM journal on Matrix Analysis and Applications 21, 4 (2000), 1324–1342.
  • De Sa et al. (2017) Christopher De Sa, Matthew Feldman, Christopher Ré, and Kunle Olukotun. 2017.

    Understanding and optimizing asynchronous low-precision stochastic gradient descent. In

    ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 561–574.
  • Dudani et al. (2002) Ajay Dudani, Frank Mueller, and Yifan Zhu. 2002. Energy-conserving feedback EDF scheduling for embedded systems with real-time constraints. In ACM SIGPLAN Notices, Vol. 37. ACM, 213–222.
  • Figurnov et al. (2017) Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry P Vetrov, and Ruslan Salakhutdinov. 2017. Spatially Adaptive Computation Time for Residual Networks.. In CVPR, Vol. 2. 7.
  • Gobieski et al. (2018) Graham Gobieski, Nathan Beckmann, and Brandon Lucia. 2018. Intermittent Deep Neural Network Inference. SysML (2018).
  • Gobieski et al. (2019) Graham Gobieski, Brandon Lucia, and Nathan Beckmann. 2019. Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 199–213.
  • Han et al. (2016) Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 243–254.
  • Han et al. (2015) Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  • Heaton (2011) J Heaton. 2011. Programming Neural Networks with Encog3 in Java 1st Edition, Heaton Research.
  • Hester et al. (2016a) Josiah Hester, Travis Peters, Tianlong Yun, Ronald Peterson, Joseph Skinner, Bhargav Golla, Kevin Storer, Steven Hearndon, Kevin Freeman, Sarah Lord, et al. 2016a. Amulet: An energy-efficient, multi-application wearable platform. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 216–229.
  • Hester et al. (2015) Josiah Hester, Lanny Sitanayah, and Jacob Sorber. 2015. Tragedy of the coulombs: Federating energy storage for tiny, intermittently-powered sensors. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 5–16.
  • Hester and Sorber (2017) Josiah Hester and Jacob Sorber. 2017. Flicker: Rapid Prototyping for the Batteryless Internet-of-Things. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. ACM, 19.
  • Hester et al. (2017) Josiah Hester, Kevin Storer, and Jacob Sorber. 2017. Timely Execution on Intermittently Powered Batteryless Sensors. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. ACM, 17.
  • Hester et al. (2016b) Josiah Hester, Nicole Tobias, Amir Rahmati, Lanny Sitanayah, Daniel Holcomb, Kevin Fu, Wayne P Burleson, and Jacob Sorber. 2016b. Persistent clocks for batteryless sensing devices. ACM Transactions on Embedded Computing Systems (TECS) 15, 4 (2016), 77.
  • Hicks (2017) Matthew Hicks. 2017. Clank: Architectural support for intermittent computation. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 228–240.
  • Hubara et al. (2016) Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107–4115.
  • Islam and Nirjon (2019a) Bashima Islam and Shahriar Nirjon. 2019a. Poster Abstract: On-Device Training from Sensor Data onBatteryless Platforms. In The 18th ACM/IEEE Conference on Information Processing in Sensor Networks. ACM/IEEE.
  • Islam and Nirjon (2019b) Md Tamzeed Islam and Shahriar Nirjon. 2019b. SoundSemantics: exploiting semantic knowledge in text for embedded acoustic event classification. In Proceedings of the 18th International Conference on Information Processing in Sensor Networks. ACM, 217–228.
  • Jayakumar et al. (2014) Hrishikesh Jayakumar, Arnab Raha, and Vijay Raghunathan. 2014. QuickRecall: A low overhead HW/SW approach for enabling computations across power cycles in transiently powered computers. In VLSI Design and 2014 13th International Conference on Embedded Systems, 2014 27th International Conference on. IEEE, 330–335.
  • Jia et al. (2018) Jinda Jia, Xiaobiao Shan, Deepesh Upadrashta, Tao Xie, Yaowen Yang, and Rujun Song. 2018. Modeling and Analysis of Upright Piezoelectric Energy Harvester under Aerodynamic Vortex-induced Vibration. Micromachines 9, 12 (2018), 667.
  • Johnson (2018) Jeff Johnson. 2018. Rethinking floating point for deep learning. arXiv preprint arXiv:1811.01721 (2018).
  • LeCun et al. (1998) Yann LeCun, Corinna Cortes, and Christopher JC Burges. 1998.

    The MNIST database of handwritten digits, 1998.

    URL http://yann. lecun. com/exdb/mnist 10 (1998), 34.
  • Leroux et al. (2017) Sam Leroux, Steven Bohez, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt. 2017. The cascading neural network: building the Internet of Smart Things. Knowledge and Information Systems 52, 3 (2017), 791–814.
  • Liu et al. (1991) Jane W.-S. Liu, Kwei-Jay Lin, Wei Kuan Shih, Albert Chuang-shi Yu, Jen-Yao Chung, and Wei Zhao. 1991. Algorithms for scheduling imprecise computations. In Foundations of Real-Time Computing: Scheduling and Resource Management. Springer, 203–249.
  • Lucia and Ransford (2015) Brandon Lucia and Benjamin Ransford. 2015. A simpler, safer programming and execution model for intermittent systems. ACM SIGPLAN Notices 50, 6 (2015), 575–585.
  • Ma et al. (2017) Kaisheng Ma, Xueqing Li, Jinyang Li, Yongpan Liu, Yuan Xie, Jack Sampson, Mahmut Taylan Kandemir, and Vijaykrishnan Narayanan. 2017. Incidental computing on IoT nonvolatile processors. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 204–218.
  • Ma et al. (2015) Kaisheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, Jack Sampson, Yuan Xie, and Vijaykrishnan Narayanan. 2015. Architecture exploration for ambient energy harvesting nonvolatile processors. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 526–537.
  • Maeng et al. (2017) Kiwan Maeng, Alexei Colin, and Brandon Lucia. 2017. Alpaca: Intermittent execution without checkpoints. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 96.
  • Maeng and Lucia (2018) Kiwan Maeng and Brandon Lucia. 2018. Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18).
  • Meyer et al. (2016) Elijah Meyer, Mark C Greenwood, and Tan Tran. 2016. Daily Step Count Profile Data for 61 Days [dataset]. (2016).
  • Mirhoseini et al. (2013a) Azalia Mirhoseini, Ebrahim M Songhori, and Farinaz Koushanfar. 2013a. Automated checkpointing for enabling intensive applications on energy harvesting devices. In Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on. IEEE, 27–32.
  • Mirhoseini et al. (2013b) Azalia Mirhoseini, Ebrahim M Songhori, and Farinaz Koushanfar. 2013b. Idetic: A high-level synthesis approach for enabling long computations on transiently-powered ASICs. In 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 216–224.
  • Moser et al. (2006) Clemens Moser, Davide Brunelli, Lothar Thiele, and Luca Benini. 2006. Lazy scheduling for energy harvesting sensor nodes. In IFIP Working Conference on Distributed and Parallel Embedded Systems. Springer.
  • Moser et al. (2007) Clemens Moser, Davide Brunelli, Lothar Thiele, and Luca Benini. 2007. Real-time scheduling for energy harvesting sensor nodes. Real-Time Systems (2007).
  • Musliner et al. (1993) David J Musliner, Edmund H Durfee, and Kang G Shin. 1993. CIRCA: A cooperative intelligent real-time control architecture. IEEE Transactions on Systems, Man, and Cybernetics 23, 6 (1993), 1561–1574.
  • Nabhan and Zomaya (1994) Tarek M Nabhan and Albert Y Zomaya. 1994. Toward generating neural network structures for function approximation. Neural Networks 7, 1 (1994), 89–99.
  • Nakkiran et al. (2015) Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, and Carolina Parada. 2015. Compressing deep neural networks using a rank-constrained topology. (2015).
  • Nirjon (2018) Shahriar Nirjon. 2018. Lifelong Learning on Harvested Energy. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 500–501.
  • Piczak ([n. d.]) Karol J. Piczak. [n. d.]. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd Annual ACM Conference on Multimedia (2015-10-13). ACM Press, 1015–1018. https://doi.org/10.1145/2733373.2806390
  • Piczak (2015) Karol J Piczak. 2015.

    Environmental sound classification with convolutional neural networks. In

    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1–6.
  • Rahmati et al. (2012) Amir Rahmati, Mastooreh Salajegheh, Dan Holcomb, Jacob Sorber, Wayne P Burleson, and Kevin Fu. 2012. TARDIS: Time and remanence decay in SRAM to implement secure protocols on embedded devices without clocks. In Proceedings of the 21st USENIX conference on Security symposium. USENIX Association, 36–36.
  • Ramdas et al. (2017) Aaditya Ramdas, Nicolás Trillos, and Marco Cuturi. 2017. On wasserstein two-sample testing and related families of nonparametric tests. Entropy 19, 2 (2017), 47.
  • Ransford et al. (2012) Benjamin Ransford, Jacob Sorber, and Kevin Fu. 2012. Mementos: System support for long-running computation on RFID-scale devices. Acm Sigplan Notices 47, 4 (2012), 159–170.
  • Rastegari et al. (2016) Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525–542.
  • Ravi (2017) Sujith Ravi. 2017. Projectionnet: Learning efficient on-device deep networks using neural projections. arXiv preprint arXiv:1708.00630 (2017).
  • San Miguel et al. (2018) Joshua San Miguel, Karthik Ganesan, Mario Badr, and Natalie Enright Jerger. 2018. The EH model: Analytical exploration of energy-harvesting architectures. IEEE Computer Architecture Letters 17, 1 (2018), 76–79.
  • Shao et al. (2018) Chong Shao, Bashima Islam, and Shahriar Nirjon. 2018. Marble: Mobile augmented reality using a distributed ble beacon infrastructure. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI). IEEE, 60–71.
  • Sharma et al. (2018) Himanshu Sharma, Ahteshamul Haque, and Zainul Jaffery. 2018. Modeling and Optimisation of a Solar Energy Harvesting System for Wireless Sensor Network Nodes. Journal of Sensor and Actuator Networks 7, 3 (2018), 40.
  • Shi et al. (2009) Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, and SVN Vishwanathan. 2009. Hash kernels for structured data. Journal of Machine Learning Research 10, Nov (2009), 2615–2637.
  • Shih and Liu (1992) W-K Shih and Jane W-S Liu. 1992. On-line scheduling of imprecise computations to minimize error. In Real-Time Systems Symposium, 1992. IEEE, 280–289.
  • Sorber et al. (2007) Jacob Sorber, Alexander Kostadinov, Matthew Garber, Matthew Brennan, Mark D Corner, and Emery D Berger. 2007. Eon: a language and runtime system for perpetual systems. In Proceedings of the 5th international conference on Embedded networked sensor systems. ACM, 161–174.
  • Srinivasan et al. (2008) Kannan Srinivasan, Maria A Kazandjieva, Saatvik Agarwal, and Philip Levis. 2008. The -factor: measuring wireless link burstiness. In Proceedings of the 6th ACM conference on Embedded network sensor systems. ACM, 29–42.
  • Szegedy et al. (2017) Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017.

    Inception-v4, inception-resnet and the impact of residual connections on learning. In

    Thirty-First AAAI Conference on Artificial Intelligence

  • Szegedy et al. (2015) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.
  • Szegedy et al. (2016) Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826.
  • Tucker (1966) Ledyard R Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279–311.
  • Van Der Woude and Hicks (2016) Joel Van Der Woude and Matthew Hicks. 2016. Intermittent computation without hardware support or programmer intervention. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 17–32.
  • Wang et al. (2017) Min Wang, Baoyuan Liu, and Hassan Foroosh. 2017. Factorized Convolutional Neural Networks.. In ICCV Workshops. 545–553.
  • Xia et al. (2019) Stephen Xia, Daniel de Godoy, Bashima Islam, Md Tamzeed Islam, Shahriar Nirjon, Peter R Kinget, and Xiaofan Jiang. 2019. Improving Pedestrian Safety in Cities using Intelligent Wearable Systems. IEEE Internet of Things Journal (2019).
  • Xue et al. (2013) Jian Xue, Jinyu Li, and Yifan Gong. 2013. Restructuring of deep neural network acoustic models with singular value decomposition.. In Interspeech. 2365–2369.
  • Yang et al. (2016) Bo Yang, Xiao Fu, Nicholas D Sidiropoulos, and Mingyi Hong. 2016. Towards k-means-friendly spaces: Simultaneous deep learning and clustering. arXiv preprint arXiv:1610.04794 (2016).
  • Yao et al. (2018a) Shuochao Yao, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Lu Su, and Tarek Abdelzaher. 2018a. FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices. arXiv preprint arXiv:1809.06970 (2018).
  • Yao et al. (2018b) Shuochao Yao, Yiran Zhao, Huajie Shao, Aston Zhang, Chao Zhang, Shen Li, and Tarek Abdelzaher. 2018b. Rdeepsense: Reliable deep mobile computing models with uncertainty estimations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 173.
  • Yao et al. (2018c) Shuochao Yao, Yiran Zhao, Huajie Shao, Chao Zhang, Aston Zhang, Dongxin Liu, Shengzhong Liu, Lu Su, and Tarek Abdelzaher. 2018c. Apdeepsense: Deep learning uncertainty estimation without the pain for iot applications. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). IEEE, 334–343.
  • Yao et al. (2017) Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. ACM, 4.
  • Yıldırım et al. (2018) Kasım Sinan Yıldırım, Amjad Yousef Majid, Dimitris Patoukas, Koen Schaper, Przemyslaw Pawelczak, and Josiah Hester. 2018. Ink: Reactive kernel for tiny batteryless sensors. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. ACM, 41–53.
  • Yuan and Nahrstedt (2003) Wanghong Yuan and Klara Nahrstedt. 2003. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems. In ACM SIGOPS Operating Systems Review, Vol. 37. ACM, 149–163.
  • Zhang and Sabuncu (2018) Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in Neural Information Processing Systems. 8778–8788.
  • Zhou et al. (2018) Husheng Zhou, Soroush Bateni, and Cong Liu. 2018. S^ 3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 190–201.
  • Zhu et al. (2012) Ting Zhu, Abedelaziz Mohaisen, Yi Ping, and Don Towsley. 2012. DEOS: Dynamic energy-oriented scheduling for sustainable wireless sensor networks. In INFOCOM, 2012 Proceedings IEEE. IEEE, 2363–2371.