Internet of Things (IoT) has attracted much research interest in recent years. It is predicted that there will be around 50 billion IoT devices 2020 
. In IoT applications, it is common to collect a large amount of information from distributed sensors and perform a complex task, e.g., executing a Machine Learning (ML) inference model over the data collected. The key requirements for IoT applications include accuracy, latency and power consumption on IoT solutions.
Currently, there are two main paradigms for IoT solutions, namely, fully decentralized and cloud-based. In fully decentralized solutions, sensors are equipped with processing units and can perform basic inference tasks. To reduce the computation complexity, low complexity ML models have been investigated in literature. Quantized CNN  reduces computational complexity by quantizing the filter weights of convolutional layers. Huang and Wang 
propose a pruning mechanism to remove a few of unnecessary connections and/or layers of a deep neural network. BlockDrop focuses on residual networks (ResNet) and suggests to drop extra and unnecessary blocks of ResNet for a given input image. The authors show that it is possible to achieve almost the same accuracy as a complete ResNet model even though not all residual blocks are utilized. In cloud-based solutions, sensor data are collected at a cloud node (CN) for centralized processing. The cloud-based solutions are attractive as they are not generally limited by computation resources and they can take advantage of the global view of all sensors for the inference tasks. However, such solutions suffer from high communication overhead in transferring data from distributed sensors and potentially high latency.
To mitigate these problems in cloud-based solutions, building upon their earlier work on BranchyNet , Teerapittayanon et al.  propose a new distributed deep neural networks architecture (DDNNs) for cloud-based ML. In DDNN, before sending data to the CN, sensors first send their data to an edge device, called local aggregator (LA), which checks to see if it can perform the task itself. If LA is successful, it saves both time and communication compared to sending data to the cloud.  shows that the scheme reduces latency and communication cost while achieving almost the same accuracy. If LA is not successful, sensors send their data to the cloud for further processing. In this case, high communication overhead ensues.
In this paper, we consider the same scenario as in  where there are many sensors collecting data and transmitting them to the CN for processing (e.g. for object classification). The LA also resides on the local area network that connects the sensors and thus transmissions from sensors to the LA incur low communication overhead. The LA may have sufficient computation capacity to perform the early stages of inference tasks as in . However, when the LA has insufficient confidence of the inference result, sensors should send their data to the CN. In such cases,  suggests that all sensors send their data to th CN for further processing. In this work, instead of transmitting all sensor data to the CN, we propose SensorDrop, a sensor selection approach that only transmits relevant data from a subset of sensors. SensorDrop is motivated by the observation that distributed sensor data tends to have high spatial correlation. Take object recognition using a multi-view multi-camera network as an example. An object can appear in the field of view (FoV) of multiple cameras concurrently. For the purpose of object recognition, it may suffice to utilize the inputs from one or a subset of these cameras. Doing so can reduce communication costs by transmitting less data to the CN.
It is non-trivial to identify the suitable subset of sensors to transmit their data without any expert knowledge. Such decisions are clearly input dependent as the correlation of the data among sensors as well as their informativeness to the target task vary with the type and location of interested events. For instance, in the multi-view multi-camera object recognition application discussed earlier, even among cameras that have a common object in their FoV, the size and clarity of the object generally differ from one camera to another. To determine which sensors to transmit their data, in SensorDrop, we propose an reinforcement learning-based approach to identify useful sensor data. We adopt the Advantage Actor-Critic (A2C) reinforcement learning method, and devise a reward function that takes into account both the accuracy of the inference task and the communication cost. Evaluations using a multi-view multi-camera dataset shows that SensorDrop indeed outperforms baseline methods in communication costs with only minor performance degradation in inference accuracy. Specifically, with over 74% reduction in communication costs, the inference accuracy only degrades by about 10%. We further conduct extensive experiments to investigate the impact of parameter settings on the trade-offs between accuracy and communication costs.
The remainder of the paper is organized as follows. In Section II, we review the basic concept of A2C reinforcement learning. Section III presents the system model and the problem definition. The proposed A2C SensorDrop controller is discussed in Section IV. Section V describes the dataset and the experimental setup, and demonstrates the effectiveness of the proposed scheme in balancing inference accuracy and communication overhead. Finally, Section VI concludes the paper.
Ii Background - Advantage Actor-Critic Based Reinforcement Learning Method (A2C)
Reinforcement learning approaches are being used in many problems that mapping situations to actions to maximize a reward signal. Unlike conventional ML methods, a learner (agent) is not told which actions to take; instead, it explores different actions to maximize the feedback it receives from its environment . In addition to the agent and the environment, other major components of reinforcement learning include policy, reward and value functions. A Policy is a rule used by the agent to decide what actions to take. Reward is a signal that tells how good or bad agent actions are on the environment. The agent aims to maximize the cumulative reward.Value functions
stand for the value of a state, i.e., the estimate of the expected return of being in a given state.
Actor-critic based approaches  are one of several RL algorithms exploited in different applications. A general structure of these algorithms is shown in Fig. 1. The actor-network is responsible for deciding which actions should be taken and the critic network criticizes the actions of the actor by investigating new states and the corresponding reward value.
The actor and critic parameters are represented by and . The actor learns the optimal policy gradually while the critic learns to estimate the Q-function considering the value of reward . Mathematically, let be the value function of next state and be the constant discount factor. The Q-function can be written as:
can be further decomposed into the state-value function and the advantage value . is a measure of how much a certain action is better than other actions in a specific state, i.e., . Equivalently, we have:
Iii System Model and Problem Definition
Similar to , we consider a system architecture that consists of three main parts: 1) end devices (Sensors) 2) an LA which is on the same network of the sensors 3) a CN in the cloud. Fig. 2 shows an overall structure of the model,
The end devices include sensors collecting information from their environment. As in 
, we assume that sensors are low-cost and capable of performing simple tasks such as limited numbers of convolutional neural network (CNN) layers. To further reduce complexity, only quantized CNNs are utilized at the sensors.
An LA is an edge device on the same network as the sensors  which can collect sensor data at low communication overhead compared to transferring data to the cloud and perform simple control tasks. LA nodes have lower computation capability than cloud nodes.
The CN is a node located in the cloud. It has high computational capabilities and can perform complex ML tasks with significant data processing requirements.
The network is deployed to collect data from distributed sensors and perform a complicated ML task (e.g., object recognition and tracking). It is assumed that sensors are not able to execute the complete ML operations individually (either because they do not have sufficient processing power or the ML task requires data from many sensors).
The system architecture is well suited for IoT applications. However, one main disadvantage is the large communication cost of sending data from all sensors to the CN. To reduce the communication cost, we take advantage of correlations among data from different sensors as well as the processing power at the LA. The LA selectively instruct sensors to send data to the CN.
LA data selection can greatly reduce the communication overhead. However, the LA has no prior knowledge of how is the degree of redundancy among sensors or how much relevant a sensor’s data would be for the intended ML task at CN. Reinforcement learning methods provide a framework for learning and decision making in presence of uncertainty.
In this study, we utilize a neural network (NN) located at the LA to act as a controller and decide which sensor data should be transmitted. The approach is particularly suitable in the following scenarios: a) data collected by individual sensors (e.g., an image from one of the security cameras) may not contain much information for the task performed at the cloud. For instance, the image of one camera is blurry or no object is visible in that image; b) multiple sensors report very similar data (e.g., several security cameras having similar views at the same time), and thus it suffices to send only a subset of them to the CN. This situation is common in IoT networks due to sparsity and locality of spatial data.
It is worth mentioning that, compared to , no further processing on the sensor’s data is performed on the LA , rather, the LA makes binary decisions regarding whether data from a sensor should be sent to the cloud or not. In fact, our approach is orthogonal and complementary to that in , i.e., it is possible to have an LA which first makes inference locally, and if it is not successful, performs the sensor-selection procedure to selectively send data of sensors to the CN.
Iv Proposed Scheme
We aim to design a controller at the LA that, given some representation of the the sensors’ information, can determine the utility of forwarding a sensor’s data to the CN. Consider a network of sensors. Let be the raw data collected by sensor (). The sensor may pass the raw information to the CN or may perform some initial processing on that (if it has enough computational power). In a general form, we use , to denote the data that the sensor transmit to the CN after such processing step (if the sensor is very simple and not able to perform any processing could be an identity function). , if transmitted, will be utilized by the CN to perform the target ML task.
Each sensor also has another output toward the LA using which the LA decides which sensor should send data to the CN. This output is denoted by , . As the LA does not want to perform the actual ML task on the data and is responsible for sensor selection only, is usually a low dimensional representation (e.g., a lower resolution image) of . Noting that could be an identity function as well but it is not a good choice as it means that we are sending all row sensors’ data to the LA which is not needed in that details.
The controller (located at the LA) should be trained to observe the received and then decide based on a metric. As the accuracy of inference is important, the metric should retain the accuracy while reducing the overall communication-overhead. Clearly, there is a tradeoff between these two factors, and based on the requirement of each application, an appropriate selection metric should be defined.
For RL modeling, we consider the combination of sensors and the CN as the environment. In this environment, each sensor collects data and sends , to the LA. The set of received from all sensors constitutes the state of the environment, .
Based on the network state, the RL agent determines the appropriate action. In our model, the action space is the set of
-dimensional binary vectors, corresponding to which sensors should send their data to the CN and vice versa, i.e., its th element determines whether the th sensor’s data should be dropped or transmitted. represents the agent neural-network equivalent function considering the environment state and shows the parameters of the actor network which are set during the training procedure.
After receiving a subset of the sensors’ data, the CN performs the desirable ML task. The CN first computes as the average of received information,
The averaging method at (6) can be changed based on the requirement of the application; we should just make sure that we have a method at the CN to make sure the input dimension of the CN network is constant regardless of the number of sensors selected, e.g. by averaging in this application the CN neural network does not need to know how many sensors are selected. The resulted is then passed through the CN trained neural network for classification, . Finally, the reward of the action is determined based on the inference accuracy and communication overhead. The RL agent uses this reward to make better decisions in future steps.
The policy we are looking for is to learn what types of data are useful for the ML task. The critic network measures the quality of the controller’s decision based on the classification result at the CN, updates its parameters and provides feedback to the actor about the effect of the taken action. The actor exploits this feedback to update its network weights.
To fully specify the A2C RL algorithm, we need to define the reward function that measures how good an action is. Here we define a sample reward function that account for both the accuracy and communication overhead.
where is the number of selected sensors (the number of ones in the action vector ). In (7), setting and to different positive values, we can strike a balance between reducing communication overhead and high inference accuracy. As can be seen, if the CN misclassifies, a negative reward of will be returned to the RL agent. Otherwise, a positive reward is generated. This reward is larger when fewer sensors have transmitted their data to the CN, i.e., with less communication overhead.
Fig. 3 provides the detailed network structure of the proposed scheme. We note that the specific operations at the sensors as well as the neural network architecture at the CN are for the particular application that we will explain in Section V. For other applications, the sensor and the CN neural nets can be modified as needed and it is independent of the proposed RL-bases sensor drop mechanism.
In this section, we present the implementation and evaluation of the proposed approach.
We use the same dataset as in  for evaluation of the proposed method. Roig et al.  first introduced this multi-view multi-camera dataset that presents multi-view of same objects. This dataset consists of images captured from 6 cameras simultaneously.  further cleaned up this dataset. Similar to , the dataset has been split to 680 training samples and 171 testing samples. It contains four classes, namely, of car, bus and person and images that do not contain any of defined objects.
V-B Implementation details
Neural Network Structures: The detailed network setup of SensorDrop for multi-view multi-camera image classification is provided on Fig. 3. Following the same terminology as in , the basic processing block of our network is called ConvP
consisting of a convolutional layer followed by a max pooling layer.
We adopt similar network topology and neural network structures at the sensors on the CN as those in , i.e., 6 cameras (sensors) collect their images (), and is considered as a ConvP layer following the binary neural network (BNN) architecture . , in this setup, is defined as the average over all channels of .
The structure of the actor and critic neural networks are similar to each other, each consisting of two consecutive ConvP layers and a fully-connected layer. The CN takes the selected and computes an average of the inputs. Two sets of convolutional layer is then applied followed by a max pooling layer and a fully-connected layer with 4 outputs (a 4-dimensional one-hot vector representing four different labels available: bus, person, car and no caption images). The exact size of each layer is given in Fig. 3.
Training: Training of the RL-agent depends on its environment and the reward that it gets from its actions. Therefore prior to RL-agent training we need to have our environment completely setup, the neural networks at the sensors and the CN should have been trained. In our experiment, these networks are trained using typical supervised method where it is assumed that all sensors are sending their data to the CN. In this stage, Adam optimizer is used with a learning rate of 0.001 and a batch size of 50.
V-C Experimental Results
We evaluate the inference accuracy of the system and the reduction in communication-overhead.
Fig. 4 shows reward, accuracy and communication overhead over training iterations. In the plots, we normalize accuracy and communication costs using the baseline values in absence of SensorDrop. It is observed that all measures in SensorDrop are in general improve with more training iterations and we achieve higher rewards at the expense of a small decrease in accuracy. Also as can be seen, at some points during training the accuracy decreases as the result of incorrectly dropping sensors, but the algorithm manages to correct its decisions in the subsequent iterations.
The bottom plot of Fig. 4 reveals how effective SensorDrop is in reducing communication overhead while still maintaining accurate prediction. As can be seen, when the RL agent sees more (action, reward) tuples, it gradually learns when to drop sensors to reduce overhead without negatively affecting the accuracy.
|SensorDrop - reward (7)||25.8%||0.84|
|SensorDrop - reward (8)||35.2%||–|
Table I summarizes the test performance of SensorDrop over the 171 held-out samples which have not been seen during the training phase. The first line is the result when we use the proposed method with (7) as the reward function. As we are not aware of any similar competing technique in literature for dropping sensor information, for comparison, we report the results of two naive methods, i.e., Baseline and RandomDrop. In the Baseline method there is no sensor drop. In RandomDrop
, to reduce communication overhead, each sensor randomly decides to send its data with probability. In this experiment, we set so that about of the communication overhead can be reduced. Last line of Table I presents the test performance of the SensorDrop with another reward function that will be discussed in Section V-C2.
In Table I, the communication overhead of all schemes are normalized by that of the Baseline. The baseline has the highest communication overhead while achieving the highest accuracy as all sensor data is transmitted.
RandomDrop, achieves lower communication overhead at the cost of lower classification accuracy e.g., . One important point about RandomDrop method is that this method does not have a classification accuracy, i.e., sometimes it is higher than and sometimes it is lower. It is due to that in some cases random dropping may results to good selection of useful data and so we get high accuracy, and sometimes it drops important images which leads to poor classification results. The communication overhead of the RandomDrop is about (if we had more test samples it should be more closer to but now since we have limited number of test samples, it is a little bit deviated from ).
Among all, SensorDrop has the lowest communication overhead with over reduction in data transmission compared with the BaseLine approach. Furthermore. since the sensor selection are based on a well-trained controller, the accuracy of the scheme is about .
To further understand the effectiveness of SensorDrop, the average contribution of each sensor (camera) is depicted in Fig. 5, i.e., how often each sensor needs to transmit its data. It can be seen that most of the sensors send a small percentage of the total data they collected with the except of camera 3, which contributes most data.
It is worth mentioning that the outputs of SensorDrop provide insights on the deployment of the sensors. For instance, in the security camera dataset, data from camera 3 and 6 are important for correct inference, while in contrast camera 4 and 5 do not contribute much. This implies that camera 4 and 5 are positioned in suboptimal locations. We might decide to relocate them or shut them down to conserve resources.
To further gain insights on the behavior of the algorithm, Fig. 6 shows three camera views, each containing six images from the six cameras. Two of the sequences result in correct classification and one gives incorrect classification. Also included in the figure are the decisions by the RL agent in SensorDrop.
In Fig. 6
, the first row contains images from different cameras when a person is in the view of 4 cameras but not in the view of the remaining 2 cameras (as indicated by the blank ones). As expected, both of the blank images are dropped correctly, and two of the non-black images were selected to be transmitted to the CN by SensorDrop. The CN correctly classifies the target with the two images.
In the second row of Fig. 6, there are 6 images of a bus from different views. Clearly, there exist redundancy among these images and thus it is expected that not all images are needed for the inference task (i.e., to identify the presence of a bus). It is observed that the RL method act reasonably by selecting the last two images which results in correct classification.
The last row of Fig. 6 gives an example that SensorDrop failed. In this example a pedestrian is in the view of only two out of six cameras. In this case, SensorDrop selected the input from a single camera, which results in a wrong classification at the CN. Note that such events are infrequent as the overall accuracy of the model is very high as reported in Table 1.
V-C2 Trade-off between Accuracy and communication overhead
As the final set of experiments, we explore the tradeoff between accuracy and communication overhead by adjusting the parameters in (7).
As discussed in Section IV, the suitable reward function is problem dependent and should be defined based on the requirements of the specific problem. We have already demonstrated the effectiveness of SensorDrop when (7) is used as the reward function. To show SensorDrop works with other rewards, we experiment with another simple reward function as,
where is the number of active sensors (the number of ones in the action vector ) and parameter is a real number in the range of , which can be tuned to give more importance to accuracy or communication overhead. A smaller will lead to greater reduction on the communication overhead, and vice versa. The other parameter is a negative reward value that the agent receives if a wrong prediction occurs in the experiments, is set to .
We trained our SensorDrop with the new reward function and . Figure 7 shows the convergence of SensorDrop. Specially, it plots training accuracy and normalized communication overhead (over a baseline without dropping). It can be observed that the model learns to remove redundancy in the data and reduce the communication overhead while maintaining the highly accurate object detection. The test performance of this setting is reported as the last line of Table I. As can be seen, the new reward function has a bit higher accuracy compared to the initial reward function in (7), but the saving on the communication overhead is about lower. We did not include the reward value for this case in the table, as here we use different reward functions compared with the other methods of Table I.
Figure 7 results are differently illustrated in Fig. 8. In this figure, there are several dots showing the performance (accuracy and overhead) of the proposed method when it is trained over time. The color of each dot represent show long the network was under training (color goes from blue to red), i.e., blue dots represent the network performance in early training stages and red dots are associated to the results after many iterations. The axis of Fig. 8 are the accuracy and communication overhead of the SensorDrop scheme at that particular iteration. As can be seen, the model learns to go from high-overhead to low-overhead while maintain the high accuracy of object detection.
Lastly, we study how parameter in (8) affects the trade-off between accuracy and communication costs. In this set of experiments, varies from 0.1 to 0.9. The test accuracy and communication overhead are depicted in Fig. 9.
When , the model learns to attain a high accuracy of but it incurs a higher communication overhead. On the other end, smaller values of , such as , lead to significant reduction of communication overhead (to about 25%) but at the expense of lower accuracy. It should be noted that despite the drop in accuracy for smaller , the reduction is moderate since SensorDrop intelligently selects which sensors’ data to be dropped.
In this paper, We investigate the problem of reducing the communication overhead from distributed sensors to a cloud node for complex inference tasks. Designed a controller selects a subset of sensors to send data to the cloud, while keeping the accuracy at an acceptable level. Considering the dynamics of the data collected at the sensors, we devised a Advantage Actor-Critic based RL scheme to train the controller. The performance of SensorDrop has been evaluated in different settings using a real-world multi-view camera dataset. SensorDrop was shown to greatly outperform naive schemes such as no-drop and random drop. We have also demonstrated how the parameters of the RL reward function can be tuned to make an appropriate tradeoff between the accuracy and communication overhead.
-  (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In Advances in neural information processing systems, pp. 3123–3131. Cited by: §V-B.
-  (2018) Big data challenges and trade-offs in energy efficient internet of things systems. In 26th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1–6. Cited by: §I.
-  (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42 (6), pp. 1291–1307. Cited by: §II.
Data-driven sparse structure selection for deep neural networks.
Proceedings of the European Conference on Computer Vision (ECCV), pp. 304–320. Cited by: §I.
-  (2011) Conditional random fields for multi-camera object detection. In 2011 International Conference on Computer Vision, pp. 563–570. Cited by: §V-A.
-  (2018) Reinforcement learning: an introduction. MIT press. Cited by: §II.
-  (2017-06) Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Vol. , pp. 328–339. External Links: Cited by: §I, §I, 1st item, 2nd item, §III, §III, §V-A, §V-B, §V-B.
Branchynet: fast inference via early exiting from deep neural networks.
2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469. Cited by: §I.
-  (2016) Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828. Cited by: §I.
-  (2018) Blockdrop: dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8817–8826. Cited by: §I.