Video Surveillance (VS) technology has become a fundamental tool for the public and private sector security, such as traffic monitoring, indoor monitoring, and crime and violence detection [1, 2, 3]. Edge Artificial Intelligence (EAI) is a promising technology that combines Artificial Intelligence (AI), Internet of Things (IoT), and Edge Computing (EC) technologies [4, 5, 6]. Applying EAI technology in VS is an innovative and promising work that migrates computing workloads from the network center to the edge of the network to reduce huge network communication overhead and provide real-time and accurate video analytics solutions. However, this work is also facing many serious challenges: (1) how to address the problems of synchronization of distributed AI models in an EC environment; (2) how to design a feasible edge computing architecture for the VS system, taking into account large-scale monitor terminals, huge video stream, and huge network communication overhead; and (3) how to keep workload balance among edge nodes under the complicated scenarios of unbalanced connection of monitor terminals and unbalanced computing capacities of edge nodes.
. In the field of VS, existing AI and deep learning algorithms, such as Convolutional Neural Network (CNN) and Deep Neural Network (DNN), are mainly used for static image analysis, rather than image streaming and video analysis[13, 14]. Focusing on distributed VS systems and AI algorithms, most current VS systems rely on traditional centralized or cloud-based solutions, facing huge data communication overhead, high latency, and severe packet loss limitations [12, 3]. Existing studies have proposed various distributed AI and Deep Learning (DL) algorithms in distributed computing clusters and cloud computing platforms, such as distributed CNN, DNN, and LSTM [15, 9, 16]. There are many exploration spaces for distributed AI algorithms and VS system in EC environments [4, 9, 17].
In this paper, we focus on intelligent video surveillance systems based on AI and EC technologies, propose a Distributed Intelligent Video Surveillance (DIVS) system using a distributed DL model, and deploy the DIVS system in an edge computing environment. The contributions of this paper are summarized as follows:
We establish a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. It migrates computing workloads from the network center to network edges to reduce communication overhead and provide low-latency analysis solutions.
We provide task-level and model-level parallel training methods for the distributed DL model. In the task-level parallel, multiple DL sub-models with different structures are deployed on each edge node and different data analysis tasks are performed in parallel. In the model-level parallel, training processes of the CNN model are further parallelized on each edge node.
A model parameter updating method is proposed to realize the model synchronization of the global DL model on the EC platform with low communication cost.
Considering the unbalanced connection of monitor terminals and unbalanced computing capacities of edge nodes, we propose a dynamic data migration approach to improve the workload balance of the DIVS system.
The remainder of this paper is structured as follows. Section 2 reviews the related work. Section 3 establishes a multi-layer edge computing architecture and a distributed DL training model for the DIVS system. The implementation of the proposed DIVS system is described in Section 4. Experimental evaluation of the DIVS system is presented in Section 5. Finally, Section 6 concludes the paper.
Ii Related Work
Various distributed AI and DL algorithms were proposed in distributed computing, cloud computing, fog computing, and edge computing environments to improve their performance and scalability [15, 16, 17, 18, 19, 20]. In our previous work, we proposed a two-layer parallel CNN training architecture in a distributed computing cluster . Li et al
. discussed the application of Machine Learning (ML) in smart industry and introduced an efficient manufacture inspection system using fog computing. Diro et al
. proposed a Long Short-Term Memory (LSTM) network for distributed attack detection in fog computing environments. Focusing on edge computing, Khelifi et al. discussed the applicability of merging DL models in EC environments, such as CNN, RNN, and RL . In , Li et al. designed an offloading strategy to optimize the performance of IoT deep learning applications in EC environments.
. proposed a trunk-branch ensemble CNN platform for video-based face recognition, which can extract complementary information from holistic face images. In the VS of the public safety field, a temporally memory similarity learning neural network was presented for person re-identification. In the field of traffic monitoring, Zhang et al. proposed a CNN-based vehicles detection and annotation algorithm that can identify vehicle positions and extract vehicle properties from video streams . However, most existing methods train the DL models based on static images, and are rarely used for video analysis.
To efficiently handle large-scale video datasets and improve the performance of VS systems, researchers have attempted to deploy VS systems in distributed computing, cloud computing, and edge computing environments [12, 17, 22]. In , Kavalionak et al. introduced a distributed protocol for a face recognition system, which exploits the computing power of the monitoring devices to perform person recognition. Yi et al. built a video analytics system on an EC platform that offloads computing tasks between monitoring devices and edge nodes and provides low-latency video analysis . In , Park et al. proposed a scalable architecture for an automatic surveillance system using edge computing to reduce cloud resource consumptions and wireless network limitations.
Iii Proposed DIVS System Architecture
Iii-a Multi-layer EC structure of DIVS System
In the IoT and big data era, VS systems hold the characteristics of massive monitoring terminals, wide scope of monitoring, and endless video streams. At the same time, VS systems face increasing demands of accurate data analysis and low-latency response. We propose a distributed intelligent video surveillance system by combining the IoT, AI, and EC technologies. We establish a multi-layer edge computing platform for the DIVS system, providing flexible and scalable computing capabilities and effectively reducing network communication overhead. The DIVS system consists of a large number of high-definition monitoring devices, multi-layer edge nodes, a cloud server, and a distributed DL model. The proposed DIVS system architecture is shown in Fig. 1. The main components of the DIVS system are described as follows.
(1) Monitoring terminals. There are Monitoring Terminals (MT, also called monitoring devices) deployed at various positions of monitoring spots, such as traffic road, railway station, airport, or park. Each MT is equipped with a high-definition camera with a physical resolution of 1080P or 4K and adopts H.264 or H2.65 video coding standards. Each MT periodically submits the collected video dataset to the edge node to which it belongs by wired or wireless communication.
(2) Edge nodes and cloud server. Multi-layer Edge Nodes (ENs) are deployed in the DIVS system, which are deployed at different management levels, such as streets, districts, and counties of a city. The first-level ENs are responsible for connecting the MTs of each monitoring spot. Each middle-level EN is connected to all low-level ENs within its jurisdiction and is connected the high-level EN to which it belongs. High-level ENs connect to the cloud server, which provides task scheduling, data management, and resource allocation.
(3) Distributed DL model for video analytics applications. The video management and analytics application is a core component of the DIVS system, which provides management functions and video analysis functions. In this work, we only focus on the deep learning models of video analytics.
Iii-B EC-based Distributed Deep Learning Model
Most distributed deep learning solutions for big data applications are deployed on distributed cloud centers, leading to a large amount of network communication overheads. In this work, we propose an EC-based distributed deep learning model for the DIVS system and deploy distributed DL models on the edge nodes. The proposed distributed deep learning model based on edge computing is illustrated in Fig. 2.
The cloud server is responsible for task scheduling, data management, resource allocation, and data visualization. During the model training process, the cloud server monitors the training time costs on edge nodes and migrates datasets to achieve workload balancing.
As an input data source, each monitoring terminal submits surveillance video data to the corresponding edge node in a streaming manner. On the edge node, the video stream from each MT is divided into data frames in a sliding-time-window approach. Then, according to the data access requirements of separate DL sub-models, data blocks are further extracted from the data frames as an input of each DL sub-model. After obtaining the input dataset, each DL sub-model trains itself and updates the local weight parameters. The training process on each edge node is performed in parallel.
Iv Implementation of DIVS System
Based on the proposed the DIVS system architecture, we implement the DIVS system and address the problems of parallel training, model synchronization, and workload balancing. We introduce task-level parallel and model-level parallel training methods in Section IV-A to further accelerate the video analysis process. Section IV-B presents a model parameter updating method to achieve model synchronization of the global DL model in a distributed EC environment. Section IV-C presents a dynamic data migration approach is proposed to address the imbalance of workload of edge nodes.
Iv-a Parallel Training of Distributed DL Model
Benefitting from the formidable and scalable computing capacities of edge nodes, we propose two parallel training approaches of DL training model on the EC environment to accelerate the video analysis process of the DIVS system.
Iv-A1 Task-level Parallel Training
In actual scenarios, multiple video analytics tasks are typically performed in a video file on a DIVS system. For example, many researchers applied different deep learning algorithms for traffic monitoring, such as CNN models for vehicle classification [5, 24] and LSTM models for traffic flow prediction [25, 26]. Therefore, we propose a task-level parallel training method for the distributed DL model. We deploy multiple DL models with different structures (i.e., CNN and LSTM) on each edge node to perform different data analysis tasks in parallel. Each DL model is divided into multiple sub-models and allocated to corresponding edge nodes. Taking the field of traffic monitoring as an example, we deploy two DL models to perform three video surveillance tasks, including a CNN model for vehicle classification and a LSTM model for traffic flow prediction.
(1) CNN model for vehicle classification.
. To classify all types of vehicle from the traffic monitoring video streams, we build a distributed CNN model in the DIVS system based on the existing work. The original traffic monitoring video stream is submitted from each MT to EN and is divided into multiple video frames. Then, the CNN model extracts all vehicles from each video frame with realistic and complex background and saves them as separate sub-images. An example of the CNN model structure for vehicle classification is illustrated in Fig.3.
The structure of CNN vehicle classification model is formulated as flowing:
In each feature extractor unit, there are one convolutional layer and one pooling layers . Repeating this unit 2 time superimpose 3 fully-connected layers .
In the first feature extractor unit, the original image of each video frame is sent to a convolutional layer to obtain a feature map. We use a pooling layer to compress features and determine if the current area contains vehicles. In the second feature extractor unit, intermediate images containing vehicles are sent to the second convolutional layer to obtain a fine-grained feature map. From the second pooling layer, we can extract each potential vehicle and sent it as a separate input to the fully-connected layers. SVM classifiers and Bayesian networks are applied in the fully-connected layers to accurately classify all types of the vehicles. Finally, the CNN vehicle classification model is copied and distributed into each edge node for parallel training. Due to limited space, we do not discuss the detail process of the CNN model.
(2) LSTM model for traffic flow prediction.
In existing work, LSTM model was applied in video surveillance, especially for traffic flow prediction [25, 26]. To make full use of the traffic monitoring video stream, we build a distributed LSTM model based on the existing work for the DIVS system to predict the traffic flow by sharing the same input video stream of the CNN vehicle classification model. The LSTM model learn the time series with dependencies and automatically evaluates the optimal time periods for time series prediction. An example of the LSTM model structure for traffic flow prediction is illustrated in Fig. 4.
In this work, the LSTM traffic flow prediction model is composed of an input layer, an recurrent hidden layer, and an output layer. We express the monitoring video streams as time series, denoted as , where is the video frame at the -th time step. We calculate the memory cell based on the input , which is a core module of the LSTM model. Three gates are employed to control the value of : a forget gate to forget the current value of , an input gate to read its input, and an output gate to output the value of . The gates and cells of the LSTM model are defined as follows:
where matrices are the weight parameters of the LSTM model. The LSTM traffic flow prediction model is copied and distributed into each edge node for parallel training. Due to limited space, we do not discuss the detail process of the LSTM model. Readers can explore more interesting DL models on each edge node to training the monitoring video datasets.
Iv-A2 Model-level Parallel Training
Considering that the edge nodes are equipped with multi-core CPUs and have potential parallel computing power, we propose a model-level parallel training method to further accelerate the training process of each DL sub-model on each edge node. In this section, we use the CNN model as an example to introduce the parallelization of two important training processes, such as the convolutional layers and fully-connected layers.
(1) Parallelization of convolutional layer.
A video frame is used as an input matrix of the CNN sub-model. In the convolution layer, a filter parameter matrix is introduced to transfer the input matrix into a feature map to extract the key features. We can partition into multiple convolution areas to perform the convolutional operation in parallel. Assuming that () is the shape (i.e., depth, height, and width) of and () is the shape of , we can calculate the shape () of as:
is the number of the zero padding ofand
is the stride of the convolutional operation. Then, we extract each convolutional area(the start and end rows and columns) of for each parallel task:
Each convolution area is convoluted separately with the filter parameter matrix to get the result: an element of the feature map. Each element in the feature map is computed based on the corresponding convolution area in and . Different tasks can access different convolution areas in simultaneously without updating the their values, there is not any data dependency among these tasks. An example of the parallel convolutional computation of each CNN sub-model is illustrated in Fig. 5 and the steps of this process are described in Algorithm 1.
(2) Parallelization of fully-connected layer.
In a fully-connected network, different neurons are arranged in different layers, with multiple neurons in each layer. Each neuron in the-th layer is connected to all the neurons in , and the output of neurons in is the input of neurons in . That is, data dependencies occur among neurons in and . In contrast, there is no connection among neurons in the same layer, namely, there is no logical or data dependency between them. Therefore, the calculation process of neurons in the same layer can be executed in parallel.
Assuming that there are neurons in the layer and neurons in , and is the weight set for the neurons in and . For each neuron in , we calculate the output of , as defined in Eq. (4):
is the activation function betweenand . The computation tasks of each neuron in can be executed in parallel. An example of parallel training of the fully-connected network is illustrated in Fig. 6.
In Fig. 6 (a), we parallelize the computation tasks of each hidden layer into independent sub-tasks . In addition, after the outputs of all neurons in the hidden layer are obtained, we also calculated each neuron in the output layer simultaneously. As shown in Fig. 6 (b) The computation tasks of the output layer are decomposed into parallelizable sub-tasks .
The maximum parallelism degree of the entire fully-connected network is the neuron width of the entire fully-connected layer, namely, the number of neurons in the layer having the largest number of neurons, as defined in Eq. (5):
where is the number of neuron in the layer .
Iv-B Weight Parameter Update and Model Synchronization
In actual scenarios, each edge node may connect to a different number of monitoring terminals due to uneven deployment of monitoring terminals. In addition, each edge node has different computing capacities due to the heterogeneity of edge nodes. These conditions lead to different workloads and training speeds between edge nodes of the same level, and further lead to synchronous problems during global weight updating. Therefore, we propose a weight parameter update and model synchronization method for the distributed DL model. The workflow of the proposed weight parameter update and model synchronization method is illustrated in Fig. 7.
Definition 1: Local weight set. The weight parameters of each DL sub-model on each edge node is defined as a local weight set. Each local weight set is trained based on the local video frames and updated on a low-level edge node.
Definition 2: Global weight set. The weight parameters of the entire DL model is defined as a global weight set. The global weight set is updated on a high-level edge node by collecting the local weight sets of all of its sub-models.
We use a batch training approach to train the DL sub-model and update the corresponding local weight set. As shown in Fig. 7, the initial global weight set of a DL model is shared to all low-level ENs to train the corresponding sub-models. All edge nodes use to train the first sample of the first batch and get the corresponding outputs. Each edge node updates as a local weight set based on the output. Then, is used to train the second sample on and obtain a updated value of . Repeat this step, until all of the samples in the first batch on are trained.
After all edge nodes complete a batch training, the latest local weight set trained on each edge node is aggregated to a high-level edge node to calculate a new version of global weight set . Given MTs and ENs in a DIVS system, the average number of MTs connected to each EN is set as . Assuming that for each MT, the video stream per unit time is divided into video frames. The video frames on each edge node are further divided into multiple batches of training subsets, let be the number of batches. Assume that there are video streams from MTs arrive to the -th edge node , the batch size on is calculated as , and the average number of samples in each training batch on an edge node is . Hence, we define the exponential value of the difference between the batch width of each edge node and the average batch width of all edge nodes as its contribution to the global weight set:
The global weight set for the -th batch training is defined:
where is the updated local weight set the DL sub-model on , which is trained by the -th batch training. After obtaining the updated global weight set , the entire DL model achieves a synchronization. Then, the high-level edge node shares to each edge node for the next batch training. The above process is repeated until the weight set reaches a steady state. The steps of weight parameter update and model synchronization is described in Algorithm 2.
Iv-C Dynamic Video Data Migration
As described in previous section, edge nodes have different workloads and training speeds, which will lead to synchronous problems during global weight updating. Therefore, we propose a Dynamic Data Migration (DDM) strategy to maximize workload balancing of the distributed EC system and minimize the synchronous in global weight updating.
Let be edge nodes in the DIVS system. Assuming that there are video frames in the training subset on computer , and is the average time it takes for
to complete a training process on a video frame in the current migration assessment period. The migration assessment period is defined as a time period between two migration operations. Hence, the time of an epoch of iteration training onis calculated as .
Let be the average time that all edge nodes complete a training iteration. If , it means needs to migrate video frames to reduce the training time. If , it means has available computing resources to receive some immigrated video frames. If , it means does not need to migrate as well as cannot receive immigrated video frames. The number of video frames on that require to be migrated is calculated in Eq. (8):
The set of migration data is denoted as . indicates the number of video frames to be migrated, and set . indicates the number of video frames allowed to be immigrated, and set . After migration, the new execution time on id calculated as . Therefore, the problem of workload balancing of the EC system is formulated in Eq. (9):
The smaller the value of , the more balanced the workload of the entire EC system. Let be the threshold of the workload balance. If , we continue to calculate the number of migrations and the migration plan. Otherwise, no migration is done. Namely, instead of migrating all the time, we only conduct migration assessments in each migration interval.
After getting the amount of video frames that each edge node requires to migrate, we match the migration requirements and immigration capacities for computers. We try to move the entire dataset to the target edge nodes to guarantee a minimum number of edge nodes involved in each migration. We introduce an offset of for the data migration matching. , if , where is the number of video frames that allow to move in and and is that of video frames that need to move out. The data migration matching strategy is described as follows:
Find edge nodes that do not meet the migration conditions. For an edge node , if , namely, does not need to migrate data and is removed from the migration list. Similarly, if , it means that cannot provide a significant resource for data immigration and is removed from the immigration list.
Find edge nodes with the best migration matching. If , then the data migration will be performed between edge nodes and with the amount of .
Find from the current immigration list, then find from the migration list. It is easy to prove that .
V-a Experimental Setting
We conduct experiments to evaluate the effectiveness and efficiency of the proposed EC-based DIVS system. We built an EC system with two levels of edge nodes, including 200 monitoring terminals, 35 EC servers, and a cloud server. Monitoring terminals are deployed at the intersections of 30 streets. Then, 30 first-level edge nodes are deployed on the corresponding streets to collect video streams from MTs in the current street. There are 5 ENs in the second level of the EC system, each of which is connected to 6 first-level ENs. Half of the edge nodes are equipped with Intel Core i5-6400 quad-core CPU, 6 GB DRAM and 32 GB main memory. The remaining edge nodes are equipped with Intel Xeon Nehalem EX six-core CPU, 8 GB DRAM and 64 GB main memory. A high-speed Gigabit network is used between MTs and the edge nodes. In the experiments, seven days of traffic monitoring videos are collected from these monitoring terminals.
V-B Performance Evaluation
We evaluate the performance of the proposed DIVS system from the perspective of the scale of edge nodes and video analysis tasks. To be fair, we set the structure and parameters of the DL models so that each DL model has similar computational complexity. The total execution time of the entire DIVS system is recorded and compared, as shown in Fig. 8.
In each case in Fig. 8 (a) with the same number of MTs, as the number of edge nodes increases, the total execution time of the DIVS system continues to decrease. When the number of edge nodes increases to 30, it only needs 1773.62 (s) and 851.28 (s) respectively. Interestingly, the advantages in large-scale MT scenarios are more obvious than in small-scale MT scenarios. For example, when the system is expanded from 5 edge nodes to 30 edge nodes, the execution time of 50 MTs drops from 2335.13 (s) to 851.28 (s), and the decline rate is 61.91%. In contrast, the execution time of 200 MTs is reduced from 6117.22 (s) to 1773.62 (s), with a decline rate of 71.01%. It is obvious in Fig. 8 (b) that an increase in the scale of tasks will not lead to a doubling of the total execution time. The reason is that no matter how many tasks are required on each edge node, the input video stream needs to be transmitted from the MTs to the edge nodes only once, which saves considerable data communication delay. In addition, the advantages of the execution time for multiple tasks are more pronounced as the number of MTs increases. Therefore, the experimental results show that the DIVS system achieves good scalability.
V-C Data Communication and Workload Balance
We evaluate the proposed DIVS architecture in the view of data communication cost and workload balance. The number of MTs gradually increases from 30 to 200, and the number of edge nodes increases from 5 to 30 in each case. The experimental results are shown in Fig. 9.
It is clear from Fig. 9 that the DDM strategy owns the most significant workload balance with a compromise data communication cost in most cases. As shown in the curves of DDM in Fig. 9 (a), benefitting from dynamic data migration, workload balance of the EC system keeps well steady with the scale of MTs increases. In contrast, without DDM strategy, although they own lower data communication cost in most cases, but the workload on the edge nodes is seriously unbalanced, which also leads to long waiting time for synchronization and more execution time for the entire DIVS system. In addition, with the EC system scale increases in Fig. 9 (b), benefitting from dynamic data migration, workload balance of the EC system also keeps well steady in the case of DDM. Experimental results demonstrate that the DDM strategy of DIVS significantly improve the workload balance of the EC system with acceptable communication cost.
In this paper, we built a Distributed Intelligent Video Surveillance (DIVS) system and deployed it on a multi-layer edge computing architecture to provide flexible and scalable training capabilities. We addressed the problems of parallel training, model synchronization, and workload balancing. Two parallelization aspects were provided to improve the throughout of the DIVS system and a model parameter updating method was proposed to realize the model synchronization of the global DL model. In addition, we proposed a dynamic data migration approach to address the workload and computational power imbalance problems of edge nodes. Experimental results showed that the DIVS system can efficiently handle video surveillance and analysis tasks.
This work is partially funded by the National Key R&D Program of China (Grant No. 2018YFB1003401), the National Outstanding Youth Science Program of National Natural Science Foundation of China (Grant No. 61625202), the International (Regional) Cooperation and Exchange Program of National Natural Science Foundation of China (Grant No. 61661146006, 61860206011), and the International Postdoctoral Exchange Fellowship Program (Grant No. 2018024). This work is also supported in part by NSF through grants IIS-1526499, IIS-1763325, CNS-1626432, and NSFC 61672313.
-  U. L. N. Puvvadi, K. D. Benedetto, A. Patil, K.-D. Kang, and Y. Park, “Cost-effective security support in real-time video surveillance,” IEEE Trans Ind. Informat., vol. 11, no. 6, pp. 1457–1465, 2015.
-  Y. Zhou, L. Liu, L. Shao, and M. Mellor, “Fast automatic vehicle annotation for urban traffic surveillance,” IEEE Trans. Intell. Transport. Syst., vol. 19, no. 6, pp. 1973–1984, 2018.
-  M. Valera, S. A. Velastin, A. Ellis, and J. Ferryman, “Communication mechanisms and middleware for distributed video surveillance,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp. 1795–1809, 2011.
-  Z. Zhou, H. Liao, B. Gu, K. M. S. Huq, S. Mumtaz, and J. Rodriguez, “Robust mobile crowd sensing: When deep learning meets edge computing,” IEEE Network, vol. 32, no. 4, pp. 54–60, 2018.
-  C. Ding and D. Tao, “Trunk-branch ensemble convolutional neural networks for video-based face recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 40, no. 4, pp. 1002–1014, 2018.
L. Ding, Y. Tian, H. Fan, Y. Wang, and T. Huang, “Rate-performance-loss optimization for inter-frame deep feature coding from videos,”IEEE Trans. Image Processing, vol. 26, no. 12, pp. 5743–5757, 2017.
-  D. Li, L. Deng, Z. Cai, B. Franks, and X. Yao, “Intelligent transportation system in macao based on deep self-coding learning,” IEEE Trans Ind. Informat., vol. 14, no. 7, pp. 3253–3260, 2018.
-  P. Li, Z. Chen, L. T. Yang, Q. Zhang, and M. J. Deen, “Deep convolutional computation model for feature learning on big data in internet of things,” IEEE Trans Ind. Informat., vol. 14, no. 2, pp. 790–798, 2018.
-  Z. Zhao, K. M. Barijough, and A. Gerstlauer, “Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 37, no. 1, pp. 2348–2359, 2018.
-  H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for the internet of things with edge computing,” IEEE Network, vol. 32, no. 1, pp. 96–101, 2018.
-  Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, “Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids,” IEEE Trans Ind. Informat., vol. 14, no. 4, pp. 1606–1615, 2018.
-  H. Kavalionak, C. Gennaro, and G. Amato, “Distributed video surveillance using smart cameras,” J. Grid Compu., pp. 1–19, 2018.
H. Han, A. K. Jain, F. Wang, S. Shan, and X. Chen, “Heterogeneous face attribute estimation: A deep multi-task learning approach,”IEEE Trans. Pattern Anal. Machine Intell., vol. 40, no. 11, pp. 2597–2609, 2018.
-  D. Cho, Y.-W. Tai, and I. S. Kweon, “Deep convolutional neural network for natural image matting using initial alpha mattes,” IEEE Trans. Image Processing, vol. 28, no. 3, pp. 1054–1067, 2019.
-  J. Chen, K. Li, K. Bilal, X. Zhou, K. Li, and P. S. Yu, “A bi-layered parallel training architecture for large-scale convolutional neural networks,” IEEE Trans. Parallel Distrib. Syst., pp. 1–1, 2018.
-  M. Langer, A. Hall, Z. He, and W. Rahayu, “Mpca sgd: A method for distributed training of deep learning models on spark,” IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 11, pp. 2540–2556, 2018.
-  S. Yi, Z. Hao, Q. Zhang, Q. Zhang, W. Shi, and Q. Li, “Lavea: latency-aware video analytics on edge computing platform,” in Proceedings of the Second ACM/IEEE Symposium on Edge Computing. ACM, 2017, p. 15.
-  L. Li, K. Ota, and M. Dong, “Deep learning for smart industry: Efficient manufacture inspection system with fog computing,” IEEE Trans Ind. Informat., vol. 14, no. 10, pp. 4665–4673, 2018.
-  A. Diro and N. Chilamkurti, “Leveraging lstm networks for attack detection in fog-to-things communications,” IEEE Commun. Mag., vol. 56, no. 9, pp. 124–130, 2018.
-  H. Khelifi, S. Luo, B. Nour, A. Sellami, H. Moungla, S. H. Ahmed, and M. Guizani, “Bringing deep learning at the edge of information-centric internet of things,” IEEE Commun. Lett., pp. 1–1, 2018.
-  D. Zhang, W. Wu, H. Cheng, and R. Zhang, “Image-to-video person re-identification with temporally memorized similarity learning,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 2622–2632, 2018.
-  W. Sun, J. Liu, Y. Yue, and H. Zhang, “Double auction-based resource allocation for mobile edge computing in industrial internet of things,” IEEE Trans Ind. Informat., vol. 14, no. 10, pp. 4692–4701, 2018.
-  H. D. Park and O.-G. Min, “Scalable architecture for an automated surveillance system using edge computing,” J. Supercomput, vol. 73, no. 3, pp. 926–939, 2017.
-  S. Yu, Y. Wu, W. Li, Z. Song, and W. Zeng, “A model for fine-grained vehicle classification based on deep learning,” Neurocomputing, vol. 257, pp. 97–103, 2017.
-  X. Ma, Z. Tao, Y. Wang, H. Yua, and Y. Wang, “Long short-term memory neural network for traffic speed prediction using remote microwave sensor data,” Transportation Research, vol. 54, pp. 187–197, 2015.
-  Z. Zhao, W. Chen, X. Wu, P. C. Y. Chen, and J. Liu, “Lstm network: a deep learning approach for short-term traffic forecast,” IET Intelligent Transport Systems, vol. 11, no. 2, pp. 68–75, 2017.