Internet of Things (IoT) applications have permeated industries and society in sectors ranging from energy and agriculture, to child care, conservation and manufacturing. Many recent applications of IoT describe themselves as being smart to emphasize the use of technology to enhance service provision, for example smart cities or smart health (Venkatesh et al., 2018). The push towards smart services is further motivated by complex and expanding urban environments, in which technology plays a fundamental role to support social inclusion, economic development and environmental sustainability (Anttiroiko et al., 2014).
Smartness implies a notion of intelligence embedded within the system and the services it offers. The concept of intelligence typically refers to the ability of computational system components to perform data processing tasks (Ibarra-Esquer et al., 2017). Intelligence has become integral to the IoT, and the successes of machine learning make it a natural candidate for providing the data processing capabilities that enable smart services. Yet, while running machine learning models on readily available datasets has become easy, deploying them in real-life, large-scale production environments is difficult (Ratner et al., 2019a). When used on a daily basis to serve millions of users in a variety of settings, predictive accuracy alone is insufficient to measure a model’s performance. Deployment concerns, cost issues and accessibility are calling for systems approaches that consider machine learning from an integrated software and hardware perspective. In IoT applications, algorithms and models are subject to the requirements of the cyber system and the physical environment in which they are deployed. This raises new design concerns, which must be considered alongside deployment, scaling and distribution requirements.
The computational capabilities of many IoT devices are limited. Cloud technologies, which provide elastic and unlimited computational power, are an essential extension of the IoT to enable machine learning and other data processing techniques, to provide data storage and support auxiliary services (Atlam et al., 2017). However, the cloud is not a panacea, as the wireless communication link between devices and the cloud introduces delays, costs, security risks and privacy concerns. Alternative approaches are thus required to scale and distribute machine learning systems across heterogeneous computing resources in the IoT. Recent advances to scaling machine learning in data centers draw on distributed computing paradigms to parallelize training and inference tasks. Edge computing is extending the data processing continuum from devices to gateways, edge servers and the cloud.
This survey presents a comprehensive review of the challenges and opportunities of using machine learning to enable smart services in the IoT. We aim to provide the reader with a detailed and holistic, systems-level view of the multi-layered technical considerations that collectively enable intelligence in the IoT. The survey brings together current applications and architectures for machine learning in the IoT and application analysis methodologies from cyber physical systems. We summarize the vast experience of the machine learning systems community in deploying machine learning systems in production. Furthermore, we present the advances made in distributed machine learning and edge computing, and highlight their potential for scaling and distributing machine learning in the IoT. Finally, we show how the design concerns and diverse technology layers can be structured and conceptualized to order emerging research, and discuss open challenges.
1.1. Complementary Surveys
There are a number of related surveys that are complementary to this one. Samie et al. (2016) provide an overview of machine learning in the IoT from an embedded computing point of view, with emphasis on applications and machine learning approaches. Mahdavinejad et al. (2018) also review machine learning for IoT data analysis, with a focus on smart cities and data characteristics. Jagannath et al. (2019) take a functional perspective and review the application of machine learning for wireless communication in the IoT. These surveys cover algorithms and applications, but do not survey machine learning systems and distributed computing concerns. Diaz et al. (2016) review integration issues for the IoT and cloud computing, but do not focus on machine learning. Dias et al. (2016) survey prediction-based data reduction in wireless sensor networks, but do not focus on machine learning specifically, or applications beyond data reduction.
1.2. Structure of the Survey
This review starts by introducing the IoT and machine learning in Section 2. Section 3 presents machine learning applications in and design aspects of the IoT, as well as the state of the art architectures for deploying machine learning in the IoT. Key challenges of machine learning systems in production are discussed in Section 4. Considerations for scaling and distributing machine learning systems are outlined in Section 5. Section 6 presents emerging technologies that are enabling the development of machine learning systems for the IoT. Section 7 highlights open problems for machine learning systems for smart services in the IoT and Section 8 concludes the review.
2. Preliminary: Machine Learning in the Internet of Things
2.1. The Internet of Things and Cyber Physical Systems
The Internet of Things (IoT) extends the digital realm of the Internet and the Web to the physical world of objects (Greer et al., 2019). Objects, or things, in the IoT must have at least one of the following capabilities: to be uniquely identified, know their precise location, be able to obtain data about their state or the environment and modify the environment through remotely controlled actuation (Ibarra-Esquer et al., 2017). In addition, things may have the capability of processing information. In parallel to the IoT, Cyber Physical Systems (CPS) which are comprised of technical and computational systems, have emerged out of the fields of systems engineering and control. Technical systems process, transport and transfer materials and energy in the physical world. Computational systems process information. An embedded system is a computational (cyber) system that is a fixed component of a technical (physical) system (Schlingloff, 2016). CPS connect embedded systems through communication technology. Sensors and actuators are components of CPS that link the physical and cyber realms by providing information through sensor observations and action through actuation (CPS Public Working Group, 2017). CPS have the capabilities of collecting data about the physical environment through sensors, of transporting data using communication technology like bluetooth and the Internet, of processing data on local processors or cloud servers, and of receiving messages to perform an action to alter the state of the physical environment. Human users and operators can interact with CPS.
The definitions of IoT and CPS have been converging over time towards a common understanding of systems that integrate logical components such as network connectivity and computation, with physical components through the use of transducing components, which are sensors and actuators (Greer et al., 2019). Perspectives in the two fields differ on four key issues: system-level control, whether the one is a platform for the other, Internet connectivity and machine-human interactions. However, Greer et al. (2019) argue that the practical benefits of a common definition outweigh the differences in perspective. This paper takes a unified perspective on the two fields to retain focus on the integration of machine learning systems with systems of hybrid physical-logical nature. Unless specified otherwise, the term IoT refers to systems that are either classified as IoT or CPS.
2.2. Overview of Machine Learning
Machine learning methods fit complex functions over data to discover patterns and correlations which can be exploited for building Models and making predictions (Hastie et al., 2009). Models learned from data are distinguished from knowledge-based Models by capitalising the latter. A model is learned from a training dataset by approximating a useful function that transforms input variables to an output. Once such a function has been approximated, it can be used to calculate an output value for a new input value. In practice, fitting a function over training data is called model training, while using a model to make predictions is called inference (Amershi et al., 2019). The test for a model is its ability to produce the correct output for a new input. Models are only useful if they can generalise beyond the training data (Domingos, 2012). Machine learning approaches are promising if well understood Models are not available or if they are too complex to design manually (Mitchell, 2006). Similarly, if Models describe a changing process or context and are required to evolve over time, learning from data provides a mechanism for model updates within a changing context.
Two broad categories of machine learning approaches exist: supervised and unsupervised learning. Supervised learning requires that each training sample is labelled, meaning that for each input value there is a known output. Supervised model training uses the labelled training data as guide to find the parameterized function that minimizes the error between the model’s predicted output values and the real output values (i.e. the labels). The goal of unsupervised machine learning is to discover structure in the input data in the absence of training labels that are otherwise used to approximate the error for each observation(Hastie et al., 2009). Typical tasks that machine learning algorithms perform are clustering, classification and regression, which can be used to group data based on characteristic attributes, to detect patterns and anomalies in the data, to make recommendations, discover trends, make forecasts, and emulate audio and visual perception. In particular, Deep Neural Networks, which are commonly referred to as deep learning, is seen as a promising technology to process the data generated by sensors in the IoT (Chen and Ran, 2019)(Wang et al., 2019b)(Wang et al., 2018). This multi-layered, neural network based, supervised machine learning approach has provided state of the art results for many perception based tasks (Lecun et al., 2015)
Machine learning methods scale to very large datasets and improve with more data (Mitchell, 2006)
. This has made them successful in analysing the extreme quantities of data produced by digital, online services and applications. Paired with the flexible and accessible computing infrastructure made available through the cloud and open source software, machine learning has become an essential technology for extracting value from data in modern, digital environments(Stoica et al., 2017). In a static machine learning work flow, raw data is processed and transformed, and then used to train a model. After sufficient evaluation, the model can be used for inference with new data samples. For many models, parameters can be optimized alongside model training to better suit the data (Hastie et al., 2009). Arriving at a good model depends as much on the training data as on the optimization and evaluation functions (Domingos, 2012). Feature engineering is used in data processing to ensure that the data input contains independent variables that are correlated to the output. It is usual to train and evaluate more than one model, and ensembles of models usually perform better than any one model. Training machine learning models is an optimization problem and has considerably greater computational cost and time requirements than inference (Chahal et al., 2020). In dynamic environments, model updates are required over time. This can be done explicitly by retraining the model with new data, or implicitly by using online learning algorithms over streaming data (Gama et al., 2014).
3. Deploying Machine Learning in the IoT
3.1. Machine Learning Applications in the IoT
Machine learning technologies are used for behavioural profiling, monitoring, prediction, planning and perception in the IoT in diverse sectors like agriculture (Elijah et al., 2018), health (Fang et al., 2016), home (Saad Al-Sumaiti et al., 2014), building (Molina-Solana et al., 2017), electricity (Zhang et al., 2018)(Ghorbanian et al., 2019)(Wang et al., 2019c), water (Wang et al., 2015), wearables (Fortino et al., 2013) and transport (Liang et al., 2019)(Raza et al., 2019). While differences exist between sectors, many of the tasks that machine learning technologies can perform have application across domains. For example, in the electricity and water sectors, clustering smart meter data enables behavioural profiling to model and predict the consumption behaviour of customers (Alahakoon and Yu, 2016)(Eggimann et al., 2017). Human activity recognition is used to identify physical user activity from wearable and mobile data (Wang et al., 2019a). Similar approaches are used for appliance-level load disaggregation in the electricity sector (Zhang et al., 2018)
. Pattern recognition approaches are used to discover anomalous events(Prasad et al., 2009) such as electricity theft (Nagi et al., 2010) and health incidents such as falls (Davis et al., 2016). In home and building systems, scheduling and optimization can be used to automate appliance usage to achieve various objectives, for example least cost or maximum comfort (Saad Al-Sumaiti et al., 2014). In transport systems it is desirable to predict real-time traffic flow to optimize routes for an objective, like minimal delay (Tang et al., 2019)
. Computer vision based perception applications are used in autonomous vehicles(Mallozzi et al., 2019) and security applications.
3.2. Design Aspects and Stakeholder Concerns
The physical world is not entirely predictable (Lee, 2008). Yet, engineered systems are designed to deliver reliable, predictable and robust performance (Gunes et al., 2014), which is of utmost importance in systems that are critical to human life. The hybrid nature of the IoT imposes more stringent requirements than what would be the case for purely physical or solely cyber systems. Several national, regional and global initiatives have been actively formalising the requirements of IoT systems and producing standards pertaining to aspects of these systems. Examples are the IEEE 2413-2019 Standard for an Architectural Framework for the Internet of Things (IoT) and the oneM2M TS-0001-V3.20.0 Functional Architecture. The Cyber Physical Systems Framework, developed as an analysis methodology by the US National Institute of Standards and Technology (CPS Public Working Group, 2017), defines system concepts that are of interest to one or more stakeholders as CPS concerns. At a higher level, the framework groups related concerns into nine aspects, as summarised in Table 1. Concerns are related and typically present trade-offs. For example, in considering the uncertainty concern, the latency imposed by specifying and managing uncertainty must also be considered. Not all concerns are relevant within a particular application and stakeholders may prioritize concerns differently. Requirements can be used to specify properties of the system that address the concerns. Given the conceptual similarity of the IoT and CPS, the CPS Framework can provide insights into application concerns of smart services and requirements of machine learning systems in the IoT.
|functional||actuation, communication, controllability, functionality, manageability, monitoriablity, performance, physical, physical context, sensing, states, uncertainty|
|business||enterprise, cost, environment, policy, quality, regulatory, time to market, utility|
|human||human factors, usability|
|trustworthiness||privacy, reliability, resilience, safety, security|
|timing||logical time, synchronization, time awareness, time-interval and latency|
|data||data semantics, identity, operations on data, relationship between data, data velocity, data volume|
|boundaries||behavioural, networkability, responsibility|
|composition||adaptability, complexity, constructivity, discoverability|
|lifecycle||deployability, disposability, engineerability, maintainability, operability, procurability, producibility|
Machine learning systems for smart service applications are part of the logical system in the IoT. Unlike the low risk analytical settings in which statistical machine learning has been developed, the hybrid physical-logical nature of the IoT thus introduces real-world, potentially life-threatening consequence to system malfunction or failure. As with other software systems, specifying the target system behaviour during a requirements analysis process is thus essential (Schlingloff, 2016)(Rahman et al., 2019). Traditionally, predictive performance has been the key concern of machine learning research. Likewise, in IoT the focus of machine learning implementations has been on functional aspects such as algorithm selection and parameter optimization for various application domains (Mahdavinejad et al., 2018)(Ravi et al., 2017). Increasingly however, challenges such as privacy, security, big data and latency are raised as important (Usman et al., 2019)(Samie et al., 2019)(Fei et al., 2019). Other context specific challenges that have been highlighted in various domains are summarized in Table 2. Despite recognizing these challenges, only few domains, like wearables, incorporate explicit requirements analysis processes to specify system requirements upfront (Fortino et al., 2013). Some individual works such as Akbar et al. (2017)
consider error estimation and the scalability of the architecture in their evaluation, but generally metrics beyond performance are not yet designed for or evaluated.
|Smart grids (Ghorbanian et al., 2019)(Tu et al., 2017)(NIST, 2014)||data heterogeneity, data integration, data storage, data velocity, data volume, implementability, interoperability, latency, privacy, reilability, resilience, security, visualisation|
|Agriculture (Elijah et al., 2018)||appropriate technology selection, business model, cost, data ownership, ease of use, interoperability, localisation, privacy, regulation, reliability, resource optimization (hardware and software), roaming, scalability, security|
|Health (Fang et al., 2016)(Shishvan et al., 2019)||adaptability, complexity, data heterogeneity, human interaction, latency, longitudinal analysis, multidisciplinary interaction, noisy data, personalisation, privacy, scalability, visualisation|
|Mobility (Raza et al., 2019) (Mallozzi et al., 2019)||cooperation, deployability, latency, liability, mobility (location updates), privacy, reliability, resource management (hardware), resource optimization (hardware), safety, scheduling, security, trust, uncertainty, validation|
|Wearables (Mukhopadhyay, 2015)(Fortino et al., 2013)||comfort, connectivity, data volume, efficiency, human physiology, latency, interoperability, power consumption, privacy, programming effectiveness, reliability, safety, sustainability, usability, user acceptance, variations over time|
3.3. Cloud-based System Architecture
The IoT has become synonymous with big data (Kalburgi et al., 2015)(Ibarra-Esquer et al., 2017), a concept that embodies the business opportunities and challenges entailed in extracting value from high volume, high velocity and high variety data. Large scale data processing, analytics and storage are prerequisites for gaining insights from big data. Cloud platforms have made data processing and storage ubiquitous and convenient by providing unlimited, on-demand computing resources without upfront user commitment, and by allowing users to pay for use as needed, for as long as needed (Jonas et al., 2019). The cloud is seen as essential for analysing and mining big data. It is thus no surprise that cloud technologies are viewed as integral to the implementation of smart services in many domains (Elijah et al., 2018)(Ghorbanian et al., 2019)(Zanella et al., 2014).
State of the art architectures for IoT applications are similar and characterised by their reliance on cloud technologies for data processing and machine learning (Atlam et al., 2017)(Siegel et al., 2018)(Samie et al., 2019). They are designed to facilitate the flow of data from tags and sensors that collect observations, over local communication networks and the Internet to the processing and storage facilities of the cloud. In the cloud the data can be processed and fused with other datasets, machine learning models can be trained, inference can be done and the data is stored for future use. The processed data can then be used in applications via APIs, user interfaces or visualisations to support decision making. The outputs of the inference process can also be transmitted over the communication network back to actuator devices to affect a change in the physical environment. Figure 1 shows how devices, communication networks and the cloud collect, transmit and process raw data from sensors for applications, and how processed data can be returned to devices for actuation.
Many open source technologies that support this kind of architecture exist, and have been discussed in detail (Diaz et al., 2016)(Kalburgi et al., 2015). Both open source software and commercially available cloud platforms are making it easy to launch new IoT applications with machine learning capabilities. The high computational demands of deep learning place further reliance on cloud computing resources (Mohammadi et al., 2018). Despite the success of cloud architectures, IoT applications face challenges of both large-scale machine learning, and computing challenges due to the scale and distributed nature of the IoT. These need to be overcome to deliver data processing capabilities that meet smart service requirements.
4. Machine Learning Systems in Production
This section highlights challenges of machine learning systems in production environments which will prevail in IoT applications. Table 3 presents a summary of these challenges at different stages of the machine learning life cycle: data provenance, model training, inference and ongoing machine learning operations.
|Data Provenance||Model Training||Inference||Software Engineering|
|Dirty data||Availability of training data||Efficient inference||System synergy|
|Data errors||Learning under uncertainty||Inference quality assurance||Data & model management|
|Data dependencies||Resource-performance trade-off||Interpretable inference||Model boundaries & limitations|
|Attacks on data||Training malicious models||Protecting private information||Designing for change|
4.1. Data Provenance
4.1.1. Dirty data
Machine learning approaches are successful where manual processes of creating models are laborious or not possible, and functions can be approximated from data instead (see Section 2.2). Just like code is the foundation of software, data is the foundation of machine learning (Amershi et al., 2019)(Ratner et al., 2019b). Due to data’s central role in machine learning, missing values (Bießmann et al., 2019)(Schelter et al., 2018), data redundancy and noise (Zhou et al., 2019) significantly impact model performance and continue to be researched to improve data preprocessing and cleaning.
4.1.2. Data errors
In addition to impurities that occur in raw data, performance degrading data errors that are generated during preprocessing and feature generation can propagate through the entire machine learning work flow. As an extension of the software metaphor, such data errors are like bugs in code (Breck et al., 2019)
. Data errors can be created by the data-generating code, for example by dropping features, by inconsistently changing the representation of some features or by creating improbable feature values. Feature skew and training skew are the result of variations in and inconsistent distributions of feature values between training and serving time. Scoring or serving skew refers to the selective serving of model outputs, which results in a self-reinforcing training set. Finally, mismatches can exist between the assumptions made in the training code and the expected data, resulting in impossible data values (e.g. negative log values).
4.1.3. Data dependencies
The concept of technical debt was introduced by Ward Cunningham in 1992 to consider the long term cost of moving quickly in software engineering. In software engineering, code dependency increases complexity and is classified as technical debt. In machine learning systems, unstable and underutilised data dependencies present technical debt that arises when model input signals change over time and when features are correlated, become redundant or add no value to the model (Sculley et al., 2015). These data dependencies are often hidden and can have unexpected effects that make machine learning systems brittle and error diagnosis expensive.
4.1.4. Security attacks on data
Machine learning systems are offering new attack surfaces that jeopardise system security. Poisoning and evasion attacks in particular exploit the dependence of machine learning systems on data. Poisoning attacks pollute the training data in either a targeted manner by influencing specific model outputs, or in an untargeted manner by lowering the model’s predictive accuracy. Evasion attacks happen during inference when an adversary modifies the input data to induce incorrect model outputs (Ji et al., 2018).
4.2. Model Training
The essential resource requirements for model training are (labelled) input data, computing power, electricity to power the computing resources and memory to store training data and models. Most challenges of model training relate to constraints around or lack of one of these resources.
4.2.1. Availability of training data
Supervised learning relies on high fidelity labelled training data to learn models. Accumulating and labelling sufficient training data is often the most expensive and time consuming aspect of applying machine learning (Ratner et al., 2016). While some existing IoT applications have collected training data that may be readily labelled retrospectively, for many applications the cost and availability of data labelling presents significant challenges (Yao et al., 2018b).
4.2.2. Learning under uncertainty
Noise in data are values that obscure the underlying signal. Noise can result from random or systematic errors in the observations, or from data that has been tampered with. Hidden stratification, where training data contains unrecognised categories that are not represented in the labels but that affect predictive outcomes, presents an additional challenge (Oakden-Rayner et al., 2019). Learning under label noise (Natarajan et al., 2013)et al., 2011), which studies the behaviour of machine learning techniques when subjected to the malicious attack of an adversary, have been well studied in the statistical and theoretical machine learning communities. Developing machine learning systems that exhibit robust behaviour in the presence of noisy and adversarial input is viewed as a particular challenge for deploying machine learning in critical applications (Stoica et al., 2017). This includes systems that can handle inputs for which they were not trained, and models that decline to make predictions of which they are not confident. In the IoT the multi-layered hardware architecture introduces additional uncertainties such as connectivity loss and latency due to the wireless communication network. Developing machine learning systems that can provide sensing quality assurance and guaranteed results in unstable operating conditions is thus important (Abdelzaher et al., 2020).
4.2.3. Resource-performance trade-off
The AlphaGo Zero programme (Silver et al., 2017), which has learned Go playing abilities, was trained on 29 million self-play games over a period of 40 days. For each iteration of self-play, a neural network was trained using 64 GPU workers and 19 CPU parameter servers. The superhuman game playing performance that AlphaGo Zero exhibits has come at an exceptionally large cost of computing power. This is the trend. Since 2012, the computing power required to produce the best models has increased by 300 000 times (Amodei et al., 2019). This is happening at a time where processors can no longer deliver increased computing power at the same rate as in the past, due to the ending of Moore’s law (Stoica et al., 2017). Understanding the trade-off between computing power, memory and energy consumption on the one hand, and model performance on the other hand is an ongoing research challenge (Abdelzaher et al., 2020)
. A simple heuristic is that the availability of storage, compute resources and communication overhead all increase as distance to the remote data center decreases. Large storage and compute are desirable, while communication overheads are not.
4.2.4. Training malicious models
Due to the complexity and large training cost of learning deep neural network models, many applications rely on readily available machine learning services offered online, or pre-trained, primitive models for download from online repositories. While the availability of pre-trained models enables the development of new applications and increases access to machine learning technologies, they also pose security risks. Backdoor attacks on neural network classifiers occur when a pre-trained model works well on regular inputs, but provides spurious outputs for specific inputs that are only known by the attacker. Gu et al. (2019)
show that networks trained with such backdoors retain their effectiveness even when used in new settings after undergoing transfer learning.Ji et al. (2018) consider a more general class of model reuse attacks in which malicious models are trained to provide predictably incorrect outputs for target input values. These types of attacks are characterised as being effective, evasive, elasitic and easy. They pose a significant risk to using machine learning systems where robust performance is required.
4.3.1. Efficient inference
The resource requirements for training models are much greater than those required for inference. In the case of AlphaGo Zero, the trained model uses 4 TPUs on one machine at match time (Silver et al., 2017), a fraction of the computing power required to train it. However, the scale of model serving and thus inference requirements in modern data centers necessitates optimized inference pipelines with high throughput and low latency (Kraft et al., 2020) to provide a seamless user experience. With the shift towards using deep learning on mobile (Jiang et al., 2020) and embedded devices (Rusci et al., 2020), efficient inference is becoming an important challenge to address due to on-device energy and computing power constraints.
4.3.2. Inference with quality assurance
Several machine learning models have been found to produce incorrect predictions when input data contain slight perturbations (Goodfellow et al., 2015). This can be abused by adversaries in evasion attacks. In general, models that are easy to optimize are easy to perturb. Linear models, ensembles and models that model the input distribution are not resistant to adversarial examples. Adverserial examples aside, achieving predictable performance with regards to model throughput and latency, as well as the correctness of results, is essential in all operating conditions for critical applications. To provide high quality inference, developing methods that are capable of guaranteeing model outputs is an important challenge (Abdelzaher et al., 2020). This requires reliable uncertainty estimates that can be calculated at run-time, as well as an understanding of how results are affected by resource availability. In the IoT, methods for calculating uncertainty estimates must be resource efficient.
4.3.3. Interpretable inference
Efficient and reliable inference strengthen the systems level performance of machine learning, but this alone is not enough. Many popular machine learning techniques, like neural networks, are considered black-box methods (Ribeiro et al., 2016), meaning that the model output cannot be explained easily in relation to properties of the model input. While models may exhibit predictive performance as good as, or exceeding that of human experts, they are prone to making errors that no human would make and that defy common sense (Oakden-Rayner et al., 2019). This erodes the value of the model for real-world decision making. Model interpretability is often likened to the ability of humans to understand how a model works. Lipton (2018) categorises the techniques used to render models interpretable into two categories, transparency and post hoc explanations. Transparency comprises simulatability, decomposability and algorithmic transparency. Explanations provide justification for model predictions, irrespective of whether the model is transparent. Where trust, causal reasoning, transferability, informativeness, fair and ethical decision making are necessary, model interpretability is considered important (Lipton, 2018).
4.3.4. Protecting private information
Alongside explainability, safe-guarding data confidentiality is critical, amongst other reasons to comply with regulatory frameworks such as the General Data Protection Regulation in Europe. Reducing the granularity of data representation is a typical approach to preserving privacy, but comes at the cost of some loss of algorithmic effectiveness (Aggarwal and Yu, 2008). Differential privacy can provide privacy guarantees and has become one of the leading approaches to ensuring private computation (Williams and McSherry, 2010). However, by introducing noise into computations, differential privacy presents a trade-off between accuracy and privacy during inference (Stoica et al., 2017). Furthermore, differential privacy assumes independent and identically distributed data, which does not hold true when data are temporaly correlated (Cao et al., 2019), as is the case with timeseries. While user-level privacy is not affected in this case, event-level privacy (i.e. privacy related to each time point) may deteriorate over time.
4.4. Software Engineering for Machine Learning
4.4.1. System synergy
Sculley et al. (2015)
analyse the hidden technical debt that arises when teams move fast to develop machine learning products without considering the long term maintainability of the systems they produce. The actual machine learning code, that is the code responsible for training and inference, is only a small component in the greater system which includes configuration, data collection, data verification, feature extraction, machine resource management, analysis tools, process management tools, serving infrastructure and monitoring. Within this complex setup of code and components, it is difficult to enforce strong abstraction boundaries and models become entangled. This creates scenarios where oftentimes hidden dependencies exist between components. Small changes cascade down the entire user chain and can have unintended and unnoticed consequences. Data dependencies, feedback loops and configuration code create further dependencies that production teams must manage.
4.4.2. Data and model management
Since the study done by Sculley et al. (2015) at Google, researchers and software engineers at other companies have documented and expanded on the software engineering challenges of machine learning systems. Lwakatare et al. (2019) interviewed 12 experts, from various companies in different domains, with experience in software engineering for machine learning. From the experiences described by the interviewees, the authors created a taxonomy of challenges experienced in different phases of the machine learning lifecycle (i.e. dataset assembly, model creation, model training and evaluation, model deployment) based on the maturity of the machine learning system within its commercial application (i.e. prototyping, non-critical deployment, critical deployment, cascading deployment). In a study conducted at Microsoft, Amershi et al. (2019)
interviewed 14 software engineers, with different levels of experience and on different teams, involved within the company’s artificial intelligence ecosystem. They found that the top challenge experienced across respondents, irrespective of experience level and team, was data discovery and management. Tracking and versioning model input data as projects grow is particularly challenging, as datasets are often taken from different schemas. Convenient tools that (automatically) codify the knowledge of individual engineers that gather and process data are necessary to do this. However, heterogeneity in coding languages, technical skill levels of end users and application use cases pose challenges to a one-size-fits-all approach(Schelter et al., 2018).
4.4.3. Model boundaries and limitations
Models are not modular in the way that software is. Due to dependencies, individual models are not extensible and multiple models interact in non-obvious ways. Customisation and reuse present further challenges. Taking a step back, even defining a model is difficult (Schelter et al., 2018). In the most narrow sense, a model consists of the algorithm that specifies the machine learning task and the parameters obtained after training. However, input data must be transformed into features, and machine learning pipelines can be used to combine and track the combination of feature transformations and parameters (Meng et al., 2016). Still, the model depends on the data it was trained on, as well as the assumptions of the underlying data distribution. Capturing and managing these implicit assumptions is difficult. While a model may be simple to reuse in the same domain, it can require significant changes when used in a different context. Defining a model becomes even more difficult when considering an ensemble of models, or meta-models such as neural architecture searches. While systems for extracting and managing model metadata have been developed (Schelter et al., 2017), there exists no declarative abstraction for the whole machine learning pipeline (i.e. the machine learning equivalent of SQL) (Schelter et al., 2018).
4.4.4. Designing for change
Models evolve as data changes, methods improve or software dependencies change, thus making ongoing model validation critical (Schelter et al., 2018)
. However, comparing model performance is challenging. Training and evaluation must happen on the same data, while avoiding overfitting. For data like timeseries, that are not independent and identically distributed, standard train/validation/test splits are invalid. The same code must be used to compute evaluation metrics throughout and evaluations must track the information on which they depend. Still, in complex, long-running experiments it is difficult to determine the exact reason for performance changes and to preserve backward compatibility of trained models. Detecting change in data and deciding when to retrain a model is necessary to retain model integrity over time. Keeping humans in the loop is important for auditing predictions when training data is noisy, but brings its own challenges.
5. Scaling and Distributing Machine Learning Systems
In the traditional cloud architecture presented in Section 3.3, individual sensor and actuator nodes send and receive data (via gateways) to the cloud, which functions as a central server that performs all the computing tasks. Network topologies that follow this centralized paradigm treat devices merely as data collection objects and edge nodes, typically gateways or small servers, as transmission objects that provide data for model training and inference in the cloud. The cloud provides the advantage of elastic computing resources but presents challenges to scaling machine learning in the IoT, as the communication links that transfer data from devices to the cloud can be a major bottleneck (Shi et al., 2016).
Decentralized and distributed structures are two alternative network architectures (Baran, 1962) that have been proposed for networked control systems (Ge et al., 2017) and communication in the IoT (Minerva et al., 2015), making distributed computing a logical consideration. Distributed machine learning is already an active area of research for scaling machine learning in the cloud, and holds similar relevance for distributing machine learning in the IoT. Edge computing is emerging as an area of research to facilitate distributed machine learning across the cloud, edge and devices. Table 4 summarises the cloud challenges, considerations for distributed computing and for edge computing, to scale and distribute machine learning in the IoT. They are discussed in greater detail in the remainder of this section.
|Cloud challenges||Distributed computing||Edge computing|
|Data transfer & intermittency||Parallelisation||Digital devices for smart services|
|Privacy||Synchronization||On-device storage & processing|
|Network security||System architectures||Wireless communication networks|
|System optimization||Edge architectures|
5.1. The Cloud is not Enough
In 2018 an estimated 17.8 billion connected devices were reported in use (Lasse Lueth, 2018). This number is projected to almost double by 2025. The bandwidth requirements to transfer data from and to devices to enable centralized machine learning systems are enormous (Zhang et al., 2015). Processing data locally, and discarding it immediately where possible, would reduce the data transfer burden. In mobile and multi-media IoT applications where deep learning has become essential for perception related tasks such as language, speech and vision services, distributing machine learning workloads is already an active area of research (Li et al., 2018)(Chen and Ran, 2019)(Zhou et al., 2019). Similarly, in transport applications vehicular edge computing is emerging as an important research area (Raza et al., 2019). Generally, when applications require low latency, user privacy and uninterrupted service, a centralized cloud-only approach is insufficient due to intermittency, durability and security challenges introduced by data transfer.
5.1.1. Data transfer and intermittency
Wireless data transfer relies on messages passing data between sensors and the cloud, where the data is processed as input to machine learning systems. The volume and flow size of network traffic is constrained by the bandwidth of the communication system. The quality of service that can be provided is affected by latencies and system downtime. High data traffic volumes can incur significant financial costs (Shi et al., 2016). Devices that predominantly upload or download control traffic, for example home automation sensors, work appliances, health and wearable devices, account for a smaller portion of traffic volume (Mazhar and Shafiq, 2020) and are more likely to be affected by latency than bandwidth constraints. Applications that upload or download media content, like smart cameras, game consoles and smart TVs, place a greater burden on traffic volumes and are much more affected by bandwidth constraints. Historically, broadband networks have more downstream bandwidth than upstream bandwidth. At scale, content-heavy IoT applications are likely to saturate the upload bandwidth (Zhang et al., 2015), creating data transfer bottlenecks.
In addition to bandwidth induced bottlenecks, centralized, cloud-based IoT machine learning systems will experience unpredictable latencies due to sensing, wireless transmission, gateway processing, internet delivery and cloud processing (Zhang et al., 2015). Adhoc latencies can be the result of normal packet-to-packet variations from network traffic routing, operator error, software error or denial of service attacks. At present Internet downtimes are common (Grover et al., 2013). Web users typically tolerate them, but in the IoT the temporary unavailability of sensors or actuators will directly impact the physical world. While service level agreements from cloud providers can provide compensation for poor service quality (Dhirani et al., 2018), a single cloud provider may not be able to provide the required service guarantees (Atlam et al., 2017).
Cloud-based machine learning applications cede control of the flow of information of billions of connected devices to centralized platforms, creating a threat to privacy in the process (Roman et al., 2013). Aspects of privacy that require consideration are anonymity, control and trust. Privacy leakage of user information pertaining to data values, location and usage compromises the anonymity of data providers (Alrawais et al., 2017). Location-based services can infer device location based on communication patterns, while usage data can reveal sensitive temporal activity patterns (Mazhar and Shafiq, 2020). For example, home occupancy and fine-grained appliance usage can be inferred from electric smart meter data (Zhang et al., 2018). Unprotected usage and location data provide opportunity for stalking from entities that profile and track users without their consent. Liang et al. (2018) show how convolutional neural networks can be used for human activity recognition from sensor data, like tap positions on the screen of a mobile phone, to infer private information, like passwords, without requiring user consent and without the user’s knowledge.
Providing users with the sense of being in control rather than being controlled by an unknown external entity is necessary to prevent the notion that the IoT erodes privacy. Tools that allow users and data generators to retain their anonymity by controlling the granularity and location of the data they produce and share become important elements to provide access control and ascertain privacy (Roman et al., 2013). While devices in centralized IoT systems can be configured to share or hide particular data streams, the type of services a system provides may be limited based on the amount of data it receives. The privacy of data collections that are analysed at a later stage or archived must also be considered. As the cloud is out of users’ control, there is no effective way of verifying that data has been completely destroyed (Zhang et al., 2015), even if a user revokes access.
In cloud-based systems, both data providers and information consumers must completely trust the central entity. Roman et al. (2013) consider two dimensions of trust. Firstly, IoT systems require trust between all collaborating entities in their current and future interactions. Secondly, they require system level trust, so that the user does not feel subjected to external control. If users trust the central entity, encryption can protect data privacy during transfer. However, if the central entity does not hold the secret key required for decryption, it acts either only as a storage service, or advanced cryptographic mechanisms are required to do computations on encrypted data (Bost et al., 2015). Resource constrained connected devices may lack the ability to encrypt and decrypt generated data (Alrawais et al., 2017), or provision the necessary computing power and energy for cryptographic computation (Yan et al., 2014).
5.1.3. Network security
Sending data over public communication infrastructure like the Internet poses security risks (Roman et al., 2013). Broadly speaking, network layer security risks aim to prevent data from reaching its destination, steal data on devices and in transit, or hijack data and devices to gain control over system components. Exhaustion attacks such as Denial of Service (DoS) flood network resources with redundant requests. In the IoT such attacks can be initiated both from within or from outside the system. Due to the pervasive nature of devices, DoS can also be achieved by physically damaging or destroying devices. Node capture and storage attacks extract and alter user information stored on devices or the cloud, while eavesdropping extracts information from data traffic and Man-in-The-Middle attacks intercept and alter messages in transit. Exploit attacks steal information on a system to gain control of it (Burhan et al., 2018). Other types of attacks may also be launched to gain partial or full system control, with the impact of an attack being determined by the importance of the data managed and services rendered by an entity (Roman et al., 2013). Network security issues are not unique to cloud-based IoT systems. While distributing machine learning can reduce vulnerabilities linked to the centralization of resources and information flow, it presents new opportunities for attack, as distributed system components are more difficult to protect.
5.2. Distributed Computing Considerations
Distributed machine learning applies high performance computing theory to optimize machine learning training and inference across cloud servers (Verbraeken et al., 2019). For large scale distributed machine learning, the impetus is usually to distribute model training, as it is computationally expensive (Chahal et al., 2020). Many of the advances have been driven by research in deep learning, where the need to scale data processing has been pressing (Bengio, 2013). Key questions when distributing computations are:
Which system components can be executed concurrently?
When should the outputs of concurrent operations be synchronized?
How should concurrent operations be distributed?
How can system performance be optimized?
How does the system recover when computations are interrupted?
Efficiencies in distributed machine learning are gained by executing computational tasks concurrently. Parallelisation considers which system components can be executed at the same time. Common strategies are data, model and pipeline parallelism, as well as hybrids that combine different approaches (Ben-Nun and Hoefler, 2019). Data parallelism partitions the input samples into minibatches and distributes the batches across worker nodes, which apply the same model to different datasets. Once trained, the model parameters of each worker need to be synchronized. This approach works well for compute-intensive operations with few parameters, but is limited by synchronization requirements when the number of parameters is large (Mayer and Jacobsen, 2019). Increasing the batch size can alleviate synchronization challenges, but reduces model convergence. Assumptions for data parallel training are that the data is independent and identically distributed (i.i.d.), and that the model fits onto a single device. In the IoT both of these can pose a challenge. Model parallelism, in contrast, partitions the model by the architectural structure. This reduces the memory storage required for each worker, allowing for larger models to be used. Training data is passed to the model input layer, and then transferred to different workers that execute different parts of the model. Finding the optimal way of splitting models is hard (Mayer et al., 2017) and model partitioning does not necessarily reduce the training time. Passing model outputs between workers also introduces communication overheads. Pipeline parallelism like Huang et al. (2019)’s GPipe combines data and model parallelism. Thus, instead of waiting for all the data to pass through each partition of a split model, the data is also partitioned into micro-batches. Once a worker has computed the outputs for its model partition, the micro-batch is immediately propagated to the next worker. Successful examples of hybrid parallelism are Project Adam (Chilimbi et al., 2014) and DistBelief (Dean et al., 2012).
Where parallelisation distributes tasks, synchronization reassembles the outputs of the tasks. An important consideration is thus when to synchronize the outputs of concurrent operations and how to manage dependencies between tasks. Common strategies that are applied are synchronous, asynchronous and bounded asynchronous approaches (Mayer and Jacobsen, 2019). Synchronous approaches gather and aggregate updates from workers after each iteration. A worker can only start a new iteration after the newly aggregated global update has been received. This ensures that models are always in sync, but introduces a synchronization barrier and straggler problem (Chahal et al., 2020)
where convergence can take a long time due to the time spent waiting for the slowest workers. This becomes particularly problematic with slow network connections and when failures are frequent. In asynchronous approaches, models are updated independently of each other. This gives workers great flexibility and removes the straggler problem, but introduces a new challenge of staleness as workers may be computing updates that lag behind the global model. This results in slower convergence and reduced training performance. Bounded asynchronous approaches aim to find a middle ground by leveraging the approximate nature of machine learning models to allow workers a degree of freedom that is only curbed when a model becomes too stale(Cipar et al., 2013).
5.2.3. System architectures
Machine learning system architectures determine how to distribute computing operations that can occur concurrently. The choice of architecture depends on computation and communication requirements and constraints, such as the desired fault tolerance, bandwidth, communication latency and, in the case of deep neural networks, the network topology and parameter update frequency (Ben-Nun and Hoefler, 2019). The key architectural objectives are firstly to provide scalability to allow a large number of parallel workers to regularly compute, send and receive model updates. Secondly, the system should be easy to configure and thirdly, it should optimally exploit existing lower-level primitives for tasks such as communication (Mayer and Jacobsen, 2019).
Centralized, decentralized and distributed systems are standard topologies, defined by the degree of distribution that the system implements. The way that the system boundary is defined can also affect the classification of the topology. Verbraeken et al. (2019) consider a centralized system as one that employs a strictly hierarchical approach to aggregate learned parameters in a single location. From this lense, decentralized training performs intermediate aggregations that are either broadcast to all nodes, or sent as updated model partitions to multiple parameter servers. More typically, parameter servers are considered a centralized architecture, as they maintain a central view on the state of the model (Mayer and Jacobsen, 2019)(Ben-Nun and Hoefler, 2019). In a decentralized architecture, nodes exchange parameters directly. The effectiveness of the decentralized architecture depends on the communication patterns between worker nodes. In a fully-connected allreduce structure, each worker communicates with all other workers. Ring topologies on the other hand only require communication updates between neighbours. While this reduces the communication overhead, it takes longer to propagate the updated parameters to all workers.
Conventional knowledge views decentralized algorithms as a compromise, however Lian et al. (2017) show that it is possible for decentralized algorithms to outperform centralized ones. Decentralized architectures have the further advantages that they do not require the implementation, resource allocation and tuning of a parameter server and that fault tolerance can be achieved more easily, because there is no single point of failure (Mayer and Jacobsen, 2019). A potential limitation of decentralized approaches is the cost of synchronization. Both centralized and decentralized approaches applied in distributed machine learning assume balanced, i.i.d. training data, and a network with homogeneous, high bandwidth (Mayer and Jacobsen, 2019). In the IoT these assumptions generally do not hold.
5.2.4. System optimization
The primary objective of distributed machine learning is to minimise the time required to execute computing tasks (Mayer et al., 2017). While parallelisation and distributed architectures increase the available computing resources, applying them naively can harm, rather than improve system performance (Yan et al., 2016). For model training, the system optimization, resource allocation and scheduling challenges essentially are concerned with determining how to partition a model, where to place model parts and when to train which part of the model (Mayer and Jacobsen, 2019) in the shortest possible time, fully utilising available computing resources. Deep learning tasks in particular have the challenges that frameworks are typically not designed with dependability in mind, that training tasks are long running, relying heavily on GPUs that generate excessive heat and that burden communication networks in data centers (Jayaram et al., 2019). When data transfer exceeds available bandwidth, communication becomes the bottleneck in the overall training process and compute resources are underutilised (Mayer and Jacobsen, 2019). Efficiency gains from additional computing power thus do not translate to the desired reduction in training time. Communication strategies can be designed for continuous communication to manage bursts, to avoid message overlaps that exceed bandwidth capacity, or to prioritise specific messages over others. To optimize system performance, scheduling and synchronizing communication with computing operations across servers (Chahal et al., 2020), and scheduling and optimally allocating computing tasks to processing resources (Mayer et al., 2017) is necessary.
Managing the complexity of large-scale, distributed machine learning systems is challenging, as it requires tuning both system-level configurations and machine learning parameters (Carreira et al., 2019). This is frequently done manually. Automatically mapping tasks to hardware resources, scheduling and balancing workloads and determining the task execution order is thus important. Recent efforts have investigated adaptive, dynamic load balancing (Li et al., 2019), optimal resource allocation and dynamic scheduling (Yan et al., 2016), and automated, dependence-aware scheduling (Wei et al., 2019) to improve training speed and system response to varying loads. Further meta-optimizations that can be automated to improve model and system performance are parameter search, hyper-parameter search and neural architecture search (Ben-Nun and Hoefler, 2019).
5.2.5. Fault tolerance
categorise failures in deep learning training workloads as being infrastructure, AI engine and user related. Based on their study, the most significant error types are: incorrect or unreadable inputs, inability to create a model checkpoint, network connection errors, semantic errors due to library version mismatch or dependencies, CPU running out of memory, violation of memory access and core dumps. Despite the frequency of failures, many existing machine learning frameworks, like Caffe2, Horovod, Pytorch and PaddlePaddle, do not implement approaches for fault tolerance to handle process and hardware errors(Amatya et al., 2017)(Verbraeken et al., 2019). Instead, users need to include checkpoints in their code from which the system can resume after failure (Jayaram et al., 2019). Design choices can however be made to build fault tolerant systems either by reactive or proactive means. Reactive approaches, like checkpointing, replication and logging, respond to a failure. Proactive approaches employ pre-emptive measures or detect failure patterns, like predictive fault tolerance which mitigates failures by observing related errors.
Different reactive strategies for building fault tolerant deep learning systems have been discussed in the literature (Amatya et al., 2017). In checkpoint-restart applications, the state of the data and computation are saved periodically so that systems can be restarted from their latest checkpoint after failure. Detect-resume models redistribute the workload from a failed process to the remaining nodes to enforce continued execution. Re-spawning creates new workers after failure. User-level fault mitigation can fix broken communicators on the fly and is well suited to recover messaging processes from faults if they do not carry much process-specific state information. Reinit on the other hand reinitialises the message passing interface (MPI) automatically if a fault is detected, and is suitable when the code complexity of recovery operations is high. Proactive strategies can be adopted as part of the software engineering lifecycle. User errors in code or configuration, and erroneous data formats are responsible for a large number of job failures (Jeon et al., 2019). Simple approaches for preventing errors are syntax checking, pre-running jobs and improved failure handling by performing a schema check.
Generally, decentralized and asynchronous systems are considered to be less failure-prone. In decentralized systems there is no single point of failure (Mayer and Jacobsen, 2019). Asynchronous systems do not suffer from synchronization barriers and are designed to tolerate stragglers and worker failure (Verbraeken et al., 2019). DistBelief (Dean et al., 2012)
is an asynchronous system with a sharded parameter server, that presents one of the earliest fault tolerant deep learning implementations. It combines data and model parallelism, and uses asynchronicity on two levels to achieve redundancy by running both the parameter server shards and the model replicas independently. DistBelief has subsequently been replaced by Tensorflow(Abadi et al., 2016). Fault tolerance may come at some performance cost, but this may be a worthwhile trade-off for a dependable and cost-effective execution environment (Jayaram et al., 2019).
5.3. Extending Machine Learning to the Edge
In the IoT, digital technologies that are located at the periphery of the Internet are called the edge. Edge technologies vary in processing capabilities and connectivity (Samie et al., 2016). At the lowest level, sensing and actuator devices observe and control the environment. Their embedded microcontrollers can be exploited for computation, but processing and memory resources are scarce and power consumption is severely restricted, especially if devices are battery operated (Farella et al., 2017). Gateways typically have more computational power than devices and are used to settle the heterogeneity between diverse protocols of different networks and the Internet. Smartphones can act as gateways, and gateways can be used to perform data processing. Finally, fogs are servers that extend the cloud computing paradigm (Samie et al., 2016) by distributing computation, communication, control and storage closer to the end users (Chiang and Zhang, 2016). Shi et al. (2016) define edge computing as the enabling technologies that allow computation to be performed at the edge of the network, including any computing and network resources along the path between data sources and cloud data centers.
Distributed machine learning achieves scale by combining the localised computational power of many processors. In the IoT, edge computing can extend the computing power of the cloud to the observation endpoints (Sittón-Candanedo et al., 2019), thus creating a geographically distributed network of processors that can be used for model training and inference. While local model training and inference reduce data transfer challenges, they introduce constraints due to the low computing power and energy requirements of devices. Deep learning applications for mobile phones and wearables have been driving the development of more efficient and lightweight machine learning approaches, as on-device memory and processing power, energy and bandwidth are testing the limits of complex, large deep learning models in low resource environments (Chen and Ran, 2019)(Zhang et al., 2019)(Tang et al., 2017).
5.3.1. Digital devices for smart services
Devices collect data to enable services that serve the objectives of application domains. For example, assisted living applications can use wearable devices to recognise human activities and provide services to infer abnormal behaviour or detect emergency situations (Bianchi et al., 2019). Four different configurations can be used to map devices to services (Samie et al., 2016). A one-to-one relationship maps a single device to a single service. One-to-many maps a single device to multiple services, thus sharing device resources between applications. In many-to-one configurations multiple devices serve a single service. This setup introduces high communication between devices and a large amount of redundancy. Finally, in many-to-many relationships multiple devices serve multiple services. While shared devices reduce the cost of hardware and maintenance, they introduce new challenges of timing, resource constraints and allocation. When many devices operate in close proximity, interference can affect data transmission, which may increase the energy consumption of devices, reduce the service quality and delay real-time applications. It is necessary to consider the devices-to-service relationship in the scheduling of training jobs. Single-tenant scheduling provisions the resources for a single training job, while multi-tenant scheduling requires resource schedulers to allocate multiple training jobs over shared resources (Mayer and Jacobsen, 2019).
5.3.2. On-device data storage and processing
The objective of an application determines the quality and rate at which data must be acquired and transferred by devices to deliver a desired service. Devices acquire data by sensing the physical environment, process data, execute control actions, store and transmit data to and from upstream technologies and communicate with devices around them (Henkel et al., 2017). All these operations require processing power and consume energy, meaning that the processing and connectivity capabilities of devices determine the smart services that can be delivered. In general, both computation and communication present trade-offs between quality of service and energy consumption constraints. The device capabilities are determined by the hardware and software technologies included in the device (Samie et al., 2016). For typical IoT devices, processing (Farella et al., 2017), memory (Yao et al., 2018b) and power (Henkel et al., 2017) are limited, especially for battery-powered devices. The instantaneous power consumed by a device depends on the underlying hardware technology, voltage, frequency and power settings (like active, sleep and off states) and the efficient execution of software implementations. The quality of collected data depends on the resolution and sampling rate of the sensor (Samie et al., 2016). The data generation rate varies based on the type of sensor, and thus type of data that is generated. Figure 2 shows the bit per second data generation rate for different categories of sensors. The power consumption of a sensor increases with increased data generation rates, but also with increased quality requirements. Error detection, encryption and transmission further increase energy consumption. Energy efficiency is a prerequisite for enabling further processing, like machine learning, on devices (Farella et al., 2017) and approaches for profiling, reducing, scheduling and optimizing data flows and energy consumption are required. Due to the heterogeneity of devices and machine learning frameworks, portability of models (Verbraeken et al., 2019) and performance benchmarks of machine learning systems in the IoT are a challenge and an emerging area of research (Banbury et al., 2020).
5.3.3. Wireless communication networks
Data center settings offer reliable, free and high throughput communication links for large-scale, distributed machine learning systems. To distribute machine learning in the IoT, devices need to perform computations locally, or offload computations onto edge and cloud servers. Offloading offers access to greater computing power at higher levels, but presents a trade-off against communication cost, uncertainty and latency (Samie et al., 2019), which impedes the realisation of real-time applications (Tang et al., 2019). Moreover, wireless communication links are lossy and noisy (Al-Fuqaha et al., 2015), resulting in variability in latency that affects the quality of supply and presents a risk of completely loosing connectivity (Abdelzaher et al., 2020). The data generation rate of devices determines the required bandwidth and the technologies that are suitable for delivering connectivity. The choice of wireless communication technology constrains the range over which, and rate at which data can be transferred (Samie et al., 2016). Bandwidth and latency determine the data throughput of the communication network, which must meet the requirements of the smart service.
Wireless communication technologies connect devices either as a local network, or they connect individual devices and local networks to the Internet (Samie et al., 2016). Common topologies in communication networks are star, peer-to-peer and cluster-tree topologies (Al-Fuqaha et al., 2015). Data transfer is constrained by low power requirements of devices. In addition to the network topology, routing schemes significantly impact the energy efficiency, scalability, latency and adaptivity of wireless networks in the IoT. Data compression, data fusion and approaches for minimising communication cost have been studied in routing schemes to improve communication performance and reduce on-device energy consumption (Luo et al., 2006). Besides the data path, the timing of data transfer matters for managing bottlenecks and the power consumption of devices. Timing schemes can accommodate continuous, sporadic and on-demand data transfer. On-demand timing can either happen on user request, or when an event is detected. Key challenges for machine learning systems are how to reduce the amount of data to be transferred over the wireless network and how to handle the uncertainty and unpredictability introduced by wireless communication (Samie et al., 2016). Furthermore, on-device processing and computation offloading present a trade-off, raising the questions of when to switch between local and remote processing and what that decision should be based on (Abdelzaher et al., 2020).
5.3.4. System architectures for edge computing
Data processing, model training and inference in the IoT can be device, gateway, fog or cloud centric (Samie et al., 2016). As discussed, the key question for device centric computation is whether and when to offload computation, which is more challenging if the decision must be made at runtime. Gateway centric computation requires wireless communication, which introduces unpredictable latencies that affect availability and quality of service as devices transmitting data to the gateway increase. Fogs provide greater computational power than devices and gateways, and less latency than transmitting data to the cloud. Finally, cloud centric approaches deliver a high quality of service, offer unlimited storage and processing resources, but come with communication overheads. In the edge computing context, cloud-only computation is considered to be centralized, whereas decentralized architectures implement local training and inference on fogs, gateways or devices (Zhou et al., 2019). Fore ease of reference this section refers to gateways and fogs as edge servers.
Hybrid architectures that distribute training and inference across the cloud, edge and devices are an active area of research (Chen and Ran, 2019). Figure 3 presents example architectures that distribute model training and inference across the cloud, edge and devices in different configurations. The architectures are named by training-inference location. From top left to bottom right both processing power and communication overheads decrease, as inference moves from the cloud to devices, and training is distributed to the edge. Figure 3a) is a typical cloud-only architecture. In Figure 3b), the simplest hybrid scenario, a cloud-edge architecture trains models in the cloud and downloads trained models onto edge servers which perform local inference with reduced latency, on data offloaded from nearby devices. Baharani et al. (2019) use this type of architecture to perform model training in the cloud and inference on an edge server. By reducing the data and energy footprint required for inference, computation can be moved downstream onto devices with reduced processing capacity, thus further reducing latency (Tang et al., 2017)(Yao et al., 2018b). The typical architecture for mobile inference and wearables is show in Figure 3c). For sensitive data with privacy concerns and for applications where communication costs are prohibitive, federated learning offers strategies for distributing training across edge servers and the cloud (Kairouz et al., 2019). Figure 3e) is the common setup for federated learning with training distributed between the cloud and some mobile devices. Architectures d) and e) introduce additional complexities due to parallelisation and synchronization requirements. A fully distributed architecture as shown in Figure 3f) is only possible if an application has no requirements for real-time, remote monitoring.
The main challenge with distributing computing across the IoT is to find the optimal balance between local processing and computation offloading, and to find the best timing for offloading, while taking the constraints posed by heterogeneous devices and communication technologies into consideration (Samie et al., 2016). The cost, latency and variability of wireless data transfer must be considered in relation to the frequency and size of training updates, privacy and real-time requirements when designing an architecture.
6. Emerging Trends
Smart service applications in the IoT consist of interdependent machine learning, communication and computing technologies with cross-cutting concerns. Figure 4 consolidates the challenges and considerations for developing intelligent applications in the IoT, as discussed in the preceding sections. At the highest level, the application objective determines system requirements for a smart service. Based on the objective, an application may consist of a single or multiple devices, which again can be specific to one or several applications. On the lowest level the device hardware – the sensors and actuators, the processor, memory, connectivity protocols, communication network and the power sources – constrain device capabilities. The software implementations on devices manage data acquisition, processing, storage, transfer and control actions. Achieving synergy between hardware and software implementations across heterogeneous devices is important for high system performance. Depending on the device capabilities and the rate of data generation, data processing, training and inference can be done locally on devices, or offloaded to more powerful upstream servers. Offloading requires data transfer over wireless communication networks. The efficiency and quality of offloading depends on message timing, the communication network topology and throughput, which is strongly dependent on network bandwidth.
To distribute computations across devices, edge nodes and cloud servers, workloads must be parallelized and synchronized. These processes are closely tied to the system architecture and are further affected by the quality and latency of data transfer. Improved scheduling, resource allocation and communication strategies must be considered alongside computation requirements for system optimization. Network interruptions are bound to affect computation and fault tolerance is a necessary consideration. Finally, the machine learning system requires considerations for model training and inference, alongside mechanisms to ascertain data provenance and sound software engineering practices. The class of learning problem is determined by the characteristics of the data distribution, and the application objective.
|ML Sys.||Comp. Layer||Challenge addressed||Work||Sec.||Approach||Year|
|DP||C||dirty data||DataWig (Bießmann et al., 2019)||imputation software for missing values||2019|
|DP||C||data errors||Breck et al. (2019)||proactive code validation||2019|
|T||C||training data availability||Koller et al. (2019)||6.2||multi-stream sequence constraints for weak supervision||2019|
|T||C||training data availability||Osprey (Bringer et al., 2019)||6.2||high level interface for weak supervision over imbalanced data||2019|
|T||C||training data availability||Ratner et al. (2016)||6.2||data programming with labeling functions||2016|
|T||C||learning under uncertainty||BlinkML (Park et al., 2019)||quality assurance: bounded approximate models||2019|
|T||C||learning under uncertainty||DeepSense (Yao et al., 2016)||deep learning framework for noisy & feature customization||2016|
|T||C||quality assurance||CCU (Meinke and Hein, 2020)||mathematical guarantees for out-of-distribution detection||2020|
|T||C||resources vs performance||Yuan et al. (2020)||hybrid parallelism with independent subnets||2020|
|T||C||resources vs performance||Dziedzic et al. (2019)||resource usage control with band-limited training||2019|
|T||C||resources vs performance||Pan et al. (2017)||data parallelism with backup workers||2017|
|T||C||resources vs performance||gossiping SGD (Jin et al., 2016)||decentralized, asynchronous data parallelism||2016|
|T||C||resources vs performance||SqueezeNet (Iandola et al., 2016)||6.1||model pruning & compression for improved training efficiency||2016|
|T||C||protecting private info.||PPAN (Tripathy et al., 2019)||data release with data-driven, optimal privacy-utility tradeoff||2019|
|T||C||protecting private info.||v.d. Hoeven (2019)||6.3||local differential privacy with user-defined guarantees||2019|
|T||C||training malicious models||AggregaThor (Damaskinos et al., 2019)||robust gradient aggregation for Byzantine resilience||2019|
|T||C||system optimization||Gandivafair (Chaudhary et al., 2020)||6.2||gang-aware scheduling with load balancing & GPU trading||2020|
|T||C||system optimization||AlloX (Le et al., 2020)||6.2||min-cost bipartite matching for job scheduling||2020|
|T||C||system optimization||Xie et al. (2019)||6.2||randomly wired neural nets from stochastic network generators||2019|
|T||C||system optimization||Cirrus (Carreira et al., 2019)||serverless computing||2019|
|T||C||system optimization||Zoph and Le (2017)||6.2||neural architecture search||2017|
|T||C||system optimization||Omnivore (Hadjis et al., 2016)||6.2||centralized, asynchronous data parallelism||2016|
|T||C||context adaptation||Li and Hoiem (2018)||learning without forgetting, using only new data||2018|
|T||C/D||system architectures||Koloskova et al. (2019)||decentralized training with gradient compression||2019|
|T||C/D||protecting private info.||FedProx (Li et al., 2020)||6.3||FedAvg permitting partial work||2020|
|T||C/D||protecting private info.||Seif et al. (2020)||6.3||wireless federated learning with local differential privacy||2020|
|T||C/D||protecting private info.||Truex et al. (2019)||6.3||federated learning, diff. privacy & secure multiparty comput.||2019|
|T||C/D||protecting private info.||FURL (Bui et al., 2019)||6.3||locally trained user representations in a federated mobile setup||2019|
|T||C/D||protecting private info.||FedAvg (Brendan McMahan et al., 2017)||6.3||synchronous, data parallel training on mobile devices||2017|
|T||C/D||synchronization||ADSP (Hu et al., 2019)||online search to optimize parameter update rate||2019|
|T||C/D||context adaptation||RILOD (Li et al., 2019)||incremental learning for object detection||2019|
|T||E/D||context adaptation||DeepCham (Li et al., 2016)||context-aware adaptation learning||2016|
|I||C||system optimization||Willump (Kraft et al., 2020)||feature computation compiler optimization||2020|
|I||C||efficient inference||Clipper (Crankshaw et al., 2017)||prediction serving with caching, batching & model selection||2017|
|I||C||efficient inference||SERF (Yan et al., 2016)||interference-aware, queuing-based scheduler||2016|
|I||C||protecting private info.||Dwork et al. (2016)||6.3||differential privacy - calibrating noise to sensitivity||2016|
|I||C||protecting private info.||Bost et al. (2015)||6.3||classification over encrypted data||2015|
|I||C/D||protecting private info.||RAE (Malekzadeh et al., 2018b)||6.3||AutoEncoder for feature-based replacement of sensitive data||2018|
|I||C/D||protecting private info.||GEN (Malekzadeh et al., 2018a)||framework to trade-off app utility & user privacy||2018|
|I||C/D||throughput||FilterForward (Canel et al., 2019)||application specific microclassification to reduce data transfer||2019|
|I||D||efficient inference||DeepMon (Huynh et al., 2017)||computation offloading onto mobile GPUs||2017|
|I||D||on-device porc. & stor.||TinyML (Banbury et al., 2020)||6.1||performance benchmark for ultra-low-power devices||2020|
|I||D||on-device proc. & stor.||ShrinkBench (Blalock et al., 2020)||6.1||standardized evaluation of pruned neural networks||2020|
|I||D||on-device proc. & stor.||Rusci et al. (2020)||6.1||rule-based, iterative mixed-precision model compression||2020|
|I||D||on-device proc. & stor.||AMC (He et al., 2018)||6.1||reinforcement learning for model compression||2018|
|I||D||on-device proc. & stor.||DeepEye (Mathur et al., 2017)||caching & model compression||2017|
|I||D||hard/software co-design||Serenity (Ahn et al., 2020)||6.1||dynamic prog. for memory-optimized compiler schedules||2020|
|I||D||hard/software co-design||SkyNet (Zhang et al., 2020)||hardware-aware neural architecture search||2020|
|I||D||hard/software co-design||HAQ (Wang et al., 2019b)||6.1||hardware-aware, automated mixed-precision quantization||2019|
|I||D||hard/software co-design||DeepX (Lane et al., 2016)||6.1||automatic model decomposition & runtime compression||2016|
|SE||C||lifecycle management||Kang et al. (2020)||6.2||model assertions to monitor, validate & continuously improve||2020|
|SE||C||lifecycle management||ease.ml/ci (Renggli et al., 2019)||6.2||continuous integration for machine learning||2019|
|SE||D||hard/software co-design||TVM (Chen et al., 2018)||6.1||deep learning workload compilation & optimization||2018|
|SE||D||hard/software co-design||CMSIS-NN (Lai and Suda, 2018)||6.1||library of software kernels for neural nets on Cortex-M cores||2018|
|SE||D||on-device proc. & stor.||PoET-BiN (Chidambaram et al., 2020)||6.1||
look-up tables for tiny binary neurons
|SE||D||on-device proc. & stor.||Riptide (Fromm et al., 2020)||6.1||
optimized neural network binarization
Machine Learning System (ML Sys.): Data Provenance (DP), Training (T), Inference (I), Software Engineering (SE)
Computing Layer (Comp. Layer): Cloud (C), Cloud/Device (C/D), Edge/Device (E/D), Device (D)
Based on the design considerations in Figure 4, Table 5 lists emerging research that cuts across challenges and considerations. Figure 4 can be used to systematically discover emerging research at the intersection of technology layers and design considerations to expand Table 5, which is open to further additions. The first column in the table specifies the machine learning system component that is targeted in the research. This can be data provenance, training, inference or software engineering. The second column specifies the compute location where machine learning components are executed. This can be the cloud, edge, device or distributed between locations. The third column maps to Figure 4 and specifies the design consideration or challenge addressed in the work. The second last column gives a short summary of the approach that has been taken to address the challenge. The remainder of this section highlights three emerging research areas that have particular significance for machine learning systems in the IoT: on-device inference, automation and optimization, and privacy-preserving machine learning.
6.1. On-device inference
Most of the recent technology developments have been focused on deep learning workloads. While training deep learning models continues to be done on the cloud, on-device inference is a major trend for latency and privacy sensitive settings. Recent research has focused on making deep learning models fit on devices with limited memory, executing inference efficiently in low power and memory-constrained environments, and approaches for inference on heterogeneous devices.
The two key techniques used for model compression to reduce model size are quantization, which reduces the floating point precision of parameters and gradients, and pruning, which eliminates insignificant parameters. SqueezeNet (Iandola et al., 2016) is a convolutional neural network architecture for image processing, that uses model pruning and quantization to reduce the model size of AlexNet, the benchmark against which it compares, by 510x and the number of parameters by 50x, while retaining model performance. The authors use principled design space exploration on a micro and macro level to evaluate the effect of the organization and dimensionality of individual network layers, and the network structure as a whole. He et al. (2018) use reinforcement learning to provide the optimal model pruning policy for several image models for mobile devices. They found their automated AMC approach to offer improvements in terms of accuracy and inference latency. Rusci et al. (2020) use a rule-based, mixed bitwidth low-precision quantization to select the optimal bitwidth for each layer of a MobilenetV1 image processing network. They use quantization-aware retraining to improve performance and successfully deploy their model on a memory-constrained microcontroller device with 512kB RAM and 2MB FLASH memory. The hardware-aware automated quantization (HAQ) framework developed by Wang et al. (2019b) also allows for quantization with mixed precision on different neural network layers on the two MobilenetV1/2 image models. HAQ uses reinforcement learning to determine the optimal quantization policy, and takes feedback from the hardware accelerator into consideration. The approach shows a reduction in latency and energy consumption, as well as negligible accuracy loss when compared to a fixed 8-bit quantized network. The approach has the ability to adapt to different hardware architectures. Riptide (Fromm et al., 2020) is an end-to-end system for training and deploying high-speed binarized networks. It performs extreme quantization by reducing bitwidth to 1, 2 or 3 bit precision and achieves superior efficiency and speedups without degrading accuracy. PoET-BiN (Chidambaram et al., 2020) maps binary neural networks to look-up tables on Field Programmable Gateway Arrays (FPGAs) to reduce energy consumption and compute time significantly on the CIFAR-10, MNIST and SVHN datasets.
The results obtained for on-device processing are promising, but challenging to compare across research papers. Blalock et al. (2020) note that despite its popularity, model pruning lacks adequate performance benchmarks which makes it impossible to confidently compare competing techniques. They introduce ShrinkBench, an open-source framework to standardize the evaluation of pruned neural networks. The TinyML benchmark (Banbury et al., 2020) is being developed to enable performance comparison for ultra-low-power devices and forms part of the larger suite of MLPerf benchmarks (Mattson et al., 2020). In addition to benchmarks, software tools are being developed to facilitate the deployment of deep learning workloads. TVM (Chen et al., 2018)
is a compiler that can be used with popular high-level machine learning frameworks, like TensorFlow, PyTorch and Keras, to train and deploy models on diverse hardware backends. Serentity(Ahn et al., 2020) optimizes compiler schedules for irregularly wired neural networks by using dynamic programming to adhere to on-device memory constraints. CMSIS-NN (Lai and Suda, 2018) is an optimized library of software kernels to enable the deployment of neural networks on Cortex-M cores. DeepX (Lane et al., 2016) is a software accelerator that offers runtime resource control through neural network layer compression and by automatically decomposing a deep learning model across available processors while accounting for dynamic resource constraints of mobile devices.
6.2. Automation and Optimization
To manage complex configuration, deployment and maintenance tasks across multiple technologies, automation and optimization are increasingly relied on throughout the distributed machine learning development lifecycle. Neural architecture search automates and optimizes the process of finding good network architectures for deep learning models. The original approach to neural architecture search proposed by Zoph and Le (2017)
uses a recurrent neural network as controller to generate hyperparameters, and reinforcement learning to train the controller, in order to arrive at a state-of-the-art convolutional neural network architecture. The concept of exploring a search space for optimal network architectures has since been extended. For example,Xie et al. (2019) propose the use of network generators to sample potential architectures from a distribution, and have discovered that randomly wired neural networks can produce competitive results to conventional structures.
While deep learning models are producing state-of-the-art results, their reliance on labelled training data is a constraint. Advances in weak supervision are providing mechanisms for automating label generation. Ratner et al. (2016) propose data programming for programmatically creating training datasets with an extraction framework called Snorkel. Rather than labeling individual data points, Snorkel allows subject matter experts to define labeling functions to describe the processes by which data points could be labeled. The labeling functions are then used as programs to label the data. Osprey (Bringer et al., 2019) extends Snorkel and provides a high level interface for non-programmers to specify complex labeling functions. Koller et al. (2019) apply weak supervision to sequence learning problems with parallel sub-problems in multi-stream video processing. They exploit temporal sequence constraints within independent streams and combine them by explicitly imposing synchronization points. In the context of continuous improvement, Kang et al. (2020) use model assertions to monitor and validate machine learning models, and to provide weak supervision. The assertions validate the consistency of model outputs and present correction rules that propose new training labels for data that fail the assertions. ease.ml/ci (Renggli et al., 2019), on the other hand, is a continuous integration system that aims to reduce the amount of labels required to a practical amount. The system enables users to state integration conditions with reliability constraints. Optimization techniques are then used to reduce the number of training labels while providing performance guarantees that meet production requirements.
Allocating and scheduling distributed workloads across computing resources can be optimized to meet diverse resource utilization objectives. Hadjis et al. (2016) study trade-off factors to minimize the training time of convolutional neural networks. Based on their findings, they present Omnivore, an optimizer for asynchronous, data parallel training. Gandivafair (Chaudhary et al., 2020) is a GPU cluster scheduler that guarantees fair allocation of compute resources amongst users, while maximizing the efficiency of the cluster by limiting idle resources. The scheduler uses a gang-aware approach that allocates GPUs in an all-or-nothing manner to a job, distributes jobs evenly through the cluster by using a load balancer and finally implements an automated GPU trading strategy to ensure user-level fairness. AlloX (Le et al., 2020) also addresses fair resource allocation amongst users, but aims to simultaneously optimize performance to reduce the average job completion time.
6.3. Privacy-preserving machine learning
Recent work on privacy-preserving machine learning is predominantly focused on retaining the anonymity of data providers both during training and inference. Differential privacy (Dwork et al., 2016) adds random noise that is generated according to a chosen distribution to a data query before returning the result to the user. This approach has provable privacy guarantees and has become the standard for protecting sensitive data in machine learning. v.d. Hoeven (2019) generalize this approach by allowing the data provider to control the noise distribution and by implication the privacy guarantees, while keeping the parameters of the distribution hidden during training.
On-device processing reduces the risk of data leakage. While feasible for inference, the larger resource requirements for training have given rise to federated learning (Brendan McMahan et al., 2017), which keeps sensitive data on mobile devices and performs global parameter aggregation on the cloud. Thus privacy is preserved and large data transfer between devices and the cloud is prevented. The Federated Averaging, or FedAvg (Brendan McMahan et al., 2017) algorithm does data parallel training on mobile devices and synchronously averages parameter updates on a central server. Many extensions have been proposed to federated learning. FedProx (Li et al., 2020) is an adaptation of FedAvg that permits partial work from straggling devices to be committed for averaging. While keeping data on devices improves privacy protection, communicating the update parameters and model over the wireless network presents security risks. Adding differential privacy has been proposed in (McMahan et al., 2018) and integrated into the Tensorflow Private library. Truex et al. (2019) proposes federated learning with differential privacy and secure multiparty computation to improve model accuracy by reducing the added noise without sacrificing privacy, as the number of participating devices increases. Seif et al. (2020) show that interference in the wireless communication channel can provide strong guarantees for local differential privacy and improved bandwidth efficiency for federated learning. To allow for personalization, FURL (Bui et al., 2019) splits parameters into local, private parameters that remain on a mobile device and federated parameters that are averaged on a central server.
For privacy-preserving inference on the cloud, Bost et al. (2015) present an approach for classification over encrypted data. To limit sensitive inference on application data accessed by third parties, the Replacement AutoEncoder (RAE) algorithm proposed by Malekzadeh et al. (2018b) differentiates between black-listed (i.e. sensitive), grey-listed (i.e. not useful) and white-listed (i.e. desired information) predictions. RAE learns to replace black-listed predictions with values from grey-listed predictions. The Guardian-Estimator-Neutralizer (GEN) framework (Malekzadeh et al., 2018a) uses feature learning and data reconstruction to trade off the utility that an application provides based on the data it has access to, and the sensitive user information that is revealed.
7. Open problems
Machine learning has enabled dramatic progress in realising smart services within the IoT. However, many challenges remain to ensure that the deployment of intelligence moves from fragile, cloud-based algorithms to fully functional, trustworthy and business-aligned systems. This section discusses some of the open challenges.
7.1. Full Stack Machine Learning Support
End-to-end software support for machine learning systems in the IoT must facilitate the development, testing, configuration, deployment, management and maintenance of all technologies that affect data provenance, model training and inference. Due to the scale and distribution of devices in the IoT, many of the required tasks will need to be automated. As noted by Hartsell et al. (2019)
, traceability and reproducability are necessary at every development step for safety and mission-critical applications typical in CPS and the IoT. Machine learning systems embedded within the IoT are no exception, and must be traceable and transparent to human users. Data history and quality are of particular importance, as they replace traditional analytical Models in machine learning systems. Extending and adapting data provenance and software engineering tools developed for cloud technologies to the device-edge-cloud continuum can serve as a starting point. Necessary tasks include automated data validation, catching data errors, making implicit assumptions about data explicit, detecting anomalies between training and serving data, performance profiling and deciding when to retrain a model. The scarcity of labelled training data needs to be overcome and alternatives to supervised learning will need to be explored. Within the context of the IoT, traditional semi-supervised learning with general adversarial networks presents challenges due to increased model complexity, training instability and performance degradation in multi-modal sensing applications(Yao et al., 2018a). Automated labelling is viewed as promising for addressing the challenge of accumulating sufficient training data (Abdelzaher et al., 2020), but its limitations still need to be explored. Many smart services will need to respond to changing, local conditions. Context adaptation, personalization, incremental learning and continual learning present promising avenues to evolve models in dynamic environments.
7.2. Comprehensive Approach to Trustworthiness
Predictions in the IoT must cater for open world assumptions. New categories, unseen examples, black swan events and foreign attack models require an extension of existing machine learning privacy and robustness considerations to broader trustworthiness concerns that arise within the IoT context. These should include reliability, resilience, safety and security concerns, together with comprehensive trustworthiness guarantees. A nuanced treatment of privacy that extends beyond anonymity to trust and control is necessary. Designing systems that can learn on encrypted data, that can train on devices, and that can provide sufficient and efficient inference performance with coarser and less data, will improve anonymity, control and trust of machine learning systems in the IoT. Machine learning systems must anticipate network security attacks that can result in altered, noisy and missing data and models. Proactive approaches to dealing with attacks include finding ways to measure data provenance, devising mechanisms for learning under uncertainty, declining predictions, verifying data and model authenticity, providing inference with quality assurance and developing systems that are resilient under attack. Intermittent and unreliable data transfer over wireless channels will result in missing values that can limit inference quality and affect system level predictive performance. This requires machine learning systems that can handle data dropouts, that are fault tolerant and that provide reliable results in unstable communication environments. Issues of fairness, accountability and transparency that have emerged in the machine learning community (Wallach, 2014) need to be accounted for in machine learning systems in IoT applications. To incorporate trustworthiness concerns within system requirements and design, mechanisms for specifying and measuring them, and for evaluating trade-offs presented by trustworthiness concerns are needed.
7.3. Socio-Technical Perspective of Smart Services
Smart services in the IoT can be viewed as socio-technical systems involving complex, interacting and interdependent cyber-physical technical components and networks of independent actors consisting, for example, of users, data generators, network providers, data processors and service providers. Important research directions lie at the intersection of the different enabling technologies highlighted in Figure 4, within the contexts provided by the design aspects and concerns outlined in Table 1. The technical system design demands interdisciplinary expertise, and an expanded understanding of the interacting design considerations of distributed machine learning systems and IoT communication, hardware and device technologies. Domain experts need a nuanced understanding of the availability and limitations of machine learning systems and cloud technologies. Moreover, the multi-stakeholder environment gives rise to conflicting requirements and priorities between actors. To apply a systems approach to the development of smart services, frameworks and processes are needed to navigate conflicting design concerns and the trade-offs between stakeholder requirements that they represent. Providing an overview of available technologies and design choices with explicitly specified limitations, quality of service guarantees and uncertainty estimates at each design step can help in negotiating requirements trade-offs and technology choices amongst stakeholders.
This article presents a structured review of considerations and concerns for deploying machine learning systems in the IoT to enable smart services with intelligence. Four key perspectives have emerged through the review process. Firstly, a unified view of cyber-physical systems and the IoT can benefit the development of machine learning systems for smart services of hybrid logical-physical nature. Particularly, sharing approaches across these two communities enables IoT applications to draw on solid frameworks developed for cyber-physical systems to identify IoT specific design aspects and stakeholder concerns. Secondly, our review highlights the challenges presented by state-of-the-art cloud architectures for machine learning systems in the IoT, and the rising demand to consider alternative architectures. Thirdly, many machine learning systems have been deployed in production and the challenges for ongoing operations are known. These challenges will prevail in IoT applications as well, and require a system-perspective that appraises machine learning technologies beyond algorithms. Finally, advances in distributed computing and edge computing are promising for scaling and distributing machine learning in the IoT, and will be integral to the future architectures of smart services.
- TensorFlow: A System for Large-Scale Machine Learning. In USENIX Symposium on Operating System Design and Implementation, Savannah, USA. External Links: Cited by: §5.2.5.
- Five Challenges in Cloud-enabled Intelligence and Control. ACM Transactions on Internet Technology 20 (1), pp. 1–19. Cited by: §4.2.2, §4.2.3, §4.3.2, §5.3.3, §5.3.3, §7.1.
- A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In Privacy Preserving Data Mining: Models and Algorithms, pp. 11–52. External Links: Cited by: §4.3.4.
- Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Proceedings of the 3rd MLSys Conference, Austin, US. External Links: Cited by: §6.1, Table 5.
- Predictive analytics for complex IoT data streams. IEEE Internet of Things Journal 4 (5), pp. 1571–1582. External Links: Cited by: §3.2.
- Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications Ala. IEEE COMMUNICATION SURVEYS & TUTORIALS 17 (4), pp. 2347. External Links: Cited by: §5.3.3, §5.3.3.
- Smart Electricity Meter Data Intelligence for Future Energy Systems: A Survey. IEEE Transactions on Industrial Informatics 12 (1), pp. 425–436. External Links: Cited by: §3.1.
- Fog Computing for the Internet of Things: Security and Privacy Issues. IEEE Internet Computing 21 (2), pp. 34–42. External Links: Cited by: §5.1.2, §5.1.2.
- What does fault tolerant deep learning need from MPI?. In Proceedings of the 24th European MPI Users’ Group Meeting, New York, NY, USA. External Links: Cited by: §5.2.5, §5.2.5.
- Software Engineering for Machine Learning: A Case Study. Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019, pp. 291–300. External Links: Cited by: §2.2, §4.1.1, §4.4.2.
- AI and Compute (2019 update). External Links: Cited by: §4.2.3.
- Smart cities in the new service economy: Building platforms for smart services. AI and Society 29 (3), pp. 323–334. External Links: Cited by: §1.
- Integration of cloud computing with internet of things: Challenges and open issues. Proceedings - 2017 IEEE International Conference on Internet of Things, IEEE Green Computing and Communications, IEEE Cyber, Physical and Social Computing, IEEE Smart Data, iThings-GreenCom-CPSCom-SmartData 2017, pp. 670–675. External Links: Cited by: §1, §3.3, §5.1.1.
- Real-Time Deep Learning at the Edge for Scalable Reliability Modeling of Si-MOSFET Power Electronics Converters. IEEE Internet of Things Journal 6 (5), pp. 7375–7385. External Links: Cited by: §5.3.4.
- Benchmarking TinyML Systems: Challenges and Direction. External Links: Cited by: §5.3.2, §6.1, Table 5.
- On Distributed Communication Networks. First Congress of the Information Systems Sciences. External Links: Cited by: §5.
- Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. ACM Computing Surveys 52 (4). External Links: Cited by: §5.2.1, §5.2.3, §5.2.3, §5.2.4.
- Deep Learning of Representations: Looking Forward. In International Conference on Statistical Language and Speech Processing, External Links: Cited by: §5.2.
- IoT Wearable Sensor and Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home Environment. IEEE Internet of Things Journal 6 (5), pp. 8553–8562. External Links: Cited by: §5.3.1.
- DataWig: Missing value imputation for tables. Journal of Machine Learning Research 20, pp. 1–6. External Links: Cited by: §4.1.1, Table 5.
- What is the State of Neural Network Pruning. In Proceedings of the 3rd MLSys Conference, Austin, US. Cited by: §6.1, Table 5.
- Machine Learning Classification over Encrypted Data. NDSS 4324, pp. 1–34. External Links: Cited by: §5.1.2, §6.3, Table 5.
- Data Validation for Machine Learning. In Proceedings of the 2nd SysML Conference, pp. 1–14. Cited by: §4.1.2, Table 5.
- Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017 54. External Links: Cited by: §6.3, Table 5.
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code. Proceedings of the ACM SIGMOD International Conference on Management of Data. External Links: Cited by: §6.2, Table 5.
- Federated User Representation Learning. External Links: Cited by: §6.3, Table 5.
- IoT elements, layered architectures and security issues: A comprehensive survey. Sensors (Switzerland) 18 (9), pp. 1–37. External Links: Cited by: §5.1.3.
- Scaling Video Analytics on Constrained Edge Nodes. In Proceedings of the 2nd SysML Conference, Palo Alto, US. External Links: Cited by: Table 5.
- Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations. Technical report External Links: Cited by: §4.3.4.
- Cirrus: a Serverless Framework for End-to-end ML Workflows. In Proceedings of the ACM Symposium on Cloud Computing, pp. 13–24. External Links: Cited by: §5.2.4, Table 5.
- A Hitchhiker’s Guide On Distributed Training Of Deep Neural Networks. Journal of Parallel and Distributed Computing 137, pp. 65–76. External Links: Cited by: §2.2, §5.2.2, §5.2.4, §5.2.
- Balancing Efficiency and Fairness in Heterogenous GPU Clusters for Deep Learning. In EuroSys 2020, External Links: Cited by: §6.2, Table 5.
- Deep Learning With Edge Computing: A Review. Proceedings of the IEEE 107 (8), pp. 1655–1674. External Links: Cited by: §1.1, §2.2, §5.1, §5.3.4, §5.3.
- TVM : An Automated End-to-End Optimizing Compiler for Deep Learning This paper is included in the Proceedings of the. In USENIX Symposium on Operating System Design and Implementation, Carlsbad, US. External Links: Cited by: §6.1, Table 5.
- Fog and IoT: An Overview of Research Opportunities. IEEE Internet of Things Journal 3 (6), pp. 854–864. External Links: Cited by: §5.3.
- PoET-BiN: Power Efficient Tiny Binary Neurons. In Proceedings of the 3rd MLSys Conference, Austin, US. Cited by: §6.1, Table 5.
- Project ADAM: Building an efficient and scalable deep learning training system. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014, pp. 571–582. External Links: Cited by: §5.2.1.
- Solving the straggler problem with bounded staleness. In HotOS’13: Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems, Cited by: §5.2.2.
- Framework for Cyber-Physical Systems : Volume 1 , Overview. Technical report Vol. 1500-201, National Institute of Standards and Technology. Note: Brilliant! must read in depth External Links: Cited by: §2.1, §3.2, Table 1.
- Clipper: A Low-Latency Online Prediction Serving System. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17)., Boston, US. External Links: Cited by: Table 5.
- AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation. In Proceedings of the 2nd SysML Conference, Palo Alto, US. External Links: Cited by: Table 5.
- Activity Recognition Based on Inertial Sensors for Ambient Assisted Living. 2016 19th International Conference on Information Fusion (FUSION), pp. 371–378. External Links: Cited by: §3.1.
- Large Scale Distributed Deep Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12), External Links: Cited by: §5.2.1, §5.2.5.
- Cloud computing and Internet of Things fusion: Cost issues. Proceedings of the International Conference on Sensing Technology, ICST 2017-Decem (January 2018), pp. 1–6. External Links: Cited by: §5.1.1.
- A survey about prediction-based data reduction in wireless sensor networks. ACM Computing Surveys 49 (3). External Links: Cited by: §1.1.
- State-of-the-art, challenges, and open issues in the integration of Internet of things and cloud computing. Journal of Network and Computer Applications 67, pp. 99–117. External Links: Cited by: §1.1, §3.3.
- A few useful things to know about machine learning. Communications of the ACM 55 (10), pp. 78–87. External Links: Cited by: §2.2, §2.2.
- Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality 7 (3), pp. 17–51. External Links: Cited by: §6.3, Table 5.
- Band-limited training and inference for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019 2019-June, pp. 3139–3155. External Links: Cited by: Table 5.
- The Potential of Knowing More: A Review of Data-Driven Urban Water Management. Environmental Science and Technology 51 (5), pp. 2538–2553. External Links: Cited by: §3.1.
- An Overview of Internet of Things (IoT) and Data Analytics in Agriculture: Benefits and Challenges. IEEE Internet of Things Journal 5 (5), pp. 3758–3773. External Links: Cited by: §3.1, §3.3, Table 2.
- Computational health informatics in the big data age: A survey. ACM Computing Surveys 49 (1). External Links: Cited by: §3.1, Table 2.
- Technologies for a thing-centric internet of things. Proceedings - 2017 IEEE 5th International Conference on Future Internet of Things and Cloud, FiCloud 2017 2017-January, pp. 77–84. External Links: Cited by: Figure 2, §5.3.2, §5.3.
- CPS data streams analytics based on machine learning for Cloud and Fog Computing: A survey. Future Generation Computer Systems 90, pp. 435–450. External Links: Cited by: §3.2.
- Enabling effective programming and flexible management of efficient body sensor network applications. IEEE Transactions on Human-Machine Systems 43 (1), pp. 115–133. External Links: Cited by: §3.1, §3.2, Table 2.
- Riptide: Fast end-to-end Binarized Neural Networks. In Proceedings of the 3rd MLSys Conference, Austin, Texas. Cited by: §6.1, Table 5.
- A survey on concept drift adaptation. ACM Computing Surveys 46 (4). External Links: Cited by: §2.2.
- Distributed networked control systems : A brief overview. Information Sciences 380, pp. 117–131. External Links: Cited by: §5.
- Big Data Issues in Smart Grids: A Survey. IEEE Systems Journal PP, pp. 1–12. External Links: Cited by: §3.1, §3.3, Table 2.
- Explaining and harnessing adversarial examples. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–11. External Links: Cited by: §4.3.2.
- Cyber-Physical Systems and Internet of Things NIST Special Publication 1900-202. Technical report National Institue of Standards and Technology. Cited by: §2.1, §2.1.
- Peeking behind the NAT: An empirical study of home networks. Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC, pp. 377–389. External Links: Cited by: §5.1.1.
- BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 7, pp. 47230–47243. External Links: Cited by: §4.2.4.
- A survey on concepts, applications, and challenges in cyber-physical systems. KSII Transactions on Internet and Information Systems 8 (12), pp. 4242–4268. External Links: Cited by: §3.2.
- Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs. External Links: Cited by: §6.2, Table 5.
- CPS design with learning-enabled components: A case study. Proceedings of the International Workshop on Rapid System Prototyping, pp. 57–63. External Links: Cited by: §7.1.
- The Elements of Statistical Learning. Second edition, Springer. External Links: Cited by: §2.2, §2.2, §2.2.
- AMC: AutoML for model compression and acceleration on mobile devices. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11211 LNCS, pp. 815–832. External Links: Cited by: §6.1, Table 5.
- Ultra-Low Power and Dependability for IoT Devices. IoT Technologies. Cited by: §5.3.2.
- Distributed Machine Learning through Heterogeneous Edge Systems. External Links: Cited by: Table 5.
- Adversarial machine learning. In AI Sec, Chicago, US. External Links: Cited by: §4.2.2.
- GPipe : Efficient Training of Giant Neural Networks using Pipeline Parallelism. In 33rd Conference on Neural Information Processing Systems, Cited by: §5.2.1.
- DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 82–95. External Links: Cited by: Table 5.
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. External Links: Cited by: §6.1, Table 5.
- Tracking the evolution of the internet of things concept across different application domains. Sensors (Switzerland) 17 (6), pp. 1–24. External Links: Cited by: §1, §2.1, §3.3.
- Machine learning for wireless communications in the Internet of Things: A comprehensive survey. Ad Hoc Networks 93, pp. 101913. External Links: Cited by: §1.1.
- FFDL: A Flexible Multi-tenant Deep Learning Platform. Middleware 2019 - Proceedings of the 2019 20th International Middleware Conference, pp. 82–95. External Links: Cited by: §5.2.4, §5.2.5, §5.2.5.
- Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In USENIX Annual Technical Conference, External Links: Cited by: §5.2.5, §5.2.5.
- Model-reuse attacks on deep learning systems. Proceedings of the ACM Conference on Computer and Communications Security, pp. 349–363. External Links: Cited by: §4.1.4, §4.2.4.
- MNN: A Universal and Efficient Inference Engine. In Proceedings of the 3rd MLSys Conference, Austin, Texas. External Links: Cited by: §4.3.1.
- How to scale distributed deep learning?. (i), pp. 1–16. External Links: Cited by: Table 5.
- Cloud Programming Simplified: A Berkeley View on Serverless Computing. External Links: Cited by: §3.3.
- Advances and Open Problems in Federated Learning. pp. 1–105. Note: Excellent survey on federated learning! External Links: Cited by: §5.3.4.
- i-IoT ( Intelligent Internet of Things ). Vol. 4. External Links: Cited by: §3.3, §3.3.
- Model Assertions for Monitoring and Improving ML Models. In Proceedings of the 3nd SysML Conference, Austin, US. Cited by: §6.2, Table 5.
- Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence XX (X), pp. 1–1. External Links: Cited by: §6.2, Table 5.
- Decentralized Deep Learning with Arbitrary Communication Compression. In ICLR 2020 Conference Paper, pp. 1–22. External Links: Cited by: Table 5.
- Willump: A Statistically-Aware End-to-End Optimizer. In Proceedings of the 3rd MLSys Conference, Austin, Texas. Cited by: §4.3.1, Table 5.
- Enabling deep learning at the IoT edge. IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, pp. 1–6. External Links: Cited by: §6.1, Table 5.
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2016 - Proceedings (1), pp. 1–12. External Links: Cited by: §6.1, Table 5.
- State of the IoT 2018: Number of IoT devices now at 7B – Market accelerating. External Links: Cited by: §5.1.
- AlloX: Compute Allocation in Hybrid Clusters. In EuroSys 2020, Vol. 16. External Links: Cited by: §6.2, Table 5.
- Deep learning. Nature 521 (7553), pp. 436–444. External Links: Cited by: §2.2.
- Cyber Physical Systems: Design Challenges. Technical report UC Berkeley. External Links: Cited by: §3.2.
- DeepCham: Collaborative edge-mediated adaptive deep learning for mobile object recognition. In Proceedings - 1st IEEE/ACM Symposium on Edge Computing, SEC 2016, pp. 64–76. External Links: Cited by: Table 5.
- Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Network 32 (1), pp. 96–101. External Links: Cited by: §5.1.
- Distributed machine learning load balancing strategy in cloud computing services. Wireless Networks 2. External Links: Cited by: §5.2.4, Table 5.
- Federated Optimization in Heterogeneous Networks. In Proceedings of the 3rd MLSys Conference, Austin, US. Cited by: §6.3, Table 5.
- Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (12), pp. 2935–2947. External Links: Cited by: Table 5.
Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Advances in Neural Information Processing Systems 2017-December (1), pp. 5331–5341. External Links: Cited by: §5.2.3.
- Toward Intelligent Vehicular Networks: A Machine Learning Framework. IEEE Internet of Things Journal 6 (1), pp. 124–135. External Links: Cited by: §3.1.
- Deep Learning Based Inference of Private Information Using Embedded Sensors in Smart Devices. IEEE Network 32 (4), pp. 8–14. External Links: Cited by: §5.1.2.
- The of Model The Interpretability. Cited by: §4.3.3.
- Routing Correlated Data with Fusion Cost in Wireless Sensor Networks. IEEE Transactions on Mobile Computing 5 (11), pp. 1620–1632. External Links: Cited by: §5.3.3.
- A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation. Agile Processes in Software Engineering and Extreme Programming 1, pp. 260. External Links: Cited by: §4.4.2.
- Machine learning for internet of things data analysis: a survey. Digital Communications and Networks 4 (3), pp. 161–175. External Links: Cited by: §1.1, §3.2.
- Protecting sensory data against sensitive inferences. Proceedings of the Workshop on Privacy by Design in Distributed Systems, P2DS 2018. External Links: Cited by: §6.3, Table 5.
- Replacement autoencoder: a privacy-preserving algorithm for sensory data analysis. Proceedings - ACM/IEEE International Conference on Internet of Things Design and Implementation, IoTDI 2018, pp. 165–176. External Links: Cited by: §6.3, Table 5.
- Autonomous Vehicles: State of the Art, Future Trends, and Challenges. Automotive Systems and Software Engineering, pp. 347–367. External Links: Cited by: §3.1, Table 2.
- DeepEye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. MobiSys 2017 - Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 68–81. External Links: Cited by: Table 5.
- MLPerf Training Benchmark. In Proceedings of the 3rd MLSys Conference, Austin, US. External Links: Cited by: §6.1.
- Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools. Technical report Vol. 1. External Links: Cited by: §1.1, §5.2.1, §5.2.2, §5.2.3, §5.2.3, §5.2.3, §5.2.4, §5.2.5, §5.3.1.
- The tensorflow partitioning and scheduling problem. In Proceedings of the 1st Workshop on Distributed Infrastructures for Deep Learning, pp. 1–6. External Links: Cited by: §5.2.1, §5.2.4.
- Characterizing Smart Home IoT Traffic in the Wild. External Links: Cited by: §5.1.1, §5.1.2.
- A General Approach to Adding Differential Privacy to Iterative Training Procedures. External Links: Cited by: §6.3.
- Towards neural networks that provably know when they don’t know. In ICLR 2020, pp. 1–18. External Links: Cited by: Table 5.
- MLlib: Machine learning in Apache Spark. Journal of Machine Learning Research 17, pp. 1–7. External Links: Cited by: §4.4.3.
- Towards a Definition of the Internet of Things (IoT). IEEE Internet Initiative, pp. 1–86. External Links: Cited by: §5.
- The discipline of machine learning. Vol. 9, Carnegie Mellon University, School of Computer Science, Machine Learning …. Cited by: §2.2, §2.2.
- Deep learning for IoT big data and streaming analytics: A survey. IEEE Communications Surveys and Tutorials 20 (4), pp. 2923–2960. External Links: Cited by: §3.3.
- Data science for building energy management: A review. Renewable and Sustainable Energy Reviews 70 (December 2016), pp. 598–609. External Links: Cited by: §3.1.
- Wearable sensors for human activity monitoring: A review. IEEE Sensors Journal 15 (3), pp. 1321–1330. External Links: Cited by: Table 2.
Nontechnical loss detection for metered customers in power utility using support vector machines. IEEE Transactions on Power Delivery 25 (2), pp. 1162–1171. External Links: Cited by: §3.1.
- Learning with noisy labels. Advances in Neural Information Processing Systems, pp. 1–9. External Links: Cited by: §4.2.2.
- Framework and Roadmap for Smart Grid Interoperability Standards. Technical report Vol. 0, National Institute of Standards and Technology. External Links: Cited by: Table 2.
- Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging. In Machine Learning for Health at NeurIPS 2019, External Links: Cited by: §4.2.2, §4.3.3.
- Revisiting Distributed Synchronous SGD. pp. 1–10. External Links: Cited by: Table 5.
- BlinkML: Efficient maximum likelihood estimation with probabilistic guarantees. Proceedings of the ACM SIGMOD International Conference on Management of Data (i), pp. 1135–1152. External Links: Cited by: Table 5.
- Anomaly detection. Computers, Materials and Continua 14 (1), pp. 1–22. External Links: Cited by: §3.1.
- Machine Learning Software Engineering in Practice: An Industrial Case Study. CoRR abs/1906.07154, pp. 1–21. External Links: Cited by: §3.2.
- MLSys: The New Frontier of Machine Learning Systems. pp. 1–4. External Links: Cited by: §1.
- Data programming: Creating large training sets, quickly. Advances in Neural Information Processing Systems (Nips), pp. 3574–3582. External Links: Cited by: §4.2.1, §6.2, Table 5.
- The role of massively multi-task and weak supervision in software 2.0. CIDR 2019 - 9th Biennial Conference on Innovative Data Systems Research. Cited by: §4.1.1.
- A Deep Learning Approach to on-Node Sensor Data Analytics for Mobile or Wearable Devices. IEEE Journal of Biomedical and Health Infromatics 12 (1), pp. 106–137. External Links: Cited by: §3.2.
- A survey on vehicular edge computing: Architecture, applications, technical issues, and future directions. Wireless Communications and Mobile Computing 2019. External Links: Cited by: §3.1, Table 2, §5.1.
- Continous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous yet Practical Treatment. In Proceedings of the 2nd SysML Conference, Palo Alto, US. External Links: Cited by: §6.2, Table 5.
- ”Why Should I Trust You?”. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, New York, New York, USA, pp. 1135–1144. External Links: Cited by: §4.3.3.
- On the features and challenges of security and privacy in distributed internet of things. Computer Networks 57 (10), pp. 2266–2279. External Links: Cited by: §5.1.2, §5.1.2, §5.1.2, §5.1.3.
- Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers. In Proceedings of the 3rd MLSys Conference, Austin, Texas. External Links: Cited by: §4.3.1, §6.1, Table 5.
- Smart home activities: A literature review. Electric Power Components and Systems 42 (3-4), pp. 294–305. External Links: Cited by: §3.1.
- IoT Technologies for Embedded Computing : A Survey. In CODES/ISSS ’16, Pittsburgh, USA. External Links: Cited by: §1.1, Figure 2, §5.3.1, §5.3.2, §5.3.3, §5.3.3, §5.3.4, §5.3.4, §5.3.
- From cloud down to things: An overview of machine learning in internet of things. IEEE Internet of Things Journal 6 (3), pp. 4921–4934. External Links: Cited by: §3.2, §3.3, §5.3.3.
- On Challenges in Machine Learning Model Management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 5–13. External Links: Cited by: §4.1.1, §4.4.2, §4.4.3, §4.4.4.
- Automatically Tracking Metadata and Provenance of Machine Learning Experiments. Machine Learning Systems Workshop at NIPS, pp. 1–8. External Links: Cited by: §4.4.3.
- Cyber-Physical Systems Engineering. In Engineering Trustworth Software Systems, pp. 256 – 289. External Links: Cited by: §2.1, §3.2.
- A large-scale study of failures in high-performance computing systems. In Proceedings of the International Conference on Dependable Systems and Networks (DSN2006), Philadelphia, USA. Cited by: §5.2.5.
- Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, pp. 2503–2511. Cited by: §4.1.3, §4.4.1, §4.4.2.
- Wireless Federated Learning with Local Differential Privacy. External Links: Cited by: §6.3, Table 5.
- Edge Computing: Vision and Challenges. IEEE Internet of Things Journal 3 (5), pp. 637–646. External Links: Cited by: §5.1.1, §5.3, §5.
- Incorporating Artificial Intelligence into Medical Cyber Physical Systems: A Survey. Connected Health in Smart Cities. External Links: Cited by: Table 2.
- The future internet of things: Secure, efficient, and model-based. IEEE Internet of Things Journal 5 (4), pp. 2386–2398. External Links: Cited by: §3.3.
- Mastering the game of Go without human knowledge. Nature 550 (7676), pp. 354–359. External Links: Cited by: §4.2.3, §4.3.1.
- A review of edge computing reference architectures and a new global edge proposal. Future Generation Computer Systems 99 (2019), pp. 278–294. External Links: Cited by: §5.3.
- A Berkeley View of Systems Challenges for AI. External Links: Cited by: §2.2, §4.2.2, §4.2.3, §4.3.4.
- Enabling Deep Learning on IoT Devices. IEEE Computer, pp. 92 – 96. Cited by: §5.3.4, §5.3.
- Delay-Minimization Routing for Heterogeneous VANETs with Machine Learning Based Mobility Prediction. IEEE Transactions on Vehicular Technology 68 (4), pp. 3967–3979. External Links: Cited by: §3.1, §5.3.3.
- Privacy-Preserving Adversarial Networks. 2019 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019, pp. 495–505. External Links: Cited by: Table 5.
- Effects of differential privacy and data skewness on membership inference vulnerability. Proceedings - 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019, pp. 82–91. External Links: Cited by: §6.3, Table 5.
- Big data issues in smart grid – A review. Renewable and Sustainable Energy Reviews 79, pp. 1099–1107. External Links: Cited by: Table 2.
- A survey on big multimedia data processing and management in smart cities. ACM Computing Surveys 52 (3). External Links: Cited by: §3.2.
- User-Specified Local Differential Privacy in Unconstrained Adaptive Online Learning. Cited by: §6.3, Table 5.
- Modular and Personalized Smart Health Application Design in a Smart City Environment. IEEE Internet of Things Journal 5 (2), pp. 614–623. External Links: Cited by: §1.
- A Survey on Distributed Machine Learning. 1 (1), pp. 1–33. External Links: Cited by: §1.1, §5.2.3, §5.2.5, §5.2.5, §5.2, §5.3.2.
- Big data, machine learning, and the social sciences: fairness, accountability, and transparency. External Links: Cited by: §7.2.
- Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters 119, pp. 3–11. External Links: Cited by: §3.1.
- Deep learning for smart manufacturing: Methods and applications. Journal of Manufacturing Systems 48, pp. 144–156. External Links: Cited by: §2.2.
- HAQ: Hardware-aware automated quantization with mixed precision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June, pp. 8604–8612. External Links: Cited by: §2.2, §6.1, Table 5.
- Review of Smart Meter Data Analytics: Applications, Methodologies, and Challenges. IEEE Transactions on Smart Grid 10 (3), pp. 3125–3148. External Links: Cited by: §3.1.
- Cyber-physical systems for water sustainability: Challenges and opportunities. IEEE Communications Magazine 53 (5), pp. 216–222. External Links: Cited by: §3.1.
- Automating Dependence-Aware Parallelization of Machine Learning Training on Distributed Shared Memory. In Proceedings of the Fourteenth EuroSys Conference 2019, External Links: Cited by: §5.2.4.
- Probabilistic inference and differential privacy. Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, NIPS 2010, pp. 1–9. External Links: Cited by: §4.3.4.
- Exploring randomly wired neural networks for image recognition. Proceedings of the IEEE International Conference on Computer Vision 2019-October, pp. 1284–1293. External Links: Cited by: §6.2, Table 5.
- SERF : Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, External Links: Cited by: §5.2.4, §5.2.4, Table 5.
- A survey on trust management for Internet of Things. Journal of Network and Computer Applications 42, pp. 120–134. External Links: Cited by: §5.1.2.
- SenseGAN: Enabling Deep Learning for Internet of Things with a Semi-Supervised Framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2 (3), pp. 1–21. External Links: Cited by: §7.1.
- DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing. External Links: Cited by: Table 5.
- Deep Learning for the Internet of Things. Technical report Cited by: §4.2.1, §5.3.2, §5.3.4.
- Distributed Learning of Deep Neural Networks using Independent Subnet Training. External Links: Cited by: Table 5.
- Internet of things for smart cities. IEEE Internet of Things Journal 1 (1), pp. 22–32. External Links: Cited by: §3.3.
- The Cloud is Not Enough: Saving IoT from the Cloud. 7th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud ’15, Santa Clara, CA, USA, July 6-7, 2015., pp. 21–21. External Links: Cited by: §5.1.1, §5.1.1, §5.1.2, §5.1.
- Sequence-to-point learning with neural networks for non-intrusive load monitoring. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 2604–2611. External Links: Cited by: §3.1, §5.1.2.
- Thermal comfort modeling for smart buildings: A fine-grained deep learning approach. IEEE Internet of Things Journal 6 (2), pp. 2540–2549. External Links: Cited by: §5.3.
- SkyNet: A Hardware-efficient Method for Object Detection and Tracking on Embedded Systems. In Proceedings of the 3rd MLSys Conference, Austin, US. Cited by: Table 5.
- Big data analytics in smart grids: a review. Energy Informatics 1 (1). External Links: Cited by: §3.1.
- Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing. arXiv. External Links: Cited by: §1.1, §4.1.1, §5.1, §5.3.4.
- Neural Architecture Search with Reinforcement Leaning. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1–16. External Links: Cited by: §6.2, Table 5.