Log In Sign Up

Big Data Analytics for Manufacturing Internet of Things: Opportunities, Challenges and Enabling Technologies

by   Hong-Ning Dai, et al.

The recent advances in information and communication technology (ICT) have promoted the evolution of conventional computer-aided manufacturing industry to smart data-driven manufacturing. Data analytics in massive manufacturing data can extract huge business values while can also result in research challenges due to the heterogeneous data types, enormous volume and real-time velocity of manufacturing data. This paper provides an overview on big data analytics in manufacturing Internet of Things (MIoT). This paper first starts with a discussion on necessities and challenges of big data analytics in manufacturing data of MIoT. Then, the enabling technologies of big data analytics of manufacturing data are surveyed and discussed. Moreover, this paper also outlines the future directions in this promising area.


page 4

page 10


Video Big Data Analytics in the Cloud: Research Issues and Challenges

On the rise of distributed computing technologies, video big data analyt...

Industrial Big Data Analytics: Challenges, Methodologies, and Applications

While manufacturers have been generating highly distributed data from va...

Enabling Big Data Analytics at Manufacturing Fields of Farplas Automotive

Digitization and data-driven manufacturing process is needed for today's...

Big data analytics architecture design

Objective. We propose an approach to reason about goals, obstacles, and ...

Real Time Analytics: Algorithms and Systems

Velocity is one of the 4 Vs commonly used to characterize Big Data. In t...

Preliminary Exploration on Digital Twin for Power Systems: Challenges, Framework, and Applications

Digital twin (DT) is one of the most promising enabling technologies for...

I Introduction

The manufacturing industry is experiencing a paradigm shift from automated manufacturing industry to “smart manufacturing” [1]. During this evolution, Internet of Things (IoT) plays an important role of connecting the physical environment of manufacturing to the cyberspace of computing platforms and decision-making algorithms, consequently forming a Cyber-Physical System (CPS) [2]. We name such industrial IoT dedicated to manufacturing industry as manufacturing IoT (MIoT) in this paper.

MIoT consists of a wide diversity of manufacturing equipments, sensors, actuators, controllers, RFID tags and smart meters, which are connected with computing platforms through wired or wireless communication links. There is a surge of big volume of data traffic generated from MIoT. The MIoT data is featured with large volume, heterogeneous types (i.e., structured, semi-structured, unstructured) and is generated in a real-time fashion. The analytics of MIoT data can bring many benefits, such as improving factory operation and production, reducing machine downtime, improving product quality, enhancing supply chain efficiency and improving customer experience [3, 4, 5]. However, there are also many challenges in data analytics in MIoT in the different phases of the whole life cycle of data analytics.

There are several surveys on data analytics in manufacturing industry. The work of [5] proposes a data-driven smart manufacturing framework and provides several application scenarios based on this conceptual framework. The necessities of big data analytics in smart manufacturing are summaried in [6]. The work of [4] provides an overview on data analytics in manufacturing with a case study. Tao and Qi presents an overview of service-oriented manufacturing in [7]. However, most of the aforementioned studies lack of the introduction of enabling technologies corresponding to the challenges, which are of interest to both academic researchers and industrial practitioners.

Therefore, the aim of this paper is to provide an overview on data analytics in MIoT from opportunities, challenges and enabling technologies. The main contributions of this paper can be summarized as follows.

  • We provide a summary on key characteristics of MIoT and a life cycle of big data analytics for MIoT data. We also discuss necessities and challenges of big data analytics in MIoT.

  • We present an overview on enabling technologies of big data analytics for MIoT from the aspects of data acquisition, data preprocessing and data analytics.

  • We given an outline of future research directions in aspects of security, privacy, fog computing and new data analytics methods.

The rest of this paper is organized as follows. Section II gives the discussion on necessities and challenges of big data analytics in MIoT. Section III introduces enabling technologies of big data analytics in MIoT. Section IV discusses the future research directions. Finally, this paper is concluded in Section V.

Ii Necessities and challenges of big data analytics for Manufacturing Internet of Things

In this section, we first introduce the key characteristics of Manufacturing Internet of Things in Section II-A. We then introduce the life cycle of big data analytics for MIoT in Section II-B. We next discuss the necessities of big data analytics for MIoT in Section II-C and the challenges in Section II-D.

Ii-a Key characteristics of Manufacturing Internet of Things

In this paper, we roughly categorize IoT into consumer Internet of Things (CIoT) and Manufacturing Internet of Things (MIoT). Table I compares MIoT with CIoT. In contrast to MIoT, CIoT mainly serve for consumers. Hence, CIoT mainly consists of consumer devices (e.g., smart phones, wearable electronics) and smart appliances (e.g., refrigerators, TVs, washing machines). CIoT mainly aims to improve user experience while MIoT mainly focuses on improving factory operations and production, reducing the machine downtime and improving product quality. Moreover, MIoT usually works in harsh industrial environment (like vibrated, noisy and extremely high/low temperature) while CIoT works in moderate environment. In addition, MIoT applications usually require high data-rate network connection with low delay while CIoT applications have relaxed requirement on network connection. Furthermore, MIoT systems are usually mission-critical and sensitive to system failure or machinery downtime while CIoT systems are non-mission-critical.

In this paper, we mainly focus on MIoT. The MIoT ensures the connection of various things (smart objects) mounted with various electronic or mechanic sensors, actuators, instruments and software systems which can sense and collect information from the physical environment and then make actions on the physical environment. During this procedure, the data analytics plays an important role in extracting informative values, forecasting the coming events and predicting the increment/decrements of products.

Manufacturing IoT Consumer IoT
Goal Manufacturing-industry Centric Consumer Centric
Devices Machines, Sensors, Controllers, Actuators, Smart meters Consumer devices and Smart appliances
Working Environment Harsh (vibration, noisy, extremely high/low temperature) Moderate
Data rate High (usually) Low or average
Delay Delay sensitive Delay tolerant
Mission Mission-critical Non-mission-critical
TABLE I: Comparison between MIoT and CIoT

Ii-B Life cycle of big data analytics for MIoT

We first introduce the life cycle of big data analytics for MIoT. Figure 1 shows that the life cycle of big data analytics for MIoT consists of three consecutive stages: 1) Data Acquisition, 2) Data Preprocessing and Storage, 3) Data Analytics. There are other taxonomies [8, 9, 5]. We categorize the life cycle of big data analytics into the above three stages since this taxonomy can accurately capture the key features of big data analytics in MIoT.

  1. Data acquisition consists of data collection and data transmission. Firstly, data collection involves acquiring raw data from various data sources in the whole manufacturing process via dedicated data collection technologies. For example, RFID tags are scanned by RFID readers in product warehouse. Then, the collected data will be transmitted to the data storage system through either wired or wireless communication systems. Details about enabling technologies of data acquisition are given in Section III-A.

  2. Data preprocessing and storage. After data collection, the raw data needs to be preprocessed before keeping them in data storage systems because of the big volume, redundancy, uncertainty features of the raw data [4]. The typical data preprocessing techniques include data cleaning, data integration and data compression. Data storage refers to the process of storing and managing massive data sets. We divide the data storage system into two components: storage infrastructure and data management software. The infrastructure not only includes the storage devices but also the network devices connecting the storage devices together. In addition to the networked storage devices, data management software is also necessary to the data storage system. Details about enabling technologies of data preprocessing and data storage are given in Section III-B.

  3. Data analytics.

    In data analysis phase, various data analytical schemes are used to extract valuable information from the massive manufacturing data sets. We roughly categorize the data analytical schemes into four types: (i) statistic modelling, (ii) data visualization, (iii) data mining and (iv) machine learning. Details about enabling technologies of data analysis are presented in Section


Ii-C Necessities of big data analytics for MIoT

There is an enormous amount of data generated from the whole manufacturing chain consisting of raw material supply, manufacturing, product distribution, logistics and customer support, as shown in Figure 1. Such “big data” needs to be extensively analysed so that some valuable and informative information can be extracted.

We summarize the reasons of big data analytics for MIoT as follows:

  • Improving factory operations and production. The predictive analytics of manufacturing data and customer demand data can help to improve machinery utilization consequently enhancing factory operations. For example, the demands for certain products are often related to weather or seasonal conditions (e.g., down coats related to the cold weather). Forecasting a cold wave can be used to make pro-active allocation of machinery resources and pre-purchasing raw materials to fulfill the upsurge demands.

  • Reducing machine downtime. The prevalent sensors deployed throughout the whole product line can collect various data reflecting machinery status. For example, the analysis of machinery health data can help to identify the root cause of failure consequently reducing machine downtime [4]. Moreover, the sensory data from automatic assembly line can also be used to determine excessive load of machines so as to balance the loads among multiple machines [10].

  • Improving product quality. On one hand, the analysis of market demand and customer requirement can be used to improve the product design in reflecting product improvements. During the product manufacturing procedure, the analysis of manufacturing data can help to reduce the ratio of defective goods by identifying the root cause. As a result, the product quality can be improved.

  • Enhancing supply chain efficiency. The proliferation of various sensors, RFID and tags during supplier, manufacturing and transportation generates massive supply chain data, which can be used to analyse supply risk, predict delivery time, plan optimal logistic route, etc. Moreover, the analysis of inventory data can reduce the holding costs and fulfill the dynamic demands by establishing safety stock levels. In addition, big data analytics on IoT-enabled intelligent manufacturing shops [3] can also help to make accurate logistic plan and schedules. As a result, the system efficiency can be greatly improved.

  • Improving customer experience. Companies can obtain customer data from various sources, such as sales channels, partner distributors, retailers, social media platforms. Then, big data analytics on customer data offers descriptive, predictive and prescriptive solutions to enable companies to improve product design, quality, delivery, warrant and after-sales support. As a result, the customer experience can be improved. For example, the IoT data in the whole food supply-chain is also beneficial to prevent mischievous actions and guarantee food safety [11].

Fig. 1: Life cycle of Big Data Analytics for MIoT

Ii-D Challenges of big data analytics for MIoT

MIoT data has the following characteristics: (1) massive volume, (2) heterogeneous data types, (3) being generated in real-time fashion and (4) bringing huge both business value and social value. The unique features cause the research challenges in big data analytics for MIoT. We summarize the challenges in the following aspects.

1. Challenges in data acquisition

Data acquisition addresses the issues including data collection and data transmission, during which there are the following challenges.

  • Difficulty in data representation. MIoT data has different types, heterogeneous structures and various dimensions. For example, manufacturing data can be categorized into structured data, semi-structured and un-structured data [5]. How to represent these structured, semi-structured and un-structured data becomes one of major challenges in big data analytics for MIoT.

  • Efficient data transmission. How to transmit the tremendous volumes of data to data storage infrastructure in an efficient way becomes a challenge due to the following reasons: (i) high bandwidth consumption since the transmission of big data becomes a major bottleneck of wireless communication systems [8]; (ii) energy efficiency is one of major constraints in many wireless industrial systems, such as industrial wireless sensor networks [12].

2. Challenges in data preprocessing and storage

Data generated from MIoT leads to the following research challenges in data preprocessing.

  • Data integration. Data generated in MIoT has the various types and heterogeneous features. It is necessary to integrate the various types of data so that efficient data analytics schemes can be implemented. However, it is quite challenging to integrate different types of MIoT data.

  • Redundancy reduction. The raw data generated from MIoT is characterized by the temporal and spatial redundancy, which often results in the data inconsistency consequently affecting the subsequent data analysis. How to mitigate the data redundancy in MIoT data becomes a challenge.

  • Data cleaning and data compression. In addition to data redundancy, MIoT data is often erroneous and noisy due to the defected machinery or errors of sensors. However, the large volume of the data makes the process of data cleaning more challenging. Therefore, it is necessary to design effective schemes to compress MIoT data and clean the errors of MIoT data.

Data storage plays an important role in data analysis and value extraction. However, designing an efficient and scalable data storage system is challenging in MIoT. We summarize the challenges in data storage as follows.

  • Reliability and persistency of data storage. Data storage systems must ensure the reliability and the persistency of MIoT data. However, it is challenging to fulfill the above requirements of big data analytics while balancing the cost due to the tremendous amount of data [13].

  • Scalability. Besides the storage reliability, another challenging issue lies in the scalability of storage systems for big data analytics. The various data types, the heterogeneous structures and the large volume of massive data sets of MIoT lead to the in-feasibility of conventional databases in big data analytics. As a result, new storage paradigms need to be proposed to support large scale data storage systems for big data analytics.

  • Efficiency. Another concern with data storage systems is the efficiency. In order to support the vast number of concurrent accesses or queries initiated during the data analytics phase, data storage needs to fulfill the efficiency, the reliability and the scalability requirements together, which is extremely challenging.

3. Challenges in data analytics

It is quite challenging in big data analytics for MIoT due to the tremendous volume, the heterogeneous structures and the high dimension. The major challenges in this phase are summarized as follows.

  • Data temporal and spatial correlation. Different from conventional data warehouses, MIoT data is usually spatially and temporally correlated. How to manage the data and extract valuable information from the temporally/ spatially-correlated MIoT data becomes a new challenge.

  • Efficient data mining schemes. The tremendous volume of MIoT data leads to the challenge in designing efficient data mining schemes due to the following reasons: (i) it is not feasible to apply conventional multi-pass data mining schemes due to the huge volume of data, (ii) it is critical to mitigate the data errors and uncertainty due to the erroneous features of MIoT data.

  • Privacy and security. It is quite challenging to pertain the privacy and ensure the security of data during the analytics process. Though there are a number of conventional privacy-preserving data analytical schemes, they may not be applicable to the MIoT data with the huge volume, heterogeneous structures, and spatio-temporal correlations. Therefore, new privacy-preserving data mining schemes need to be proposed to address the above issues.

Iii Enabling Technologies

In this section, we discuss the enabling technologies of big data analytics in MIoT. According to the three phases in the life cycle of big data analytics in MIoT, we roughly categorize these technologies into data acquisition, data preprocessing and storage, data analytics. In particular, we first discuss the data acquisition related technologies in Section III-A. We then describe the data preprocessing and storage in Section III-B. In Section III-C, we discuss the data analytics in MIoT.

Iii-a Data acquisition

Fig. 2: Wireless Communication Technologies for MIoT (figure is not scalable)

As shown in Figure 1, the whole manufacturing chain involves with multiple parties such as suppliers, manufacturers, distributors, logistics, retailers and customers. As a result, different types of data sources generate from each of these sectors. Take a manufacturing factory an example. Sensors deployed at the production line can collect device data, product data, ambient data (like temperature, humidity, air pressure), electricity consumption, etc. In the product warehouse, RFID or other tags can help to identify and track products. RFID tags attached at products can be read in a short distance by a RFID reader in a wireless manner.

The collected data can then be transmitted to the next stage via either wired or wireless manner. Industrial Ethernet is one of the most typical wired connections in manufacturing. When Ethernet is applied to an industrial setting, more rugged connectors and more durable cables are often required to satisfy harsh environment requirements (like vibration, noise and temperature). Compared with wired communications, wireless communications do not require communication wiring and related infrastructure consequently saving the cost and improving scalability. The major obstacle of the wide deployment of wireless communications in industrial systems is the lower throughput and the higher delay than wired communications. However, the recent advances in wireless communications make wireless connections feasible in industrial components.

Various sensors, RFIDs and other tags can connect with IoT gateways, WiFi Access Points (APs), small base station (BS) and macro BS to form an industrial wireless sensor networks (IWSN) [14]. It is worth mentioning that different wireless technologies have different coverage and bandwidth capabilities. Figure 2 gives the comparison of various wireless technologies regarding to coverage and bandwidth. In particular, it is shown in Figure 2 that conventional wireless technologies like Near Field Communications (NFC), RFID, Bluetooth Low Energy (LE), wireless body sensor networks (WBAN), Internet Protocol (IPv6), Low-power Wireless Personal Area Networks (6LoWPAN) and Wireless Highway Addressable Remote Transducer (WirelessHART) [15] are suffering from short communication range (i.e., most of them can typically cover less than hundreds of meters). As a result, they cannot support the wide-coverage industrial applications, like smart metering, smart cities and smart grids [16]. It is true that other wireless technologies such as WiFi (IEEE 802.11) and mobile communication technologies (such as 2G, 3G, 4G networks) can provide longer coverage range while they often require high energy consumption at handsets, whereas most of sensor nodes have the limited energy (i.e., supplied by batteries). Therefore, WiFi and other mobile communication technologies may not be feasible in IWSN due to the high energy consumption.

Recently, Low Power Wide Area Networks (LPWAN) essentially provide a solution to the wide coverage demand while saving energy. Typically LPWAN technologies include Sigfox, LoRa, Narrowband IoT (NB-IoT) [17]. LPWAN has lower power consumption than WiFi and mobile communication technologies. Take NB-IoT as an example. It is shown in [16] that an NB-IoT node can have a ten-year battery life. Moreover, LPWAN has a longer communication range than RFID, bluetooth and 6LoWPAN. In particular, LPWAN technologies have the communication range from 1km to 10 km. Furthermore, they can also support a large number of concurrent connections (e.g., NB-IoT can support 52,547 connections as shown in [16]). However, one of limitations of LPWAN technologies is the low data rate (e.g., NB-IoT can only support a data rate upto 250 kps). Therefore, LPWAN technologies should complement with conventional RFID, 6LoWPAN and other wireless technologies so that they can support the various data acquisition requirements.

Iii-B Data preprocessing and storage

Iii-B1 Data preprocessing

Data acquired from MIoT has the following characteristics:

  • Heterogeneous data types. The whole manufacturing chain generates various data types including sensory data, RFID readings, product records, text, logs, audio, video, etc. The data is in the forms of structured, semi-structured and non-structured.

  • Erroneous and noisy data. The data obtained from industrial environment is often erroneous and noisy mainly due to the following reasons: (a) interference during the process of data collection especially in industrial environment, (b) the failure and malfunction of sensors or machinery, (c) intermittent loss or outage of wireless or wired communications [18]. For example, wireless communications are often susceptible to harsh industrial environmental factors like blockage, shadowing and fading effects. Moreover, data transmission may fail in industrial WSNs due to the depletion of batteries of sensors or machinery.

  • Data redundancy. Data generated in MIoT often contain excessively redundant information. For instance, it is shown in [19] that there are excessive duplicated RFID readings when multiple RFID tags were scanned by several RFID readers at different time slots. The data redundancy often results in data inconsistency.

Fig. 3: Data preprocessing techniques

Data preprocessing approaches on MIoT data include data cleaning, data integration and data compression as shown in Figure 3. In industrial environment, sensory data is usually uncertain and erroneous due to the depletion of battery power of sensors, imprecise measurement of sensors and communication failures. There are several approaches proposed to address these issues. For example, [20]

proposed RFID-Cuboids approach to remove redundant readings and eliminate the missing values. Moreover, an Indoor RFID Multi-variate Hidden Markov Model (IR-MHMM) was proposed to determine uncertain data and remove duplicated RFID readings as shown in

[21]. Furthermore, a machine-learning based method was proposed to filter out the invalid RFID readings [22]. In addition, the study of [23] proposed an auto-correlation based scheme to remove duplicated time-series temperature data. In [24], a novel data cleaning mechanism was proposed to clean erroneous data in environmental sensing applications. Besides duplicated readings, there also exist missing values in MIoT data. In [25]

, an interpolation method was proposed to recover the missing values of smart grids data. Moreover, energy-saving is a critical issue in data-cleaning algorithms used in MIoT. In

[26], an energy-efficient data-cleaning scheme was proposed.

Iii-B2 Data storage

Data storage plays an important role in big data analytics for MIoT. We summarize the solutions of data storage in two aspects: 1) storage infrastructure and 2) data management software.

Storage infrastructure consists of a number of interconnected storage devices. Storage devices typically include: magnetic Harddisk Drive, Solid-State Drives, magnetic taps, USB flash drives, Secure Digital (SD) cards, micro SD cards, Read-Only-Memory (ROM), CD-ROMs, DVD-ROMs, etc. These storage devices can be connected together (via wired or wireless connections) to form the storage infrastructure for MIoT in industrial environment.

Besides storage infrastructure, data management software plays an important role in constructing the scalable, effective, reliable storage system to support big data analytics in MIoT. As shown in Figure 1, the data management software consists of three layered components:

  • Distributed file systems. Google File System (GFS) was proposed and developed by Google [27] to support the large data intensive distributed applications such as search engine. Moreover, Hadoop Distributed File System (HDFS) was proposed by Apache [28] as an alternative to GFS. In addition, there are other distributed file systems, such as C# Open Source Managed Operating System (Cosmos) proposed by Microsoft [29], XtreemFS [30] and Haystack proposed by Facebook [31]. Most of them can partially or fully support the storage of large scale data sets. Therefore, most of them can offer the support for large scale data storage of MIoT data.

  • Database management systems (DBMS). DBMS offers a solution to organize the data in an efficient and effective manner. DBMS software tools can be roughly categorized into two types: traditional relational DBMS (aka SQL databases) and non-relational DBMS (aka Non-SQL databases). SQL databases have been a primary data management approach, especially useful to Material Requirements Planning (MRP), Supply Chain Management (SCM), Enterprise Resource Planning (ERP) in the whole manufacturing chain. Typical SQL databases including commercial databases, such as Oracle, Microsoft SQL server and IBM DB2, and open-source alternatives, such as MySQL, PostgreSQL and SQLite. SQL databases usually store data in tables of records (or rows). This storage method neverthless leads to the poor scalability of databases. For example, when data grows, it is necessary to distribute the load among multiple servers. One of benefits of SQL databases is that most of SQL databases can guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties of database transactions, which is crucial to many commercial applications (e.g., ERP and inventory management). Different from SQL databases, NoSQL databases support various types of data, such as records, text, and binary objects. Compared with traditional relational databases, most of NoSQL databases are usually highly scalable and can support the tremendous amount of data. Therefore, NoSQL databases are promising in managing sensory data, device data, RFID trajectory data in MIoT [4].

  • Distributed computing models. There are a number of distributed computing models proposed for big data analytics. For example, Google MapReduce [32] is one of the typical programming models used for processing large data sets. Hadoop MapReduce [33] is the open source implementation of Google MapReduce. MapReduce is suffering from the lack of iterations or recursions, which are however required by many data analytics applications, such as data mining, graph analysis and social network analysis. There are some extensions to MapReduce to address this concern, including HaLoop [34], Berkeley Orders of Magnitude (BOOM) Analysis [35], Twister [36], iHadoop [37] and iMapReduce [38]. In addition to MapReduce, there are other alternatives such as Dryad [39], Nephele/PACTs system [40], Spark [41], Pregel [42], Hive [43], GraphLab [44].

  • Virtual machines and containers. Virtual machines (VMs) have been widely used to support cloud computing. Through virtualization, multiple VMs can be emulated on a single computer system. VMs can help to achieve the isolation of multiple virtual operating systems, on top of which multiple applications can be supported. Different from VMs, containers run on top of a single operating system and a single hardware while containers separate the applications as well as the underneath binary and library files. Therefore, containers can achieve the lightweight virtualization, consequently resulting the super fast booting speed, small size, less resource consumption (compared with VMs). The lightweight features of containers lead to the feasibility to edge computing scenarios (to be illustrated in Section III-D).

Iii-C Data analytics

Iii-C1 Typical data analytics approaches

Typical data analytics approaches include: 1) Statistical modeling schemes, 2) Data mining schemes, 3) Machine learning schemes and 4) Data visualization.

Statistical modeling methods are mainly based on statistical theory. There are three types of statistical methods: (i) descriptive statistics that is used to quantify relationships in data


; (ii) inferential statistics that is used to to deduce generalizations from the sample data sets

[46]; (iii) stochastic modeling methods can capture the dynamic features of data traffic, predict user mobility and track objects [47, 48].

Data mining is the process of extracting useful information from massive data sets. There are a wide variety of data mining algorithms that can be used in MIoT such as Apriori algorithm, Frequent Pattern Growth (FP-Growth) algorithm, Density-based spatial clustering of applications with noise (DBSCAN), Generalized Sequential Pattern (GSP), Sequential Pattern Discovery Using Equivalent Class (SPADE) and Prefix-Projected Sequential Pattern Mining (PrefixSpan) [49].

Machine learning explores to construct self-adaptive algorithms that can learn from existing data and perform predictive analysis. As one of typical applications of machine learning, data mining has emphasis on extracting valuable information from data. Typical Machine learning algorithms include support vector machines (SVMs)


, naive Bayes


, Decision tree learning

[52], -Nearest Neighbors (-NN) [53]

, hidden Markov model, Bayesian networks


, neural networks

[55], Ensemble methods [56], -means [57]

, singular value decomposition (SVD), Principal Component Analysis (PCA)


and reinforcement learning algorithms such as Q-learning


Fig. 4: Data analytics

Iii-C2 Taxonomy of data analytics approaches in MIoT

We next present an overview of data analytics in MIoT in the aspect of MIoT applications. In particular, data analytics methods in MIoT can be roughly categorized into: 1) Descriptive analytics, 2) Diagnostic analytics, 3) Predictive analytics, 4) Prescriptive analytics. This classification can better represent the data analytics in MIoT applications in different levels of complexity and extracted values. Figure 4 depicts different levels of data analytics methods in MIoT applications. Both descriptive and diagnostic analytics methods are reactive while predictive and prescriptive analytics approaches are proactive. Moreover, prescriptive and predictive analytics approaches are more complicated than descriptive and diagnostic analytics methods though they can bring more values than descriptive and diagnostic analytics. We then present an overview of existing studies in the four levels of data analytics.

(1) Descriptive analytics

Descriptive analytics is an exploratory analysis of historical data to tell what happened. During this stage, most of data mining and statistic methods can be used to reveal the data characteristics, recognize patterns and identify relationships of data objects. Descriptive analytics can be used in the whole life cycle of manufacturing data. In particular, a real-time monitoring system was proposed in [59] to track the different manufacturing resources. Zhong et al. [60] proposed RFID-Cuboid framework to integrate production logistic data with RFID data and offered a system prototype to visualize logistic trajectory data. Moreover, the study of [61] presented a cloud-based approach to evaluate the energy consumption during product manufacturing process. In addition, air-qualtiy monitoring system based on wireless sensor networks at a logistics shipping base was proposed in [62].

(2) Diagnostic analytics

Diagnostic analytics is a deeper look at data to attempt to understand the causes of events and behaviours. The diagnostic analysis of machines and other equipments can help to identify the possible faults and predict the failures to reduce the machine down-times. For example, a method of integrating SVM and artificial neural network (ANN) was presented to detect and diagnose machinery faults of centrifugal pumps [63]. The study of [64]

proposed fault detection methods for propeller ventilation of vessels based on Kalman filter. Wuest et al. put forth a surpervised maching learning method to monitor product quality in


. Compared with supervised machine learning methods, unsupervised learning methods require less feature engineering efforts in obtaining features consequently saving the time and the labor. In

[66], a two-stage unsupervised learning method was proposed to conduct diagnostic analysis of machine faults. In addition to fault diagnosis, anomaly detection

(or outlier detection) is to identify data objects that do not comply with an expected pattern as given. In


, a deep learning based method was proposed to detect electric theft via anomaly detection of electricity consumption data in smart grids.

(3) Predictive analytics

Predictive analytics mainly utilizes historical data to anticipate the trends of data (i.e., what will occur in the future). In[67]

, a random forests (RFs) based method was proposed to predict the tool (machine) wear in manufacturing cycle. It is also shown in

[67] that RFs method outperforms ANN and SVMs in terms of prediction accuracy. One of challenges in data analytics of MIoT data is the imbalanced number of negative and postive samples [4]. The study of [68] proposed a cost-sensitive decision tree ensemble algorithm to address this issue. Extensive experimental results show that the proposed method outperforms other existing baseline methods. Moreover, in [69], a deep-learning based method was proposed to predict product surface defects. In addition, consumer behaviour prediction plays an important role in manufacturing business stage, e.g., to improve the consumers’ purchase decision-makeing predictions. In [70], a Bayesian network based approach was proposed to predict the customer purchase behaviour. In particular, the analysis is based on massive RFID data, which was collected through RFID tags attached at customers.

(4) Prescriptive analytics

Prescriptive analytics extends the results of descriptive, diagnostic and predictive analytics to make right decisions in order to achieve predicted outcomes (i.e., what should we do to achieve the goal?). The prescriptive methods typically include simulation, decision-making, optimization and reinforcement learning algorithms. In particular, in [71], a conceptual design approach was proposed to simulate the configuration and procedural training in a bio-ethanol plant. The study of [72] presents a novel method for manufacturing-networks design via intelligent decision-making on selecting suppliers to fulfill the requirements of frugal innovation. In [73], an analytic hierarchy process (AHP) based method was proposed to evaluate manufacturing sustainability performance. Moreover, in [74], a novel method with the integration of Timed Colored Petri Nets (CTPNs) and reinforcement learning (RL) was proposed to solve the problem of manufacturing scheduling.





Descriptive What happened?
    Association rule mining
    Clustering, sequential pattern mining
    Querying, statistic reporting
    Data visualization
Misbehaviour pattern capturing
Product status checking
Benchmark analysis
[59] [60] [61] [62]
Diagnostic Why it happened?
    Bayesian analysis
Fault diagnosis
Root-cause analysis of failure
Anomaly detection
[63] [64] [65] [66] [25]
Predictive What might happen in the future?
    Classification, regression
    Machine learning (supervised
    Deep learning
Trajectory prediction
Consumer behaviour prediction
Device maintenance prediction
[67] [68] [69] [70]
Prescriptive What should be done?
    Reinforcement learning (e.g., Q-Learning)
    Decision making: e.g., Analytic Hierarchy
   Process (AHP), The Technique for Order
   of Preference by Similarity to Ideal
   Solution (TOPSIS)
System resilience
System reliability
System optimization
[71] [72] [73] [74]
TABLE II: Classification of data analytics approaches in MIoT

Table II summarizes data analytics methods used for MIoT. We categorize them into four types according to different levels in terms of complexity and extracted values. Moreover, we also enumerate representative data analytics methods in each category. In addition, we also list representative application cases in each category.

Iii-C3 Data visualization in MIoT

In addition to the aforementioned data analytics, data visualization is also an important tool in MIoT data. Effective data visualization procedure can help to extract and interpret the informative values from complex and high-dimensional MIoT data [75]. Typical data visualization methods include information visualization, exploratory data analysis, statistic plots. The typical quantitative messages that are conveyed by data visualization include: time-series, ranking, frequency distribution, deviation, correlation, part-to-whole, geographic [76]. The basic data visualization techniques include: 1) various statistic plots (e.g., bar chart, histogram, pie diagram, scatter plots), 2) word clouds of text data, 3) correlation coefficient matrices/functions, 4) network/graph diagrams of non-structural data, 5) heat map of geographic data.

Typical data visualization toolboxes include Matlab plot (, gnuplot (, Python’s Seaborn (, Pandas plot (, Matplotlib ( Moreover, web-based visualization tools have also been wide used. Representative web-based data visualization tools include Tableau (, Plotly (, Sisense (, D3.js (

Iii-D Case studies

To demonstrate the feasibility of distributed computing models in MIoT, we developed a system prototype. Figure 5(a) shows that the system framework consists of a production line, industrial devices and computing units. In particular, the production line consists of various manufacturing devices, instruments, sensors, actuators and robot arms, all of which are connected through wired or wireless links consequently forming the MIoT. In addition to the production line and industrial devices, there are a number computing units supporting diverse data processing tasks. For example, edge computing servers with equipped with embedded computers are deployed in the proximity to MIoT. Moreover, the computing-intensive tasks may be uploaded to the remote cloud servers while the latency-sensitive tasks may be processed at edge servers.

(a) System Prototype
(b) Realistic deployment of system prototype
Fig. 5: Case study for distributed computing models for MIoT

In the computing perspective, we develop a distributed computing platform with the orchestration of remote cloud computing and local edge computing. In particular, we deploy Xen hypervisor at remote cloud servers and Docker container at edge servers. On top of virtual machines, we further utilize Hadoop distributed computing platforms to support big data processing tasks. In order to coordinate the edge and cloud computing tasks, we design and implement a hybrid edge/cloud computing framework (details can be referred to the work [77]).

Figure 5(b) gives the realistic prototype of a printed circuit board (PCB) production line based on our proposed system framework. This production line consists of conveyor belts, product feeding machines, robot arms, sensors and cameras. We choose industrial WLANs as the wired connections and 6LoWPAN as the wireless connections. In addition, we adopt 4 edge servers, each of which has the identical configurations: a single-board computer with a quad-core Broadcom BCM2837 CPU, 1GB memory and 64GB SSD storage. Furthermore, there is a remote cloud server (i.e., IBM X3650 M3) with 2 Intel Xeon Processors, 24 GB memory and 1TB SSD storage.

We then evaluate the performance of the proposed hybrid edge/cloud computing framework on top of the prototype. In particular, we consider a pure cloud computing framework and a pure edge computing framework as baseline models. Moreover, image recognition tasks with varied image size were chosen to be executed at edge and cloud servers. We further adopt OpenCV frameworks on both edge and cloud servers to support the image recognition tasks.

Table III shows the latency values of three computing frameworks versus varied image sizes. In particular, the latency is calculated via averaging results with 100 images, each with the same image size (e.g., 10 MB). It is shown in Table III that the average latency is increased with the increased image size; this effect may owe to the increased computational complexity of image recognition algorithms with the increased image size. We also observe from Table III that the proposed hybrid cloud and edge scheme outperforms pure cloud computing scheme and pure edge computing scheme with larger image size (e.g., 16 MB, 18 MB and 20 MB). It can be explained as follows: 1) pure cloud computing has the strength in processing large images while suffering from the long end-to-end latency; 2) pure edge computing scheme can complete the computing tasks with smaller image size (e.g., 12 MB) and achieve the short end-to-end latency due to the deployment proximity; 3) hybrid edge/cloud computing scheme can not only exploit the strength of cloud computing to process the complicated tasks but also harness the benefit of edge computing in short latency, consequently obtaining the better performance in the cases with larger image size.

10 MB 12 MB 14 MB 16 MB 18 MB 20 MB
Cloud Computing Only (second) 1.20 1.48 1.67 1.82 2.08 2.45
Edge Computing Only (second) 0.61 0.86 0.97 1.15 1.26 1.43
Hybrid Cloud and Edge (second) 0.75 0.93 0.98 0.86 0.97 0.96
TABLE III: Performance evaluation

Iv Future research directions

In this section, we discuss open issues as well as future directions in big data analytics for MIoT. Figure 6 summarizes the future directions in big data analytics in MIoT.

Iv-a Security and Privacy Concerns

Privacy and security are becoming an arising challenge of big data analytics for MIoT. Privacy concerns the proper utilization of the data with the preservation of enterprise private information, whereas security is to ensure data confidentiality, integrity and availability [78]. We next summarize the research issues related to privacy and security in big data analytics for MIoT.

  • Security assurance in data acquisition. The proliferation of wireless connections in manufacturing industry results in the challenges in security assurance during data acquisition because of the openness of wireless medium susceptible to malicious attacks like passive eavesdropping attacks [79]. The typical countermeasure is to apply encryption schemes in wireless networks [80]. However, it may not be feasible to apply cryptography-based techniques in all IoT networks due to the following constraints: the inferior computational capability and the limited battery power of some smart objects like RFID and sensors. Therefore, new protection schemes without strong computational complexity and high energy consumption shall be developed for MIoT in the future. Blockchain, featured with security and reliability, can potentially improve the security and reliability of MIoT [81].

  • Privacy preservation and security assurance in data preprocessing and storage. After data acquisition, MIoT data will be preprocessed and stored locally (at servers of factories or other departments) or remotely (at remote cloud servers) [82]. However, the distribution of MIoT data throughout the enterprise consisting of multiple manufacturing sites across different regions often results in the vulnerability to various malicious attacks from insiders and outsiders of the enterprise. It is challenging to offer a solution against malicious attacks. There are several possible directions in solving this issue: 1) Proper key management [83] including proper key distribution and key validation period, 2) authentication mechanism including accessing control of files and data records, 3) traceability of data accessing allowing any data accessing or modification to be identifiable so that the malicious behaviours can be avoided or revoked.

  • Privacy preservation in data analytics. In order to protect data privacy, the data is often encrypted and stored at a server (or at a cloud). Before data analytics, the data needs to be decrypted. However, the decryption process is often time-consuming consequently resulting in the inefficiency of data analytics in MIoT. How to design a privacy-preservation scheme of balancing the efficiency and privacy becomes a challenge [84, 85].

Fig. 6: Future directions in big data analytics of MIoT

Iv-B Edge Computing for big data analytics in MIoT

The integration of cloud computing with manufacturing brings the opportunities in saving the capital investments of information and communication technologies (ICT), providing flexibility of ICT resources to small and medium enterprises [82, 83]. However, there are also limitations with cloud computing such as high latency, performance bottleneck, single-point-to-failure and privacy leakage [86]. Recently, mobile edge computing (or fog computing) has become a new complement to cloud computing by offloading both computational and storage tasks from remote cloud servers to local edge servers [87, 88, 89]. In this manner, the computing-intensive and delay-tolerant tasks will be executed at remote cloud servers while the delay-critical and computing less-intensive tasks will be offloaded to edge servers. As a result, the real-time tasks like sensing, monitoring and controlling can be enabled in the proximity to factories and enterprises. The case study in Section III-D also demonstrates the effectiveness of hybrid edge and cloud computing in MIoT.

However, there are many challenges in edge computing for big data analytics in MIoT.

  • Collaboration between cloud and edge servers. There are diversity of computing resources in manufacturing networks. For example, remote cloud servers usually have superior computing capability than local edge servers while there is a longer delay to upload the tasks to the remote cloud servers than to upload the tasks to the local edge servers Therefore, it is necessary to determine how to allocate the computational tasks at cloud servers or at edge servers. For example, the computing intensive and delay-tolerant tasks should be uploaded to remote cloud servers while the computing less-intensive and delay-critical tasks can be executed locally at edge servers. In this sense, edge servers can be deployed within factories and remote clouds can be deployed outside factories (even if they can be provided by third parties). To the best of our knowledge, there are few studies on investigating collaboration between cloud and edge servers, especially in the whole manufacturing network. In the future, research efforts should be done in allocating and coordinating various computing resources distributed in cloud and edge servers in manufacturing.

  • Design lightweight data analytics methods for MIoT. Many data analytics tasks that are delay-critical should be executed locally at edge servers (or at manufacturing devices). However, due to the resource limitation of edge severs, the conventional data analytics methods might be too complicated to be executed at edge servers. Therefore, the models of the data analytics methods need to be trained at remote cloud servers first and be transferred at local edge servers. However, it can result in huge communication cost to transmit this model from the remote cloud servers to the edge servers. For example, the study of [90] shows that AlexNet (i.e., a typical deep learning method) has the model size of 240MB, which is so large that it can cause extra delay from the cloud server to the edge server. Therefore, it is necessary to design lightweight data analytics schemes which can be deployed locally at edge servers approximate to users [91].

Iv-C New data analytics methods for MIoT data

Although a lot of efforts have been done in developing data analytics methods for MIoT data, there are still many open research issues in this area.

  • Imbalanced data samples. Different from data analytics in traditional fields (e.g., commercial database systems), manufacturing data has the imbalanced number of data samples between positive and negative samples. For example, it is shown in [4] that the ratio of positive samples to negative samples (vice versa) can be 99,000,000 to 1. It is challenging to apply conventional data analytics methods to analyse the imbalanced dataset. Therefore, new data analytics methods should be developed to solve this issue. To the best of our knowledge, there are few studies [68] proposed to address this issue.

  • Stream data processing. In MIoT, there is a tremendous volume of real-time data generated (e.g., sensory data from industrial wireless sensor networks) [92]. It is impossible to store and process the entire data in the memory of computers. Consequently, the conventional methods requiring saving the whole data sets in memory cannot work in this scenario. It is challenging to analyse the massive data-stream of MIoT. It is worthwhile to investigate new data analytics approaches to process the data-stream of MIoT.

V Conclusion

This paper presents an in-depth survey on big data analytics in manufacturing Internet of Things (MIoT). This paper first presents a life cycle of big data analytics in MIoT and discusses the necessities as well as challenges of big data analytics in MIoT. Then, the enabling technologies of big data analytics in MIoT are summarized according to three phases in the life cycle of big data analytics: data acquisition, data preprocessing and storage, and data analytics. Moreover, this paper also outlines the future directions and discusses the open research issues. We believe big data analytics will play an important role in promoting manufacturing industry to evolve into smart manufacturing in the foreseeable future.

Disclosure statement

The authors declare that they have no potential conflict of interest.


The research of Hong-Ning Dai and Hao Wang is supported by Macao Science and Technology Development Fund under Grant No. 0026/2018/A1, National Natural Science Foundation of China (NFSC) under Grant No. 61672170, NSFC-Guangdong Joint Fund under Grant No. U1401251, and Science and Technology Program of Guangzhou under Grant No. 201807010058. Guangquan Xu’s work is supported by the State Key Development Program of China (No. 2017YFE0111900), National Science Foundation of China (No. 61572355, U1736115). Jiafu Wan’s work is supported by Science and Technology Program of Guangzhou (No. 201802030005), Guangdong Province Key Areas R & D Program (No. 2019B090919002). Muhammad Imran’s work is supported by the Deanship of Scientific Research, King Saud University through research group number RG-1435-051.


  • [1] A. Kusiak, “Smart manufacturing,” International Journal of Production Research, vol. 56, no. 1-2, pp. 508–517, 2018.
  • [2] H.-N. Dai, R. C.-W. Wong, H. Wang, Z. Zheng, and A. V. Vasilakos, “Big data analytics for large scale wireless networks: Challenges and opportunities,” ACM Computing Surveys, 2019. [Online]. Available:
  • [3] R. Y. Zhong, C. Xu, C. Chen, and G. Q. Huang, “Big data analytics for physical internet-based intelligent manufacturing shop floors,” International Journal of Production Research, vol. 55, no. 9, pp. 2610–2621, 2017.
  • [4] P. Lade, R. Ghosh, and S. Srinivasan, “Manufacturing analytics and industrial internet of things,” IEEE Intelligent Systems, vol. 32, no. 3, pp. 74–79, 2017.
  • [5] F. Tao, Q. Qi, A. Liu, and A. Kusiak, “Data-driven smart manufacturing,” Journal of Manufacturing Systems, 2018.
  • [6] A. Kusiak, “Smart manufacturing must embrace big data.” Nature, vol. 544, no. 7648, p. 23, 2017.
  • [7] F. Tao and Q. Qi, “New it driven service-oriented smart manufacturing: Framework and characteristics,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 1, pp. 81–91, Jan 2019.
  • [8] H. Hu, Y. Wen, T. S. Chua, and X. Li, “Toward scalable systems for big data analytics: A technology tutorial,” IEEE Access, vol. 2, pp. 652–687, 2014.
  • [9] R. Casado and M. Younas, “Emerging trends and technologies in big data processing,” Concurr. Comput. : Pract. Exper., vol. 27, no. 8, pp. 2078–2091, 2015.
  • [10] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for smart manufacturing: Methods and applications,” Journal of Manufacturing Systems, 2018.
  • [11] K. Leng, L. Jin, W. Shi, and I. Van Nieuwenhuyse, “Research on agricultural products supply chain inspection system based on internet of things,” Cluster Computing, Feb 2018.
  • [12] E. Azoidou, Z. Pang, Y. Liu, D. Lan, G. Bag, and S. Gong, “Battery lifetime modeling and validation of wireless building automation devices in thread,” IEEE Transactions on Industrial Informatics, 2017.
  • [13] J. Guerra, H. Pucha, J. Glider, W. Belluomini, and R. Rangaswami, “Cost effective storage using extent based dynamic tiering,” in Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST), 2011.
  • [14] Q. Chi, H. Yan, C. Zhang, Z. Pang, and L. D. Xu, “A Reconfigurable Smart Sensor Interface for Industrial WSN in IoT Environment,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1417–1425, 2014.
  • [15] S. Petersen and S. Carlsen, “Wirelesshart versus isa100.11a: The format war hits the factory floor,” IEEE Industrial Electronics Magazine, vol. 5, no. 4, pp. 23–34, Dec 2011.
  • [16] J. Xu, J. Yao, L. Wang, Z. Ming, K. Wu, and L. Chen, “Narrowband internet of things: Evolutions, technologies and open issues,” IEEE Internet of Things Journal, vol. PP, no. 99, pp. 1–13, 2017.
  • [17] K. Mekki, E. Bajic, F. Chaxel, and F. Meyer, “A comparative study of LPWAN technologies for large-scale IoT deployment,” ICT Express, 2018.
  • [18] A. Siddiqa, I. A. T. Hashem, I. Yaqoob, M. Marjani, S. Shamshirband, A. Gani, and F. Nasaruddin, “A survey of big data management: Taxonomy and state-of-the-art,” Journal of Network and Computer Applications, vol. 71, pp. 151 – 166, 2016. [Online]. Available:
  • [19] G. Ertek, X. Chi, and A. N. Zhang, “A framework for mining rfid data from schedule-based systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 11, pp. 2967–2984, 2017.
  • [20] R. Y. Zhong, G. Q. Huang, S. Lan, Q. Dai, X. Chen, and T. Zhang, “A big data approach for logistics trajectory discovery from rfid-enabled production data,” International Journal of Production Economics, vol. 165, pp. 260 – 272, 2015. [Online]. Available:
  • [21] A. I. Baba, H. Lu, T. B. Pedersen, and M. Jaeger, “Cleansing indoor rfid tracking data,” SIGSPATIAL Special, vol. 9, no. 1, pp. 11–18, July 2017.
  • [22] H. Ma, Y. Wang, and K. Wang, “Automatic detection of false positive RFID readings using machine learning algorithms,” Expert Systems with Applications, vol. 91, pp. 442 – 451, 2018.
  • [23] S. Bhandari, N. Bergmann, R. Jurdak, and B. Kusy, “Time series data analysis of wireless sensor network measurements of temperature,” Sensors, vol. 17, no. 6, 2017.
  • [24] S. Tasnim, N. Pissinou, and S. S. Iyengar, “A novel cleaning approach of environmental sensing data streams,” in 2017 14th IEEE Annual Consumer Communications Networking Conference (CCNC), 2017, pp. 632–633.
  • [25]

    Z. Zheng, Y. Yang, X. Niu, H. N. Dai, and Y. Zhou, “Wide and Deep Convolutional Neural Networks for Electricity-Theft Detection to Secure Smart Grids,”

    IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1606–1615, 2018.
  • [26] C. Deng, R. Guo, C. Liu, R. Y. Zhong, and X. Xu, “Data cleansing for energy-saving: a case of cyber-physical machine tools health monitoring system,” International Journal of Production Research, vol. 56, no. 1-2, pp. 1000–1015, 2018.
  • [27] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” in Proceedings of ACM SOSP, 2003.
  • [28] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
  • [29] R. Chaiken, B. Jenkins, P.-A. k. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, “Scope: Easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow., vol. 1, no. 2, pp. 1265–1276, Aug. 2008.
  • [30] F. Hupfeld, T. Cortes, B. Kolbeck, J. Stender, E. Focht, M. Hess, J. Malo, J. Marti, and E. Cesario, “The xtreemfs architecture—a case for object-based file systems in grids,” Concurrency and Computation: Practice and Experience, vol. 20, no. 17, pp. 2049–2060, Dec. 2008.
  • [31] D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel, “Finding a needle in haystack: Facebook’s photo storage,” in Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI), 2010.
  • [32] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
  • [33] Apache, “Hadoop mapreduce,” 2014. [Online]. Available:
  • [34] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, “Haloop: Efficient iterative data processing on large clusters,” Proc. VLDB Endow., vol. 3, no. 1-2, Sept. 2010.
  • [35] P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. M. Hellerstein, and R. Sears, “Boom analytics: Exploring data-centric, declarative programming for the cloud,” in Proceedings of the 5th European Conference on Computer Systems (EuroSys), 2010.
  • [36] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox, “Twister: A runtime for iterative mapreduce,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ), 2010.
  • [37] E. Elnikety, T. Elsayed, and H. E. Ramadan, “ihadoop: Asynchronous iterations for mapreduce,” in IEEE CloudCom, 2011.
  • [38] Y. Zhang, Q. Gao, L. Gao, and C. Wang, “imapreduce: A distributed computing framework for iterative computation,” Journal of Grid Computing, vol. 10, no. 1, pp. 47–68, 2012.
  • [39] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks,” SIGOPS Oper. Syst. Rev., vol. 41, no. 3, pp. 59–72, Mar. 2007.
  • [40] D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke, “Nephele/pacts: A programming model and execution framework for web-scale analytical processing (socc),” in Proceedings of the 1st ACM Symposium on Cloud Computing, 2010.
  • [41] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), 2010.
  • [42] G. e. a. Malewicz, “Pregel: A system for large-scale graph processing,” in Proceedings of ACM SIGMOD, 2010.
  • [43] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, “Hive - a petabyte scale data warehouse using hadoop,” in IEEE 26th International Conference on Data Engineering (ICDE), 2010.
  • [44] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: A framework for machine learning and data mining in the cloud,” Proc. VLDB Endow., vol. 5, no. 8, pp. 716–727, Apr. 2012.
  • [45] W. M. Trochim, J. Donnelly, and K. Arora, Research Methods The Essential Knowledge Base, 2nd ed.   Cengage Learning, 2016.
  • [46] P. S. Bandyopadhyay and M. R. Forster, Philosophy of Statistics.   Elsevier., 2011.
  • [47] P. Newson and J. Krumm, “Hidden markov map matching through noise and sparseness,” in Proceedings of ACM SIGSPATIAL, 2009.
  • [48] Y. Liao, H. Panetto, P. C. Stadzisz, and J. M. Simão, “A notification-oriented solution for data-intensive enterprise information systems – A cloud manufacturing case,” Enterprise Information Systems, vol. 12, no. 8-9, pp. 942–959, 2018.
  • [49] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, third edition ed.   Boston, USA: Morgan Kaufmann, 2012.
  • [50] V. N. Vapnik,

    The Nature of Statistical Learning Theory

    .   New York, NY, USA: Springer-Verlag New York, Inc., 1995.
  • [51] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008.
  • [52] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (3rd Edition), 3rd ed.   Prentice Hall, 2009.
  • [53] N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
  • [54] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine learning for big data processing,” EURASIP Journal on Advances in Signal Processing, vol. 2016, no. 1, pp. 1–16, 2016.
  • [55] G. P. Zhang, “Neural networks for classification: a survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, no. 4, pp. 451–462, 2000.
  • [56] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, 1st ed.   Chapman & Hall/CRC, 2012.
  • [57]

    T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,”

    IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 881–892, July 2002.
  • [58] I. Jolliffe, Principal Component Analysis.   Springer-Verlag, 2002.
  • [59] Y. Zhang, G. Zhang, J. Wang, S. Sun, S. Si, and T. Yang, “Real-time information capturing and integration framework of the internet of manufacturing things,” International Journal of Computer Integrated Manufacturing, vol. 28, no. 8, pp. 811–822, 2015.
  • [60] R. Y. Zhong, S. Lan, C. Xu, Q. Dai, and G. Q. Huang, “Visualization of rfid-enabled shopfloor logistics big data in cloud manufacturing,” The International Journal of Advanced Manufacturing Technology, vol. 84, no. 1, pp. 5–16, Apr 2016. [Online]. Available:
  • [61] Y. Zuo, F. Tao, and A. Nee, “An internet of things and cloud-based approach for energy consumption evaluation and analysis for a product,” International Journal of Computer Integrated Manufacturing, vol. 31, no. 4-5, pp. 337–348, 2018.
  • [62] J. Molka-Danielsen, P. Engelseth, and H. Wang, “Large scale integration of wireless sensor network technologies for air quality monitoring at a logistics shipping base,” Journal of Industrial Information Integration, vol. 10, pp. 20 – 28, 2018. [Online]. Available:
  • [63] A. Azadeh, M. Saberi, A. Kazem, V. Ebrahimipour, A. Nourmohammadzadeh, and Z. Saberi, “A flexible algorithm for fault diagnosis in a centrifugal pump with corrupted data and noise based on ann and support vector machine with hyper-parameters optimization,” Applied Soft Computing, vol. 13, no. 3, pp. 1478 – 1485, 2013.
  • [64] H. Wang, S. Fossen, F. Han, I. A. Hameed, and G. Li, “Towards data-driven identification and analysis of propeller ventilation,” in OCEANS, 2016, pp. 1–6.
  • [65] T. Wuest, C. Irgens, and K.-D. Thoben, “An approach to monitoring quality in manufacturing using supervised machine learning on product state data,” Journal of Intelligent Manufacturing, vol. 25, no. 5, pp. 1167–1180, Oct 2014.
  • [66] Y. Lei, F. Jia, J. Lin, S. Xing, and S. X. Ding, “An intelligent fault diagnosis method using unsupervised feature learning towards mechanical big data,” IEEE Transactions on Industrial Electronics, vol. 63, no. 5, pp. 3137–3147, May 2016.
  • [67] D. Wu, C. Jennings, J. Terpenny, R. X. Gao, and S. Kumara, “A comparative study on machine learning algorithms for smart manufacturing: tool wear prediction using random forests,” Journal of Manufacturing Science and Engineering, vol. 139, no. 7, p. 071018, 2017.
  • [68] A. Kim, K. Oh, J.-Y. Jung, and B. Kim, “Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles,” International Journal of Computer Integrated Manufacturing, pp. 1–17, 2017.
  • [69] R. Ren, T. Hung, and K. C. Tan, “A generic deep-learning-based approach for automated surface inspection,” IEEE Transactions on Cybernetics, vol. 48, no. 3, pp. 929–940, March 2018.
  • [70] Y. Zuo, “Prediction of consumer purchase behaviour using bayesian network: An operational improvement and new results based on rfid data,” Int. J. Knowl. Eng. Soft Data Paradigm., vol. 5, no. 2, pp. 85–105, Apr. 2016.
  • [71] I. Gerlach, V. C. Hass, and C.-F. Mandenius, “Conceptual design of an operator training simulator for a bio-ethanol plant,” Processes, vol. 3, no. 3, pp. 664–683, 2015.
  • [72] D. Mourtzis, E. Vlachou, N. Boli, L. Gravias, and C. Giannoulis, “Manufacturing networks design through smart decision making towards frugal innovation,” Procedia CIRP, vol. 50, pp. 354 – 359, 2016, 26th CIRP Design Conference. [Online]. Available:
  • [73] A. Kluczek, “Application of multi-criteria approach for sustainability assessment of manufacturing processes,” Management and Production Engineering Review, vol. 7, no. 3, pp. 62–78, 2016.
  • [74] M. Drakaki and P. Tzionas, “Manufacturing scheduling using colored petri nets and reinforcement learning,” Applied Sciences, vol. 7, no. 2, 2017.
  • [75] A. C. Telea, Data visualization: principles and practice.   CRC Press, 2014.
  • [76] F. H. Post, G. Nielson, and G.-P. Bonneau, Data Visualization - the State of the Art.   Springer-Verlag, 2003.
  • [77] X. Li, J. Wan, H.-N. Dai, M. Imran, M. Xia, and A. Celesti, “A hybrid computing solution and resource scheduling strategy for edge computing in smart manufacturing,” IEEE Transactions on Industrial Informatics, vol. (early access), pp. 1–9, 2019.
  • [78] X. Wang, L. T. Yang, H. Liu, and M. J. Deen, “A big data-as-a-service framework: State-of-the-art and perspectives,” IEEE Transactions on Big Data, vol. 4, no. 3, pp. 325–340, Sep. 2018.
  • [79] X. Li, Q. Wang, H.-N. Dai, and H. Wang, “A novel friendly jamming scheme in industrial crowdsensing networks against eavesdropping attack,” Sensors, vol. 18, no. 6, 2018.
  • [80] C. Hennebert and J. D. Santos, “Security Protocols and Privacy Issues into 6LoWPAN Stack: A Synthesis,” IEEE Internet of Things Journal, vol. 1, no. 5, pp. 384–398, 2014.
  • [81] H.-N. Dai, Z. Zheng, and Y. Zhang, “Blockchain for internet of things: A survey,” IEEE Internet of Things Journal, 2019. [Online]. Available:
  • [82] P. Wang, R. X. Gao, and Z. Fan, “Cloud computing for cloud manufacturing: benefits and limitations,” Journal of Manufacturing Science and Engineering, vol. 137, no. 4, pp. 1–9, 2015.
  • [83] C. Esposito, A. Castiglione, B. Martini, and K. K. R. Choo, “Cloud manufacturing: Security, privacy, and forensic concerns,” IEEE Cloud Computing, vol. 3, no. 4, pp. 16–22, July 2016.
  • [84] N. Wang, X. Xiao, Y. Yang, T. D. Hoang, H. Shin, J. Shin, and G. Yu, “Privtrie: Effective frequent term discovery under local differential privacy,” in IEEE International Conference on Data Engineering (ICDE), 2018.
  • [85] M. Babar, F. Arif, M. A. Jan, Z. Tan, and F. Khan, “Urban data management system: Towards big data analytics for internet of things based smart urban environment using customized hadoop,” Future Generation Computer Systems, vol. 96, pp. 398 – 409, 2019. [Online]. Available:
  • [86] H. Liu, F. Eldarrat, H. Alqahtani, A. Reznik, X. de Foy, and Y. Zhang, “Mobile edge cloud system: Architectures, challenges, and approaches,” IEEE Systems Journal, vol. PP, no. 99, pp. 1–14, 2017.
  • [87] T. X. Tran, A. Hajisami, P. Pandey, and D. Pompili, “Collaborative mobile edge computing in 5g networks: New paradigms, scenarios, and challenges,” IEEE Communications Magazine, vol. 55, no. 4, pp. 54–61, April 2017.
  • [88] D. Wu, S. Liu, L. Zhang, J. Terpenny, R. X. Gao, T. Kurfess, and J. A. Guzzo, “A fog computing-based framework for process monitoring and prognosis in cyber-manufacturing,” Journal of Manufacturing Systems, vol. 43, pp. 25 – 34, 2017. [Online]. Available:
  • [89] X. Wang, L. T. Yang, X. Xie, J. Jin, and M. J. Deen, “A cloud-edge computing framework for cyber-physical-social services,” IEEE Communications Magazine, vol. 55, no. 11, pp. 80–85, Nov 2017.
  • [90] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training,” in International Conference on Learning Representations (ICLR), 2018.
  • [91] C. Leng, H. Li, S. Zhu, and R. Jin, “Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM,” in AAAI, 2018.
  • [92] X. Wang, W. Wang, L. T. Yang, S. Liao, D. Yin, and M. J. Deen, “A Distributed HOSVD Method With Its Incremental Computation for Big Data in Cyber-Physical-Social Systems,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 481–492, June 2018.