Merging Subject Matter Expertise and Deep Convolutional Neural Network for State-Based Online Machine-Part Interaction Classification

by   Hao Wang, et al.
University of Michigan

Machine-part interaction classification is a key capability required by Cyber-Physical Systems (CPS), a pivotal enabler of Smart Manufacturing (SM). While previous relevant studies on the subject have primarily focused on time series classification, change point detection is equally important because it provides temporal information on changes in behavior of the machine. In this work, we address point detection and time series classification for machine-part interactions with a deep Convolutional Neural Network (CNN) based framework. The CNN in this framework utilizes a two-stage encoder-classifier structure for efficient feature representation and convenient deployment customization for CPS. Though data-driven, the design and optimization of the framework are Subject Matter Expertise (SME) guided. An SME defined Finite State Machine (FSM) is incorporated into the framework to prohibit intermittent misclassifications. In the case study, we implement the framework to perform machine-part interaction classification on a milling machine, and the performance is evaluated using a testing dataset and deployment simulations. The implementation achieved an average F1-Score of 0.946 across classes on the testing dataset and an average delay of 0.24 seconds on the deployment simulations.



page 1

page 2

page 3

page 4


Classification of Time-Series Images Using Deep Convolutional Neural Networks

Convolutional Neural Networks (CNN) has achieved a great success in imag...

A Deep Learning Based Ternary Task Classification System Using Gramian Angular Summation Field in fNIRS Neuroimaging Data

Functional near-infrared spectroscopy (fNIRS) is a non-invasive, economi...

Deep Fruit Detection in Orchards

An accurate and reliable image based fruit detection system is critical ...

Time-series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Change Point Detection techniques aim to capture changes in trends and s...

Stress Classification and Personalization: Getting the most out of the least

Stress detection and monitoring is an active area of research with impor...

Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning (Full Version)

Models play an essential role in the design process of cyber-physical sy...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Cyber-Physical Systems (CPS), defined as the integration of physical assets with computing and communication technologies [14], are becoming omnipresent in Smart Manufacturing (SM). One notable characteristic of CPS is that the physical asset is monitored, classified, controlled, and coordinated by its cyber counterpart to achieve more efficient performance [15], and hence monitoring and classification are fundamental requirements of CPS. One important capability is machine-part interaction classification, which can be defined as the continuous characterization of the machine’s effects on the work-piece in real-time [19]. Machine-part interaction classification is of special interest, because it provides information to address a number of manufacturing needs such as machine fault detection [19], [7] and root cause analysis [18]. The outputs of a machine-part interaction classification system are referred to as the interactive states of the machine, which are used to characterize the interactions between the machine and the work-piece.

In principle, machine-part interaction classification involves the identification of interactive states and the transitions between interactive states in real-time. The general concept of machine-part interaction classification has been used to formulate problems in other fields such as construction [17], [20], [16] and health care [5], [12]

. While researchers have developed and tailored solutions to specific problems, the vast majority of solutions share similar detection principles and classification algorithms. Furthermore, one can observe a common trend that deep learning techniques are becoming more widely used

[7], [16], [5]. Although deep learning models have achieved promising results in the manufacturing sector, they should be used with caution and proper constraints. The task of machine-part interaction classification can be conveniently formulated as a sequence of classification problems, which are well suited for deep learning models. However, classifications at different time instances are not independent of each other. As a result, a stand-alone deep learning model, regardless of its performance, is not sufficient to capture the underlying dynamics in the data. Furthermore, without proper error checking mechanisms, a deep learning model can be prone to misclassifications due to noise in the data. To construct a robust and reliable machine-part interaction classification system, we need to incorporate proper error checking mechanisms into the solution.

After reviewing the state of the art in machine-part interaction classification and similar domains, we identify two gaps in the literature. First, a vast majority of solutions are designed to perform time series classification on steady-state behaviors, without addressing transitions between the states. While this approach is valid for a wide range of applications, it is insufficient for classifying machine-part interactions in the manufacturing sector. We argue that the information about when transitions occur is valuable for addressing manufacturing needs such as anomaly detection and root cause analysis, because temporal information of transitions can help partition useful time-series signals based on the machine-part interactions

[19]. Other software and modules in the SM eco-system can leverage the partitioned signals to address corresponding manufacturing needs. Second, most existing solutions do not explicitly incorporate Subject Matter Expert (SME) knowledge. Little information about what SME knowledge is included or how to incorporate SME knowledge systematically is provided. This information is critical for adopting the solution to a specific machine or manufacturing system [11].

We address the gaps in the literature by introducing an integrated framework featuring both deep learning techniques as well as SME knowledge for online state-based machine-part interaction classification. The first contribution of this paper is a framework that is capable of classifying the interactive states of the machine and the part and detecting transitions between these states. Our proposed deep convolutional neural network architecture explicitly exploits local connectivity of time series data in the temporal domain, while maintaining computational efficiency. The second contribution is the incorporation of SME knowledge into a data-driven approach to improve the performance of the proposed classification framework, especially with the introduction of error checking mechanisms, and how SME knowledge can be integrated systematically into the system.

The remaining sections of the paper are organized as follows. An introduction to neural networks and a review of classification systems are presented in Section II. In Section III, an online machine-part interaction classification framework, featuring both SME and deep learning techniques, is presented. The proposed classification system is implemented for a milling machine, and the system is evaluated in terms of testing performance as well as deployment simulation performance in Section IV. Lastly, Section V provides conclusions and some future directions of this research.


Neural Networks

A Neural Network (NN), in its most generic form, is a stack of layers that consists of computational units called neurons. Unlike traditional machine learning techniques, such as Support Vector Machine and Decision Tree, NN does not rely on human-engineered features and is capable of extracting highly complex features. Due to recent development in processing architecture like the Graphical Processing Unit, and the ever-increasing availability of data, NN-based models have achieved groundbreaking performance in domains such as computer vision and speech recognition. However, NNs are not preferred for some applications due to their poor interpretability, i.e. black-box models.

Convolutional Neural Network (CNN) is one of the most popular variants of NN. CNN utilizes convolutional layers that perform convolutions on the input. In the context of pattern recognition of time series data, convolution is defined as follows


where is the input, is the kernel, and is called the feature map [4]. For the purpose of this paper, we can understand convolution as the operation of sliding the kernel along the input and performing dot product between the kernel, a learnable filter capturing local features, and the overlapped section of the input [6]. Convolutional layers are designed to capture local connectivities of the inputs, and this property makes CNN one of the most effective models for pattern recognition [9].

Classification Systems

Classification systems are widely used in a number of domains, such as manufacturing [13], construction [17], and health care [12]. While subjects of monitoring tasks differ from case to case, NN-based approaches have dominated the research landscape in recent years. This is not surprising because most monitoring tasks involve classifying segments of data, and NN-based models have succeeded in many classification tasks.

In [7]

, Janssens et al. performed bearing condition and fault monitoring and classification using three different machine learning models: random forest classifier, support vector machine, and CNN. The models were tasked to classify the input into eight bearing conditions, and the CNN outperformed other models by at least 6%. The CNN used in the study was composed of one convolutional layer and one fully connected layer. In another study, Slaton et al. used an NN-based model to classify activities of excavators

[20]. The model used in this study is adapted from [12], and was originally designed for human activity monitoring. With some modifications, the model achieved an average F1-score of 0.78 in the classification task involving seven excavator activities. Unlike the model in [7], the model in this study utilizes a much deeper architecture featuring four convolutional layers and two recurrent dense layers.

NN-based approaches have achieved great success in classification, including various monitoring related tasks. However, existing solutions cannot be readily adapted to perform machine-part interaction classification, which requires simultaneous interactive state classification and transition detection.

Requirements of a Machine-Part Interaction Classification System

The first requirement for a machine-part interaction classification system is being able to classify interactive states and detect transitions between interactive states. Most frameworks, as presented in the background section, are designed to classify steady-state behaviors, which exhibit consistent machine characteristics over a period of time. Such a design works well if the transitions between the states are considered irrelevant to the classification task. Unlike the applications in equipment or human activity classification, transitions between interactive states are important for classifying machine-part interactions, because the vast majority of Computer Numerical Control (CNC) machine operations are precisely predefined in their G-Code and need to be executed exactly. Without pinpointing temporal occurrences of transitions between interactive states, the monitoring system may fail to provide adequate information on how the interactive state of the machine changes over time. While G-Code provides information about actions taken by the machine, interactive states can not be readily determined from G-Code. The monitoring system needs to infer the interactive state from sensory inputs.

The second requirement is that the system must not be allowed to have a delay time greater than , the maximum delay determined by SMEs. Since the proposed classification framework occurs in real-time, the delay caused by the system directly impacts how quickly control and coordination actions can take place. Furthermore, an excessive delay can derail the entire classification effort and render the classification system useless.


In this section, we present an online machine-part interaction classification framework. At a high level, the framework takes in a continuous stream of sensory inputs, partitions the inputs, classifies the partitioned inputs to either an interactive state or a transitioning event, and finally determines and outputs the interactive state of the machine. The machine is modeled as a Finite State Machine (FSM), whose state transitions reflect changes of the interactive state of the machine. The framework consists of three components: signal processing unit, CNN classification unit, and decision coordinator. The sliding window technique is used to partition time-series data in real-time. The components of the framework are presented in detail, followed by a brief discussion of how SME knowledge is incorporated into the framework. The framework, along with SME involvement, is illustrated in Fig. 1.


Signal Processing Unit

To classify the machine-part interaction state in real-time, the continuous stream of data needs to be partitioned into segments that can be processed by the classification system. For online applications, the sliding window technique is the preferred option [2]. Two parameters, window size and overlap between windows, are determined by SMEs based on characteristics of the signals and time scale of the machine operations. The window size determines how much context is used in classification, and intuitively, larger windows can make the system more robust to noise. The overlap between windows dictates the resolution of the monitoring system; more overlap leads to better resolution (higher sensitivity for transition detection), but demands more computational resources. The sliding window technique used in this work is illustrated in Fig. 2.


Another important aspect of signal processing is denoising because it can potentially improve the performance of the classification model. There are many denoising techniques available, and the SME is in charge of selecting the appropriate techniques and relevant details, such as coefficients of filters.

Classification Unit

The unit is a neural network composed of two CNNs, and its basic architecture is shown in Fig. 3. The primary motivation for using such a design is that the class scores trajectory, obtained from a CNN trained to recognize steady-state behaviors, can well document transitions between steady states. A sequence of discrete instances of the class scores trajectory cannot only provide information for classifying steady-state behaviors, but it also indicates transitions between states. Hence, the upstream CNN is designed to encode the signal window (segment) at a time instance into class scores, and its downstream counterpart performs classification based on the sequence of class scores. While the classification process involves the entire sequence of class scores, the result only applies to the most recent window. In other words, the classification of the current window depends on how class scores change over a fixed amount of time.

The upstream CNN acts as the encoder. The size of each window is , and each window is overlapped by data points, as shown in Fig. 2. The input to the upstream CNN is a sequence of windows, and the input is of size . The upstream CNN produces a class score output for each of the windows, and hence the output is of size , where number of interactive states.

The downstream CNN acts as the classifier, which transforms the input into class scores, where number of interactive states + number of transitioning events. The class score is the end output of the classification unit, and it is then fed to the decision coordinator.


Even though the classification unit does not require human-engineered features as inputs, features are still explicitly generated as an intermediate output and used for subsequent classification, as required by the encoder-classifier architecture. Other architectures such as single-stage CNNs can potentially perform as well as the purposed architecture, but the purposed architecture generates and utilizes highly compact intermediate feature representation. This architecture is easier to train and more efficient in real-time because of the compact intermediate feature representation, especially when dealing with high-dimensional data.

Decision Coordinator

By design, the framework is capable of mapping out the entire state trajectory and detecting transitions. However, without proper error checking mechanisms or restrictions imposed on state transitioning, the system would suffer from robustness issues in deployment. Two major issues have been identified during this study:

  1. The machine jumps from state A to state B without transitions being detected.

  2. The machine transitions from state A to state B with an impermissible transition being detected.

To address both issues, the machine is modeled as a FSM [3], where is the state set, is the event set, is the transition function, is the active event function, and is the initial state. The definition of is determined by an SME.

Under this modeling formalism, the output from the classification unit formally takes one of two forms: an interactive state or a transitioning event, depending on which class has the highest class score. By definition, the transition cannot be triggered unless is present, and hence the active event function helps resolve issue 1 mentioned above by requiring a transitioning event.

Issue 2 is addressed through the transition function . The formalism requires to be defined in order for the transition to take place. A memory retaining device is utilized to capture , and , and the definition is checked whenever a transitioning event is detected. This error checking mechanism is a primary motivation for requiring the classification unit to classify steady state behaviors in addition to transition events.

At its core, the decision coordinator is a FSM implemented to perform error checking and maintain state information of the machine during operation. If either of the issues occurs in deployment, the decision coordinator will reject the proposed state transition and record the incident. The recorded incidents are added to the training dataset, and the classification unit is retrained on the new training dataset after a period of deployment.

Merging the Data Driven Approach with SME Knowledge

SME knowledge, though not appropriately acknowledged, is often utilized in relevant studies. Design details, such as the architecture of the system and denoising techniques, are determined by SMEs. To facilitate adaptation and further development of the framework, we present a summary of how SME knowledge is incorporated into the framework.

While a deep learning model does not require SME-engineered features, its performance is likely affected by the quality of its input. SME knowledge is critical in selecting and packaging inputs of the monitoring system. It is a SME’s responsibility to select trustworthy and distinctive signals that can reliably capture the dynamics of the machine. Furthermore, SMEs should be in charge of tuning some parameters, such as the window size , sequence length , and overlap , according to the requirements of the system and characteristics of the machine and the selected signals.

The most significant involvement of SMEs in this framework is the error checking mechanism introduced in the previous subsection. The SME-defined FSM can help remedy robustness issued it can markedly improve the performance of the classification system. Additionally, violations of the restrictions imposed by the FSM are recorded and reported to SMEs for further analysis and adjustment. The integration of SME knowledge within the framework is summarized in Fig. 1.


In this section, a machine-part interaction classification system is implemented, according to the proposed framework, to monitor a milling machine. The detailed architecture of the system is presented. The system’s performance is evaluated using a testing dataset and five deployment simulations. Lastly, we provide a brief discussion on the performance of the implemented system, and how it compares to the standard method presented in the background section.

Problem Statement

The case study is a simulation study and is carried out using the NASA Milling Dataset [1], which features 167 milling trials of 13 different sets of operation conditions (feed rate, materials, and depth of cut). The monitoring system is tasked with partitioning each trial’s DC spindle current data, which is sampled at 250Hz and has a length of 9000 data points, into the following four sequential interactive states.

  1. No interaction: The endmill spins freely and makes no contact with the work-piece.

  2. Entry: The endmill starts to make contact with the work-piece.

  3. Constant milling: The endmill makes constant contact with the work-piece.

  4. Exit: The endmill gradually makes less contact with the work-piece.

Based on the progression of interactive states presented in the milling trials, three transitioning events that trigger the following transitions: no interaction to entry transition, entry to constant milling transition, and constant milling to exit transition, are considered. A typical DC spindle current signal, with interactive states labeled and transitions indicated with rectangles, is presented in Fig. 4.


Framework Implementation

The signal processing unit is implemented to partition the incoming stream of the DC current signal into segments of 400 data points with each segment overlapping the preceding and succeeding segment by 25 data points. Eight successive segments are packaged into an input to the classification unit.

The classification unit accepts inputs of size . The upstream CNN encodes the input into 8 4-class scores (4 interactive states) of size , which are then fed into the downstream CNN. A 7-class score (4 interactive states + 3 transitioning events) of size is produced as the final output of the classification unit.

The upstream CNN is pretrained with 2936 samples, of length 400 data points, extracted from well-defined steady-state regions. Since the upstream CNN is significantly larger than its downstream counterpart, pretraining provides stability and faster convergence in training.

The entire classification unit is trained end-to-end with the pretrained upstream CNN and randomly initialized downstream CNN. The entire dataset used in this step contains 3576 samples of length 3200 data points. Optimizer and loss function used in this case study are Adam

[8] and Cross-Entropy loss with Softmax function [10], respectively. Architectural details of the classification unit are presented in Tab. 4 in Appendix A.

The decision coordinator is implemented as a FSM of four states: no interaction, entry, constant milling, and exit. There are seven permissible state transitions, but only three of them exist in the dataset. The state transitioning diagram of the decision coordinator is presented in Fig. 5.



We use a testing dataset and deployment simulations to evaluate the effectiveness of the implemented system. Performance on the testing dataset quantifies the classification unit’s ability to identify interactive state/transitioning events at a given time instance. However, the performance does not provide any insights into the robustness of the system, nor can it quantify delays in indicating the transition to the next interactive state. To address such limitations, we perform deployment simulations, through which we can gauge how the system would perform in real-time.

The testing dataset consists of 894 samples of size . The samples are constructed in the same manner as the training samples, and they reflect the interactive state of the machine or indicate a transitioning event at a time instance. We use precision, recall, and F1-Score to evaluate the performance of the classification unit.

For deployment simulations, the DC current signal is streamed as in real-time. The result from each simulation is a series of classifications representing the state of the machine or a transitioning event. The simulation results are compared with the actual state and occurrences of transition to determine the correctness of classifications and delays in detecting transitions.

Results and Discussion

We first present the results from the testing dataset, as shown in Tab. 1. The classification unit achieves an F1-Score above 0.9 for all but one class: entry/const transitioning event class. This indicates that the classification unit misclassifies the entry/const transitioning event class more often than other classes. This is expected because entry/const transitioning events are more gradual than other transitioning events, and more data is required to capture their full characteristics. In other words, this class is more difficult to classify than other classes, given a fixed time interval.

Classes Precision Recall F1-Score
No Int. 0.974 1.000 0.987
Entry 0.992 0.959 0.975
Const. 0.979 0.995 0.987
Exit 0.919 0.971 0.944
No Int./Entry Tran. 0.954 0.977 0.966
Entry/Const. Tran. 0.896 0.796 0.843
Const./Exit Tran. 0.956 0.896 0.925

We now present the result of the deployment simulations. The system correctly partitions all five trials into segments that correspond to different interactive states, and intermittent misclassifications only occur in one segment. Since the system is designed for online application, one key performance metric is the delay in identifying the transition to the next interactive state. We manually label the transition points (the time instance when a transition occurs) for the five trials, and the results of the deployment simulations are compared with our labels. Delays, measured in seconds, for each transition are presented in Tab. 2.

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
No Int. to Entry 0.156 0.160 0.136 0.088 0.140
Entry to Const. -0.700 0.820 0.380 0.036 0.324
Const. to Exit 0.160 0.132 0.100 0.288 0.180

It is worthwhile to point out that one of the delays (entry to constant transition of Trial 1) is negative, which means that the detection leads the actual occurrence of the transition. While lagging detections are expected because more context is required beyond the transition point for the classifier to recognize meaningful changes present in the signal, leading detections in general seem unreliable. Such a phenomenon occurs largely due to the fact that entry/const transitioning events are more gradual. Compared to the other two types of transitioning events, entry/const transitioning events are less defined in the sense that a range of data points around the predefined transition point can be viewed as valid transition points.

Finally, we compare our system to standard methods introduced in the background section. To perform such a comparison, we implement a CNN (referred to as the baseline CNN) capable of classifying signals of length 400 as one of the four classes: No interaction, Entry, Constant, and Exit. The baseline CNN can achieve similar performance in the four classes to our system. Since it does not consider transitioning events explicitly, we can only infer occurrences of transition when there is a change in state. The baseline CNN becomes more uncertain as it approaches the transition points, and this is expected because the classifier is only trained for steady-state classes. As a result, it is unable to detect transitioning event consistently, and misclassifications are frequent around the neighborhood of transition points. To quantify the transition detection performance of the baseline CNN, we perform deployment simulations using the baseline CNN. Transition points are manually inferred based on the changes in state, and intermittent misclassifications after the transition points are ignored. The result is presented in Tab. 3. Our system is able to reliably detect transitioning events and outperforms the baseline CNN in terms of delay in transition detection. It is also able to guard against intermittent misclassifications because of the decision coordinator. Under the restrictions imposed by the decision coordinator, illegal transitions are prevented and the system provides the expected result.

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
No Int. to Entry 0.356 0.36 0.448 0.396 0.448
Entry to Const. 0.116 0.500 0.588 0.408 0.260
Const. to Exit 0.552 0.400 0.528 0.456 0.432

Conclusions and Future Work

A machine-part interaction classification system is a key component of CPS and can help address a number of manufacturing needs, including anomaly detection, root cause diagnosis, and equipment performance analysis. In this work, a novel framework featuring deep learning techniques combined with Subject Matter Expert knowledge is presented to perform online machine operation classification. The framework is capable of classifying the steady state behaviors of the machine as well as detecting transitions between steady states, and it is validated through a testing dataset and deployment simulations on a milling machine. The deployment simulations illustrate the effectiveness of the SME-designed error checking mechanism.

In future work, we will investigate applications of the proposed framework to additional case studies. Another potential direction for future work is minimizing the delay in transition detection. Since the system performs classification in real time, minimal delay is always preferred. Currently the framework controls delay through the overlap between successive windows, and one might find it feasible to explore the possibility of using a Recurrent Neural Network-based architecture for the classification unit.


  • [1] A. Agogino and K. Goebel Milling data set. Note: NASA Ames Prognostics Data Repository (, NASA Ames Research Center, Moffett Field, CA Cited by: Problem Statement.
  • [2] O. Banos, J. M. Galvez, M. Damas, H. Pomares, and I. Rojas (2014) Window size impact in human activity recognition. Sensors (Switzerland) 14 (4), pp. 6474–6499. External Links: Document, ISSN 14248220 Cited by: Signal Processing Unit.
  • [3] C. G. Cassandas and S. Lafortune (2010) Introduction to discrete event systems. pp. 20. Cited by: Decision Coordinator.
  • [4] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Note: Cited by: Neural Networks.
  • [5] M. Inoue, S. Inoue, and T. Nishida (2018) Deep recurrent neural network for mobile human activity recognition with high throughput. Artificial Life and Robotics 23 (2), pp. 173–185. External Links: Document, ISBN 0123456789, ISSN 16147456, Link Cited by: INTRODUCTION.
  • [6] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller (2019) Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33 (4), pp. 917–963. External Links: Document, 1809.04356, ISSN 1573756X, Link Cited by: Neural Networks.
  • [7] O. Janssens, V. Slavkovikj, B. Vervisch, K. Stockman, M. Loccufier, S. Verstockt, R. Van de Walle, and S. Van Hoecke (2016) Convolutional Neural Network Based Fault Detection for Rotating Machinery. Journal of Sound and Vibration 377, pp. 331–345. External Links: Document, ISSN 10958568 Cited by: INTRODUCTION, INTRODUCTION, Classification Systems.
  • [8] D. P. Kingma and J. L. Ba (2015) Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15. External Links: 1412.6980 Cited by: Framework Implementation.
  • [9] Y. Lecun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. External Links: Document, ISSN 14764687 Cited by: Neural Networks.
  • [10] W. Liu, Y. Wen, Z. Yu, and M. Yang (2017) Large-margin softmax loss for convolutional neural networks. External Links: 1612.02295 Cited by: Framework Implementation.
  • [11] J. Moyne, Y. Qamsane, E. C. Balta, I. Kovalenko, J. Faris, K. Barton, and D. M. Tilbury (2020) A requirements driven digital twin framework: specification and opportunities. IEEE Access 8 (), pp. 107781–107801. External Links: Document Cited by: INTRODUCTION.
  • [12] F. J. Ordóñez and D. Roggen (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland) 16 (1). External Links: Document, ISSN 14248220 Cited by: INTRODUCTION, Classification Systems, Classification Systems.
  • [13] Y. Qamsane, C. Chen, E. C. Balta, B. Kao, S. Mohan, J. Moyne, D. Tilbury, and K. Barton (2019) A unified digital twin framework for real-time monitoring and evaluation of smart manufacturing systems. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vol. , pp. 1394–1401. External Links: Document Cited by: Classification Systems.
  • [14] R. Rajkumar, I. Lee, L. Sha, and J. Stankovic (2010) Cyber-physical systems: the next computing revolution. In Design Automation Conference, Vol. , pp. 731–736. External Links: Document Cited by: INTRODUCTION.
  • [15] R. Rajkumar, I. Lee, L. Sha, and J. Stankovic (2010) Cyber-physical systems: The next computing revolution. Proceedings - Design Automation Conference, pp. 731–736. External Links: Document, ISBN 9781450300025, ISSN 0738100X Cited by: INTRODUCTION.
  • [16] K. M. Rashid and J. Louis (2019) Times-series data augmentation and deep learning for construction equipment activity recognition. Advanced Engineering Informatics 42 (June), pp. 100944. External Links: Document, ISSN 14740346, Link Cited by: INTRODUCTION.
  • [17] K. M. Rashid and J. Louis (2020) Automated Activity Identification for Construction Equipment Using Motion Data From Articulated Members. Frontiers in Built Environment 5 (January). External Links: Document, ISSN 22973362 Cited by: INTRODUCTION, Classification Systems.
  • [18] M. A. Saez, F. P. Maturana, K. Barton, and D. M. Tilbury (2019) Context-Sensitive Modeling and Analysis of Cyber-Physical Manufacturing Systems for Anomaly Detection and Diagnosis. IEEE Transactions on Automation Science and Engineering 17 (1), pp. 29–40. External Links: Document, ISSN 15583783 Cited by: INTRODUCTION.
  • [19] M. Saez, F. Maturana, K. Barton, and D. Tilbury (2017) Anomaly detection and productivity analysis for cyber-physical systems in manufacturing. IEEE International Conference on Automation Science and Engineering 2017-Augus, pp. 23–29. Note: Introductory of GOS Also introduces the interactive event search algorithm External Links: Document, ISBN 9781509067800, ISSN 21618089 Cited by: INTRODUCTION, INTRODUCTION.
  • [20] T. Slaton, C. Hernandez, and R. Akhavian (2020) Construction activity recognition with convolutional recurrent networks. Automation in Construction 113 (August 2019), pp. 103138. External Links: Document, ISSN 09265805, Link Cited by: INTRODUCTION, Classification Systems.

Appendix A: Classification Unit Architecture

Layer Upstream CNN Downstream CNN
Description Parameters Description Parameters
1 Conv 33,5 15 Conv 33,8 96
2 MaxPool,2 BatchNorm 83
3 ReLU Conv 33, 16 384
4 BatchNorm,5 53 BatchNorm,16 163
5 Conv 33, 25 375 Linear 12864 8192+64
6 MaxPool, 2 Linear 647 448+7
7 ReLU
8 BatchNorm,25 253
9 Conv 33,50 3750
10 MaxPool,2
11 ReLU
12 BatchNorm,50 503
13 Linear 2500200 500000+200
14 Linear 20010 2000+10
15 Linear 104 40+4
Total 506634 9263

In this section, we provide a brief overview of the network architecture used in the case study. Overall, the network is relatively small and lightweight, compared to the state of the art architectures used in pattern recognition. With only 15 trainable layers and 518597 parameters, the network takes a small amount of time to train end to end (less than 5 minutes with GPU).

The upstream CNN features 3 conventional CONV-POOL-RELU sub-modules with batch normalization layers interweaved and followed by 3 fully connected layers. Slightly different from its upstream counterpart, the downstream CNN omits pooling layers and nonlinear activation layers. We experimented with several hyperparameter settings (number of layers, filter size for the convolutional layers, and the dimension of fully connected layers) and found that the performance of the network is not heavily influenced by the hyperparameters.

Though the detailed architecture is task-specific, we provide a simple heuristic for designing detailed architectures featuring the proposed encoder-classifier architecture. Depending on the number of input signals, the designer can choose to increase or decrease the number of CONV-POOL-RELU sub-modules. Generally, we expect the network to learn more complicated features and feature more convolutional layers, as the number of input signals increases. For the downstream CNN, we recommend using a smaller and shallower architecture, because it only processes the class scores, highly compact and simplified features, received from the upstream CNN.