Approaches and Applications of Early Classification of Time Series: A Review

05/06/2020
by   Ashish Gupta, et al.
Indian Institute Of Technology
29

Early classification of time series has been extensively studied for minimizing class prediction delay in time-sensitive applications such as healthcare and finance. A primary task of an early classification approach is to classify an incomplete time series as soon as possible with some desired level of accuracy. Recent years have witnessed several approaches for early classification of time series. As most of the approaches have solved the early classification problem with different aspects, it becomes very important to make a thorough review of the existing solutions to know the current status of the area. These solutions have demonstrated reasonable performance in a wide range of applications including human activity recognition, gene expression based health diagnostic, industrial monitoring, and so on. In this paper, we present a systematic review of current literature on early classification approaches for both univariate and multivariate time series. We divide various existing approaches into four exclusive categories based on their proposed solution strategies. The four categories include prefix based, shapelet based, model based, and miscellaneous approaches. The authors also discuss the applications of early classification in many areas including industrial monitoring, intelligent transportation, and medical. Finally, we provide a quick summary of the current literature with future research directions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

04/27/2021

Early Classification of Time Series is Meaningful

Many approaches have been proposed for early classification of time seri...
09/21/2021

Early and Revocable Time Series Classification

Many approaches have been proposed for early classification of time seri...
03/26/2021

Gated Transformer Networks for Multivariate Time Series Classification

Deep learning model (primarily convolutional networks and LSTM) for time...
03/03/2022

Early Time-Series Classification Algorithms: An Empirical Comparison

Early Time-Series Classification (ETSC) is the task of predicting the cl...
10/31/2018

The UEA multivariate time series classification archive, 2018

In 2002, the UCR time series classification archive was first released w...
07/28/2021

Snippet Policy Network for Multi-class Varied-length ECG Early Classification

Arrhythmia detection from ECG is an important research subject in the pr...
05/20/2020

Early Classification of Time Series. Cost-based Optimization Criterion and Algorithms

An increasing number of applications require to recognize the class of a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Due to advancement of energy-efficient, small size, and low cost embedded devices, time series data has received an unprecedented attention in several fields of research, to name a few healthcare [liu2015efficient, chen2019automated, 15], finance [idrees2019prediction, yin2011financial], speech and activity recognition [esling2013multiobjective, pei2017multivariate, 34], and so on [aminikhanghahi2018real, 35, shumway2017time]. There exists an inherent temporal dependency in the attributes (or data points) of a time series which allows the researchers to analyze the behavior of any process over time. In addition, the time series has a natural property to satisfy human eagerness of visualizing the structure (or shape) of data [esling2012time]. With these properties, numerous data mining algorithms have been developed to study various aspects of time series such as indexing [paparrizos2018fast], forecasting [mahalakshmi2016survey, deb2017review], clustering [aghabozorgi2015time], and classification [bakeoff]. Indexing algorithms focus on speeding up the searching of query time series in large dataset. Forecasting algorithms attempt to predict future data points of time series [mahalakshmi2016survey]. Next, the clustering algorithms aim to partition the unlabeled time series instances into suitable number of groups based their similarities [aghabozorgi2015time]. Finally, the classification algorithms attempt to predict class label of an unlabeled time series by learning a mapping between training instances and their labels [bakeoff, sharabiani2017efficient].

Time Series Classification (TSC) has been remain a topic of great interest since the availability of labeled dataset repositories such as UCR [UCR] and UCI [UCI]. As a consequence, large number of TSC algorithms have emerged by introducing efficient and cutting-edge strategies for distinguishing classes. Authors in [wang2013experimental, lines2015time, rakthanmanon2013addressing, sharabiani2017efficient] focused on instance-based learning where a similarity score is computed between a testing time series and each training instance, and the class label of training instance with maximum similarity is assigned to the testing time series. Dynamic Time Warping (DTW) [berndt1994using] and its variations [rakthanmanon2013addressing, sharabiani2017efficient] with -Nearest Neighbors (1-NN) have been extensively used similarity measures in instance-based TSC algorithms. Another family of TSC algorithms [ji2019fast, grabocka2016fast, ye2011time, rakthanmanon2013fast] focused on finding most discriminatory subsequences (called as shapelets) of time series. A class label is identified by the presence of its one or more shapelets in the testing instance. Next, dictionary based algorithms [hatami2019bag, lang2009dictionary] and ensemble approaches [bagnall2015time, lines2018time]

have also demonstrated significant progress in time series classification. Finally, we also found some deep learning based TSC algorithms summarized in 

[fawaz2019deep].

The main objective of TSC algorithms is to optimize accuracy of the classification by using complete time series. However, in time-sensitive applications such as gas leakage detection [8], earthquake [Fauvel2020ADM], and electricity demand prediction [dachraoui2013early], it is desirable to classify time series as early as possible. A classification approach that aims to classify an incomplete time series is referred as early classification [1, 2, 4]. Xing et al. [1] stated that the earliness can only be achieved at the cost of accuracy. It indicates that the main challenge before early classification approaches is to optimize the balance between two conflicting objectives, i.e., accuracy and earliness. One of the first known approach for early classification of time series is proposed in [3] and then after several researchers have put their efforts in this direction and published wide range of research articles at renowned venues. After making an exhaustive search, we found a minor survey on early classification approaches in [santos2016literature] which included only a handful of approaches and not provided any categorization.

Fig. 1: Example of application scenarios of early classification of time series.
Fig. 2: Categorization of the early classification approaches. [MPL: Minimum Prediction Length, RNN: Reverse Nearest Neighbor]

Applications of early classification: Literature indicates several potential applications of early classification using Univariate Time Series (UTS) or Multivariate Time Series (MTS). An MTS mainly consists of multiple correlated time series collected for a single event over a specified duration. Some of the important application scenarios are illustrated in Fig. 1 and are discussed below:

  1. Human activity: Early classification of human activities refers to the identification of an ongoing activity before its complete execution [16, 21, 23, 34, 37]. Such early classification helps to minimize the response time of system and in turn improves the user experience [34]. The researchers in [34, 16, 21] utilized MTS to classify various human activities such as walking, running, sitting, upstairs, eating, etc.

  2. Gene expression: It corresponds to an MTS that contains a crucial information about the biological condition of the humans. Gene expression data has been used to study the viral infection on patient, drug response on the disease, and patient recovery from the disease [11, 12, 14]. Early classification of gene expression time series significantly lowers the consequences of disease.

  3. Electrocardiogram (ECG): It is a time series of electrical signals of generated from activity of the heart. ECG time series is usually recorded by placing multiple electrodes on the chest of patient. Early classification of ECG [25, 13, 15] helps to diagnose an abnormal heart beating at the earliest which reduces the risk of heart failure.

  4. Industrial monitoring: With the advancements in sensor technology, monitoring the industrial processes has become convenient and effortless by using the sensors. The sensors generate time series which is to be classified for knowing the status of the operation. In chemical industries, even a minor leakage of chemical can cause hazardous effects on health of the crew members [8]. Early classification not only reduces the risks of health but also minimizes the maintenance cost by ensuring the smooth operations all time. Some notable industrial applications of early classification are gas leakage detection [8], fault mode identification [39], wafer quality [13, 15], and hydraulic system monitoring [43].

  5. Intelligent transportation: As modern vehicles are equipped with several sensors, it becomes easy to monitor the behavior of driver, road surface condition, inside-outside environment, etc. by using the generated sensory data. An early classification algorithm is presented in [35] to classify the type of road surface by using sensors such as accelerometer, light, temperature, etc. Such early classification of road surface helps to choose an alternative path if the surface condition is poor, i.e., bumpy or rough.

In the absence of a thorough review of the early classification approaches, it requires enormous amount of efforts of a researcher to point out a potential research gap for future work. We therefore carry out a comprehensive survey of current literature on early classification approaches and propose a useful categorization for better understanding of the status of area. This paper presents a systematic review of the early classification approaches for both univariate and multivariate time series data. We categorize the various approaches into four broad groups based on the type of strategy they followed for early classification. Fig. 2 illustrates the categorization hierarchy with groups and their subgroups.

Next section discusses the fundamentals of early classification approaches and their categorization into four broad groups as shown Fig. 2. First group (i.e., prefix based) includes the review of those approaches which utilize the prefixes of time series for achieving earliness. Section III discusses the prefix based approaches in detail. Second group (i.e., shapelet based) of approaches use key shapelets (subsequences of time series) for reliable prediction of class label of an incomplete time series. Shapelet based early classification approaches are reviewed in Section IV. Another set of approaches are included in third group (i.e., model based) and discussed in Section V. The model based approaches develop a mathematical model for optimizing the balance between earliness and desired level of accuracy. Next, the fourth group includes the miscellaneous approaches that do not meet the inclusion criterion of aforementioned categories. These approaches are discussed in Section VI. Finally, Section VII summarizes the review along with some promising research directions for further work. We also provide a nomenclature table for quick reference of abbreviations and notations used in this paper.

Ii Fundamentals and Categorization of Early Classification Approaches

In this section, we first discuss fundamentals of time series that are prerequisite for acquiring a sound understanding of various early classification approaches. Later, we explain the categories into which various approaches are grouped.

Ii-a Fundamentals

This subsection defines the notations and terminologies used in this paper.

Ii-A1 Time series

It is defined as a sequence of ordered observations typically taken at equal-spaced time intervals [ye2009time], where denotes the length of complete time series. A time series is denoted as , where is the dimension and for . If then the time series is referred as univariate otherwise multivariate. If a time series is a dimension (or part) of MTS then it can be referred as component [34, 35]. In general, a time series is univariate unless it is explicitly mentioned as multivariate.

Ii-A2 Time series classification

It refers to the prediction of class label of a time series by constructing a classifier using labeled training dataset [bakeoff]. Let is a training dataset which consists of instances as pairs of time series and their class labels . The time series classifier learns a mapping function as . The classifier can predict class label of a testing time series only if it is complete, i.e., the length of should be same as the length of training instances [5].

Ii-A3 Early classification of time series

According to [30], early classification is an extension of time series classification with the ability to classify an unlabeled incomplete time series. In other words, an early classifier is able to classify a testing time series with data points only, where . Early classification is desirable in the applications where data collection is costly or late prediction is causing hazardous consequences [9, 35]. Intuitively, an early classifier may take more informed decision about class label if more data points are available in the testing time series [6] but it will delay the decision. Therefore, the researchers focused on optimizing the accuracy of prediction with minimum delay (or maximum earliness). Further, the early classification of time series is analogous to a case of missing features with the constraint that the features are missing only because of unavailability of data points [6]. Such unavailability of data points makes an incomplete time series and one has to wait for more data points to make it complete. In the context of early classification, a test time series can be referred as incomplete or incoming time series. Fig. 3 illustrates an early classification framework for predicting a class label of an incoming time series .

Fig. 3: Illustration of an early classification framework for time series.

Ii-A4 Earliness

It is an important measure to evaluate the effectiveness of early classification approaches. Let is the number of data points of the testing time series that are used by early classifier. The authors in [34] defined the earliness as , where is the length of complete time series. Earliness is also called as timeliness [7].

Ii-A5 Prefix

In [2], prefix of a time series is defined as a following subsequence , where denotes the length of the prefix. Moreover, the training dataset is said to be in prefix space if it contains only the prefix of each time series with their associate class labels.

Ii-A6 Shapelet

It is defined as quadruple , where is any subsequence of time series of length , is the distance threshold, and is the associated class label [10, 11]. The distance threshold is learned using training instances and is used to find whether the shapelet is matched with any subsequence of testing time series or not.

Ii-A7 Interpretability

It mainly refers to the fact that how convincing the classification results are to the domain experts. In the healthcare applications, adaptability of any early classification approach heavily relies on its interpretability [4]. The authors in [4, 10, 14, 15] assert that a short segment of time series is more convincing and helpful than the time series itself if that segment contains class discriminatory patterns.

Ii-A8 Reliability

It expresses the guarantee that the probability of early predicted class label of an incomplete time series is met with a user-specified threshold 

[6, 7]

. Reliability is a crucial parameter to ensure minimum required accuracy in the early classification. It is also termed as uncertainty estimate or confidence measure in different studies 

[10, 9, 29].

Ii-B Categorization of early classification approaches

Literature indicates large number of early classification approaches for time series data. These approaches addressed the problems from wide range of research areas including healthcare [griffin2001toward, 11, 12, 14, 18, 24], human activity recognition [21, 23, 34], industry [8, 39, bernaille2006traffic]

, and so on. After making comprehensive survey, we found that UTS has attracted more researchers than MTS. It is due to following reasons: i) MTS has complicated relationship between its dimensions (time series), ii) MTS may have redundancy in dimensions which could misguide the classifier, and iii) classifier finds it challenging to handle MTS data due to curse of dimensionality.

This work categorizes various early classification approaches into meaningful groups, for better understanding of their differences and similarities. We believe that one of most meaningful way to categorize these approaches is the strategy which they have discovered to achieve the earliness. We broadly categorize the early classification of time series (including both univariate and multivariate) approaches into four major groups as shown in Fig. 2 and the included papers in different groups are given in Table I.

Ii-B1 Prefix based early classification

The strategy is to learn a minimum prefix length of time series using training instances and then classify a testing time series using its prefix of learned length. During training, a set of classifiers (one for each prefix space) are constructed and then checked for the stability of relationship between the results of prefix space and full-length space. The classifier that achieves a desired level of stability with minimum prefix length, is considered as early classifier and the corresponding prefix length is called as Minimum Prediction Length (MPL) [2, 5, 20] or Minimum Required Length (MRL) [34, 35, 36]. This early classifier has the ability to classify an ongoing time series as soon as MPL is available.

Ii-B2 Shapelet based early classification

A family of early classification approaches [4, 10, 11, 12, 13, 14, 15, 17, 18, 19, 24] focused on obtaining a set of key shapelets from the training dataset and utilized them as class discriminatory features of time series. As there exists a huge number of shapelets in the training dataset, the different approaches attempted to select only those shapelets that can provide maximum earliness and can uniquely manifest the class label. These selected shapelets are matched with ongoing testing time series and the class label of best matched shapelet is assigned to the time series.

Ii-B3 Model based early classification

Another set of early classification approaches [7, 8, 16, 23, 25, 26, 30] proposed mathematical models based on conditional probabilities. The approaches obtain these conditional probabilities by either fitting a discriminative classifier or using generative classifiers on training. A decision or stopping rule is designed to ensure the reliability of early prediction of class label. Some of these early classification approaches have also developed a cost based trigger function for making the reliable prediction.

Ii-B4 Miscellaneous approaches

The early classification approaches that do not qualify any of the above mentioned categories, are included here. Some of these approaches employed deep learning techniques [22, 32]

, reinforcement learning 

[31], and so on [3, 39].

Prefix based early classification MPL computation using RNN  [2, 5, 20]

MPL computation using posterior probabilities

 [29, 36, 35, 37, 43, 34]
Early classification using shapelets Key shapelets selection using utility measure  [4, 10, 11, 12, 18, 19]
Key shapelets selection using clustering  [13, 14, 15, 17, 38, 24]
Model based early classification Using discriminative classifier  [8, 44, 25, 28, 27, 39, 33]
Using generative classifier  [6, 7, 9, 26, 16, 21, 23]
Miscellaneous approaches With tradeoff  [31, 32]
Without tradeoff  [3, 39, 22]
TABLE I: Categorization of early classification approaches.

Ii-C Statistical evaluation of early classifier

One of most useful technique for statistical evaluation of an early classifier is proposed in [dachraoui2014evaluation]. As early classifiers address two conflicting objectives (i.e., earliness and accuracy) together, comparing the statistical significance of one early classifier with other becomes more challenging. In [dachraoui2014evaluation], the authors therefore employed two well known statistical methods including Wilcoxon signed-rank test [demvsar2006statistical] and Pareto optimum [jin2008pareto], for evaluating the early classifiers on many UCR datasets [UCR]. The evaluation technique uses Wilcoxon signed-rank test for independent comparison where it compares two early classifiers on both objectives independently on same dataset. Further, it uses Pareto optimum with the fact that an early classifier is said to be statistically better than other if it is superior on one objective without degrading on the other.

Iii Prefix based early classification

This section discusses the prefix based early classification approaches in detail. The key idea is to learn a stable prefix length of each time series during training and then utilize them for classifying an incomplete time series during testing. One of the first notable prefix based early classification approach is proposed in [1]. The authors in [1]

introduced two interesting methods, Sequential Rule Classification (SCR) and Generalize Sequential Decision Tree (GSDT), for early classification of symbolic sequences. For a given training dataset, SCR method first extracts a large number of sequential rules from different length of prefix spaces and then selects some top-

rules based on their support and prediction accuracy. These selected rules are used as early classifier. The GSDT method also extracts the sequential rules but it aims to find the rules with smaller length and higher earliness.

Next, we split the prefix based approaches into two groups according to their MPL computation methods. In first group, the approaches [2, 5, 20] developed a concept of Reverse Nearest Neighbor (RNN) to compute MPL of time series. In the second group of approaches [29, 34, 35, 36], the authors employed a probabilistic classifier to first obtain posterior class probabilities and then utilized these probabilities for MPL computation.

Iii-a MPL computation using RNN

We first discuss the concept of RNN for the time series data and then describe the approaches that have used RNN for MPL computation. Let is a labeled time series dataset with instances of length . According to [2], RNN of any time series is a set of time series in which have in their nearest neighbors. It is mathematically given as

(1)

where denotes the length of time series in prefix space and for dataset in full-length space. Fig. 4 illustrates an example of RNN with a dataset of six time series . An arrow from to represents that is the most nearest time series of based on given distance measure. It is easy to see that the time series can also have empty RNN.

Fig. 4: An example of RNN with six time series.

To compute MPL of any time series , the authors [2] compares the in full-length space with in prefix space of all lengths. The MPL of is set to if following conditions are satisfied:

(2)
(3)
(4)

where . Further, if then MPL of is equal to . Here, Eq. 4 checks the stability of RNN using prefix of with length .

Xing et al. [2] developed two different algorithms, Early 1-NN and Early Classification of Time Series (ECTS), for UTS data. Early 1-NN algorithm computes the MPL of each time series of training dataset using 1-NN. These computed MPLs are first arranged in ascending order and then used for early classification of incoming testing time series . Let is a least value of computed MPLs. Now, as soon as the number of data points in becomes equal to , Early 1-NN starts classification of using prefix space of length . It first computes 1-NN of with data points as follows

(5)

where is dataset of those time series, of training dataset , whose MPL is at most and function computes Euclidean Distance (ED) between two time series. Now, if consists more than one time series of training dataset then most dominating class label is assigned to . If is empty then the classifier waits for more data points in and repeats the above process.

Early 1-NN has two major drawbacks: i) each time series can have different MPL and ii) computed MPLs are short and not robust enough due to overfitting problem of 1-NN. To overcome these drawbacks, ECTS algorithm [2]

first clusters the time series based on their similarities in full-length space. It employed an agglomerative hierarchical clustering 

[agglomerative] with single linkage for clustering. The agglomerative clustering is parameterized by minimum support threshold to avoid the over fitting issue. Later, ECTS computes only one MPL for each cluster to have a more generalized set of MPLs for reliable early classification of an incomplete time series. In [5], the authors presented an extension of ECTS, called as Relaxed ECTS, to find shorter MPLs. Relaxed ECTS relaxes the stability condition of RNN while computing MPLs for the clusters. To compute MPL of any cluster, Relaxed ECTS requires only a subset of time series with stable RNN instead of all. It also speeds up the learning process.

In [20], the authors proposed a MTS Early Classification based on PAA (MTSECP) approach where PAA stands for Piecewise Aggregated Approximation method [keogh2001dimensionality]. MTSECP first applies a center sequence method [li2015piecewise] to transform each MTS instance of dataset into UTS and then reduces the length of the transformed UTS by using PAA method. Let denotes an MTS with components and denotes its corresponding transformed UTS. Mathematically, data point of is obtained using center sequence method as given below

(6)

where . Next, MTSECP [20] represents using PPA method as , where and is computed as

(7)

Finally, the PAA representation of MTS training instances are used to compute class-wise MPLs by utilizing RNN.

Remarks: Learning MPLs using RNN is one of the simplest way to achieve earliness in the classification. However, the approaches including Early 1-NN, ECTS, and Relaxed ECTS, deal with UTS data and can not be easily extended to MTS. Apart from that MTSECP is proposed for early classification of MTS but it instead worked on transformed UTS. It indicates that the MTSECP does not utilize the correlation among different dimensions of MTS. The correlation helps to capture class identifiable information from multiple dimensions together.

Iii-B MPL computation using posterior probabilities

Apart from RNN, some researchers have also utilized the posterior probabilities for MPL computation of time series. This group of early classification approaches compute a class discriminative MPL for each class label of the dataset. For a given training dataset, these approaches fit a probabilistic classifier in prefix space of length , where . The probabilistic classifier provides posterior class probabilities for each time series of training dataset. The class with highest posterior probability is then used to compute the accuracy of the probabilistic classifier on the training data in prefix space of length . Finally, a class discriminative MPL for class label is set to if

(8)

where and are the training accuracy for class in prefix space of and full-length space , respectively. The parameter denotes a desired level of accuracy of the early classification and . From the literature, we found that Gaussian Process (GP) classifier [GP] has been the most preferred probabilistic classifier for early classification of time series [29, 34, 35, 36, 30].

Fig. 5 shows an example of class discriminative MPLs for five different classes along the progress of time series. MPL of any class is basically a timestamp of time series after which the class can be discriminated from other classes of the dataset. In addition, a threshold parameter is also required to learn with the MPLs to check the reliability of prediction [29].

Fig. 5: Illustration of timeline of class discriminative MPLs for five classes, i.e., .

Mori et al. [29] proposed an Early Classification framework based on DIscriminativeness and REliability (ECDIRE) of the classes over time. ECDIRE employed GP classifier to compute the class discriminative MPLs. It also assigned some thresholds to each class label to ensure the reliability of predictions. Such thresholds are computed from two highest posterior probabilities that are obtained by applying GP classifier on training dataset. Let and denote first and second highest probabilities for a training time series using prefix of length , respectively. Now, the threshold for any class is computed using following equation

(9)

where consists the time series that are correctly classified in class using GP classifier. These computed thresholds are used to check the reliability of predicted class label during classification of an incomplete time series. In addition, the authors in [29] also conducted a case study for early identification of bird species by using their chirping sounds. Additionally, they analyzed the statistical significance of ECDIRE using two widely used tests from [demvsar2006statistical].

In [36]

, the authors utilized a concept of game theory for early classification of Indian rivers by using the time series of water quality parameters such as pH value, turbidity, dissolved oxygen,

etc. They first formulate an optimization problem involving accuracy and earliness, and then solve it by proposing a game model. Such optimization helps to compute the class-wise MPLs while maintaining desired level of accuracy .

The authors in [35, 37, 43] attempted to classify an incoming MTS as early as possible with at least accuracy. The main focus of these work is to handle a special type of MTS which is collected by the sensors of different sampling rate in a fixed period of time. In order to classify such incoming MTS, the proposed approaches [35, 37, 43] first estimate the class-wise MPLs for each component (i.e., time series) of MTS separately. Later, the approaches [35, 37] developed a class forwarding method to early classify an incoming MTS by using the computed MPLs. On the other hand, the approach [43] proposed a divide and conquer based method to handle the different sampling rate component during classification of incoming MTS. These approaches [35, 37, 43] implicitly utilize the correlation among the components and thus maximize earliness of the classifier during class label prediction. Apart from this, the authors in [35]

also employed a Hidden Markov Model (HMM) 

[rabiner1989tutorial] during prediction for the further improvement in the earliness. Finally, they evaluated the proposed approach for classifying the type of road surface by using sensors generated MTS data.

Gupta et al. [34] extended the concept of early classification for the MTS with faulty or unreliable components. They proposed a Fault-tolerant Early Classification of MTS (FECM) approach to classify an ongoing human activity by using its MTS of unreliable sensors. FECM first identifies the faulty components using auto regressive integrated moving average model [box2015time] whose parameters are learned from training instances. Later, these faulty components are removed from the MTS and only reliable components are used for classification. During training, the FECM employed GP classifier and -means clustering for estimating the MPLs for each component separately. An utility function is developed to optimize the tradeoff between accuracy and earliness , and is formulated as given below

(10)

The accuracy

is computed using the confusion matrix obtained by applying

-means clustering on training instances. Next, the MPL of a time series is computed as

(11)

Later, FECM employed a kernel density estimation method 

[di2005kernel] for estimating the class-wise MPLs, which are used for the classification of an ongoing human activity.

Remarks: This group of approaches have covered diverse aspects of time series such as different sampling rate components of MTS [35, 43], faulty components [34], and application of game theory model for optimizing tradeoff [36]. It is also observed that the approaches [35, 43, 34] also focused on utilizing the correlation that may exist among the components of MTS.

Iv Shapelet based Early classification

This section presents a detailed review of the approaches that have used shapelets for early classification of time series. The authors [ye2009time, ye2011time] have successfully implemented the idea of shapelets for time series classification, which became the motivation point for many researchers to utilize the shapelets for achieving the earliness in the classification. Moreover, the shapelets improve the interpretability of the classification results [4, 10, 14], which enhances the adaptability of the proposed approach for real world applications such as health informatics and industrial process monitoring. In the early classification approaches [10, 11, 13, 14, 15, 17], the authors focused on to extract a set of perfect shapelets (called as key shapelets) from the given training dataset. Ideally, a perfect shapelet is powerful enough to distinguish all the time series of one class from the time series of other classes. However, it is impractical to find such perfect shapelets. The researchers therefore put their efforts towards developing a proper criterion that can provide a set of effective shapelets (if not perfect) for early classification [13, 14, 19, 24].

For a given training dataset, the early classification approaches first extract all possible subsequences (segments) of the time series with different length and then evaluate the quality and earliness of these subsequences to obtain a set of key shapelets. To compute a distance threshold of any shapelet , the distances are calculated between the subsequence and each time series of training dataset. The distance between and a time series is computed as

(12)

where symbol is used to select a subsequence from the set of all subsequences of . The authors in existing approaches [10, 11, 13, 14, 15, 17] have developed different methods for computing the distance threshold of shapelet by using its distances from each training time series. Later, these shapelets are filtered out based on their utility to obtain the most useful (key) shapelets. Finally, these keys shapelets are used for early classification of an incomplete time series.

An example of early classification using shapelets is illustrated in Fig. 6, where is an incoming time series which is to be classified using key shapelets. The class label of the shapelet is assigned to if the distance between and is less than its pre-computed threshold . The shapelet based approaches can be further divided into two groups based on the key shapelets selection methods.

Fig. 6: Illustration of early classification of time series using shapelet.

Iv-a Key shapelets selection using utility measure

The authors [4] are the first to address the early classification problem using shapelets. They developed an approach called Early Distinctive Shapelet Classification (EDSC) which utilizes the local distinctive subsequences as shapelets (or features) for early classification of time series. EDSC consists two major steps: feature extraction and feature selection. In former step, it first finds all local distinctive subsequences from training dataset and then computes a distance threshold for each subsequence. EDSC designed a strategy for threshold computation where it finds a Best Match Distance (BMD) for each shapelet by computing the distances from that shapelet to each training time series. EDSC employed two methods, kernel density estimation [di2005kernel] and chebyshev’s inequality [papoulis2002probability], to compute the threshold

of each shapelet based on the distribution of BMDs. Next, in feature selection step, the authors selected key shapelets based on their utility. In EDSC, the utility of any shapelet

is computed using its precision and weighted recall , as given below

(13)

The precision captures the class distinctive ability of the shapelet on the training dataset. On the other, the weighed recall captures earliness and frequency of shapelets in the training instances. Additionally, the authors [4]

also discussed a heuristic technique to speed up the key shapelets selection process of EDSC.

As EDSC does not provide any estimate of certainty while making the decision about class label of an incoming time series, Ghalwash et al. [10] presented an extension of EDSC with an additional property of uncertainty estimate. They named their approach as Modified EDSC with Uncertainty (MEDSC-U). The uncertainty estimate indicates the confidence level with which the prediction decision is made and if it is less than some user-defined confidence level then the decision may be delayed even after a shapelet is matched.

In [11], the authors utilized shapelets for early classification of gene expression data. A Multivariate Shapelets Detection (MSD) method is proposed to classify an incoming MTS by extracting the key shapelets from training dataset. MSD finds several multivariate shapelets from all dimensions of MTS with same start and end points. It computes a information gain based distance threshold for each multivariate shapelet to facilitate the matching with incoming MTS. In addition, the authors also formulated a weighted information gain based utility measure to select the key shapelets and to prune the needless shapelets in the process.

Ghalwash et al. [12] pointed out two major limitations in MSD: i) shapelets should have same start and end points in all dimensions of MTS and ii) it is unable to handle a common problem of varied response rate in the clinical data. To overcome these limitations, a hybrid Early Classification Model (ECM) is presented in [12]

by combining a generative model (HMM) with a discriminative model. At first, several HMM classifiers are trained over short segments of time series to learn the distribution of patterns in training data. Next, these trained HMM models generate an array of log likelihood values for the disjoint shapelets of a time series. Such array of likelihood is passed as features for training Support Vector Machines (SVM). For an incoming MTS, when number of arrived data points becomes equal to the shortest segment, respective HMM models generate a set of log likelihood values which is given as input to SVM to estimate probability scores of possible classes. If probability score is higher than a confidence threshold then ECM assigns the respective class label to the MTS otherwise it waits for more data points. The authors also proposed an extension of ECM in 

[18] which aimed to find the relevant length of segments so that HMM models can leverage the temporal dependencies in the patient specific time series.

Lin et al. [19] developed a Reliable EArly ClassifiTion (REACT) approach for MTS where some of components are categorical along with numerical. REACT first discretizes the categorical time series and then generates their shapelets along with the numerical time series. It employed a concept of Equivalence Classes Mining [lo2008mining] to avoid large number of redundant shapelets. This concept also helps to retain distinctive shapelets in the process. Apart from that, the authors proposed pruning techniques to further minimize the redundant shapelets. Later, REACT uses a information gain based utility measure for selecting the key shapelets.

Let is a training dataset with MTS instances belonging to different classes. REACT first defines the entropy of the dataset as , where is number of instances in class . Let is a sub-dataset which consists only those instances of where the shapelet appears as a subsequence. In other words, includes if where can computed using Eq. 12. REACT measures the utility of shapelet using following equation

(14)

where is controls the significance of information with respect to earliness of the shapelet. Another term is referred as weighted support which is similar to weighted recall as given in Eq. 13

Due to large number of shapelets, REACT incurs high computational overhead. The authors therefore implemented REACT with the concepts of parallel computing and executed it on GPU based system, which improved the efficiency of REACT to a great extent.

Remarks: It is true that EDSC is the first to adopt the idea of shapelets for early classification but it deals with only UTS. In fact, its modified version MEDSC-U is also limited to UTS only. However, both of these approaches have successfully drawn the attention of many researchers towards shapelet based early classification of MTS. Gene expression classification has been remain a common interest in the shapelet based approaches that are proposed for MTS and discussed in this group.

Iv-B Key shapelets selection using clustering

He et al. [13] attempted to solve an imbalanced class problem of ECG classification where training instances in abnormal class are much lesser than normal. They addressed this problem in the framework of early classification of MTS and proposed a solution approach called as Early Prediction on Imbalanced MTS (EPIMTS). At first, EPIMTS extracts all possible subsequences (candidate shapelets) of different length from each component of MTS separately. Unlike MSD [11], the extracted candidate shapelets are univariate and thus need not to have same start and end points in all the dimensions. These candidate shapelets are clustered using Silhouette Index method [rousseeuw1987silhouettes]. Later, the shapelets in the clusters are ranked according to a Generalized Extended F-Measure (GEFM) and a shapelet with maximum rank is used to present the respective cluster. For a shapelet , GEFM is computed as

(15)

where weight parameters and are used to control the importance of earliness , precision , and recall , respectively. In EPIMTS, GEFM worked well for measuring the quality of shapelets. Finally, key (core) shapelets are selected from each component of MTS based on the ranking of the obtained clusters.

In [14], the authors proposed an approach, called as Interpretable Patterns for Early Diagnosis (IPED), for studying viral infection in humans using their gene expression data. Similar to MSD, IPED also extracts multivariate candidate shapelets from the training MTS but it allows to have a multivariate shapelet with different start and end points in different dimensions. IPED computes an information gain based distance threshold for each shapelet. The authors formulated an optimization problem to find the relevant components of MTS and the key shapelets are selected from these components only. For each relevant component, the candidate shapelets are clustered into groups (total number of classes in the dataset) and then only key shapelet is selected from each group. IPED finds such key shapelets by optimizing a logistic loss over training instances.

One of the major drawback of MSD [11], EPIMTS [13], and IPED [14] approaches, is that they do not incorporate the correlation among the shapelets of different components of MTS during classification. Such a correlation helps to improve the interpretability of the shapelets. To overcome this drawback, the authors in [15, 17] developed an approach, called as Mining Core Features for Early Classification (MCFEC), where core features are the key shapelets. MCFEC first obtains candidate shapelets from each component independently and then discovers the correlation among the shapelets of different components to enhance their interpretability. Later, the key shapelets are selected using Silhouette Index method [rousseeuw1987silhouettes] based on their ranking computed using GEFM (given in Eq. 15). For classification of an incomplete MTS, MCFEC employed a Query By Committee (QBC) [seung1992query] strategy where a class label is first predicted for each component of the MTS and the class label that appears in majority is assigned to the incomplete MTS.

The authors in [38] presented a Confident Early Classification framework for MTS with interpretable Rules (CECMR) where key shapelets are extracted by using a concept of local extremum and turning points. Local extemum point of a time series is if

or

where . Next, the turning point of is if following condition holds

CECMR first discovers interpretive rules from the sets of candidate shapelets and then estimates the confidence of each rule to select the key shapelets. The correlation among the components of MTS is also incorporated in CECMR.

Recently, the shapelets are adopted for estimating an appropriate time to transfer a patient into Intensive Care Unit (ICU) by using the MTS data of physiological signs [24]. The authors in [24]

proposed a Multivariate Early Shapelet (MEShapelet) approach to estimate such appropriate time. In this case, the measurements for different physiological signs are not recorded at same interval, which generates an asynchronous MTS where the components may have different length. MEShapelet first extracts candidate asynchronous multivariate shapelets and then computes a tolerance of time threshold for each extracted candidate shapelet. Such threshold is proposed to limit the deviation among the dimensions of a shapelet. Later, the key shapelets are filtered out using clustering, which are used to construct two information gain based classifiers, decision tree and random forest. The authors collected ICU data of 2127 patients to examine the effectiveness of MEShapelet.

Remarks: Clustering of candidate shapelets has been proven a good idea to select more distinctive key shapelets than that of selected by utility measure. However, its credit goes to GEFM which ranks the shapelets based on their distinctiveness, earliness, and frequency. The approaches [17, 15, 38] utilize the correlation among the components of the shapelets which improved their earliness and interpretability to a great extent. It is interesting to note that MEShapelet [24] is able to classify even an asynchronous MTS of ICU data of patients.

V Model based early classification

This section discusses the model based early classification approaches for time series data. Unlike prefix based or shapelets based approaches, the model based approaches [6, 9, 7, 27, 30] formulate a mathematical model to optimize the tradeoff between earliness and reliability of prediction. Most of these approaches aimed to design a decision or stopping rule by using the conditional probabilities. These conditional probabilities are either generated by generative classifiers or computed by fitting a discriminative classifier on training dataset. Further, there exist some approaches [16, 21, 23, 26] which do not incorporate the reliability parameter in the model but still provided significant earliness.

Mathematically, the generative classifiers estimate the joint probability distribution

from the given labeled instances, where and denote the time series and its label, respectively. These classifiers use Bayes’ rule to make predictions by generating conditional probability and thus they learn true distribution of classes [ng2002discriminative]. During classification, a testing instance is first modeled by using learned distribution and then classified by comparing its model with the models of training instances. On the other hand, discriminative classifiers calculate the conditional probability by mapping the input instances to their labels. These classifiers attempt to a learn decision boundary between classes [ng2002discriminative], which helps to classify the testing instance. We divide the model based approaches into two following groups based on the type of adopted classifier.

V-a Using discriminative classifier

In [8], the authors developed an ensemble model based early classification approach to recognize a type of gas using an incomplete 8-dimensional time series generated by a sensors-based nose. The ensemble model consists a set of classifiers with a reject option which allows them to express their doubt about the reliability of the predicted class label. The probabilistic classifier assigns a class label to an incomplete time series using the posterior class probabilities, where

(16)

If is close to then the classifier can choose reject option to express its doubt on the class label . A threshold is used to decide whether the classifier should choose reject option or not.

The set of classifiers are kept serially along the progress of time series to facilitate the prediction using small portion of data points. If sufficient data points are not arrived then prediction is carried out again by the next classifier when another portion of data points are arrived. This process is repeated until majority of the classifiers are confident enough about the predicted class label. Decision of choosing the reject option is also depend on the cost of data collection time. Another work in [44] focused on minimizing response time to obtain the earliness in the classification. This work developed an empirical risk function which allows to minimize the risk associated with early prediction and thus optimizes the response time and earliness with confidence.

Dachraoui et al. [25] proposed a non-myopic early classification approach where the term non-myopic means at each time step the classifier estimates an optimal time in the future when a reliable prediction can be made. For an incomplete time series with data points, the optimal time is calculated by following expression

(17)

where function estimates an expected cost for future time steps . The expected cost function is formulated as

(18)

where denotes a cluster obtained after clustering the training instances into clusters and is the set of all class labels (i.e., ) of the dataset. The term computes a membership probability of into cluster and estimates the posterior probabilities of the classes using training data. Other terms and represent the cost of misclassification and expected cost of classification that may incur after steps, respectively. The formulated cost function in Eq. 18 works as trigger function to decide whether sufficient data is arrived in for making a reliable prediction or not. If then the classifier is allowed to make prediction about the class label of . As the cost function is also using the arrived data points of incomplete time series for estimating optimal time, the proposed approach is also adaptive.

The authors in [28] pointed out two weaknesses of [25]: i) assumption of low intra-cluster variability, which is impractical while obtaining membership probabilities using clustering and ii) clustering is carried out with complete time series, which may impact the estimation of optimal time. In [28], two different algorithms (NoCluster and 2Step) are introduced to overcome these weaknesses while preserving adaptive and non-myopic properties. The expected cost function in NoCluster algorithm is given as

(19)

where and is the distance between and a training time series . Next, computes the expected cost per time series basis and is expressed as

(20)

where is can be obtained by applying a probabilistic classifier on the training data. Next, in 2Step algorithm, the authors build a set of classifiers for achieving earliness and a set of regressors for maintaining the non-myopic property.

Mori et al. [27] proposed an EarlyOpt framework where a separate probabilistic classifier is constructed for each step of the time series. They formulated a stopping rule by using two highest posterior probabilities obtained from the classifiers. The main objective of EarlyOpt is to minimize the cost of prediction by satisfying the stopping rule. EarlyOpt employed two widely used discriminative classifiers, i.e., GP and SVM. In another work [30]

, the authors developed two different stopping rules by using the class-wise posterior probabilities. These stopping rules also included some real-value parameters which are optimized by using Genetic algorithms 

[holland1992adaptation]. In addition to the stopping rules, the authors developed three cost functions, with -norm and -norm, to optimize accuracy and earliness of the classification.

The authors in [33] introduced a two-tier early classification approach (TEASER) based on master-slave paradigm. In first tier, a slave classifier first computes posterior probabilities for each class label of the dataset and then constructs a feature vector for each training time series. Let is a set of posterior probabilities which is obtained for training time series . Now, the feature vector for can be given as

(21)

where is the most probable class label and is difference between first and second highest posterior probabilities. The feature vector is passed to a master classifier. In second tier, the authors employed a one-class classifier (e.g., oc-SVM [scholkopf2001estimating]) as master classifier to check the reliability of the probable class label.

Remarks: In this group, we found two interesting approaches [25, 28] that have addressed the early classification problem with a different property known as non-myopic. However, computational complexity of such approaches is very high during classification.

V-B Using generative classifier

The authors in [6, 9]

formulated a decision rule to classify an incomplete test time series with some pre-defined reliability. They employed Gaussian Mixture Model estimation and joint Gaussian estimation for estimating the distribution of incomplete time series by modeling the complete time series of training dataset as random variables. Two generative classifiers, linear SVM and Quadratic Discriminant Analysis (QDA) 

[srivastava2007bayesian], with the formulated decision rule were adopted to provide a desired level of reliability (or accuracy) in the early classification. The authors in [7]

also employed QDA to classify an incomplete time series with a desired level of reliability. The proposed approach (called as Early QDA) assumed that the training time series have Gaussian distribution, which helps to estimate parameters (

i.e., mean and covariance) easily from training instances.

Antonucci et al. [26] developed a generative model based approach for early recognition of Japanese vowel speakers using their speech time series data. The proposed approach employed an imprecise HMM (iHMM) [antonucci2015robust]

to compute likelihood of intervals of incoming time series with respect to the training instances. It uses expectation maximization algorithm to infer the parameters without using the observations of state variables. For reliable prediction, a class label is assigned to the incoming time series only if the ratio of two highest likelihoods is greater than a predefined threshold value.

Li et al. [16] employed a stochastic process, called as Point Process model, to capture the temporal dynamics of different components of MTS. They proposed a Multilevel Discretized Marked Point Process (MD-MPP) approach for early classification of MTS. MD-MPP models temporal dynamics of each component independently and then computes sequential cues to capture temporal order of events that have occurred over time among components. The authors also incorporated the correlation among components of MTS by using a variable order markov model. Another point process based approach, called as Dynamic Marked point Process with Prediction by Partial Matching (DMP+PPM), is presented in [21]. DMP+PPM captures temporal dynamics of three dimensional observations of human actions. It also incorporates temporal dependencies among different human joints during classification of ongoing action.

Finally, the authors in [23] proposed a complex activities recognition framework for mobile platform, which is named as Simultaneous complex activities Recognition and Action sequence Discovering (SimRAD). It incorporates two probabilistic models, one for action sequences and another for complex activities. The probabilistic models estimate the distribution parameters from training data and use them for inferring the class label of an incomplete MTS corresponding to an ongoing activity.

Remarks: Generative classifier based early classification approaches are more complicated than that of based on discriminative classifier. Moreover, these approaches heavily depends on the estimation of data distribution by fitting the stochastic processes which makes them difficult to understand and degrades the interpretability of their results as well.

Vi Miscellaneous approaches

This section covers other early classification approaches which do not meet the inclusion criteria of other categories. One of a primary objective of every early classification approach is to build a classifier that can provide earliness while maintaining a desired level of reliability or accuracy. However, there exist some approaches [3, 39, 22] which are capable enough to classify an incomplete time series but without ensuring the reliability, in other words, they did not attempt to optimize the tradeoff between accuracy and earliness.

Vi-a With tradeoff

The authors in [31] introduced a reinforcement learning based early classification framework using a Deep Q-Network (DQN) [mnih2015human] agent. The framework uses a reward function to keep balance between accuracy and earliness. It also includes a suitable set of states and actions for the observations of the training time series. The DQN agent learns an optimal decision making strategy during training which helps to pick a suitable action after receiving an observation in the incoming time series during testing.

In another work [32]

, the authors developed a deep neural network based early classification framework that focused on optimizing the tradeoff by estimating the stopping decision probabilities at all time stamps of time series. The authors formulated a new loss function to compute the loss of the classifier if a class label

is predicted for an incomplete time series with data points. The loss at time is given as

(22)

where is a tradeoff parameter to control the weightage of classification loss and earliness loss

. The authors implemented the framework by using a set of Long Short-Term Memory (LSTM) layers and a single convolutional layer along with the new loss function.

Remarks: Recently, the researchers in approaches [31, 32] have successfully employed reinforcement learning and deep learning techniques for early classification. These approaches have unfold a new direction for further research.

Paper
Abbreviated name
of the approach
Classifier used Datasets for experimental evaluation
Type of
time series
Category
[1] SCR and GSDT Rule based and decision tree ECG [UCR], synthetic control [UCI], DNA sequence [UCI] UTS Prefix based early classification
[2] ECTS 1-NN 7 UCR datasets
[5] Relaxed ECTS 1-NN 7 UCR datasets
[29] ECDIRE GP classifier 45 UCR datasets
[36] - GP classifier River dataset [riverd]
[20] MTSECP 1-NN Wafer and ECG [waferandecg], character trajectories [UCI], robot execution failures [UCI] MTS
[37] - GP classifier Daily and sports activities [UCI]
[34] FECM GP classifier Human activity classification (collected), NTU RGB+D [ntu], daily and sports activities [UCI], heterogeneity human activity recognition [UCI]
[35] - GP and HMM classifiers Road surface classification (collected), PEMS-SF [UCI], heterogeneity human activity recognition [UCI], gas mixtures detection [UCI]
[43] - GP classifier Hydraulic system monitoring [UCI], PEMS-SF [UCI], daily and sports activities [UCI]
[4] EDSC Closest shapelet using ED 7 UCR datasets UTS Shapelet based early classification
[10] MEDSC-U Closest shapelet using ED 20 UCR datasets
[11] MSD Closest multivariate shapelet using ED 8 Gene expression datasets [zaas2009gene, baranzini2004transcription] MTS
[12, 18] ECM Hybrid model using HMM and SVM 5 Gene expression datasets [baranzini2004transcription]
[13] EPIMTS Closest multivariate shapelet using ED Wafer and ECG [waferandecg], 2 synthetic datasets
[14] IPED Closest multivariate shapelet using ED 2 Gene expression datasets [zaas2009gene], ECG [goldberger2000physiobank]
[15, 17] MCFEC MCFEC-rule and MCFEC-QBC classifiers Wafer and ECG [waferandecg], 2 synthetic datasets
[19] REACT Decision tree Gene expression dataset [baranzini2004transcription], Wafer and ECG [waferandecg], robot execution failures [UCI]
[38] CECMR Closest multivariate shapelet using ED Wafer and ECG [waferandecg], 5 UCI datasets
[24] MEShapelet Decision tree and random forest ICU data of 2127 patients (collected)
[7] Early QDA QDA 1 Synthetic and 4 UCR datasets UTS Model based early classification
[6, 9] - Linear SVM and Local QDA 15 UCR datasets
[25] - Naive Bayes and Multi Layer Perceptron TwoLeadECG [UCR]
[26] - HMM and iHMM Japanese vowel speaker [UCI]
[27] EarlyOpt GP and SVM 45 UCR datasets
[44] - Linear SVM CBF [UCR], control charts [UCI], character trajectories [UCI], localization data for person activity [UCI] (after preprocessing)
[28] NoCluster, 2Step SVM 76 UCR datasets
[30] - GP and SVM 45 UCR datasets
[33] TEASER Two-tier classifier using variants of SVM, DTW 45 UCR datasets, PLAID [gao2014plaid], ACS-F1 [gisler2013appliance]
[8] - SVM Gas dataset (collected) MTS
[16] MD-MPP Stochastic process Auslan [UCI], PEMS-SF [UCI], motion capture [mocap]
[21] DMP+PPM Stochastic process motion capture [mocap], NTU RGB+D [ntu], UT Kinect-Action [xia2012view]
[23] SimRAD Statistical model with Deep neural network Complex activities dataset (collected)
[3] - Adaboost ensemble classifier CBF [UCR], control charts [UCI], trace [UCR], auslan [UCI] UTS Miscellaneous approaches
[31] DQN Reinforcement learning agent 3 UCR datasets
[32] - Combination of LSTM and CNN 46 UCR datasets
[39] CBR 1-NN with ED and DTW Simulated plant faults dataset (collected) MTS
[22] MDDNN Combination of CNN and LSTM Wafer and ECG [waferandecg], auslan [UCI]
TABLE II: Summary of the early classification approaches for time series.

Vi-B Without tradeoff

One of the first work that mentioned early classification of time series, is presented in [3]. Though this work aimed to classify an incomplete time series, it does not attempt to optimize the tradeoff between reliability and earliness. The authors in [3] divide the time series into intervals and then treat each interval as predicate. They use only available predicates in the classification and ignore the unavailable to achieve earliness.

The authors in [39] applied a Case-Based Reasoning (CBR) method for early classification of faults in a simulated dynamic system. CBR employed a -NN classifier to classify a fault by using an incomplete time series. The simulation studies showed that the CBR method has achieved significant earliness of around but without ensuring the desired level of reliability or accuracy.

Recently, Huang et al. [22]

proposed a Multi-Domain Deep Neural Network (MDDNN) based early classification framework for MTS. MDDNN employed two widely used deep learning techniques including Convolutional Neural Network (CNN) 

[krizhevsky2012imagenet] and LSTM [hochreiter1997long]

. It first truncates the training MTS up to a fixed time step and then give it as input to a CNN layer which is followed by another CNN and a LSTM layers. Frequency domain features are also calculated from the truncated MTS which are also given as input to a similar framework of CNN-CNN-LSTM layers. Output features from both the frameworks are passes to a fully connected layer. Finally, the fully connected layer along with a

softmax function is applied on the input to obtain the class assignment probabilities for the given incomplete MTS.

Remarks: The approaches [3, 39] are quite old and did not focus on optimizing the tradeoff which is a primary objective of the early classification. However, these approaches built a foundation for the concept of early classification. Further, we also found a recent work [22] without tradeoff optimization where deep learning models are exploited to achieve an adequate level of earliness.

Vii Discussion and Future Research Directions

With the presented categorization of early classification approaches, one can get a quick understanding of the notable contributions that have been made over the years. After reviewing the literature, we found that most of the early classification approaches have appeared after ECTS [2]. Although some early classification approaches (e.g. [3, 39]) have attempted to achieve the earliness far before than ECTS but they did not maintain a desired level of accuracy or reliability of the class prediction, which is a primary criterion of a true early classifier. We therefore included such approaches in without tradeoff group of miscellaneous category. Further, we present a summary of all the categorized approaches in Table II to have a quick acquaintance of various categories with their included papers. Moreover, this table provides details about the employed classifiers and the datasets that have been used for experimental evaluation. Additionally, one can easily separate out the approaches based on the type of time series (i.e., UTS or MTS). We make following points after a thorough review of the early classification approaches:

  • Prefix based approaches are easy to understand and have provided satisfactory results on various UCR and UCI datasets. In these approaches, 1-NN classifier for UTS and GP classifier for MTS have been a common choice for learning MPLs.

  • Majority of the approaches that use shapelets, have focused on early classification of gene expression data (i.e., MTS) and thus suitable for medical applications. As the doctors may be reluctant to adapt an approach without interpretable results, the primary objective of these approaches was to obtain the key shapelets that can exclusively represent all the time series of one class. Such shapelets are easy to interpret by linking with the patient’s disease.

  • Model based early classification approaches are difficult to understand as they involve complicated statistical methods for developing the stopping rule. From Table II, we observe that SVM has been a widely employed classifier to obtain conditional probabilities for evaluating the stopping rule or trigger function.

  • Recently, the researchers in [32, 22] have shown an interest in deep learning models for early classification of time series and have achieved promising results for both UTS and MTS.

Challenges: Despite promising results of prefix based and model based approaches, the end users may not be prefer these approaches for medical applications due to lack of interpretability in the classification results. Moreover, model based approaches are sophisticated and thus can be used only as black box. On the other hand, the shapelet based approaches are good for medical applications but impose heavy computations as huge number of candidate shapelets are to be extracted from training instances.

Research directions: In spite of having several existing early classification approaches, there exist some promising areas for further research as discussed below:

  • One of a most promising research direction is to incorporate the interpretability in the prefix based approaches but without imposing any heavy computations.

  • Imbalanced distribution of time series among the classes is a common problem in the applications where some classes have much lesser instances than other classes. Only EPIMTS [13] has focused on this problem by using the shapelets. It indicates a scope of better solution through other types of approaches.

  • Few recent studies have employed deep learning models such as LSTM and CNN, in the framework of early classification. It also opens a new direction towards enhancing the

    interpretability of neurons

    in the models which in turn will improve the adaptability of the approach.

  • In addition, the deep learning based early classification framework can be extended to incorporate correlation among the components of MTS while classifying an incoming MTS. Such correlation will surely improve the early classification results.

Nomenclature

1-NN 1-Nearest Neighbor
BMD Best Match Distance
CBR Case-Based Reasoning
CECMR Confident Early Classification framework for MTS with interpretable Rules
CNN Convolutional Neural Network
DMP+PPM Dynamic Marked point Process with Prediction by Partial Matching
DQN Deep Q-Network
DTW Dynamic Time Warping
ECDIRE Early Classification framework based on DIscriminativeness and REliability
ECG Electrocardiogram
ECM Early Classification Model
ECTS Early Classification of Time Series
ED Euclidean Distance
EDSC Early Distinctive Shapelet Classification
EPIMTS Early Prediction on Imbalanced MTS
FECM Fault-tolerant Early Classification of MTS
GEFM Generalized Extended F-Measure
GP Gaussian Process
GSDT Generalize Sequential Decision Tree
HMM Hidden Markov Model
ICU Intensive Care Unit
IPED Interpretable Patterns for Early Diagnosis
LSTM Long Short-Term Memory
MCFEC Mining Core Features for Early Classification
MDDNN Multi-Domain Deep Neural Network
MD-MPP Multilevel Discretized Marked Point Process
MEDSC-U Modified EDSC with Uncertainty
MEShapelet Multivariate Early Shapelet
MPL Minimum Prediction Length
MSD Multivariate Shapelets Detection
MTS Multivariate Time Series
MTSECP MTS Early Classification based on PAA
PAA Piecewise Aggregated Approximation
QBC Query By Committee
QDA Quadratic Discriminant Analysis
REACT Reliable EArly ClassifiTion
RNN Reverse Nearest Neighbor
SCR Sequential Rule Classification
SimRAD Simultaneous complex activities Recognition and Action sequence Discovering
SVM Support Vector Machines
TEASER a two-tier early classification approach
TSC Time Series Classification
UCI Datasets repository
UCR Time series datasets repository
UTS Univariate Time Series
A training dataset with labeled time series
Number of time series in dataset
Length of complete time series
Number of class labels in
A time series of
A class label
Prefix length variable with
data point of
A time series with data points
An incomplete (or testing) time series
An incomplete time series with data points
An MTS with components (dimensions), where
Earliness
Accuracy of classifier using data points
A shapelet with quadruple
A subsequence of time series of length
Distance threshold of a shapelet
Nearest neighbors of in
Set of reverse nearest neighbors of in

References