The field of quantum computing has experienced rapid growth in recent years, both in the number of quantum computing hardware providers and their respective processors’ computing power. Companies such as D-Wave Systems, Rigetti, and IBM offer access to their quantum processors, and their use in proof-of-concept demonstrations has been widely discussed in literature. Quantum processing units (QPUs) have been used to solve a wide variety of problems such as traffic flow [trafficflow], logistics and scheduling [jobshop, flightscheduling], quantum simulation [vwchemistry, McCaskey2019, Grimsley2019], and more [Venturelli2019, advertising]. Notably, a recent study by Google showed how their QPU can perform the task of sampling from random quantum circuits faster than state-of-the-art classical software [quantumsupremacy], ushering a new era in the field of quantum computing. These applications use so-called noisy intermediate scale quantum (NISQ [nisq]) processors to solve various forms of optimization and sampling problems. Most commonly, the problem is formulated as a quadratic unconstrained binary optimization (QUBO) problem, or its equivalent form of an Ising Hamiltonian. The former uses a basis of binary variables, and the latter makes use of spin variables . Both can be solved using existing quantum computing hardware.
The QPUs provided by D-Wave Systems use a quantum annealing algorithm that implements a transverse-field Ising Hamiltonian [manufacturedspins]. This quantum protocol prepares an initial Hamiltonian with a simple ground state, and transitions to a Hamiltonian whose ground state is difficult to find. This is referred to as Adiabatic Quantum Computation (AQC) [aqc], and under open quantum system conditions as quantum annealing [nishimori]. Because AQC has been shown to be polynomially equivalent to gate-based quantum computation [aharonov2008adiabatic], and the Ising spin-glass has been shown to be NP-hard to minimize [barahona]
, AQC (and quantum annealing) has the potential to significantly impact the fields of optimization, machine learning, and operations research. Equivalently, with gate-model QPUs such as those produced by Google, IBM, and Rigetti, the quantum approximate optimization algorithm (QAOA) is used to solve such Ising Hamiltonians. This algorithm also attempts to minimize a target Ising Hamiltonian by alternating between a driver and mixer Hamiltonian, until the sampling procedure converges to the target state population. The derivation and details of the QAOA algorithm are beyond the scope of this paper, and are discussed in detail in[qaoa].
At the end of a quantum annealing run, or a QAOA circuit execution, the measurements are a projection along the
-component of the qubits’ spins, resulting in a sequence of classical bit strings. These states can be interpreted as approximations to finite-temperature Boltzmann states from the classical spin-glass Ising Hamiltonian[temperature, qaoaboltzmann]:
Therefore, the task of programming a quantum annealer or QAOA circuit involves finding a suitable Ising Hamiltonian representation for the optimization task. In this paper, we motivate classification of time series data based on extracting features which exist within the data, and use combinatorial optimization techniques to match and reconstruct data with other time series. We start by reducing the dimensionality of the TS data and encode it as a string. We introduce a pulling procedure, comparing the encoded strings to form a collection of sets, where common features between the strings are extracted. All extracted common features are pooled together, which we can then use to construct new TS and compare between existing ones. We perform these tasks by using the pulled features as elements of the universe in the set cover problem, which has a known QUBO/Ising formulation and can be solved using quantum computers. By reformulating the critical task in our clustering algorithm as a set cover problem, we introduce two novel ideas to quantum clustering algorithms: (1) we avoid representing single vectors with polynomial numbers of qubits, instead representing the features within the data as the qubits, and (2) we perform the clustering task by transferring the core concepts of clustering (and reconstruction) to the quantum algorithm for set cover, as opposed to a direct translation of a distance-based minimization procedure.
The rest of this paper is organized as follows. Section 2 provides a short overview of existing methods for both classical and quantum clustering. Section 3 motivates the task of TS reconstruction, explains the methods used to discretize the data, and how to convert the discretized data to the set cover problem and its representative QUBO. Section 4
shows how to extend the reconstruction method to classify the TS data. Section5
outlines the experiments performed to test the developed method using various open-source data sets, and conclusions are presented in Section6.
2 Previous works
Quantum computing-based approaches which exist in literature involve fundamentally different approaches than that introduced in this work; we provide a brief overview of some key methods and algorithms related to quantum clustering. Assuming the existence of error-corrected quantum processors (and the existence of quantum RAM), it has been shown that quantum computers could perform -means clustering exponentially faster than their classical counterparts [Lloyd2013QuantumAF]. Other works have also shown how to reformulate parts of classical clustering algorithms as quantum subroutines that can be executed on error-corrected gate-model QPUs [quantum_TS1, quantum_clustering2, quantum_clustering3, quantum_clustering4]
. In quantum annealing, a similar approach has been shown in which the objective function of the clustering task (minimizing distance metrics between high-dimensional vectors) has been directly translated to a QUBO, with each vector’s possible assignment represented via one-hot encoding to physical qubits[booz, quantum_clust].
Classical time series (TS) analysis is considered to be a challenging task due to high number of dimensions involved resulting in the Curse of Dimensionality phenomenon. A series of works address the question of efficient dimensionality reduction[HOTSAX, symbol2, symbol3, senin2018grammarviz, schafer2012sfa, Patel2002MiningMI, PAA_paper, 1021475], explaining the trade-off between information loss and search space size. Main results presented in this manuscript are obtained with Symbolic Fourier Approximation (SFA) method [schafer2012sfa]
due to its pruning power, noise-robustness and scalability. SFA represents each real-valued TS in a frequency domain by a symbolic string using the discrete Fourier transform. These transformed TS can then be used by classical string-based similarity algorithms such as phonetic distance based, Levenshtein, Hamming, Jaro, Jaro-Winkler measures, and more[gomaa2013survey].
Classical TS clustering techniques can be split into the following categories: model-based, feature-based, shape-based and their combinations [TS_clust4]
. In the model-based approach the TS is encoded and fit by parametric models and clustering is applied to these extracted parameters[TS_clust2]
. In feature-based methods, the features of TS, like Fourier components, periodicity, trend, number of peaks, variance, etc., are extracted and later clustered by conventional algorithms[Hautamaki2008TimeseriesCB, CHRIST201872]. Shape-based approaches refer to comparing shapes of TS directly and matching them according to specifically chosen metrics. A typical example for this approach is Dynamic Time Warping (DTW) [DTW_original], which has been shown to outperform Euclidean metrics [Chu2002IterativeDD, Vlachos2002DiscoveringSM]. DTW-based classical methods are used to evaluate the accuracy of our approach in Section 5. For more details on classical approaches to TS clustering, we refer the reader to [TS_clust1, TS_clust3, TS_clust4].
3 Time Series Reconstruction: Problem Formulation
Clustering techniques generally require specific data representation, similarity measure definitions, and clustering algorithm selection. Similarly, in our quantum computing based approach, we represent the TS data as encoded strings from which we formulate semi-supervised clustering and optimal reconstruction as a set cover problem, and provide metrics based on solutions to the set cover problem. While different than classical approaches [reconstr2, reconstr1, reconstr3, reconst4], we preserve the computational complexity of the problem, while introducing a method that is based on latent features within the data.
In order to reconstruct given time series data, we start by discretizing the data, and comparing the encoded strings to generate the elements of our universe to form the set cover. This pulling technique is crucial to allow feature-wise comparison of the data, as well as arbitrary reconstruction of TS using existing (or training) data. We use existing techniques for discretization, and explain the pulling procedure in detail. We then show how to use this data to construct the set cover problem for quantum optimization.
3.1 Discretization and Pulling Technique
There are many ways to discretize time series data, as reviewed previously. For our purposes, we use the symbolic Fourier approximation (SFA) method [schafer2012sfa]
, as it provides differentiation between separate TS classes and features in high-dimensional data sets, allowing us to use these representative symbols for our quantum algorithm. Nevertheless, the exact discretization is data-dependent, with various hyperparameters (such as number of letters in the alphabet, length of each encoded string, etc.) present in the method; for a full explanation we refer the reader to[schafer2012sfa]. Given the encoded strings, we compare the time series using the following pulling procedure, illustrated in Figure 1. This pair-wise comparison is considered a preprocessing step necessary to formulate our set cover problem. Starting with one fixed string (red in the figure), we consider each encoded character as an independent element in the universe set333It is important to note that by using the same encoding scheme for all TS data, we ensure that all string characters belong to the same alphabet. ( in the figure). A second string (green in the figure) is compared element-wise by successively moving the second string along the first, as illustrated. At every iteration, all character matches between the two strings are recorded as a new set. In the example from Figure 1, the set of sets is .
The procedure is repeated for the rest of the encoded training TS to form the set of sets . This set, which is a union of all subsets obtained via the pulling technique, represents the features in common between the target time series and all other time series in the data set. Given this aggregate set, the goal is now to select the minimal subset that most closely reconstructs the universe, which is the NP-hard set cover problem. In the case illustrated in Figure 1, the optimal selection of subsets is underlined in red. In principle, solutions of this set cover problem do not preserve order of elements, and allow the use of the same element multiple times. This feature is useful for TS comparison, as elements of the time series data can be permuted and duplicated without affecting our reconstruction method.
3.2 Formulating the Set Cover Problem
Given the encoded strings and the common set of features , we can now formulate the set cover problem as a QUBO, following the method demonstrated in [Lucas]. Consider the universe set , and a set of subsets , such that , . Finding the smallest number of subsets whose union is is a well-known NP-hard optimization problem in the worst case [Karp]
. In order to map the set cover problem to a QUBO problem, we use the following binary variables:
Here, denotes an element of universe set, and signifies if element appears in subsets. We consider the full QUBO as a sum of two components:
The complete QUBO is given by [Lucas]. The first summation in imposes that exactly one of must be selected in the minimum via a one-hot encoding. The second term in represents the number of times is selected, and that this is equal to the number of selected subsets appears in (, as only one can be 1 in the minimum). The final term (5) serves to minimize the number of needed to cover the universe . The total number of variables required is , where is the maximal number of sets that contain given element of (see [Lucas] for details). The limiting case where each element of included covers only one element of constrains the coefficient of and to . The closer the coefficient and , the more weight is given to (5), minimizing the number of elements selected from .
In our application of time series reconstruction, the final size of the QUBO is heavily dependent on our choices during discretization. For example, the number of binary variables is equal to , where is the number of TS in the training set used for reconstruction, and is the length of string that encodes the TS. Increasing the string length to encode each TS changes the size of the universe . Allowing longer encoded strings to represent the data creates more subsets . Therefore, there exists a trade-off between the granularity of the encoded strings and the ability to solve the set cover representation of the problem. Including more characters in our alphabet for discretization changes the non-empty sets , which the number of quadratic elements in the QUBO depends on. The general trend is, however, that the number of the quadratic element decreases with the increase of the characters used in our alphabet. This could be explained due to the properties of the pulling procedure described above, since a smaller alphabet produces more non-empty elements which could be used for reconstruction of the universe . In Figure 2 we show how varying these hyperparameters of the discretization affects the size of the QUBO problem, based on 20 test samples from the BeetleFly data set [hills2014classification].
4 Semi-supervised classification
We can now combine the methods described in the previous sections– constructing the universe set from discretized data and the subsets – to perform semi-supervised clustering. We start by separating the input TS data into two groups– training and test data. In our case we use training data sets with known labels, and the task we solve is to use the labeled data to assign labels to the test set. Normally, the training set with labeled data is significantly smaller than unlabeled test set, which we exploit in our method.
We encode both the training and test data sets into strings using the method described in Section 3.1. We then perform the reconstruction procedure for every TS in our test set using the entire training set. Each TS from the test set is assumed to individually form a universe , and is to be reconstructed using the sets , obtained via the pulling procedure. Explicitly, using Figure 1, the red string is the TS from the test data set, and all strings in the training set are pulled through (greed strings) to obtain the ’s. This allows us to compare every test TS to the full training set in one-versus-all manner. Then, using the universe and ’s from the pulling procedure, we formulate the set cover problem outlined in Section 3.2. Thus, a single solution to that set cover problem (even sub-optimal in the worst case) allows us to reconstruct each TS from the test set using a set of discretized features obtained from all elements which appear in the training set. Furthermore, since annealing-based sampling methods produce finite-temperature Boltzmann distributions [temperature], various optima of the set cover problem could yield different ways to reconstruct the test TS using the training set. Due to this, it is therefore the users’ task to use these reconstructions to associate each test TS with a label from the training set. We outline the steps of our classification procedure using pseudo-code in Algorithm 1.
To classify the reconstructed test TS data we evaluated three different similarity metrics using set cover solutions: largest common subset , highest number of common subsets , and largest sum of common elements in selected . We briefly explain how each metric is calculated, and discuss the performance of each.
Largest common subset. Given a candidate solution to the set cover problem, the label corresponding to the which contains the most elements is selected. The label is then assigned to the test TS. This metric captures the longest continuous set of features from the training TS data, and assumes that is sufficient to determine the label.
Number of common subsets. Frequently, multiple ’s from the same training TS are used to reconstruct a test TS. In this metric, we count the number of subsets used to cover the universe. The test label is assigned the same label as the training TS which appears most frequently in the set cover solution.
Largest sum of subsets. This metric is a combination of the previous two. For every training TS that is used to reconstruct a test set, the total number of elements used by each is counted (summed over all ’s). The label which corresponds to the training TS with the largest sum is assigned to the test TS.
These metrics allow us to quantify the accuracy of our semi-supervised clustering algorithm. The first two metrics, being based on large sets of common features between the TS, performed the best (results shown in the next section). There was no significant difference between the two metrics, and the superiority of one metric over the other varied between data sets. The third metric, which was a combination of the first two, performed worse than either of the first metrics in the majority of the cases tested. While unexpected to begin with, this observation could be explained by the fact that because the third metric admits matches with many small subsets
that are selected in the set cover, this metric could miss significant signatures present in the TS data. The largest common subset metric was selected for the experiments presented in the next section. It should also be noted that the use of labeled training data is not designed to not reach the accuracy of supervised learning methods. Moreover, there are modifications that could be made to the methods presented to improve the accuracy, for example increasing the word length and/or using a larger train set. Both are constrained in our use-case to prohibit excessively large QUBOs from being constructed. The goal of this method, as described, is to allow for relatively high accuracy using small sets of training data.
We provide an illustrative example of our QUBO-based reconstruction and classification in Figure 3 using the BeetleFly data set. The task is to reconstruct the data in Figure 3 (a) using (b) and (c). For this example, an alphabet of size 5 was used for encoding, color-coded in the figure. The results of the set cover problem, formulated using the methods explained in previous sections, are three sets, shown as , and in Figure 3. Meaning, each box (representing a fifth of the TS data per box) that appears in one of the subsets forming the solution is designated as such. Specifically, ‘A’, ‘E’, ‘E’, ‘B’, and ‘C’ Therefore, the union , where ‘ACEEB’, the test TS data to reconstruct. For classifying the reconstructed sample, we refer to the classes of the training data used for the reconstruction, and note that the training samples in Figure 3 (b) and (c) belong to two different classes. Using the similarity metrics defined above, it is easy to determine that and both originate from the time series (b), whereas only (which contains only a single element) is obtained from (c). Therefore, (a) is assigned the same label as (b). This example is representative of the majority of cases encountered during classification, with components of the reconstructed TS varying across multiple training samples, and often also across multiple classes.
5 Experiments and Results
The experiments performed in this work used open-source labeled TS data available publicly [ts_class, ts_class_web]. We restricted our analysis to univariate TS data with two classes and small training set size to make our work amenable to NISQ devices in the near future. However, this method of semi-supervised classification can be used with any number of classes, at the cost of QUBO size. Since both the number of TS in the training data and the word length used to encode the TS contribute to the number of variables in the QUBO, we select data sets that have small numbers of TS in the training set. The exact sizes of the QUBOs for each data set are shown in Figure 4. The smallest QUBOs, from the Chinatown data set, were between 200-300 variables, and the largest were over 700 variables from the GunPoint data set.
Due to their size, the QUBOs were optimized using simulated thermal annealing with 20,000 samples and 1000 sweeps [DWave_soft], which was sufficient to ensure that low-energy local minima were sampled. The specific parameters used for the TS encoding are shown in Table 1. In general, the longer the TS are and the fewer TS are in the training set, the finer the discretization method required to accurately classify the test data. In all data sets we were able to reconstruct each test TS with elements from the training set, as explained in Section 3.2. To measure the performance of our classification method, we compared the accuracy of our labeling to semi-supervised and unsupervised classical classification methods, explained in more detail below. The results of these experiments for the various data sets are summarized in Table 2.
|Data set||Data Type||
As expected, the semi-supervised QUBO-based method outperforms classical unsupervised methods. To benchmark the results of our method, we use K-means clustering with pairwise DTW metrics calculated on the original TS (before encoding), with the labels being assigned based on belonging to one of two clusters. We also employ a classical semi-supervised method that is similar in spirit to the QUBO-based one, where the test TS labels are assigned by the DTW metric directly, calculated pairwise between each training and test TS (without encoding). We note however, that the QUBO-based method operates on a reduced dimensionality in contrast to the classical methods which use the original TS, where full information is preserved. Even under this consideration the accuracy of QUBO-based method is comparable with the semi-supervised DTW methods, and could be improved still by enriching the set , i.e. by augmenting the training set or increasing the discretization granularity.
The worst performance of the QUBO-based algorithm is observed on the TwoLeadECG data set. This could be explained by the nature of our method, as well as the sensitivity of the ECG data. By using the set cover problem, we allow for permutations of subsets of TS data in the reconstruction of the test TS. It is likely that this permutation of TS segments, and similar representation in Fourier space of the signals from the two leads in the ECG measurements, makes our method not suitable for this kind of data. To confirm this is the case, and improve the classification accuracy, we applied SAX [senin2018grammarviz] encoding, based on sliding window time series magnitude, rather than the Fourier transform. Using this method, and encoding the TS as a word of length 5 constructed from a 5 letter alphabet, the accuracy is improved to 0.62 and 0.85 for the two respective classes (0.74 weighted average). In contrast to the Fourier encoding, the better results with ECG200 data are due to the significant differences between the classes of normal and ischemia ECG readings.
The highest accuracy is obtained using the BeetleFly and Chinatown data sets. In the first case, many permutations of the training set to construct the test set are permissible, which our method takes advantage of. The accuracy of our method is additionally improved by the relative size of the training set, further augmenting the combinatorial space of permutations. This robustness can also be explained by the dimensionality reduction technique for this data set: the 2D BeetleFly images (with different orientations) were mapped to 1D series of distances to the image centre, which again is beneficial for permutation-based methods. The Chinatown data set, for comparison, contained significantly shorter TS than BeetleFly. Encoding the Chinatown TS data with the same word length as BeetleFly resulted in higher granularity representations, and ultimately higher accuracy. This provides additional evidence that the accuracy of our method can be improved by increasing the granularity of the encoding.
We present a QUBO-based method for TS reconstruction and semi-supervised classification that reaches accuracy scores comparable with classical DTW pairwise approaches, and in most cases outperforms unsupervised clustering. Among the advantages of our method is the utilization of significantly less data with respect to conventional classical methods, as well as a one-versus-all comparison that allows the selection of segments of data from multiple sources to reconstruct a single TS. This provides an additional robustness in our method in permutations of TS segments during the reconstruction. We showed how to reformulate the task of TS reconstruction as the set cover problem with a minimal number of subsets. In order to formulate this problem as a QUBO we apply TS dimensionality reduction by encoding each time series as a separate string. This encoding procedure and selection of comparison metrics (as discussed in Section 4) define the hyperparameter space of the problem. The QUBO-based classification method performed the best on image and traffic data, which is consistent with our method’s inherit ability to utilize permutations of features/data within the TS to perform reconstruction.
Time series reconstruction and classification has a wide variety of useful applications, such as: management of energy systems, factory process control, sensor systems, and many more. The methods introduced in this paper show how to reformulate the tasks of reconstruction and classification of such data using quantum computing. The fact that our work uses small training sets of labeled data means that the QUBOs produced could be solved by next-generation NISQ devices. Using quantum technologies, this method could analyze significantly more complex TS data, even in a live setting. The results of the optimization process (the selected subsets used for the reconstruction) would be informative as feedback for live process optimization as well. Future work in this are will be focused on generalising the method to multivariate TS cases, finding application-ready data sets, and execution of the presented methods on quantum processors. Specifically, with the advancement of hybrid quantum-classical algorithms, we will focus on converting the methods presented in this paper to be suitable for commercial applications.