DeepAI
Log In Sign Up

Hyperparameter optimization of hybrid quantum neural networks for car classification

05/10/2022
by   Asel Sagingalieva, et al.
2

Image recognition is one of the primary applications of machine learning algorithms. Nevertheless, machine learning models used in modern image recognition systems consist of millions of parameters that usually require significant computational time to be adjusted. Moreover, adjustment of model hyperparameters leads to additional overhead. Because of this, new developments in machine learning models and hyperparameter optimization techniques are required. This paper presents a quantum-inspired hyperparameter optimization technique and a hybrid quantum-classical machine learning model for supervised learning. We benchmark our hyperparameter optimization method over standard black-box objective functions and observe performance improvements in the form of reduced expected run times and fitness in response to the growth in the size of the search space. We test our approaches in a car image classification task, and demonstrate a full-scale implementation of the hybrid quantum neural network model with the tensor train hyperparameter optimization. Our tests show a qualitative and quantitative advantage over the corresponding standard classical tabular grid search approach used with a deep neural network ResNet34. A classification accuracy of 0.97 was obtained by the hybrid model after 18 iterations, whereas the classical model achieved an accuracy of 0.92 after 75 iterations.

READ FULL TEXT VIEW PDF

page 3

page 7

06/20/2022

Hyperparameter Importance of Quantum Neural Networks Across Small Datasets

As restricted quantum computers are slowly becoming a reality, the searc...
04/26/2022

Supervised machine learning classification for short straddles on the S P500

In this working paper we present our current progress in the training of...
11/22/2020

A Population-based Hybrid Approach to Hyperparameter Optimization for Neural Networks

In recent years, large amounts of data have been generated, and computer...
03/03/2022

Random Quantum Neural Networks (RQNN) for Noisy Image Recognition

Classical Random Neural Networks (RNNs) have demonstrated effective appl...
03/03/2022

Improvements to Gradient Descent Methods for Quantum Tensor Network Machine Learning

Tensor networks have demonstrated significant value for machine learning...

Introduction

The field of quantum computing has seen large leaps in building usable quantum hardware during the past decade. As one of the first vendors, D-Wave provided access to a quantum device that can solve specific types of optimization problems johnson2011quantum. Motivated by this, quantum computing has not only received much attention in the research community, but was also started to be perceived as a valuable technology in industry. Volkswagen published a pioneering result on using the D-Wave quantum annealer to optimize traffic flow in 2017 neukart2017traffic, which prompted a number of works by other automotive companies mehta2019quantum; ohzeki2019control; yarkoni2021multi. Since then, quantum annealing has been applied in a number of industry-related problems like chemistry streif2019solving; xia2017electronic, aviation stollenwerk2019quantum, logistics feld2019hybrid and finance grant2021benchmarking. Aside from quantum annealing, gate-based quantum devices have gained increased popularity, not least after the first demonstration of a quantum device outperforming its classical counterparts arute2019quantum. A number of industry-motivated works have since been published in the three main application areas that are currently of interest for gate-based quantum computing: optimization streif2021beating; streif2020training; amaro2021case; dalyac2021qualifying; luckow2021quantum, quantum chemistry and simulation arute2020hartree; malone2021towards, and machine learning rudolph2020generation; skolik2021layerwise; skolik2021quantum; peters2021machine; alcazar2020classical

. Research in the industrial context has been largely motivated by noisy intermediate-scale quantum (NISQ) devices – early quantum devices with a small number of qubits and no error correction. In this regime, variational quantum algorithms (VQAs) have been identified as the most promising candidate for near-term advantage due to their robustness to noise 

cerezo2021variational

. In a VQA, a parametrized quantum circuit (PQC) is optimized by a classical outer loop to solve a specific task like finding the ground state of a given Hamiltonian or classifying data based on given input features. As qubit numbers are expected to stay relatively low within the next years, hybrid alternatives to models realized purely by PQCs have been explored 

zhang2021variational; mari2020transfer; zhao2019qdnn; dou2021unsupervised; sebastianelli2021circuit; pramanik2021quantum. In these works, a quantum model is combined with a classical model and optimized end-to-end to solve a specific task. In the context of machine learning, this means that a PQC and neural network (NN) are trained together as one model, where the NN can be placed either before or after the PQC in the chain of execution. When the NN comes first, it can act as a dimensionality reduction technique for the quantum model, which can then be implemented with relatively few qubits.

In this work, we use a hybrid quantum-classical model to perform image classification on a subset of the Stanford Cars data set 

KrauseStarkDengFei-Fei_3DRR2013. Image classification is an ubiquitous problem in the automotive industry, and can be used for tasks like sorting out parts with defects. Supervised learning algorithms for classification have also been extensively studied in quantum literature havlivcek2019supervised; schuld2019quantum; schuld2020circuit; rebentrost2014quantum, and it has been proven that there exist specific learning tasks based on the discrete logarithm problem where a separation between quantum and classical learners exists for classification liu2021rigorous. While the separation in liu2021rigorous is based on Shor’s algorithm and therefore not expected to transfer to realistic learning tasks as the car classification mentioned above, it motivates further experimental study of quantum-enhanced models for classification on real-world data sets.

In combining PQCs and classical NNs into hybrid quantum-classical models, we encounter a challenge in searching hyperparameter configurations that produce performance gains in terms of model accuracy and training. Hyperparameters can be considered values that are set for the model and do not change during the training regime, and may include variables such as learning rate, decay rates, choice of optimizer for the model, number of qubits or layer sizes. Often in practice, these parameters are selected by experts based upon some a priori knowledge and trial-and-error. This limits the search space, but in turn can lead to producing a suboptimal model configuration.

Hyperparameter optimization is the process of automating the search for the best set of hyperparameters, reducing the need for expert knowledge in hyperparameter configurations for models, with an increase in computation required to evaluate configurations of models in search of an optimum. In the 1990s, researchers reported performance gains leveraging a wrapper method, which tuned parameters for specific models and data sets using best-first search and cross validation KOHAVI1995304. In more recent years, researchers have proposed search algorithms using bandits Li2017, which leverage early stopping methods. Successive Halving algorithms such as the one introduced in pmlr-v28-karnin13 and the parallelized version introduced in Li2018 allocate more resources to more promising configurations. Sequential model-based optimization leverages Bayesian optimization with an aggressive dual racing mechanism, and also has shown performance improvements for hyperparameter optimization Hutter2011; Lindauer_Hutter_2018

. Evolutionary and population-based heuristics for black-box optimization have also achieved state-of-the-art results when applied to hyperparameter optimization in numerous competitions for black-box optimization

vermetten2020sequential; back1996; awad2020squirrel. In recent years, a whole field has formed around automating the process of finding optimal hyperparameters for machine learning models, with some prime examples being neural architecture search elsken2019neural and automated machine learning (AutoML) hutter2019automated. Automating the search of hyperparameters in a quantum machine learning (QML) context has also started to attract attention, and the authors of berganza2022towards have explored the first version of AutoQML.

Our contribution in this paper is not only to examine the performance gains of hybrid quantum-classical models vs. purely classical, but also to investigate whether quantum-enhanced or quantum-inspired methods may offer an advantage in automating the search over the configuration space of the models. We show a reduction in computational complexity in regard to expected run times and evaluations for various configurations of models, the high cost of which motivate this investigation. We investigate using the tensor train decomposition for searching the hyperparameter space of the HQNN framed as a global optimization problem as in globalOptimizationTensorTrains. This method has been successful in optimizing models of social networks in sergey2019tensor, and as a method of compression for deep neural networks WANG2021320.

Results

.1 Hyperparameter Optimization

Figure 1: The hyperparameter optimization problem description (a). The tabular methods for hyperparameter optimization: the grid search algorithm (b) and the tensor train algorithm (c-d).

The problem of hyperparameter optimization (HPO) is described schematically in Fig. 1(a). Given a certain data set and a machine learning (ML) model, the learning model demonstrates an accuracy which depends on the hyperparameters . To achieve the best possible model accuracy, one has to optimize the hyperparameters. To perform the HPO, an unknown black-box function has to be explored. The exploration is an iterative process, where at each iteration the HPO algorithm provides a set of hyperparameters and receives the corresponding model accuracy . As a result of this iterative process, the HPO algorithm outputs the best achieved performance with the corresponding hyperparameters .

The HPO could be organized in different ways. One of the standard methods for HPO is a tabular method of grid search (GS), also known as a parameter sweep (Fig. 1(b)). To illustrate how a grid search works, we have chosen two hyperparameters: the learning rate () and the multiplicative factor of learning rate (). They are plotted along the -axis and the -axis, respectively. The color on the contour shows the accuracy of the model with two given hyperparameters changing from light pink (the lowest accuracy) to dark green (the highest accuracy). In the GS method, the hyperparameter values are discretized, which results in a grid of values shown as big dots. The GS algorithm goes through all the values from this grid with the goal of finding the maximum accuracy. As one can see in this figure, there are only three points at which this method can find a high accuracy with 25 iterations (shown as 25 points in Fig. 1(b)). This example shows that there could be a better tabular HPO in terms of the best achievable accuracy and the number of iterations used.

.2 Tensor train approach to hyperparameter optimization

Here, we propose a quantum-inspired approach to hyperparameter optimization based on the tensor train (TT) programming. The TT approach was initially introduced in the context of quantum many-body system analysis, e.g., for finding a ground state with minimal energy of multi-particle Hamiltonians via Density Matrix Renormalization Groups (DMRG) DMRG. In this approach, the ground state is represented in the TT format, often referred to as the Matrix Product State in physics MPS. We employ the TT representation (shown in Fig. 1(c)) in another way here, and use it for the hyperparameter optimization. As one can see in Fig. 1(c), the TT is represented as a multiplication of tensors, where an individual tensor is shown as a circle with the number of “legs” that corresponds to the rank of the tensor. and circles are the matrices of dimension, and is a rank 3 tensor of dimensions . The two arrows in the Fig. 1(c) illustrate sweeps right and left along with the TT. This refers to the algorithm described below. Leveraging the locality of the problem, i.e., a small correlation between hyperparameters, we perform the black-box optimization based on the cross-approximation technique applied for tensors tt_cross; tt_min.

Similar to the previously discussed GS method, we discretize the hyperparameter space with TT optimization (TTO) and then consider a tensor composed of scores that can be estimated by running an ML model with a corresponding set of hyperparameters. However, compared to GS, the TT method is dynamic, which means that the next set of evaluating points in the hyperparameter space is chosen based on the knowledge accumulated during all previous evaluations. With TTO we will not estimate all the scores

available to the model. Instead of this, we will approximate via TT, referring to a limited number of tensor elements using the cross-approximation method tt_cross. During the process, new sets of hyperparameters for which the model needs to be evaluated are determined using the MaxVol routine maxvol. The MaxVol routine is an algorithm that finds an submatrix of maximum volume, i.e., a square matrix with a maximum determinant module in an matrix.

Hyperparameters are changed in an iterative process, in which one is likely to find a better accuracy after each iteration, and thus find a good set of hyperparameters. Notably, the TTO algorithm requires an estimate of elements and of calculations, where is the number of hyperparameters, is a number of discretization points, and is a fixed rank. If one compares it with the GS algorithm, which requires estimation of elements, one is expected to observe practical advantages, especially with a large number of hyperparameters.

1:Accuracy
2:
3:
4:
5:Discretize each of hyperparameters with points
6:Randomly choose combinations of
7:while  do
8:     while  do
9:         while  do
10:              if  then
11:                  Estimate of elements with all values of
12:                  if  then
13:                       
14:                  end if
15:                  MaxVol
16:                  Fix corresponding values of
17:              else
18:                  Estimate of elements for fixed with all values of
19:                  if  then
20:                       
21:                  end if
22:                  MaxVol
23:                  Fix corresponding values of
24:              end if
25:              
26:         end while
27:         Change index order
28:         Relabel
29:         
30:         
31:     end while
32:     
33:     
34:end while
Algorithm 1 Tensor Train Optimization

The TTO algorithm for the HPO is presented as the Algorithm 1 pseudocode that also corresponds to Fig. 1(d). The TTO algorithm can be described with steps:

  1. Suppose each of hyperparameters is defined on some interval , where . One first discretizes each of hyperparameters by defining points

  2. Then, we need to choose the rank . This choice is a trade-off between computational time and accuracy, which respectively require a small and a large rank.

  3. combinations of

    (1)

    are chosen.

  4. In the next three steps, we implement an iterative process called the “sweep right”. The first step of this iterative process is related to the first TT core evaluation:

    • The accuracy of elements is estimated with all values of the first hyperparameter and for the combinations of :

      (2)
    • In this matrix of size we search for a submatrix with maximum determinant module:

      (3)

      The corresponding values of the first hyperparameter are fixed .

  5. The next step of this iterative process is related to the second TT core evaluation:

    • We fix values of the previous step as well as combinations of the third step. We, then, estimate the accuracy of the elements with all values of the second hyperparameter :

      (4)
    • Again, in this matrix of size we search for a submatrix with the maximum determinant module:

      (5)

      combinations of the first and the second hyperparameters are fixed.

  6. The TT core evaluation:

    • We fix combinations of the TT core as well as combinations of the third step. We, then, estimate the accuracy of the elements with all values of the :

      (6)
    • Again, in this matrix of size we search for a submatrix with the maximum determinant module:

      (7)

      combinations of hyperparameters are fixed.

    The end of one “sweep right” is reached.

  7. Similar to step 3, we have combinations of hyperparameters, but they are not random anymore. We next perform a similar procedure in the reverse direction (from the last hyperparameter to the first). The process is called the “sweep left”.

    One first changes the index order:

    (8)

    And then, continues from the fourth step of the TTO algorithm.

  8. A combination of the “sweep right” and the “sweep left” is a full sweep. We do full sweeps in this algorithm.

  9. During all the iterations, we record it if we estimate a new maximum score.

.3 Benchmarking HPO Methods

Figure 2: Tensor Train (TT) and Grid Search (GS): Expected Runtime in maximum objective function evaluations vs. growth of problem dimension .

In order to ascertain the solution quality in our proposed method for hyperparameter optimization, we tested over three black-box objective functions. These functions included the Schwefel, Fletcher-Powell, and Vincent functions from the optproblems Python library OptProblems. We ran 100 randomly initialized trails and recorded average fitness and maximum number of function evaluations in response to the change in the problem size for each objective function. We compared grid search (GS) and tensor train (TT) - both tabular methods for hyperparameter optimization. For tensor train and grid search, we partitioned the hyperparameter ranges with 4 discrete points per hyperparameter. For tensor train we set the rank parameter .

Schwefel
HPO Method Average Fitness ER
TT -541.76 3 32
GS -541.76 3 64
TT -1083.53 6 80
GS -1083.53 6 4092
TT -1805.89 10 144
GS -1805.89 10 10000
Fletcher-Powell
HPO Method Average Fitness ER
TT 5136.64 3 32
GS 4113.78 3 64
TT 23954.5 6 80
GS 14295.2 6 4092
TT 78101.4 10 144
GS 36890.11 10 10000
Vincent
HPO Method Average Fitness ER
TT -0.232 3 32
GS -0.243 3 64
TT -0.242 6 80
GS -0.243 6 4092
TT -0.241 10 144
GS -0.243 10 10000
Table 1: Table of results comparing HPO methods for Schwefel, Fletcher-Powell, and Vincent objective functions. Average fitness values and Expected Runtimes (ER) in maximum function evaluations were calculated over 100 runs for varying sizes of problem dimension (lower is better). Methods obtaining the best average fitness are highlighted in bold, with ties broken by lower ER.
Figure 3: Classical (a) and Hybrid (b) quantum neural network architectures.

.4 Car Classification with Hybrid Quantum Neural Networks

Computer vision and classification systems are ubiquitous within the mobility and automotive industries. In this article, we investigate the car classification problem using the Car data set Cars provided by Stanford CS Department. Examples of cars in the data set are shown in Fig. 3. The Stanford Cars data set contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images. The classes are typically at the combination of Make, Model, Year, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. Since the images in this data set have different sizes, we resized all images to 400 by 400 pixels. In addition, we apply random rotations by maximum 15°, random horizontal flips, and normalization to the training data. For testing data, only normalization has been applied.

We use transfer learning to solve the car classification problem. Transfer learning is a powerful method for training neural networks in which experience in solving one problem helps in solving another problem 

Transfer_Learning. In our case, the ResNet (Residual Neural Network) ResNet

is pretrained on the ImageNet data set 

ImageNet

, and is used as a base model. One can fix the weights of the base model, but if the base model is not flexible enough, one can “unfreeze” certain layers and make it trainable. Training deep networks is challenging due to the vanishing gradient problem, but ResNet solves this problem with so-called residual blocks: inputs are passed to the next layer in the residual block. In this way, deeper layers can see information about the input data. ResNet has established itself as a robust network architecture for solving image classification problems. We dowloaded ResNet34 via PyTorch 

PyTorch, where the number after the model name, 34, indicates the number of layers in the network.

As shown in the Fig. 3

(a), in the classical network after ResNet34 we add three fully-connected layers. Each output neuron corresponds to a particular class of the classification problem, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. The output neuron with the largest value determines the output class. Since the output from the ResNet34 is composed of 512 features, the first fully-connected layer consists of 512 input neurons and a bias neuron and

output features. The second fully-connected layer connects input neurons and a bias neuron with output features. The value of and can vary, thus changing the number of weights in the classical network. Since the network classifies classes in the general case, the third fully-connected layer takes neurons and a bias neuron as input and feeds neurons as output.

In the hybrid analog as shown in Fig. 3(b) we replace the second fully-connected layer with a quantum one. It is worth noting that the number of qubits used for the efficient operation of the model is initially unknown. In the quantum layer, the Hadamard transform is applied to each qubit, then the input data is encoded into the angles of rotation along the -axis. The variational layer consists of the application of the CNOT gate and rotation along , , -axes. The number of variational layers can vary. Accordingly, the number of weights in the hybrid network can also change. The measurement is made in the -basis. For each qubit, the local expectation value of the

operator is measured. This produces a classical output vector, suitable for additional post-processing. Since the optimal number of variational layers (

, depth of quantum circuit) and the optimal number of qubits are not known in advance, we choose these values as hyperparameters.

We use the cross-entropy as a loss function

(9)

where

is the prediction probability,

is 0 or 1, determining respectively if the image belongs to the prediction class, and

is the number of classes. We run our model for 10 epochs and apply weight decay and gradient clipping to prevent interference from large gradient or weight values. We use the Adam optimizer 

Adam; kingma2014adam

and reduce the learning rate after several epochs. There is no one-size-fits-all rule of how to choose a learning rate. Moreover, in most cases, dynamic control of the learning rate of a neural network can significantly improve the efficiency of the backpropagation algorithm. For these reasons, we choose the initial learning rate, the period of learning rate decay, and the multiplicative factor of the learning rate decay as hyperparameters. In total, together with number of variational layers and number of qubits, we optimize five hyperparameters presented in Table 

2 to improve the accuracy of solving the problem of car classification.

Hyperparameter Label Range Hybrid HPO values Classical HPO values
number of qubits, number of neurons
depth of quantum circuit
number of neurons
initial learning rate
step of learning rate
multiplicative factor of learning rate decay
Table 2: The table shows which hyperparameters are being optimized, their labels, limits of change, and the best values found during HPO.
Figure 4: (a) Dependence of accuracy on the number of iterations HPO. TTO for the hybrid model found a set of hyperparameters that gives an accuracy of 0.852 after 6 iterations, 0.977 after 18 iterations, for the classical model found 0.977 after 6 iterations. Grid search for the hybrid model found a set of hyperparameters that gives an accuracy of 0.989 after 75 iterations, for the classical model found 0.920 after 75 iterations. (b) Dependence of accuracy on the number of epochs with the found optimal set of hyperparameters.

.5 Simulation Results

We next perform a simulation of the hybrid quantum residual neural network described in the previous section. The simulation is compared to its classical analog, the residual neural network, in a test car classification task. Because of the limited number of qubits available and computational time constraints, we used a classification between two classes, Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012, to compare the classical and hybrid networks fairly. In total, we used 88 testing images and 89 training images. Both the hybrid quantum HQNN model, and the classical NN model, were used together with the GS and TTO methods for hyperparameter optimization. All machine learning simulations were carried out in the QMware cloud, on which the classical part was implemented with the PyTorch framework, and the quantum part was implemented with the basiq SDK QMWare. The results of the simulations are shown in Fig. 4.

Fig. 4(a) shows the dependence of accuracy on the number of HPO iterations on the test data, where one iteration of HPO is one run of the model. Green color shows the dependence of accuracy on the number of iterations for the HQNN, blue color shows for the classical NN. As one can see from Fig. 4(a), TTO works more efficiently than GS and in fewer iterations finds hyperparameters that give an accuracy above 0.9. HQNN with TTO (marked with green crosses) finds a set of hyperparameters that yields 97.7% accuracy over 18 iterations. As for the GS (marked solid green line), it took 44 iterations to pass the threshold of 98% accuracy.

TTO finds in 6 iterations a set of hyperparameters for the classical NN, which gives an accuracy of 97.7%, which is the same as the accuracy given by the set of hyperparameters for the HQNN that found in 18 iterations. As for the GS, it is clear that the optimization for the HQNN works more efficiently than for the classical one. And the optimization of the HQNN requires fewer iterations to achieve higher accuracy compared to the optimization of the classical NN. A possible reason is that a quantum layer with a relatively large number of qubits and a greater depth works better than its classical counterpart.

Figure 5: Examples of test car images that were correctly classified by the hybrid quantum residual neural network.

The best values found during HPO are displayed in Table 2. The quantum circuit corresponding to the optimal set of hyperparameters has 52 variational parameters, leading to a total of 6749 weights in the HQNN. In the classical NN there are 9730 weights. Therefore, there are significantly fewer weights in a HQNN compared to a classical NN. Nevertheless, as can be seen from the Fig. 4(b), the HQNN, with the hyperparameters found using the GS, reaches the highest overall accuracy (98.9%). The Fig. 5 shows examples of car images that were classified correctly by the HQNN model. The HQNN with an optimized set of hyperparameters achieved an accuracy of 0.989.

Discussion

We introduced two new ML developments to image recognition. First, we presented a quantum-inspired method of tensor train decomposition for choosing ML model hyperparameters. This decomposition enabled us to optimize hyperparameters similar to other tabular search methods, e.g., grid search, but required only hyperparameter choices instead of in the grid search method. We verified this method over various black box functions and found that the tensor train method achieved comparable results in average fitness, with a reduced expected run time for most of the test functions compared to grid search. This indicates that this method may be useful for high dimensional hyperparameter searches for expensive black-box functions. Future work could investigate using this method in combination with local search heuristic, where the tensor train optimizer performs a sweep over a larger search space within a budget, and seeds another optimization routine for a local search around this region. This method could also be applied to the problem for successive halving algorithm by decomposing the search space to find the optimal ratio of budget over configurations . Future work could investigate these applications in more detail.

Second, we presented a hybrid quantum neural network model for supervised learning. The hybrid model consisted of the combination of ResNet34 and a quantum circuit part, whose size and depth became the hyperparameters. The size and flexibility of the hybrid ML model allowed us to apply it to car image classification. The hybrid ML model with GS showed an accuracy of 0.989 after 75 iterations in our binary classification tests with images of Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012. This accuracy was better than of a comparable classical ML model with GS showed an accuracy of 0.920 after 75 iterations. In the same test, the hybrid ML model with TTO showed an accuracy of 0.977 after 18 iterations, whereas the comparable classical ML model with TTO, which showed the same accuracy of 0.977 after 6 iterations. Our developments provide new ways of using quantum and quantum-inspired methods in practical industry problems. In future research, exploring the sample complexity of the hybrid quantum model is of importance, in addition to generalization bounds of the quantum models similar to research in Ref. caro2021generalization. Future work could also entail investigating state-of-the-art improvements in hyperparameter optimization for classical and quantum-hybrid neural networks and other machine learning models by leveraging quantum-inspired or quantum-enhanced methods.

References