Introduction
The field of quantum computing has seen large leaps in building usable quantum hardware during the past decade. As one of the first vendors, DWave provided access to a quantum device that can solve specific types of optimization problems johnson2011quantum. Motivated by this, quantum computing has not only received much attention in the research community, but was also started to be perceived as a valuable technology in industry. Volkswagen published a pioneering result on using the DWave quantum annealer to optimize traffic flow in 2017 neukart2017traffic, which prompted a number of works by other automotive companies mehta2019quantum; ohzeki2019control; yarkoni2021multi. Since then, quantum annealing has been applied in a number of industryrelated problems like chemistry streif2019solving; xia2017electronic, aviation stollenwerk2019quantum, logistics feld2019hybrid and finance grant2021benchmarking. Aside from quantum annealing, gatebased quantum devices have gained increased popularity, not least after the first demonstration of a quantum device outperforming its classical counterparts arute2019quantum. A number of industrymotivated works have since been published in the three main application areas that are currently of interest for gatebased quantum computing: optimization streif2021beating; streif2020training; amaro2021case; dalyac2021qualifying; luckow2021quantum, quantum chemistry and simulation arute2020hartree; malone2021towards, and machine learning rudolph2020generation; skolik2021layerwise; skolik2021quantum; peters2021machine; alcazar2020classical
. Research in the industrial context has been largely motivated by noisy intermediatescale quantum (NISQ) devices – early quantum devices with a small number of qubits and no error correction. In this regime, variational quantum algorithms (VQAs) have been identified as the most promising candidate for nearterm advantage due to their robustness to noise
cerezo2021variational. In a VQA, a parametrized quantum circuit (PQC) is optimized by a classical outer loop to solve a specific task like finding the ground state of a given Hamiltonian or classifying data based on given input features. As qubit numbers are expected to stay relatively low within the next years, hybrid alternatives to models realized purely by PQCs have been explored
zhang2021variational; mari2020transfer; zhao2019qdnn; dou2021unsupervised; sebastianelli2021circuit; pramanik2021quantum. In these works, a quantum model is combined with a classical model and optimized endtoend to solve a specific task. In the context of machine learning, this means that a PQC and neural network (NN) are trained together as one model, where the NN can be placed either before or after the PQC in the chain of execution. When the NN comes first, it can act as a dimensionality reduction technique for the quantum model, which can then be implemented with relatively few qubits.In this work, we use a hybrid quantumclassical model to perform image classification on a subset of the Stanford Cars data set
KrauseStarkDengFeiFei_3DRR2013. Image classification is an ubiquitous problem in the automotive industry, and can be used for tasks like sorting out parts with defects. Supervised learning algorithms for classification have also been extensively studied in quantum literature havlivcek2019supervised; schuld2019quantum; schuld2020circuit; rebentrost2014quantum, and it has been proven that there exist specific learning tasks based on the discrete logarithm problem where a separation between quantum and classical learners exists for classification liu2021rigorous. While the separation in liu2021rigorous is based on Shor’s algorithm and therefore not expected to transfer to realistic learning tasks as the car classification mentioned above, it motivates further experimental study of quantumenhanced models for classification on realworld data sets.In combining PQCs and classical NNs into hybrid quantumclassical models, we encounter a challenge in searching hyperparameter configurations that produce performance gains in terms of model accuracy and training. Hyperparameters can be considered values that are set for the model and do not change during the training regime, and may include variables such as learning rate, decay rates, choice of optimizer for the model, number of qubits or layer sizes. Often in practice, these parameters are selected by experts based upon some a priori knowledge and trialanderror. This limits the search space, but in turn can lead to producing a suboptimal model configuration.
Hyperparameter optimization is the process of automating the search for the best set of hyperparameters, reducing the need for expert knowledge in hyperparameter configurations for models, with an increase in computation required to evaluate configurations of models in search of an optimum. In the 1990s, researchers reported performance gains leveraging a wrapper method, which tuned parameters for specific models and data sets using bestfirst search and cross validation KOHAVI1995304. In more recent years, researchers have proposed search algorithms using bandits Li2017, which leverage early stopping methods. Successive Halving algorithms such as the one introduced in pmlrv28karnin13 and the parallelized version introduced in Li2018 allocate more resources to more promising configurations. Sequential modelbased optimization leverages Bayesian optimization with an aggressive dual racing mechanism, and also has shown performance improvements for hyperparameter optimization Hutter2011; Lindauer_Hutter_2018
. Evolutionary and populationbased heuristics for blackbox optimization have also achieved stateoftheart results when applied to hyperparameter optimization in numerous competitions for blackbox optimization
vermetten2020sequential; back1996; awad2020squirrel. In recent years, a whole field has formed around automating the process of finding optimal hyperparameters for machine learning models, with some prime examples being neural architecture search elsken2019neural and automated machine learning (AutoML) hutter2019automated. Automating the search of hyperparameters in a quantum machine learning (QML) context has also started to attract attention, and the authors of berganza2022towards have explored the first version of AutoQML.Our contribution in this paper is not only to examine the performance gains of hybrid quantumclassical models vs. purely classical, but also to investigate whether quantumenhanced or quantuminspired methods may offer an advantage in automating the search over the configuration space of the models. We show a reduction in computational complexity in regard to expected run times and evaluations for various configurations of models, the high cost of which motivate this investigation. We investigate using the tensor train decomposition for searching the hyperparameter space of the HQNN framed as a global optimization problem as in globalOptimizationTensorTrains. This method has been successful in optimizing models of social networks in sergey2019tensor, and as a method of compression for deep neural networks WANG2021320.
Results
.1 Hyperparameter Optimization
The problem of hyperparameter optimization (HPO) is described schematically in Fig. 1(a). Given a certain data set and a machine learning (ML) model, the learning model demonstrates an accuracy which depends on the hyperparameters . To achieve the best possible model accuracy, one has to optimize the hyperparameters. To perform the HPO, an unknown blackbox function has to be explored. The exploration is an iterative process, where at each iteration the HPO algorithm provides a set of hyperparameters and receives the corresponding model accuracy . As a result of this iterative process, the HPO algorithm outputs the best achieved performance with the corresponding hyperparameters .
The HPO could be organized in different ways. One of the standard methods for HPO is a tabular method of grid search (GS), also known as a parameter sweep (Fig. 1(b)). To illustrate how a grid search works, we have chosen two hyperparameters: the learning rate () and the multiplicative factor of learning rate (). They are plotted along the axis and the axis, respectively. The color on the contour shows the accuracy of the model with two given hyperparameters changing from light pink (the lowest accuracy) to dark green (the highest accuracy). In the GS method, the hyperparameter values are discretized, which results in a grid of values shown as big dots. The GS algorithm goes through all the values from this grid with the goal of finding the maximum accuracy. As one can see in this figure, there are only three points at which this method can find a high accuracy with 25 iterations (shown as 25 points in Fig. 1(b)). This example shows that there could be a better tabular HPO in terms of the best achievable accuracy and the number of iterations used.
.2 Tensor train approach to hyperparameter optimization
Here, we propose a quantuminspired approach to hyperparameter optimization based on the tensor train (TT) programming. The TT approach was initially introduced in the context of quantum manybody system analysis, e.g., for finding a ground state with minimal energy of multiparticle Hamiltonians via Density Matrix Renormalization Groups (DMRG) DMRG. In this approach, the ground state is represented in the TT format, often referred to as the Matrix Product State in physics MPS. We employ the TT representation (shown in Fig. 1(c)) in another way here, and use it for the hyperparameter optimization. As one can see in Fig. 1(c), the TT is represented as a multiplication of tensors, where an individual tensor is shown as a circle with the number of “legs” that corresponds to the rank of the tensor. and circles are the matrices of dimension, and is a rank 3 tensor of dimensions . The two arrows in the Fig. 1(c) illustrate sweeps right and left along with the TT. This refers to the algorithm described below. Leveraging the locality of the problem, i.e., a small correlation between hyperparameters, we perform the blackbox optimization based on the crossapproximation technique applied for tensors tt_cross; tt_min.
Similar to the previously discussed GS method, we discretize the hyperparameter space with TT optimization (TTO) and then consider a tensor composed of scores that can be estimated by running an ML model with a corresponding set of hyperparameters. However, compared to GS, the TT method is dynamic, which means that the next set of evaluating points in the hyperparameter space is chosen based on the knowledge accumulated during all previous evaluations. With TTO we will not estimate all the scores
available to the model. Instead of this, we will approximate via TT, referring to a limited number of tensor elements using the crossapproximation method tt_cross. During the process, new sets of hyperparameters for which the model needs to be evaluated are determined using the MaxVol routine maxvol. The MaxVol routine is an algorithm that finds an submatrix of maximum volume, i.e., a square matrix with a maximum determinant module in an matrix.Hyperparameters are changed in an iterative process, in which one is likely to find a better accuracy after each iteration, and thus find a good set of hyperparameters. Notably, the TTO algorithm requires an estimate of elements and of calculations, where is the number of hyperparameters, is a number of discretization points, and is a fixed rank. If one compares it with the GS algorithm, which requires estimation of elements, one is expected to observe practical advantages, especially with a large number of hyperparameters.
The TTO algorithm for the HPO is presented as the Algorithm 1 pseudocode that also corresponds to Fig. 1(d). The TTO algorithm can be described with steps:

Suppose each of hyperparameters is defined on some interval , where . One first discretizes each of hyperparameters by defining points

Then, we need to choose the rank . This choice is a tradeoff between computational time and accuracy, which respectively require a small and a large rank.

combinations of
(1) are chosen.

In the next three steps, we implement an iterative process called the “sweep right”. The first step of this iterative process is related to the first TT core evaluation:

The accuracy of elements is estimated with all values of the first hyperparameter and for the combinations of :
(2) 
In this matrix of size we search for a submatrix with maximum determinant module:
(3) The corresponding values of the first hyperparameter are fixed .


The next step of this iterative process is related to the second TT core evaluation:

We fix values of the previous step as well as combinations of the third step. We, then, estimate the accuracy of the elements with all values of the second hyperparameter :
(4) 
Again, in this matrix of size we search for a submatrix with the maximum determinant module:
(5) combinations of the first and the second hyperparameters are fixed.


The TT core evaluation:

We fix combinations of the TT core as well as combinations of the third step. We, then, estimate the accuracy of the elements with all values of the :
(6) 
Again, in this matrix of size we search for a submatrix with the maximum determinant module:
(7) combinations of hyperparameters are fixed.
The end of one “sweep right” is reached.


Similar to step 3, we have combinations of hyperparameters, but they are not random anymore. We next perform a similar procedure in the reverse direction (from the last hyperparameter to the first). The process is called the “sweep left”.
One first changes the index order:
(8) And then, continues from the fourth step of the TTO algorithm.

A combination of the “sweep right” and the “sweep left” is a full sweep. We do full sweeps in this algorithm.

During all the iterations, we record it if we estimate a new maximum score.
.3 Benchmarking HPO Methods
In order to ascertain the solution quality in our proposed method for hyperparameter optimization, we tested over three blackbox objective functions. These functions included the Schwefel, FletcherPowell, and Vincent functions from the optproblems Python library OptProblems. We ran 100 randomly initialized trails and recorded average fitness and maximum number of function evaluations in response to the change in the problem size for each objective function. We compared grid search (GS) and tensor train (TT)  both tabular methods for hyperparameter optimization. For tensor train and grid search, we partitioned the hyperparameter ranges with 4 discrete points per hyperparameter. For tensor train we set the rank parameter .
Schwefel  

HPO Method  Average Fitness  ER  
TT  541.76  3  32 
GS  541.76  3  64 
TT  1083.53  6  80 
GS  1083.53  6  4092 
TT  1805.89  10  144 
GS  1805.89  10  10000 
FletcherPowell  
HPO Method  Average Fitness  ER  
TT  5136.64  3  32 
GS  4113.78  3  64 
TT  23954.5  6  80 
GS  14295.2  6  4092 
TT  78101.4  10  144 
GS  36890.11  10  10000 
Vincent  
HPO Method  Average Fitness  ER  
TT  0.232  3  32 
GS  0.243  3  64 
TT  0.242  6  80 
GS  0.243  6  4092 
TT  0.241  10  144 
GS  0.243  10  10000 
.4 Car Classification with Hybrid Quantum Neural Networks
Computer vision and classification systems are ubiquitous within the mobility and automotive industries. In this article, we investigate the car classification problem using the Car data set Cars provided by Stanford CS Department. Examples of cars in the data set are shown in Fig. 3. The Stanford Cars data set contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images. The classes are typically at the combination of Make, Model, Year, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. Since the images in this data set have different sizes, we resized all images to 400 by 400 pixels. In addition, we apply random rotations by maximum 15°, random horizontal flips, and normalization to the training data. For testing data, only normalization has been applied.
We use transfer learning to solve the car classification problem. Transfer learning is a powerful method for training neural networks in which experience in solving one problem helps in solving another problem
Transfer_Learning. In our case, the ResNet (Residual Neural Network) ResNetis pretrained on the ImageNet data set
ImageNet, and is used as a base model. One can fix the weights of the base model, but if the base model is not flexible enough, one can “unfreeze” certain layers and make it trainable. Training deep networks is challenging due to the vanishing gradient problem, but ResNet solves this problem with socalled residual blocks: inputs are passed to the next layer in the residual block. In this way, deeper layers can see information about the input data. ResNet has established itself as a robust network architecture for solving image classification problems. We dowloaded ResNet34 via PyTorch
PyTorch, where the number after the model name, 34, indicates the number of layers in the network.As shown in the Fig. 3
(a), in the classical network after ResNet34 we add three fullyconnected layers. Each output neuron corresponds to a particular class of the classification problem, e.g., Volkswagen Golf Hatchback 1991 or Volkswagen Beetle Hatchback 2012. The output neuron with the largest value determines the output class. Since the output from the ResNet34 is composed of 512 features, the first fullyconnected layer consists of 512 input neurons and a bias neuron and
output features. The second fullyconnected layer connects input neurons and a bias neuron with output features. The value of and can vary, thus changing the number of weights in the classical network. Since the network classifies classes in the general case, the third fullyconnected layer takes neurons and a bias neuron as input and feeds neurons as output.In the hybrid analog as shown in Fig. 3(b) we replace the second fullyconnected layer with a quantum one. It is worth noting that the number of qubits used for the efficient operation of the model is initially unknown. In the quantum layer, the Hadamard transform is applied to each qubit, then the input data is encoded into the angles of rotation along the axis. The variational layer consists of the application of the CNOT gate and rotation along , , axes. The number of variational layers can vary. Accordingly, the number of weights in the hybrid network can also change. The measurement is made in the basis. For each qubit, the local expectation value of the
operator is measured. This produces a classical output vector, suitable for additional postprocessing. Since the optimal number of variational layers (
, depth of quantum circuit) and the optimal number of qubits are not known in advance, we choose these values as hyperparameters.We use the crossentropy as a loss function
(9) 
where
is the prediction probability,
is 0 or 1, determining respectively if the image belongs to the prediction class, andis the number of classes. We run our model for 10 epochs and apply weight decay and gradient clipping to prevent interference from large gradient or weight values. We use the Adam optimizer
Adam; kingma2014adamand reduce the learning rate after several epochs. There is no onesizefitsall rule of how to choose a learning rate. Moreover, in most cases, dynamic control of the learning rate of a neural network can significantly improve the efficiency of the backpropagation algorithm. For these reasons, we choose the initial learning rate, the period of learning rate decay, and the multiplicative factor of the learning rate decay as hyperparameters. In total, together with number of variational layers and number of qubits, we optimize five hyperparameters presented in Table
2 to improve the accuracy of solving the problem of car classification.Hyperparameter  Label  Range  Hybrid HPO values  Classical HPO values 
number of qubits, number of neurons  
depth of quantum circuit  
number of neurons  
initial learning rate  
step of learning rate  
multiplicative factor of learning rate decay  
.5 Simulation Results
We next perform a simulation of the hybrid quantum residual neural network described in the previous section. The simulation is compared to its classical analog, the residual neural network, in a test car classification task. Because of the limited number of qubits available and computational time constraints, we used a classification between two classes, Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012, to compare the classical and hybrid networks fairly. In total, we used 88 testing images and 89 training images. Both the hybrid quantum HQNN model, and the classical NN model, were used together with the GS and TTO methods for hyperparameter optimization. All machine learning simulations were carried out in the QMware cloud, on which the classical part was implemented with the PyTorch framework, and the quantum part was implemented with the basiq SDK QMWare. The results of the simulations are shown in Fig. 4.
Fig. 4(a) shows the dependence of accuracy on the number of HPO iterations on the test data, where one iteration of HPO is one run of the model. Green color shows the dependence of accuracy on the number of iterations for the HQNN, blue color shows for the classical NN. As one can see from Fig. 4(a), TTO works more efficiently than GS and in fewer iterations finds hyperparameters that give an accuracy above 0.9. HQNN with TTO (marked with green crosses) finds a set of hyperparameters that yields 97.7% accuracy over 18 iterations. As for the GS (marked solid green line), it took 44 iterations to pass the threshold of 98% accuracy.
TTO finds in 6 iterations a set of hyperparameters for the classical NN, which gives an accuracy of 97.7%, which is the same as the accuracy given by the set of hyperparameters for the HQNN that found in 18 iterations. As for the GS, it is clear that the optimization for the HQNN works more efficiently than for the classical one. And the optimization of the HQNN requires fewer iterations to achieve higher accuracy compared to the optimization of the classical NN. A possible reason is that a quantum layer with a relatively large number of qubits and a greater depth works better than its classical counterpart.
The best values found during HPO are displayed in Table 2. The quantum circuit corresponding to the optimal set of hyperparameters has 52 variational parameters, leading to a total of 6749 weights in the HQNN. In the classical NN there are 9730 weights. Therefore, there are significantly fewer weights in a HQNN compared to a classical NN. Nevertheless, as can be seen from the Fig. 4(b), the HQNN, with the hyperparameters found using the GS, reaches the highest overall accuracy (98.9%). The Fig. 5 shows examples of car images that were classified correctly by the HQNN model. The HQNN with an optimized set of hyperparameters achieved an accuracy of 0.989.
Discussion
We introduced two new ML developments to image recognition. First, we presented a quantuminspired method of tensor train decomposition for choosing ML model hyperparameters. This decomposition enabled us to optimize hyperparameters similar to other tabular search methods, e.g., grid search, but required only hyperparameter choices instead of in the grid search method. We verified this method over various black box functions and found that the tensor train method achieved comparable results in average fitness, with a reduced expected run time for most of the test functions compared to grid search. This indicates that this method may be useful for high dimensional hyperparameter searches for expensive blackbox functions. Future work could investigate using this method in combination with local search heuristic, where the tensor train optimizer performs a sweep over a larger search space within a budget, and seeds another optimization routine for a local search around this region. This method could also be applied to the problem for successive halving algorithm by decomposing the search space to find the optimal ratio of budget over configurations . Future work could investigate these applications in more detail.
Second, we presented a hybrid quantum neural network model for supervised learning. The hybrid model consisted of the combination of ResNet34 and a quantum circuit part, whose size and depth became the hyperparameters. The size and flexibility of the hybrid ML model allowed us to apply it to car image classification. The hybrid ML model with GS showed an accuracy of 0.989 after 75 iterations in our binary classification tests with images of Volkswagen Golf Hatchback 1991 and Volkswagen Beetle Hatchback 2012. This accuracy was better than of a comparable classical ML model with GS showed an accuracy of 0.920 after 75 iterations. In the same test, the hybrid ML model with TTO showed an accuracy of 0.977 after 18 iterations, whereas the comparable classical ML model with TTO, which showed the same accuracy of 0.977 after 6 iterations. Our developments provide new ways of using quantum and quantuminspired methods in practical industry problems. In future research, exploring the sample complexity of the hybrid quantum model is of importance, in addition to generalization bounds of the quantum models similar to research in Ref. caro2021generalization. Future work could also entail investigating stateoftheart improvements in hyperparameter optimization for classical and quantumhybrid neural networks and other machine learning models by leveraging quantuminspired or quantumenhanced methods.