The computational power of a quantum computer grows exponentially with its number of qubits. For this reason, quantum computers are expected to surpass the computational capabilities of classical computers and achieve disruptive impact on numerous industry sectors, such as global energy and materials, pharmaceuticals, telecommunication, travel and logistics, and finance. Finance, in particular, is estimated to be the first industry sector to benefit from quantum computing not only in the medium and long terms, but even in the short term due to the large number of financial use cases that lend themselves to quantum computing and their amenability to be solved effectively even in the presence of approximations[mckinsey_quantum]. This is especially important for taking advantage of today’s Noisy Intermediate-Scale Quantum (or NISQ) devices [preskill2018quantum], which are characterized by their low quantum bit (or qubit) counts, short coherence time, and high operation noise.
This review paper presents the state of the art of quantum algorithms for financial applications, focusing in particular on those use cases that can be solved via Machine Learning (ML). The applicability of ML to finance has become increasingly more significant as highly efficient ML algorithms have evolved over time to support different data types and scale to larger data sets. ML operations applicable to finance include regression for asset pricing, classification for portfolio optimization, clustering for portfolio risk analysis and stock selection, generative modeling for market regime identification, feature extraction for fraud detection, reinforcement learning for algorithmic trading, and Natural Language Processing (NLP) for risk assessment, financial forecasting and accounting and auditing. Deep learning is often used for image recognition and text classification, as well as in any use case characterized by large unstructured datasets.
Given the complexity of the algorithms involved, and the size of the data being analyzed, ML has been identified as one of the most important domains of applicability of Quantum Computing. This has become even more evident with the discovery of new quantum algorithms for linear algebra, which offer the potential for executing linear-algebra computations on a quantum computer more efficiently and accurately than their corresponding classical counterparts [harrow2009linear]. Under certain conditions, the quantum speedup can be even exponential [Wiebe-exponential-matrix-arithmetics], modulo some caveats [aaronson2015read]:
Efficiently loading classical data onto quantum computers and reading out classical outputs resulting from quantum computations is still the field of ongoing research. The majority of the quantum algorithms devised so far is based on the existence of a Quantum Random Access Memory (QRAM) for accessing the classical data [giovannetti2008quantum]. The realization of a QRAM has been theoretically proven, but concrete hardware implementations are still undergoing. Alternatively, classical data can be loaded into a quantum states via specialized circuits [Cortese-loading-classical-data].
It is not always possible to apply a quantum linear-algebra algorithm out of the box to solve a specific financial use case; several conditions must be met and customizations are often necessary to address unique use-case-dependent requirements [aaronson2015read]. Furthermore, multiple classical and quantum algorithmic components are usually involved in the end-to-end solution of a financial use case, with the potential for any such component to become the bottleneck and negate the overall quantum advantage. The task of computing the quantum speedup of the solution of a specific use case is, therefore, not always intuitive.
As of today, no end-to-end application of quantum ML with exponential speedup over its classical counterpart has been discovered, but several promising directions have been proposed. Meanwhile, a large body of research and engineering work has been successfully dedicated to the realization of quantum algorithms with significant polynomial speedups in their data-processing subroutines, if not in the data loading and output extraction.
A central task in supervised learning isregression
, or the problem of training a simple model to approximate real-valued functions. This is an extremely important routine in experimental sciences, and has recently started to be employed on massive datasets. Computationally, this reduces to minimizing a loss function that captures the quality of the fit on training data; the common choices are thenorm for worst-case error, and the or norms for average-case error. The smoothness of the norm makes it attractive to optimization algorithms, leading to the ubiquity of least squares regressions, where the minimization problem, given
-dimensional feature vectors, can be expressed as follows:
which is a convex quadratic minimization problem and reduces to solving an system of linear equations.
A quantum speedup for this task is first considered by Wiebe et al. [weibe2012data] based on quantum algorithms for solving systems of linear equations. Their algorithm requires access to the entries of the data matrix and weight vector in superposition, and outputs a quantum state that encodes the output in time , where neglects polylogarithmic factors in the complexity. While the run time of this algorithm is exponentially smaller than classical equivalents, there are two main caveats. First, the construction of a data structure allowing superposition access to the data points may in general require time, via the construction of a QRAM [giovannetti2008quantum]. Second, copies of the output state are required to obtain a full description of the output.
Some attempts have been made to address this issue, albeit with smaller speedups. Wang [wang2017regression] gives an algorithm for regression that obtains a classical solution in time . A second approach is to use algorithms that do not output a full description of , but rather for a new data point . Schuld et al. [schuld2016regression] obtain such an algorithm given a quantum state encoding of the training data and that takes time . A third approach is to consider special quantum data structures for accessing the training data. These efficient QRAM data structures are different from a general QRAM, in that the cost of inserting, updating or deleting a single entry is . Such data structures can be useful when there is a large initial corpus of data, but the regression task must be performed repeatedly, and each updates to the data set is small (or constant). The construction of the original dataset thus has cost , but future upkeep and regression tasks each cost . Thus, the cost of the initial construction is effectively amortized over the lifetime of the deployment. This setting can be used to obtain -regression algorithms with complexity [chakraborty2019regression, kerenidis2020gradient].
There also exist quantum-inspired classical algorithms [chia2018quantuminspired] based on the last approach that use data structures providing sampling access to the data and obtain algorithms with cost . and regression problems are not smooth and are therefore often solved via smooth relaxations that are problem-specific.
A heuristic approach to regression can be used by simply training parameterized quantum circuits (PQC) as function approximators. A common application of classical regression techniques is in learning time series via Recurrent Neural Networks (RNNs) (often using Long Short Term Memory (LSTM) units). These techniques are commonly used to make predictions about evolving processes from historical data. Quantum versions of RNNs have been proposed that use PQCs as the function model[bausch2020recurrent, chen2020quantum, takaki2021temporal] that empirically anticipate improvements in convergence or error rate.
With the above discussion on various quantum regression algorithms, we next look at several financial applications that take advantage of these techniques.
Ii-a Asset Pricing
Asset Pricing is the task of assigning prices to various categories of financial instruments, such as stocks, bonds, and derivatives. There are several economic models used to assign these prices based on a sequence of instantaneous (or spot) prices, including general equilibrium pricing and arbitrage-free pricing [cochrane2009asset]. The common approach to predicting these spot prices is to model them as functions of simple underlying stochastic processes, such as Brownian or Geometric Brownian motion. Historical financial data can then be used to determine the parameters of these stochastic models. More generally, however, predicting spot prices (as well as many other time-varying quantities of financial interest) can be modeled as a time series learning problem. Specifically, given a sequence of historic prices up to time , can accurate predictions be made for prices in the future? Since this reduces to predicting real values based on training data, it is best modeled as a supervised-regression problem. Stochastic pricing models with historically calibrated parameters can be viewed as ad hoc solutions based on domain knowledge.
The increasing success of deep RNNs for time-series prediction—especially those that leverage LSTM—has led researchers to consider using these general-purpose algorithms for asset pricing. Gu et al. [gu2020empirical] and Chen et al. [chen2021deep] investigate the use of LSTM-based deep-learning methods for asset pricing, obtaining promising results. Specifically, Gu et al. [gu2020empirical]
show that ML forecasts on the S&P 500 achieve an out-of-sample annualized Sharpe Ratio of 0.77 versus the 0.51 of a buy-and-hold investor. They also find that a value-weighted long-short decile spread strategy based on neural network forecasts of stock prices achieve an annualized Sharpe Ratio of 1.35, nearly double the state-of-the-art classical regression approaches. Chenet al. further show that refined models based on LSTM forecasts can achieve out-of-sample annualized Sharpe Ratios much larger than naive deep-learning forecasts as well as classical approaches, including the Fama-French five-factor model.
The biggest challenge in deploying deep-learning methods is that training complex neural networks can often be a much more computationally intensive process than the simple parameter calibration required by classical approaches. PQCs may offer advantages over classical variational regression models in terms of expressivity, training complexity and prediction performance. In 2020, the use of PQCs to formulate RNNs has been described [bausch2020recurrent], along with proposals of quantum LSTM models [chen2020quantum]. Both approaches show potential empirical improvements over classical neural networks for particular functions, although the applicability to asset pricing is yet to be investigated.
Ii-B Multi-Asset Trend Following Strategies
Regression models can be used to predict -day returns of a multi-asset class portfolio. Each financial asset class (e.g., equity, bond, cash or commodities) might have different internal dynamics. Nevertheless, a regression model might be able to encompass the global dynamic. An example of global dynamic is the following: if the equity market is bearish, an investor might prefer safer investments, say, bonds, and therefore cause a rise in bond prices. Indeed, there has been a negative correlation between equity and bond since the beginning of the century. However, this correlation was positive between 1970 and 2000 [fan2017equity]. Therefore, the relationship between asset classes evolves and should not be taken as general truth.
To predict returns daily, one can use historical prices from various time points (e.g., 1 month, 3 months, etc.) to introduce trend information in the input data. However, this causes an increase in the number of features in the data, and could result in over-fitting. Consequently, one would rather use a Lasso regression[tibshirani1996regression] than a vanilla regression. Indeed, by adding an penalty term to the cost function of the regression problem, the model will select a subset of relevant features. As with any regression method, one can treat it as a classification model. The classes here would be buy, hold or sell. The norm may not be the most attractive regularization term for quantum implementation. Nevertheless, Du et al. [du2020quantum] provide the closest known implementation with a differentially private Lasso estimator.
Ii-C Implied-Volatility Estimation
The Implied volatility metric captures the financial market’s view of the likelihood of changes in a given security’s price. The analysis of volatility is crucial for risk management, portfolio hedging and option pricing. A precise notion of the market’s expectation of volatility is required [fengler2002analysis]. Portfolios have a sensitivity with respect to volatility changes. For instance, it has been proven that implied volatilities—such as those of oil, gold and the US stock market—play a role on the returns of the equity sector. Particularly, their impact on the prices and returns of the ten most representative US equity sectors has been quantified [ahmad2021us].
A quantum approach for learning implied volatilities has been proposed [sakuma2020application]. This uses the deep quantum neural networks firstly introduced by Beer et al. [beer2020training]. Given options, the input data is its strike prices , and the output is the implied volatilities
. A sigmoid function is used to convert the strike prices to numbers in, which are then represented as quantum states for
. The network consists of one input neuron, one output neuron, and one hidden layer with two neurons. The output of the network is a density matrix. The implied volatility of each of the options is then calculated using its respective element in the density matrix.
The goal of classification
in ML is to predict the labels for new data points using a model that is fit by a labeled dataset. Well-known traditional classification algorithms include Linear Classification, Nearest Centroid and Support Vector Machines (SVMs). More recently, neural-network-based methods have seen tremendous success. Once a neural network has had its weights trained via a labeled dataset, it can be used to perform inference on unseen data instances. It has been empirically demonstrated that neural networks achieve better performance than traditional methods, especially on large datasets[zhang2000neural]. Neural networks, on the other hand, fall short in terms of transparency and interpretability, which could be desirable when it comes when making decisions or performing tasks that involve handling sensitive information [johri2020nearest].
A linear classifier
allows for classifying an object in a dataset based on the value of a linear combination of that object’s characteristics, known asfeature values and stored in a feature vector. The common algorithm used for this is the perceptron method [minsky2017perceptrons], which finds a margin classifier given -dimensional data points in time . Quantum algorithms have been shown to be able to provide speedup, bringing the running time down to [Lloyd2021] or [wiebe2016perceptron]. It was later discovered that the optimal classical algorithm for training classifiers with constant margin runs in [clarkson2012sublinear], and that a corresponding optimal quantum classification algorithm can bring about quadratic speedup, leading to running time [li2019sublinear].
Distance-based classifiers predict the label of a new data point based on its distance to reference points according to some metric. Typical examples are the nearest-centroid and -nearest-neighbors (-NN) classifiers [hastie01statisticallearning].
The nearest-centroid algorithm is a good baseline classifier that offers interpretable results. This algorithm takes as input a number of labeled data points, where each data point belongs to a specific class. The model fitting consists of computing the centroids
, which are the barycenters of data points that cluster together in space. Once the centroids are found, a new data point is classified by finding its closest centroid in terms of Euclidean distance. A quantum version of the nearest centroid algorithm was used to perform classification on the Modified National Institute of Standards and Technology (MNIST) handwritten-digit dataset[johri2020nearest]. The experiments were executed on the IonQ trapped-ion quantum computer. This approach utilizes novel quantum procedures for loading the classical data onto quantum states and estimating distances between these states. Its accuracy matches that of the classical nearest-centroid algorithm. The authors, however, do not claim any quantum speedup in terms of time complexity.
Another distance-based classifier is the -NN algorithm, which predicts the label of a data point based on a majority vote among closest training samples according to a metric, such as the Euclidean distance. Several quantum approaches for -NN have been proposed. In particular, [li2021quantum] utilizes an algorithm for computing Hamming distances in superposition, and the quantum minimum-finding method [durr1996quantum] to find the neighbors with the smallest Hamming distances to a sample. If feature vectors lie in a low-dimensional space, the algorithm can classify a new sample with worst-case time complexity , where is the training set size. Another quantum -NN approach achieves a query complexity of [basheer2020quantum]. The authors design an oracle that encodes the fidelity between two states into a quantum register, which allows for the usage of a quantum algorithm for finding -minima [miyamoto2019quantum].
Support Vector Machines
An SVM [boser1992training]
consists of solving a convex quadratic optimization problem to find the hyperplane that results in the maximum margin between two classes of data. The dual problem
is the one usually solved. is symmetric and positive semi-definite; a common example is the Radial Basis Function. induces the, potentially non-linear, feature map . The Hilbert Space built from such maps is called the Reproducing Kernel Hilbert Space (RKHS) with reproducing kernel . The optimal classifier in RKHS is one that is a linear combination of ’s over a subset of the training data [hastie01statisticallearning].
One of proposed quantum enhancements to the SVM is based on evidence that universal quantum computation, most likely, cannot be efficiently simulated on a classical computer [quantum_enhanced_feature_spaces_2019]. Thus, one should be able to construct a quantum circuit for the map ; is a unitary operation applied to the computational basis state consisting of all qubits in the state, such that this operation is not classically feasible. This is called a quantum feature map and maps classical data into a quantum Hilbert Space that is exponentially large in the dimension of . A potential quantum kernel is
which is symmetric and positive semi-definite. In this case the RKHS is spanned by the functionals, , that are constructed from quantum circuits. The coefficients of the decision function in RKHS can be computed by a convex optimizer running on a classical computer; the quantum computer is used to evaluate the kernel. This hybrid model is called QSVM [quantum_enhanced_feature_spaces_2019]. There is potential quantum advantage in the expressability of the feature map, as long as the associated kernel is infeasible for a classical device to compute. This kernel can be computed on a quantum device utilizing either the destructive SWAP [Mitarai_2019] or the controlled-SWAP [buhrman2001quantum] tests. The latter has better asymptotic complexity, but is not as feasible on NISQ devices. While the former’s asymptotic scaling is prohibitive, it can efficiently be implemented on small quantum computers; this allows for experimentation in the near-term.
The class of Instantaneous Quantum Polynomial (IQP) circuits has been suggested as a potential candidate for [quantum_enhanced_feature_spaces_2019]:
where the functions represent classical preprocessing. If the IQP circuit is deep enough, it is believed that computing inner products of states resulting from these embeddings is #P-Hard [quantum_enhanced_feature_spaces_2019], thereby potentially out of reach of classical devices.
A secondary consideration is whether quantum algorithms can be used to accelerate the training of classical SVMs. Algorithms of complexity has been proposed for training kernel classifiers and margin SVMs [li2019sublinear]. These algorithms are optimal and provide a quadratic speedup over corresponding optimal classical algorithms [clarkson2012sublinear]. However these algorithms have complexity polynomial in the inverse of the error . There exist classical algorithms with complexity using interior-point methods; Kerenidis et al. [kerenidis2021quantum] propose a quantum algorithm to speed up these methods in terms of the dimension . While the quantum run time depends on terms that are difficult to bound directly, for random instances, the quantum algorithm can indeed provide a speedup, leading to a complexity, compared to the classical algorithm’s complexity.
In SVM, an optimal hyperplane is obtained that divides the dataset into multiple classes, with a time complexity of , where represents the feature space dimension, the number of input points, and the accuracy. QSVMs have mathematically been proven to have a run time of [qsvm_lloyd].
Variational Quantum Classifiers
Variational Quantum Classifiers (VQCs) are hybrid quantum-classical ML architectures meant for classification tasks that utilize the quantum state space as a feature space to potentially obtain a quantum advantage. A VQC circuit mainly consists of a quantum embedding, a PQC for processing the quantum data, a measurement routine, and a classical optimization loop for updating the parameters of the PQC. First, classical input data is mapped to a quantum state non-linearly using the feature-map circuit, , defined in Equation 2. Applying to results in the state .
Next, a PQC, , is constructed with parameters . An example of such a PQC is one made from compositions of single qubit rotations and entangling gates. PQC architectures have been discussed where descriptors, such as the entangling capability and expressibility, are used to characterize the performance of the PQCs [expressibility_and_entangling_capability].
In case of a binary-classification problem, a measurement routine is used to get a binary output. This is accomplished by measuring state in the Pauli Z-basis and mapping the output bit-string to a function with binary outcome
. The probability of obtaining an outcome,, is
We repeat this step for measurement shots, which gives an empirical distribution, .
Then, a classical cost function is formulated to enable optimizing the parameters , where is an added bias parameter. Once the classifier is trained on the training data set using a classical optimizer, the trained circuit can now be used to assign labels to unlabelled data. Several optimizers have been proposed and used, both gradient based, such as ADAM and SPSA [adam_optimizer, li2006simultaneous], and gradient-free ones, such as COBYLA [optimizers_without_derivatives].
VQCs have some limitations, and solving these drawbacks is an active area of research. Barren plateaus occur in optimization algorithms of quantum ML when the parameter search space turns flat once the optimizer is run [cerezo2020barren, sharma2020trainability]. Architecture design problems, such as choosing the correct cost functions and initializing the parameters, is a very complex process that has not been completely understood yet [cerezo2021cost]. Additionally, a given variational quantum circuit with fixed form may not be able to capture all of the necessary states in the Hilbert space in its parameterization, and as a result, work on adaptive variational quantum algorithms, such as the Evolutionary Variational Quantum Eigensolver (EVQE), may be applicable to VQC [rattew2019domain].
There is a connection between the QSVM and VQC formulations [quantum_enhanced_feature_spaces_2019, schuld2021supervised, huang2021power] similar to the connection between classical Neural Networks and SVMs [jacot2020neural]. There are various discussions on how data encoding affects VQCs [huang2021power, Schuld_2021], such as repeatedly encoding the inputs [P_rez_Salinas_2020]. In addition, efficient methods were presented for encoding categorical features [yano2020efficient]. Lastly, there has been research into the expressiveness of PQCs [Schuld_2021, Abbas_2021].
Support vector machines have been used to predict stock prices for over two decades, but also to predict financial distress and company’s credit rating [svm_in_finance].
Next we look at a few example financial applications where the aforementioned quantum classifications techniques have been applied.
Iii-a Prediction of Binary Options
SVM can be used to predict the outcome of exotic options. The double no-touch is a binary option [nekritin2012binary]
with a constant payout and is earned if and only if the underlying asset price remains between a predefined lower and upper bound until expiration. Unlike other options, such as a vanilla call, the payoff is not continuous, but all-or-nothing. Therefore, one can use SVM to separate the two classes corresponding to the binary option outcome. As these classes are not linearly separable, one needs a kernel to predict the outcome. This type of exotic option is often used in foreign exchange. The features selected to train the model could be the average directional index and the ratio between realized volatility over implied volatility.
Iii-B Financial Forecasting
Financial forecasting is a planning tool that helps businesses to adapt to uncertainty based on predictions. Particularly, an algorithm to forecast annual earnings is of interest to any company. Such an algorithm has been proposed [easton2020forecasting] that leverages -NN. It matches a company’s recent trend in annual earnings to historical earning sequences of other firms that are similar—known as neighbor firms. Some of the features taken into account to find such neighbors include matches based on industry, size and past accruals.
Iii-C Credit Scoring
Credit scoring is a method to evaluate the credit risk of loan applications. It helps credit analysts to decide whether the applicants are worthy of credit. Based on past experience, credit scoring is the prediction of future behavior. An algorithm for this has been proposed using weighted -NN [mukid2018credit]. The credit applicants are classified into one of two groups: a group whose members are likely to repay their debts and another group that should be denied credit because of high likelihood of defaulting.
consists of identifying groups of data points that are close to each other according to certain metrics. The feature space in which the data is encoded and the grouping metric are proxies for the actual similarities and differences of the data points. Inspired by quantum mechanics and suitable for high-dimensional data,Quantum Clustering (QC) [horn2001algorithm]
is an algorithm that belongs to the family of density-based clustering algorithms, where clusters are defined by regions of higher density of data points. The basic idea of QC is to map each data point to a Gaussian distribution centered at that sample. An analytical form computed from the Schödinger equation is used to determine the potential that gives rise to a mixture of these Gaussian as its ground state. The minima in the system’s potential energy function are used to identify clusters and are found via gradient-descent methods. Other points are assigned to clusters in a similar way.Dynamic Quantum Clustering (DQC) [weinstein2009dynamic], an improvement of QC, adopts the time-dependent Schrödinger Equation in order to study evolution of quantum states associated with data points and the structure of the potential energy function. Being data-agnostic, DQC can be applied in a wide range of fields, especially finance, for example on S&P 500 data [weinstein2013bigdata].
Classical-algorithm-inspired quantum-clustering techniques have also been proposed. For example, -means is a well-known classical clustering algorithm that identifies, among all data points, the most significant clusters and their representative centroids. Inspired by -means, and providing the same robustness guarantees against some level of noise as the classical --means algorithm, the quantum -means algorithm [kerenidis2018q]
has time complexity that is poly-logarithmic in the size of the dataset, and can be implemented using distance estimation and quantum matrix multiplication. A quantum spectral clustering algorithm for data represented as a graph has also been proposed[kerenidis2021quantum]. To overcome the potentially huge time/space overhead of loading large datasets onto a quantum device, coresets have been proposed [tomesh2020coreset], which are small datasets combined with weight functions to sufficiently summarize original datasets. If small enough and still a faithful representation of the original dataset, a coreset could be used to enable execution on a NISQ computer [khan2019kmeans, tomesh2020coreset, mendelson2019quantumassisted, aimeur2007quantum].
Next we briefly discuss several use cases of these quantum clustering algorithms in the financial sector.
Iv-a Fraud Detection
Clustering techniques can be used to perform anomaly detection by learning, from existing data, the normal mode(s), and then using this information to identify if a new data point is normal or otherwise anomalous [aggarwal2005effective]. Clustering can improve learning from imbalanced datasets, which oftentimes is the case for fraud data [singh2018clustering]. Clustering can also be combined with additional feature-selection and extraction techniques. For example, in time series data, a series could be hiking abnormally fast but still stay in a normal value range. Adding derivatives into the clustering algorithm can help detecting such an anomaly [sathyapriya2019cluster].
Iv-B Stock Selection
Cluster analysis has also been used by investors for maximizing profit and minimizing loss. Stock returns are likely to be similar in a region thanks to geographic and macroeconomic features. Identification of stock clusters allows one to track those with similar returns but different risks. Once stocks are grouped by cluster analysis, informed investors can use the output for guidance. They will, for instance, look for same-return stocks and then choose to minimize risks. Alternatively, they will pick a cluster of same-risk stocks and high return [da2005stock].
Iv-C Exchange Rate Regimes
In 1999, Levy-Yeyati and Sturzenegger [levy2005classifying] wanted to exhibit the inconsistency between the self-reported de jure classification from the International Monetary Fund (IMF) and the actual behavior shown in the data. In order to overcome bias, the authors proposed to use -means to perform cluster analysis for exchange rate regimes. This led to a de facto classification, that has then been widely used as well as tested against prior methodologies [eichengreen2013reliable].
Iv-D Hedge Fund Clustering
Due to the variety of hedge fund—and, therefore, investing strategies—it can be hard for investors to classify such investment vehicles. Moreover, hedge funds tend to reveal less information than other type of funds as they do not fall under the same disclosure requirements. To classify hedge funds, predefined classes would not be able to manage correctly future type of hedge funds. Hence, clustering methods, such as -means, have been used to overcome this issue [das2003hedge]. The features considered are based on available characteristics of hedge funds, such as asset classes, size, fees, leverage and liquidity.
V Generative Modeling
A Generative Model
learns a probability distribution over data[Goodfellow-et-al-2016]. In supervised learning, where the model is provided as a set of input/label pairs , the model learns , the joint probability distribution of inputs and labels [ng2002discriminative]
. In unsupervised learning, these models can be used to generate new data given only samples[radford2016unsupervised]. Since measuring a quantum state naturally results in a probability distribution over the outcomes, it makes sense to see if quantum computation can be utilized for generative modeling.
The Boltzmann Machine [koller2009probabilistic] is defined by a collection of visible (observed) and hidden
(marginalized out) random variables, and an undirected graph of conditional dependencies among them. It originates from thermodynamics where the nodes represent a system of correlated classical spins,, under an external magnetic field. The classical Ising Hamiltonian
represents the energy of the system. Probabilistic inference is performed by sampling from the steady-state distribution—a Gibbs state—over the visible nodes. This is usually done utilizing Markov Chain Monte Carlo (MCMC) methods[koller2009probabilistic]. In most cases, the graph is restricted to being bipartite to make sampling feasible, resulting in the Restricted Boltzmann Machine (RBM) [Amin_2018].
To formulate the quantum Boltzmann Machine, we quantize the Ising Hamiltonian by making the replacements , where is the Pauli spin operator for the -th qubit. This results in a quantum Hamiltonian, and thus nodes are associated with qubits, and sampling is performed by projective measurements on the visible qubits.
One potential quantum method to sample from the visible nodes of the Gibbs state is to utilize Quantum Annealing (QA) [farhi2000quantum, Amin_2018, Dixit_2021]. For example, QA can be performed using the D-Wave devices [Harris_2010].
Alternatively, we can prepare the quantum Gibbs state for this system by performing Imaginary Time Evolution (ITE) [Zoufal_2021]. If the initial state is maximally mixed, performing ITE according to a quantum Hamiltonian will result in the associated Gibbs State. ITE can be performed variationaly, via McLachlan’s principle, on a gate-based quantum computer [Yuan2019theoryofvariational]. Interestingly, the model introduced by Zoufal et al. [Zoufal_2021] can be utilized to formulate a Boltzmann Machine without restricted connections that is tractable on a quantum device.
Generative Adversarial Learning
As another prominent architecture for modeling probability distributions, Generative Adversarial Networks (GANs) [goodfellow2014generative], operate by simultaneously training a generator network and a discriminator network against each other through adversarial games, for which tries to fool by generating fake data samples that are non-distinguishable from the ones drawn from the real distribution, whereas tries to tell them apart and not be fooled by . Quantum GANs (qGANs) have since been proposed [PhysRevLett.121.040502, PhysRevA.98.012324] and experimentally tested, for example, on superconducting quantum computers [Hueaav2761]. Either of qGAN’s generator or discriminator, or both, can be in the form of quantum circuits. In addition to the original GAN’s cross entropy, other distance metrics, such as Wasserstein [chakrabarti2019quantum], have also been proposed to improve the adversarial training on NISQ devices.
Quantum Born Machine
Closely related to quantum Boltzmann Machines and qGANs,
Quantum Born Machines [Cheng_2018, coyle2020born]
are another class of methods based on PQCs
that have been studied for performing distributed-learning tasks.
For example, Coyle et al. [coyle2020born] propose using maximum mean discrepancy, the Stein discrepancy, and the Sinkhorn divergence,
to improve the training of a subclass of quantum circuit Born machines.
Having discussed several quantum generative modeling techniques, we next look at sample use cases in the finance domain where these techniques can be applied.
V-a Fraud Detection
Quantum versions of the Boltzmann Machine have be utilized for generative-learning and discriminative-learning tasks [Amin_2018, Dixit_2021]. Specifically for fraud detection, a Variational ITE Boltzmann Machine methodology has been utilized to classify anomalous credit-card transactions [Zoufal_2021]. The system Hamiltonian is represented by a sum of Pauli strings whose coefficients are functions of trained parameters and input features. As mentioned earlier, this formulation is not restricted to the Ising Hamiltonian typically utilized by Boltzmann Machines. Predictions are performed by sampling from a single visible qubit indicating whether the transaction was fraudulent.
qGANs were combined with a framework for Generative Adversarial Anomaly detection, AnoGAN [Herr_2021]. The generator was a PQC; the continuous output from the expectation of Pauli Z operators on each qubit was fed into a classical affine upscaling layer to achieve the full input feature dimension. The goal of the generator was to model the distribution of non-fraudulent transactions.
V-B Probability Distribution Preparation
One crucial step for achieving quantum advantage in many financial applications is the efficient preparation of input probability distributions. qGANs [zoufal2019quantum, SITU2020193] and quantum Born Machines [Cheng_2018, coyle2020born] have both been utilized to learn PQCs for loading probability distributions. Upon convergence, the quantum circuit, as an efficient representation of the underlying distribution, can for example be used in amplitude estimation to perform derivative pricing tasks [stamatopoulos2020option], with a theoretical quadratic speedup compared to classical Monte Carlo simulations. Additional techniques have been explored for the general creation of continuous distributions [haner2018optimizing, grover2002creating]. Additional techniques exploring the creation of certain families of continuous distributions include the work of Rattew et al. [rattew2020quantum]
for the preparation of normal distributions.
Vi Quantum-Assisted Feature Extraction
Feature extraction refers to the set of techniques used to identify attributes of a dataset potentially helpful in ML tasks such as classification and regression. A quantum algorithm may help in feature extraction by computing properties of the dataset that a classical computer would fail to identify, or would take a very long time to do so. By encoding a data onto a quantum state, we can map a low-dimensional classical data to a much higher dimension in the Hilbert space. The expanded dimensionality of the quantum representation may be used to identify features invisible to a classical algorithm [schuld2019quantum]. The growing interest in quantum kernels [chatterjee2016generalized, wang2021towards], used in conjunction with Support Vector Machines, has also culminated an experimental demonstration [bartkiewicz2020experimental].
A widely used algorithm to extract low-dimensional features out of a high-dimensional data is the Principal Component Analysis (PCA). In PCA, a large feature space is analyzed to identify attributes with the highest variance. Classical PCA takes time that is polynomial in the dimension or number of features in the original dataset. If such a classical data is mapped to a quantum density matrix, the quantum version of the algorithm can perform PCA exponentially faster, that is in time polynomial in the logarithm of the dimension[lloyd2014quantum].
Extracting features is particularly challenging while analyzing images where a large number of pixels have to be analyzed to identify image attributes. For these applications, a quantum computer may help in edge detection in images [zhou2019quantum].
In finance, feature extraction may be used in detecting anomalies in transactions. As an example use case, graph theoretic tools are used to study bidding markets to identify colluding communities or cartels [wachs2019network]. Quantum-aided graph kernel methods [schuld2019quantum, bai2017quantum] have been proposed to detect non-trivial features, such as communities [shaydulin2019network], in a graph, which may represent, for instance, a network of financial parties that frequently transact with each other. When working with graph representations of data, we often want to measure the similarity between two graphs. In fact, Gaussian Boson Sampling can be used to check if two graphs are isomorphic to each other [bradler2021graph]. Moreover, Gaussian Boson Sampling can be used to construct kernel vectors representing the similarity between any two graphs [Schuld_2021].
Feature selection consists of choosing from a subset of the available features to pass to the model [hastie01statisticallearning]. This contrasts with methods, such as PCA, that perform a transformation on the features. Feature selection can be formulated as a combinatorial minimization problem with binary decision variables designating whether to select a feature or not. Such binary optimization problems can be solved utilizing QA [farhi2000quantum].
Below we present examples of how these techniques can be applied to financial use cases.
Vi-a Model Reduction
PCA is a widely used method for dimensionality reduction that can be seen from the perspective of singular value decomposition. With the matrix decomposition, where is a rectangular diagonal matrix, the -principal components are the first columns of .
In 2014, Lloyd et al. [lloyd2014quantum] described a quantum PCA with exponential speedup over its classical counterpart. This theoretical speedup is realizable under certain conditions as it is based on HHL [harrow2009linear]. The algorithm can be used in finance to ease model tuning: as market conditions evolves, models needs to be tuned in order to match the implied volatility—volatility estimated by the model—with the market volatility. By using PCA, one reduces the number of components and, consequently, the number of parameters, thereby easing the model tuning.
For example, in a product based on foreign exchange, the input parameters are various and can range from global market data, such as risk-free interest rate, to asset specifics parameters, such as the spot price. As a consequence, the model tuning becomes computationally expensive due to the high number of inputs. However, as just the top three principal components can oftentimes explain over 95% of the output variations, one can tune the model faster and still accurately by using only these three components.
A variation of quantum PCA has been implemented on hardware [martin2021toward] to solve a similar problem by reducing the volatility factor dimension of the Heath-Jarrow-Morton model [10.2307/2951677] in order to estimate forward rates.
Vi-B Combinatorial Feature Selection for Credit Score Classification
As mentioned earlier, feature selection can be cast to a combinatorial optimization problem. In the case of supervised learning, it important to select features that are independent and relevant to the learning task. More specifically, for classification, the correlation coefficients between label and features can represent the relevance. The correlation matrix of the features can be used to represent the dependence between features. This can be formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem, where the quadratic terms are the entries of the correlation coefficients between features, and the linear terms are correlations between the features and the label. QA[farhi2000quantum] can be used to solve the QUBO utilizing heuristics provided by quantum mechanics. This exact formulation, solved with a Quantum Annealer, was applied to reduce the number of features used for assessing the credit worthiness of applicants [milne2017optimal].
Vii Reinforcement Learning
Reinforcement learning (RL) [sutton2018reinforcement] is a ML technique where an agent attempts to learn through interactions with the environment. Classical RL has demonstrated remarkable capabilities in areas such as video games [mnih2015human], board games [silver2016mastering, silver2017mastering], robotics [kober2013reinforcement] and self-driving vehicles [sallab2017deep].
Classical RL is often formulated as a Markov Decision Process (MDP). MDPs enable the modeling of environments where actions are non-deterministic—that is, where taking a given action may probabilistically lead to one of multiple possible outcomes. As such, MDPs are useful for modeling many real-world problems where RL agents are exposed to inherent uncertainty. An MDP is characterized by a set of states, a set of actions available at each state , transition dynamics specifying the probability of obtaining state upon taking action at state , and a reward function . Of importance, an agent selects actions according to a policy which is maintained as a probability distribution over the actions available at any given state. The objective of an RL agent is to learn an optimal policy (one which selects actions maximizing the expected cumulative rewards) given that both the transition dynamics and the reward function of the environment are unknown a priori.
Utilizing quantum computers to perform RL was first discussed by Dong et al. in 2005 [dong2005quantum], with a follow-up in 2008 [dong2008quantum].
In their approach, the possible actions at any given state in the environment are maintained in a quantum superposition, and amplitude amplification is used to increase the probability of measuring a good action at any given state.
In 2017, Dunjko et al. published a framework for quantum RL, where they expand upon the amplitude-amplification approach, which assumes access to an oracle representing the environment [dunjko2017advances].
Furthermore, they introduce more general techniques for learning model meta-parameters, and additionally observe that there is significant potential for quantum advantage in luck-favoring task environments (i.e., environments where a lucky agent finds good sequences of actions much sooner than an unlucky agent) following from quantum search-based speedups.
In a 2021 paper, Wang et al. derive a quantum RL algorithm with quadratic performance improvements in various parameters over corresponding classical algorithms for the evaluation of an optimal policy, state-values, and state-action pair values (q-values) in an MDP [wang2021quantum].
They explain that this work is applicable to any RL problem where the environment may be classically simulated, as a classical circuit implementing the simulator may be efficiently turned into a quantum circuit.
Additionally, recent studies have explored the use of variational PQCs to implement both RL and Deep RL (DRL) in continuous action spaces [chen2020variational, wu2020quantum].
Next, we present some use cases showing how quantum RL techniques can be utilized in the finance domain.
Vii-a Algorithmic Trading
The process of executing trades of financial instruments systematically by accounting for market variables with limited or no human intervention is referred to as algorithmic or automated trading. Generally, algorithmic trading is performed by predictions in a supervised manner followed by obtaining optimal trading decisions under uncertainty associated with the corresponding predictions and market volatility. RL bypasses the need for predictions by casting algorithmic trading as a sequential decision-making problem wherein trading decisions are obtained directly that maximize the cumulative returns over a finite time horizon [zhang2020deep]. The domain of RL, and more specifically DRL, has demonstrated huge applicability for algorithmic trading [pricope2021deep]. However, such RL approaches for automated trading operate under certain strong assumptions and may benefit from quantum ML techniques for improved time and model complexity.
Algorithmic trading can be cast to a multi-period portfolio-selection problem that involves re-balancing the portions of capital invested in selected assets at each stage. There have been attempts to solve this multi-stage optimization problem with a QA device to obtain an optimal trading trajectory [rosenberg2016solving]. However, this approach does not adopt any RL technique based on policy or value function approximation. Due to the hardware limitations of the current quantum devices, quantum RL approaches have not been directly applied yet to automated trading. Nevertheless, components of algorithmic trading can certainly benefit from quantum advantages offered by quantum RL. For instance, the LSTM neural network architecture used as q-value estimator [li2019deep] could be potentially replaced with quantum LSTM [chen2020quantum] for improved performance. Also, variational quantum circuits [wu2020quantum] can be used for different DRL components applied to decision-making in algorithmic trading.
Vii-B Market Making
Market makers have an important role in financial markets as they increase the liquidity of exchanges, thereby facilitating transactions and investment [avellaneda2008high, gueant2013dealing]. A market maker is responsible for maintaining a set of sell orders (asks) and buy orders (bids) at various quantities and prices. When incoming market orders are made on a security held by the market maker, they are required to transact. As such, they inherently assume risk, as a position they are forced to acquire can subsequently depreciate. Market makers profit by taking advantage of the gap, called spread, between the lowest ask and highest bid. For instance, assuming an incoming market order is made to sell security , the market maker will fulfill the order purchasing it at their bid price. If another market order is immediately made to purchase security , the market maker fulfills the order by selling it at their ask thereby price, obtaining a profit equal to the spread.
Market making is amenable to quantum RL, where the problem can be modelled with an agent state, taking into account attributes such as inventory and risk-tolerance, and an environment state where the agent only has partial information, which may not necessarily be Markovian [spooner2018market].
Viii Natural Language Processing
Natural Language Processing (NLP) is the field concerned with automated text and language analysis. A drawback with most search engines that use classical NLP is that they understand separate words and not a grammatical structure. This has triggered research in distributional compositional semantics (DisCo). A particular DisCo model is the Coecke, Sadrzadeh and Clark (CSC) model [clark2008compositional, coecke2010mathematical]
, based on tensor-product composition inspired by quantum theory.
In modern classical NLP, the vector space model [schutze1998automatic] is used to compute the meaning of individual words. Given an individual word in a text, its meaning is computed by first setting up basis words (i.e., the most common words in the text) and then, for each of ’s nearby basis words, counting its frequency through the text. The proximity of two words is measured by the similarity between them and it is calculated, for example, using the inner product of their normalized representative vectors. These are called distributional methods and cannot be extended to find the meaning of long sentences as two sentences are not typically repeated. In contrast, algorithms based on compositional semantics derive the meaning of a sentence from known meanings of component words. The DisCo model combines both approaches to introduce grammatical understanding to the composition of word vectors.
In the CSC model, each grammatical type in the text is assigned a tensor product space based on some grammar (e.g., the Lambek’s pregroup grammar [lambek2008word]). For instance, a transitive verb takes a subject noun as a left argument and an object noun as right argument. The meaning of a noun is calculated as in the distributional model; its vector space is denoted as . Therefore, the meaning of a transitive verb is a tensor in the space , where is the meaning space for the sentences. An important feature of this model is the use of diagrammatic notation for vectors, tensors and linear maps. This model has the computational challenge of large tensor product spaces. Even thought there exist classical approaches—such as dimensionality reduction [polajnar2013learning]—to avoid the calculation of the full tensor product, they make certain assumptions that are not always necessarily met.
The recent development of encoding classical data on quantum hardware using variational PQCs enables quantum NLP to be particularly suitable for NISQ devices.
In particular, the quantum CSC model can encode linguistic structures faster in comparison to its classical counterpart.
Its quantum speedup stems from the quantum nearest-neighbor algorithm that is employed for the sentence similarity calculations in the DisCo framework.
If certain conditions are met, for the -dimensional noun meaning space there is a quantum algorithm capable of classifying any CSC model sentence composed of tensors
into classes with time ,
an improvement over classical methods’ complexity [zeng2016quantum].
Below are a few potential applications of the discussed quantum NLP techniques in the financial sector.
Viii-a Risk Assessment
Banks can quantify the chances of a successful loan payment based on a credit risk assessment. Usually, the payment capacity is calculated based on previous spending patterns and past loan payment history. However, this information is not always available, especially for underbanked applicants. NLP techniques can be applied to solve this problem, by using multiple data points to assess credit risk. For instance, NLP can measure attitude and an entrepreneurial mindset in business loans. Similarly, it can also point out incoherent data and take it up for more scrutiny. Even more, the subtle aspects, such as the lender’s and borrower’s emotions during a loan process, can be incorporated with the help of NLP [purda2015accounting, fisher2016natural].
Viii-B Financial Forecasting
Financial forecasting is based on many macroeconomic factors, which are unstructured and scattered across different sources. This is the reason why NLP techniques are frequently employed [xing2018natural]. For example, NLP has been proposed for classification of news articles as significant or non-significant from the financial point of view [yildirim2018classification]. In addition, sentiment analysis, which plays an important role in decision-making by traders, has also been carried out with the help of NLP techniques [mishev2020evaluation].
Viii-C Accounting and Auditing
Another application of NLP is accounting and auditing [fisher2016natural], whose objective is the detection and prevention of fraud via evaluation of accounting systems, monitoring of internal controls, assessment of fraud risk, and interpretation of financial data for anomalous trends. NLP has been proposed for the creation of semantic knowledge bases or trees for financial accounting standards. Also auditors can detect anomalies in financial statements by applying NLP techniques.
In this paper, we presented an introduction of quantum ML techniques and their applications in the financial services sector. We identified seven machine learning tasks, for which several quantum algorithms have been previously proposed in the literature: regression, classification, clustering, generative learning, feature extraction, sequential decision-making, and Natural Language Processing. We analyzed the speedups offered by various quantum ML techniques, and discussed the financial applications that could benefit from such quantum acceleration. Moreover, where the literature for finance-specific quantum ML techniques remains sparse, we provide insights into applying state-of-the-art general quantum ML techniques to specific financial use cases. Additionally, we consider the realities of implementing quantum computing techniques in the financial sector, for example, by considering the challenges imposed by hardware limitations. In summary, this article serves as a road map towards enriching the finance industry with quantum ML techniques in the NISQ era and beyond.
This paper was prepared for information purposes by the Future Lab for Applied Research and Engineering (FLARE) group of JPMorgan Chase Bank, N.A.. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates make any explicit or implied representation or warranty and none of them accept any liability in connection with this paper, including, but limited to, the completeness, accuracy, reliability of information contained herein and the potential legal, compliance, tax or accounting effects thereof. This document is not intended as investment research or investment advice, or a recommendation, offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction.