1 Background
Permutation problems are well studied combinatorial optimisation problems in nature inspired computing. They have many realworld applications especially in planning and logistics. Some of the most frequently studied permutation problems in literature are the wellknown Travelling Salesman Problem (TSP) and Quadratic Assignment Problem (QAP). Since these problems are NPhard, heuristics and metaheuristics have been proposed for solving them.
Several classes of algorithms have been applied to problems naturally represented as permutations e.g. Estimation of Distribution Algorithm
arza2020kernels, Iterated Local Search and Differential Evolution Algorithm santucci2015algebraic. Quantuminspired algorithms such as the Digital Annealer (DA) has also been shown to present more promising performance on the QAP when compared to CPLEX and QBSolve matsubara2020digital. The DA was particularly shown to be up to three or four orders of magnitude faster than CPLEX on QAP instances and maximum cut instances.Quantum and quantuminspired methods have been of research interest in recent years. This is because they are able to exploit the use of specialised hardware to solve optimisation problems much quicker than classical algorithms implemented on general purpose machines aramon2019physics. It is common to formulate combinatorial optimisation problems such as permutation problems in quadratic and binary form such that algorithms that use specialised hardware including (but not limited to) quantum devices can be used to solve them. In recent years, Quadratic Unconstrained Binary Optimisation (QUBO) has become a unifying model for representing many combinatorial optimisation problems verma2020penalty. QUBO (or the equivalent Ising) formulations of common combinatorial optimisation problems are presented in lucas2014ising
. As the name depicts, QUBO problems are unconstrained, quadratic and of binary form. Since the representation only supports binary information, the natural representation of permutation problems can therefore not be used. While some classical optimisation algorithms such as genetic algorithms use permutation representation
hussain2017genetic to solve QAP and/or TSP, other algorithms require alternative representations e.g. random keys ayodele2016rk, factoradics regnier2014factoradic, binary baluja1994population, matrix representation larranaga1999genetic or twoway onehot lucas2014ising. The twoway onehot representation glover2018tutorial; lucas2014ising; matsubara2020digital, also known as permutation matrix Birdal_2021_CVPR, is often used in QUBO formulations of permutation problems and is used in this study. In this representation, a substring of bits is used to represent each entity (e.g a location in TSP). In each substring, only one bit can be turned on and in the entire solution, the bit turned on in each substring must be unique. This representation ensures that the mutual exclusivity constraint of permutation problems is respected. QUBO solvers such as Path Relinking method used in verma2020penalty and the first generation DA in aramon2019physics are single flip solvers and are therefore not able to preserve twoway onehot validity. To ensure that the problem to be solved is in ‘unconstrained’ form, penalty weights are applied to combine the cost function (unconstrained objective function) with the constraint function. Solutions are penalised by the magnitude of violation of the constraint(s).Setting penalty weights is not a trivial task. Values that are too large make the search too difficult for the solver as the penalty terms overwhelm the original objective function verma2020penalty. Penalty weights that are too small are highly undesirable as infeasible solutions will displace feasible solutions in the search, causing the solver to return infeasible solution(s). Penalty weights are however not unique and there are often a range of values that work well glover2019quantum. The primary objective of setting the penalty weights is often to ensure that the optimal solution of the QUBO is the optimal solution of the original constrained problem. However, it is also important that these values are not too large.
In literature, there are many approaches of setting penalty weights for QUBOs. The value can be set by domain experts glover2019quantum. A common approach is to derive the penalty weights empirically using methods that increase the weights until feasibility is achieved rosenberg2016solving. This approach is however computationally intensive as a full run of the algorithm and analysis of results is required each time until the right value is reached.
Another common approach is to set the penalty weight to a value greater than the largest possible objective value. Deriving the range (upper bound and lower bound) of the objective function is often problem specific. These values can also be too large. Although there are some general methods of determining the bounds of a QUBO, it is however often the case that methods with less computational complexity lead to values that are too large while methods that can provide better bounds are often computationally expensive boros2008max. Moreover, while better bounds can lead to smaller but valid penalty weights (i.e guarantees feasibility of the optimal), the penalty values are still often larger than desired. In cseker2020digital, the performance of the DA was analysed using different penalty weights that are fractions of the range of the objective function. The range was derived using problem specific information, the authors found much smaller values can lead to better performance of the DA.
Furthermore, the maximum coefficient of the QUBO has been used as penalty weights when solving problems like the TSP. The idea behind this is that, if a TSP solution is penalised by the maximum distance between any two cities, feasibility of the optimal solution can always be achieved. In takehara2019multiple, the authors used a multitrial approach. The values used were within a range defined using the minimum and the maximum distance between any two locations. Similarly, in goh2020hybrid, fractions of the maximum distance were used to derive penalty weights for the TSP. This is an example of a scenario where values much lower than the full range of the objective function can be valid. This approach may however not generalise to other problems verma2020penalty.
In verma2020penalty, a preprocessing method that can be used to generate penalties for equality constraints within the context of single flip QUBO solvers was presented. The authors measure the maximum change in objective function that can be obtained as a result of any single flip in a solution. This method, which is referred to as VLM in this study, is explained in more details in Section 3.
Examples of QUBO (or Ising) solvers that use specialised hardware are Dwave’s Quantum Annealer johnson2011quantum and Fujitsu’s DA aramon2019physics. The Quantum Annealer uses Quantum Processing Units (QPU) to achieve its speed up while the DA uses applicationspecific CMOS hardware. In this study, we analyse the effect of different methods of generating penalty weights for permutation problems (TSP and QAP). Initial experiments are based on a CPU implementation of the DA Algorithm presented in aramon2019physics. Further analysis of the effect of penalty weights are done using the third generation DA (DA3) as the QUBO solver da3.
The rest of this paper is structured as follows. Section 2 presents the permutation formulations and QUBO formulations of the TSP and QAP used in this study. Section 3 presents the methods of generating penalty weights. Section 4 presents a description of the DA Algorithm. Experimental Settings are presented in Section 5. Analyses of results and conclusion are presented in Sections 6 and 7 respectively.
2 Permutation Problems
A valid permutations is described as , where . In general, QUBO problems can be defined as follows:
(1) 
where and are QUBO matrix and constant term, the solution is an
dimensional vector, and
is the energy (or fitness) of . To formulate permutation problems as QUBO, it is important for the problem to be formulated as zeros and ones as these are the only values supported by QUBO solvers. Some of the wellknown approaches of transforming integer values to zeros and ones are binary, gray code and onehot. Within the context of QUBOs, the twoway onehot (permutation matrix) encoding is often used to represent permutations and will be used in this study.The energy (fitness function) of the problem is shown in Eq. (2) where and are respectively cost and constraint functions while is the penalty weight. The generic cost and constraint functions of permutation problems are presented in Eqs. (3) and (4) goh2020hybrid.
(2) 
(3) 
(4) 
Note that the constraint function is designed to ensure that the solutions can be converted to valid permutations i.e. penalise solutions that do not satisfy,
(5) 
In Eqs. (3)  (5), binary variable indicates whether an object is assigned to position or not. We however note that is solved as a vector of size rather than a matrix. Also, while in Eq. (3) is the QUBO coefficient that captures the relationship between an object being in position and an object being in position . In the rest of this paper, QUBO matrices and representing the cost or constraint functions are presented as matrices.
2.1 Quadratic Assignment Problem
The QAP can be described as the problem of assigning a set of facilities to a set of locations. For each pair of locations, a distance is specified. For each pair of facilities, a flow (or weight) is specified. The aim is to assign each facility to a unique location such that the sum of the products between flows and distances is minimised.
Formally, the QAP consists of two input matrices and , where is the flow between facilities and , and is the distance between locations and , the solution to the QAP is a permutation where represents the location that facility is assigned to. The objective function (total cost) is formally defined as follows
(6) 
We aim to solve Eq. (2) where the cost function of the QUBO representing the QAP is presented in Eq. (7) and the constraint function is the same as Eq. (4).
(7) 
Any solution can be encoded with variables, when is presented in vector format.
2.2 Travelling Salesman Problem
The TSP consists of locations and a matrix representing distances between any two locations. The aim of the TSP is to minimise the distance travelled while visiting each location exactly once and returning to the location of origin. Given that is used to denote the city and is the distance between and . The solution to the TSP is a permutation where each represents the location to visit. The TSP is formally defined as
(8) 
We aim to solve Eq. (2) where the cost function, of the QUBO representing the TSP is presented in Eq. (9) and the constraint function is the same as Eq. (4).
(9) 
The TSP instances used in this work are symmetric, we can therefore fix the first city, reducing the size of to lucas2014ising.
Note that QUBOs for these problems can be generated using packages such as PyQUBO zaman2021pyqubo. QUBOs are generated in this study using method in moraglio_georgescu.
3 Penalty Weights
The aim of this study is to derive methods of setting in Eq. (2) such that the optimal solution to the penalised objective function is the optimal solution of the original constrained problem. We do this without problem specific knowledge but use information captured in the QUBO matrices representing the cost and constraint functions. This is shown in Eq. (10), is used to denote the cost function of the optimal solution . is the solution space of infeasible solutions. Note that produces a nonnegative value, if the solutions are feasible but for infeasible solutions. The value of increases according to the degree of constraint violation.
(10) 
(11) 
In the rest of this study, and are used to denote the QUBO matrices representing and respectively. Note that , which is the QUBO matrix optimised by the solver, can be derived by aggregating the matrices (i.e. ), where . The methods of generating penalty weights used in this study are described as follow:
UB: A common method of setting penalty weights is based on the Upper Bound (UB) of the objective function. The UB of used in this study is presented in Eq. (12). This is a valid upper bound for problems with all positive QUBO coefficients. We note that a solution consisting of all 1s is an infeasible solution but it gives an estimate of how large the objective function could be.
(12) 
MQC: The Maximum QUBO Coefficient (MQC) which also corresponds to the maximum distance between any two cities in the TSP has been used as penalty weights in previous study lucas2014ising. MQC is defined in Eq. (13).
(13) 
VLM: This is the method proposed by Verma and Lewis verma2020penalty. For a 1flip solver like the DA, any variable can be flipped from 0 to 1 and vice versa at each iteration of the algorithm. VLM focuses on deriving a good estimate for the numerator of Eq. (11), i.e. . The method estimates the amount of gain/loss in objective function that can be achieved by either turning a bit on or off. They do not consider the denominator (i.e ) which the authors recognise will be hard to estimate without complete enumeration. Since in infeasible solutions, VLM (Eq. (14)) will always be valid.
(14)  
(15) 
MOMC: We propose an amendment to VLM verma2020penalty. We refer to the proposed method as the Maximum change in Objective function divided by Minimum Constraint function of infeasible solutions (MOMC). We note that is not considered in Eq. (14). VLM can be reduced such that is still valid, if we know the minimum constraint function () of any infeasible solution. This can be computed from by estimating the minimum change in constraint function that is greater than 0 as shown in Eq. (16).
(16)  
(17) 
For permutation problems represented as twoway onehot, of any solution that is a flip away from a feasible solution is (i.e. ). Method of generation using the proposed MOMC is presented in Eq. 18 where and are derived as shown in Eqs. (14) and (17)
(18) 
MOC: We propose another amendment to the VLM method. The method presented here is derived by selecting the Maximum value derived from dividing each change in Objective function with the corresponding change in Constraint function (MOC). MOC captures possible equivalent increase in constraint function as a result of a change in objective function which could be achieved by flipping any bit from 0 to 1 or vice versa. Method of generation using the proposed MOC is presented in Eq. (19) where and are derived as shown in Eqs. (14) and (16)
(19) 
4 Digital Annealer
The DA is a technology designed to solve large scale combinatorial optimisation problems in much shorter time than most classical algorithms.
The first generation DA is a single flip solver with similar properties as the Simulated Annealing (SA). It is however designed to be more effective than the classical SA algorithm aramon2019physics. The standard algorithm of the first generation DA is presented in Alg. 1
. The DA algorithm exploits the use of specialised hardware such that all neighbouring solutions are explored in parallel and in constant time regardless of the number of neighbours. This approach significantly improves acceptance probabilities of the regular SA algorithm. The DA does not completely evaluate each solution but computes the energy difference resulting from flipping any single bit of the parent solution. Also, the DA uses an escape mechanism to avoid being trapped in local optimal. As shown in the algorithm,
is used to relax acceptance criteria. The is incremented by a parameter (offset_increase_rate) each time no neighbour which satisfies the acceptance criteria is found. The acceptance criteria used in this study is where is the probability of accepting the flip. Note that represents the difference in energy as a result of flipping the bit and is the current temperature.More extensive details of the algorithm can be seen in aramon2019physics. The first and second generation DAs were released in May 2018 and December 2018. Both versions were designed to solve optimisation problems that have been formulated as QUBOs. The most recent generation of the DA is the DA3 which is able to find optimal or sub optimal solutions to Binary Quadratic Problems (BQP) of up to 100,000 bits da3. BQPs include QUBO but also other binary and quadratic formulations that may have constraints. Note that the algorithm that supports DA3 has been updated to perform better than the current algorithm presented. For simplicity, we use the CPU implementation of Algorithm 1 in this study to evaluate the quality of solution derived when using different penalty weights. We however also presented some results derived using DA3.
5 Experimental Setup
Problem sets, measure of performance and parameter settings used in this study are presented in this section.
5.1 Datasets
In order to compare the behaviour of the DA with different penalty weights, we used common TSP and QAP instances from TSPLIB reinelt1991tsplib and QAPLIB burkard1997qaplib. The instances used in this study and the corresponding solution sizes are presented in Table 1.
QAP Instances  

had12  144 
had14  196 
had16  256 
had18  324 
had20  400 
QAP Instances  

rou12  144 
rou15  225 
rou20  400 
tai40a  1600 
tai40b  1600 
TSP Instances  

bays29  784 
bayg29  784 
berlin52  2601 
brazil58  3249 
dantzig42  1681 
TSP Instances  

fri26  625 
gr17  256 
gr21  400 
gr24  529 
st70  4761 
QUBO matrices (in upper triangular format) representing the cost and constraint functions of these QAP and TSP instances used are made available^{1}^{1}1https://github.com/mayoayodelefujitsu/QUBOs
5.2 Performance Measure
We compare the performance of the DA using different methods of generating penalty weights. The performance measure used is the Average Relative Percentage Deviation (ARPD) defined in Eq. (20)
(20) 
Note that is the best energy returned by the DA for the run using penalty weight set to and is the number of runs. We set and the optimal value are obtained from QAPLIB and TSPLIB.
5.3 Parameter Settings: DA Algorithm
The parameter settings used in the DA and DA3 are shown in Table 2. For the DA, the temperature is set to decrease from ‘Initial Temperature’ to ‘Final Temperature’ by a fraction ‘Temperature Decay’. Note where denotes the temperature at iteration . In the DA3, temperature and offset increment related parameters are automatically set. Therefore, no manual setting is required for the DA3, these parameters are thus shown as ‘NA’ for DA3 in Table 2. The stopping criteria used in the DA (CPU implementation) is ‘number of iterations’ but ‘time (in seconds)’ is used in DA3. This is because the two stopping criteria allowed in the DA3 are time and target energy (fitness). The DA is executed for iterations while the time limit for DA3 is set to the ceiling of 3% of in seconds ( is presented in Table 1). Each experiment is executed independently 20 times. The parameters are chosen based on preliminary experiments. Note that (Eq. (14)).
Parameter  DA  DA3 

Initial Temperature  0.1, , 10  NA 
Final Temperature  1  NA 
Stopping Criteria  sec  
offset_increase_rate  NA  
Number of Runs  20  20 
Temperature Decay  0.001  NA 
6 Results
In this section, different methods of generating penalty weights are compared using the parameter settings presented in Section 5.3. Results using the CPU implementation of the first generation DA algorithm are presented. Further results relating to the third generation DA are also presented.
6.1 Penalty Weights for TSP and QAP instances
Table 3 shows the penalty weights derived for different TSP and QAP instances using the methods of generating penalty weights defined in Section 3. The smallest and valid penalty weights are highlighted in bold. Validity of the penalty weights are assessed in Section 6.2. It should be noted that the methods were applied to QUBO matrices in upper triangular format.


Penalty Weights  

UB  MQC  VLM  MOMC  MOC  
QAP  had12  249,240  126  5,460  2,730  488  
QAP  had14  573,484  162  8,968  4,484  533  
QAP  had16  1,014,488  162  12,580  6,290  545  
QAP  had18  1,832,940  200  16,102  8,051  1,513  
QAP  had20  2,950,640  220  20,928  10,464  1,335  
QAP  rou12  40,734,756  19,602  874,944  437,472  34,531  
QAP  rou15  98,340,328  19,602  1,498,176  749,088  79,715  
QAP  rou20  346,044,384  19,602  2,569,174  1,284,587  123,342  
QAP  tai40a  5,904,547,332  19,602  10,418,804  5,209,402  176,904  
QAP  tai40b  1,767,388,016,312  32,656,592  4,524,144,275  2,262,072,138  56,133,309  
TSP  bayg29  3,381,534  386  6,279  3,140  2,404  
TSP  bays29  4,259,764  509  8,593  4,297  3,003  
TSP  berlin52  74,165,126  1,716  55,515  27,758  27,148  
TSP  brazil58  379,655,572  8,700  288,552  144,276  55,557  
TSP  dantzig42  4,814,472  192  5,029  2,515  1,915  
TSP  fri26  1,455,150  280  4,833  2,417  1,616  
TSP  gr17  1,005,188  745  7,981  3,991  3,074  
TSP  gr21  2,666,064  865  11,160  5,580  2,853  
TSP  gr24  1,609,942  389  5,185  2,593  1,888  
TSP  st70  16,647,424  129  5,055  2,528  2,079 
6.2 CPU implementation of the DA Algorithm: comparing methods of generating penalty weights
In this section, results derived using CPU implementation of the first generation DA algorithm are presented. Table 4 shows the number of feasible solutions returned by the DA within the stopping criteria when different methods of generating penalty weights are used.

Number of feasible runs  
UB  MQC  VLM  MOMC  MOC  
had12  20  0  20  20  20  
had14  20  0  20  20  20  
had16  20  0  20  20  20  
had18  20  0  20  20  20  
had20  20  0  20  20  20  
rou12  20  0  20  20  13/14/14  
rou15  20  0  20  20  20  
rou20  20  0  20  20  20  
tai40a  20  0  20  20  20  
tai40b  20  0  20  20  20 

Number of feasible runs  

UB  MQC  VLM  MOMC  MOC  
bayg29  20  20  20  20  20  
bays29  20  20  20  20  20  
berlin52  20  20  20  20  20  
brazil58  20  20  20  20  20  
dantzig42  20  20  20  20  20  
fri26  20  20  20  20  20  
gr17  20  20  20  20  20  
gr21  20  20  20  20  20  
gr24  20  20  20  20  20  
st70  20  20  20  20  20 
Tables 5, 6 and 7 respectively present the ARPD derived by the DA at initial temperature set to and using different methods of generating penalty weights. The ARPD for MQC is not presented for QAP instances in any of the tables because no feasible solution was found.
UB, MQC and MOC present their best ARPD averaged across TSP instances when the temperature is set to while MOMC and VLM present their best ARPD on TSP instances when temperature is set to . UB presents its best performance on QAP instances when temperature is set to , and MOC, MOMC and VLM present the best ARPD averaged across all QAP instances when the temperature is set to .

Optimal  ARPD  
UB  VLM  MOMC  MOC  
had12  1,652  14.15  12.98  11.98  6.40  
had14  2,724  16.20  14.85  13.86  6.28  
had16  3,720  12.23  13.63  10.76  5.50  
had18  5,358  11.97  11.24  9.25  6.35  
had20  6,922  12.46  12.15  8.99  6.25  
rou12  235,528  29.12  29.15  27.98  10.37  
rou15  354,210  30.75  33.34  28.21  16.28  
rou20  725,522  24.04  25.96  20.42  14.35  
tai40a  3,139,370  20.44  20.73  16.08  13.00  
tai40b  637,250,948  77.76  76.51  52.92  11.73  
Avg  24.91  25.05  20.05  9.65 

Optimal  ARPD  

UB  MQC  VLM  MOMC  MOC  
bayg29  1,610  189.59  52.94  180.69  125.19  114.98  
bays29  2,020  190.45  55.52  188.23  130.47  116.61  
berlin52  7,542  295.70  100.40  289.46  212.58  209.45  
brazil58  25,395  389.04  138.94  390.04  276.79  260.21  
dantzig42  699  340.26  95.05  335.62  225.04  224.17  
fri26  937  177.06  57.32  177.34  120.84  107.51  
gr17  2,085  112.06  29.67  107.56  84.75  66.97  
gr21  2,707  170.91  44.82  166.01  123.01  93.32  
gr24  1,272  166.45  52.37  160.64  114.54  102.29  
st70  675  444.14  124.52  419.62  330.44  325.83  
Avg  247.57  75.16  241.52  174.37  162.13 

Optimal  ARPD  
UB  VLM  MOMC  MOC  
had12  1,652  15.25  7.65  8.33  6.54  
had14  2,724  15.26  9.37  9.76  6.43  
had16  3,720  13.27  8.13  8.75  5.41  
had18  5,358  11.40  7.08  7.04  6.55  
had20  6,922  12.86  7.38  7.66  6.74  
rou12  235,528  32.12  20.34  20.75  9.58  
rou15  354,210  31.33  22.00  21.37  15.75  
rou20  725,522  24.49  17.77  17.91  13.69  
tai40a  3,139,370  20.43  16.10  16.13  12.89  
tai40b  637,250,948  78.61  51.81  51.25  11.49  
Avg  25.50  16.76  16.89  9.51 

Optimal  ARPD  

UB  MQC  VLM  MOMC  MOC  
bayg29  1,610  194.42  54.69  127.05  120.22  117.59  
bays29  2,020  196.69  57.22  124.98  122.92  120.10  
berlin52  7,542  298.15  102.47  217.57  214.73  214.16  
brazil58  25,395  380.88  137.75  278.99  277.66  261.90  
dantzig42  699  350.45  100.25  238.07  232.41  222.18  
fri26  937  185.14  59.04  116.29  114.82  109.82  
gr17  2,085  128.08  31.41  70.44  62.56  60.52  
gr21  2,707  190.91  52.99  114.31  105.63  98.48  
gr24  1,272  178.25  52.85  114.77  104.18  98.75  
st70  675  435.61  129.66  335.66  331.78  329.34  
Avg  253.86  77.83  173.81  168.69  163.28 

Optimal  ARPD  
UB  VLM  MOMC  MOC  
had12  1,652  11.26  7.99  8.51  6.22  
had14  2,724  15.56  9.13  9.48  6.11  
had16  3,720  14.02  8.19  8.19  5.12  
had18  5,358  11.80  7.07  7.31  6.03  
had20  6,922  12.57  7.32  7.33  6.43  
rou12  235,528  28.30  18.94  16.50  10.02  
rou15  354,210  33.98  21.02  20.16  14.57  
rou20  725,522  25.19  17.80  17.36  13.05  
tai40a  3,139,370  20.96  16.00  15.97  12.54  
tai40b  637,250,948  79.65  50.85  49.94  12.10  
Avg  25.33  16.43  16.07  9.22 

Optimal  ARPD  

UB  MQC  VLM  MOMC  MOC  
bayg29  1,610  193.29  57.27  127.64  122.24  122.83  
bays29  2,020  196.84  57.57  132.84  124.92  111.88  
berlin52  7,542  300.65  103.48  218.34  215.59  214.92  
brazil58  25,395  375.04  139.30  285.40  273.91  265.80  
dantzig42  699  333.03  100.39  238.24  227.12  230.27  
fri26  937  180.85  62.51  124.09  114.74  11.27  
gr17  2,085  122.95  30.19  70.65  64.84  60.17  
gr21  2,707  178.13  52.73  115.05  107.30  99.54  
gr24  1,272  179.79  56.82  116.19  106.69  103.71  
st70  675  452.83  126.34  337.61  334.76  325.69  
Avg  251.34  78.66  176.61  169.21  164.56 
In Table 5, the DA presents the best average ARPD on QAP instances when the method of setting penalty weight is set to MOC. The DA however presents the best average ARPD on TSP instances when MQC is used. These results are expected since the MOC and MQC respective present the smallest yet valid penalty weights for QAP and TSP instances. ARPD for QAP instance rou12, is shown in italics when the method of setting penalty weight is set to MOC because the ARPD is only computed using the 13 feasible solutions found while others are generated using 20 feasible solutions. The DA presents the worst average ARPD on QAP (or TSP) instances when the method of setting penalty weights is set to VLM (or UB).
In Tables 6 and 7, similar to the results produced in Table 5, the DA presents the best average ARPD on QAP instances when the methods of setting penalty weight is set to MOC and the best average ARPD on TSP instances when set to MQC. ARPD for QAP instance rou12, is shown in italics when the method of setting penalty weight is set to MOC because the ARPD is only computed using the 14 feasible solutions found while others are generated using 20 feasible solutions. The DA presents the worst average ARPD on QAP and TSP instances when the method of setting penalty weights is set to UB.
In general, the results show that the methods which produced the smallest valid penalty weights (MQC for TSP and MOC for QAP) consistently produced the best ARPD. Conversely, the results also show that the method that produced the largest penalty weights on TSP and QAP instances (UB) often presents the worst ARPD. For permutation problems represented as twoway onehot, all neighbours of a feasible solution are infeasible solutions. It is therefore important for penalty weights to be small enough to encourage the algorithm to explore infeasible solutions in order to find better feasible solutions.
6.3 DA3: Comparing methods of generating penalty weights
In section 6.2, we show how different methods of generating penalty weights can affect the performance of the 1st generation DA (CPU implementation). In this section, we present results using DA3 (i.e. version of the third generation DA which benefits from hardware speedup). It should be noted that the DA3 has more capabilities and can be executed in many modes. A major improvement presented by the DA3 is the ability to handle linear inequality and onehot constraints.

Optimal  ARPD  

UB  VLM  MOMC  MOC  
had12  1,652  0.00  0.00  0.00  0.00  
had14  2,724  0.00  0.00  0.00  0.00  
had16  3,720  0.00  0.00  0.00  0.00  
had18  5,358  0.00  0.00  0.00  0.00  
had20  6,922  0.00  0.00  0.00  0.00  
rou12  235,528  0.00  0.00  0.00  0.00  
rou15  354,210  0.00  0.00  0.00  0.00  
rou20  725,522  0.00  0.00  0.00  0.00  
tai40a  3,139,370  0.07  0.07  0.07  0.07  
tai40b  637,250,948  0.00  0.00  0.00  0.00  
Average  0.01  0.01  0.01  0.01 

Optimal  ARPD  

UB  MQC  VLM  MOMC  MOC  
bayg29  1,610  0.00  0.00  0.00  0.00  0.00  
bays29  2,020  0.00  0.00  0.00  0.00  0.00  
berlin52  7,542  1.86  2.96  4.16  6.05  2.20  
brazil58  25,395  1.53  3.33  1.53  1.58  1.43  
dantzig42  699  0.00  0.00  0.00  0.00  0.00  
fri26  937  0.00  0.00  0.00  0.00  0.00  
gr17  2,085  0.00  0.00  0.00  0.00  0.00  
gr21  2,707  0.00  0.00  0.00  0.00  0.00  
gr24  1,272  0.00  0.00  0.00  0.00  0.00  
st70  675  2.67  1.48  2.41  2.37  2.39  
Average  0.61  0.78  0.81  1.00  0.60 
We present the ARPD achieved within seconds (where represents the size of the solution) of executing DA3 with different methods of generating penalty weights in Table 8
. DA3 achieves 100% feasibility on TSP instances with any of the penalty methods, it also achieves 100% feasibility on QAP when UB, VLM, MOMC and MOC methods are used. Furthermore, the standard deviation across 20 runs of the DA3 is often 0. For all QAP instances as well as 7 out of 10 TSP instances, DA3 obtains the same ARPD regardless of the penalty weights. DA3 presents varying ARPD on the largest TSP instances when different methods of generating penalty weights are used. Best ARPD was obtained for berlin52, brazil58 or st70 when UB, MOC or MQC is used respectively. There is therefore no clear evidence of one method being the best. We can therefore not make the same conclusions as the previous section, where there was clear evidence of smaller and valid penalty weights leading to better solution quality. Similarities in performance of DA3 regardless of penalty weights used may be because of the algorithmic changes made since 1st generation DA. DA3 is designed to solve problems formulated as twoway onehot more efficiently
da3. It is also able to automatically find the best parameter settings for any BQP. The algorithm that supports DA3 is however not publicly available, it is therefore difficult to be precise about the reason for the difference in performance when compared to the first generation DA.7 Conclusion and Further Work
Permutation problems like TSP and QAP can be formulated as QUBO such that algorithms that use specialised hardware e.g. Quantum Annealer or DA can solve them. Transforming these problems into QUBO form requires the setting of penalty weights. In this study, we examined different methods of generating penalty weights within the context of using the DA algorithm for solving permutation problems. The permutation problems used are TSP and QAP. We present improvements to existing methods of generating penalty weights leading to better performance of the DA. Although the DA algorithm, which shares similar properties with SA, was influenced by the magnitude of penalty weights, we could not reach the same conclusions with DA3. It was impossible to do deeper analysis because the DA3 algorithm is not publicly available. Further research into how various algorithms behave with different mechanisms of generating penalty weights is therefore necessary.