DeepAI

# Building separable approximations for quantum states via neural networks

Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert-Schmidt distance. By examining the output of the algorithm, we can deduce whether the target state is entangled or not, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of d=10. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain novel ones, for instance for triseparability. Finally, we show how to use the neural network's results to gain analytic insight.

• 1 publication
• 4 publications
• 4 publications
02/01/2022

### Learning entanglement breakdown as a phase transition by confusion

Quantum technologies require methods for preparing and manipulating enta...
03/02/2022

### The quantum low-rank approximation problem

We consider a quantum version of the famous low-rank approximation probl...
05/15/2021

### The equivalence between correctability of deletions and insertions of separable states in quantum codes

In this paper, we prove the equivalence of inserting separable quantum s...
05/06/2019

### Picturing Bivariate Separable-Features for Univariate Vector Magnitudes in Large-Magnitude-Range Quantum Physics Data

We present study results from two experiments to empirically validate th...
12/01/2021

### Infinite Neural Network Quantum States

We study infinite limits of neural network quantum states (∞-NNQS), whic...
03/26/2021

### Testing identity of collections of quantum states: sample complexity analysis

We study the problem of testing identity of a collection of unknown quan...
10/31/2020

### Analysis and Reliability of Separable Systems

The operation of a system, such as a vehicle, communication network or a...

## I Introduction

Entanglement is now considered a defining feature of quantum theory, with broad implications in modern physics, from quantum information processing to many-body physics.

The detection and characterisation of entanglement is however a notoriously challenging problem Horodecki et al. (2009); Gühne and Toth (2009). First of all, it is known that the problem of determining whether a given density matrix is entangled or separable is NP-hard Gurvits (2003); Gharibian (2010). There exist however general methods for detecting entanglement, notably the celebrated negativity under partial transposition (NPT) criteria which ensures the considered density matrix must be entangled Peres (1996); Horodecki et al. (1996). The converse, however, does not hold, as there exist entangled states which are positive under partial transposition, so-called bound (or PPT) entanglement Horodecki et al. (1998). Other techniques have been developed, yet all of them are only useful in specific cases. Moving beyond the bipartite case, the certification of multipartite entanglement, of which there exist a zoology of different forms, is by far even more challenging and less understood.

Beyond the question of determining whether a given quantum state is entangled or not, one may consider the problem of approximating a given target state via a separable one. More precisely, if the target state is separable, the question is to provide an explicit (separable) decomposition for the density matrix. While, if the state is entangled, to construct a separable state that minimizes a certain distance (in the Hilbert space) with respect to the target state.

This question has been addressed indirectly in the studies of entanglement measures based on the distance from the set of separable states Horodecki et al. (2009); Vedral et al. (1997), and is particularly relevant when constructing entanglement witnesses Pittenger and Rubin (2002); Bertlmann et al. (2002); Pittenger and Rubin (2003); Bertlmann et al. (2005); Bertlmann and Krammer (2008)

. Additionally, finding the closest separable state has been studied directly, but this task is even difficult for two-qubit systems

Kim et al. (2010). For a very specific notion of distance, it has also been studied directly though the concept of “best separable approximation” of a quantum state Lewenstein and Sanpera (1998). The construction of separable approximations for multipartite states is largely unexplored, except for specific families of states, which typically have a high level of symmetry Ishizaka (2002); Hayashi et al. (2008, 2009); Hübener et al. (2009); Parashar and Rana (2011); Carrington et al. (2015); Quesada and Sanpera (2014); Akulin et al. (2015); Rodriques et al. (2014).

In the present work, we attack these questions using tools from machine learning. Specifically, we devise neural networks for constructing a separable approximation, given a target density matrix. We define a notion of “closest separable state”, which represents the separable state minimizing a given distance with respect to the target; note that this does not coincide with the best separable approximation in general. We benchmark our method with two distance measures, the trace distance and Hilbert-Schmidt distance, on several examples, including a bipartite entangled state of local dimension up to

. We also demonstrate the potential of our method in the multipartite case, where we construct multi-separable decompositions for several classes of entangled states (noisy GHZ and W states) up to four qubits. In particular, we obtain tighter bounds or establish new estimates on multi-separability for several classes of states. We conclude with a number of open questions and directions for future research. Finally, in the appendices we study the output of the neural network in order to gain analytic insight into the closest separable state to a Bell state, as well as random two-qubit states. From the intuition gained we create ansätze for closest separable states for both cases, and derive an exact bound for the two-qubit generic case.

## Ii Related work

Previous work on using machine learning for the separability problem has been focused either having the machine choose good measurements and then using an existing entanglement criteria Wang (2017); Yosefpor et al. (2020) , or on viewing the task as a classification problem Lu et al. (2018); Gao et al. (2018); Ma and Yung (2018); Gray et al. (2018); Yang et al. (2019); Goes et al. (2021); Ren and Chen (2019). For classification, typically a training set is constructed where quantum states are labeled as separable or entangled. The machine learns on this training set and given a new example predicts whether it is entangled or separable. There are several difficulties with this approach. First, the machine just gives a guess of whether the state is entangled or separable, and does not provide any kind of certificate. Second, the training data can only be generated in a regime where we already understand the problem well, which results in the machine giving only marginal new insight at best. This could be circumvented by using suboptimal criteria (e.g. PPT) in order to create the training data, however, the machine would just learn this criteria instead of correctly identifying the entanglement/separability boundary.

We overcome these challenges by using a generative model, which tries to give an explicit separable decomposition of a target state. This way we immediately get a certified upper bound on the distance from the separable states. A similar approach has been taken in Refs. Harney et al. (2020, 2021), where the authors represent the quantum states with “quantum neural network states” Carleo and Troyer (2017); Melko et al. (2019), and their extension to density matrices Yoshioka and Hamazaki (2019); Hartmann and Carleo (2019); Nagy and Savona (2019); Vicentini et al. (2019)

, as opposed to the dense representation we utilise. Their results show a more limited flexibility in the loss function and in the design of types of separable states. One such family of separable states that is not examined is the very challenging, yet interesting question of multipartite bi- and triseparability (relevant for 3-party and 4-party genuine multipartite entanglement), which we address here.

## Iii Preliminaries

In this section we first introduce the notions of separability for bipartite and multipartite systems and then define the closest separable state. Finally we introduce the basic concepts of neural networks. For more detailed introductions on separability and entanglement or on neural networks, we refer the interested reader to Ref. Horodecki et al. (2009) and Goodfellow et al. (2016), respectively.

A quantum state acting on , shared between two parties, is said to be separable if it can be constructed by the convex combination of some local quantum density matrices acting on , and acting on as

 ρ=K∑k=1pkρk1⊗ρk2, (1)

with

a normalized discrete probability distribution. Any state which is not separable is

entangled. For finite dimensional systems, i.e. where , for , the local states of the decomposition, , can be taken to be pure. Due to Caratheodory’s theorem, the number of terms required in the sum, , is upper bounded by .

For a multipartite system of parties several notions of separability exist. The straightforward generalizaiton of Eq. (1) results in the notion of a fully separable decomposition,

 ρ=K∑k=1pkρk1⊗ρk2⊗⋯⊗ρkn. (2)

Naturally, one can also just examine bipartite separability on the mutlipartite system by grouping the parties together. This leads to the notion of biseparability with respect to the partition ,

 ρ=K∑k=1pkρkI⊗ρk¯I, (3)

where denotes a subset of the indices and denotes its complement. A multipartite state is called biseparable if it can be decomposed as a convex mixture of states that are separable considering all possible bipartitions, namely

 ρ=K∑k=1pkρkIk⊗ρk¯Ik, (4)

where crucially, now each can be different.

There are many ways to quantify entanglement of a target state , among which a particularly useful one is based on the distance of a state from the set of separable states. Any distance measure111We use the term distance, in line with the literature, however note that must not necessarily be a metric, and is thus more related to the notion of a divergence. between quantum states , which is zero if and only if , and for which for any completely positive trace preserving map , can be used to construct an entanglement measure, by minimizing over separable states Horodecki et al. (2009); Vedral et al. (1997). We will use the neural network to find the closest separable state with respect to a distance , formally

 ρCSS:=argminρSepD(ρT,ρSep.)), (5)

where is a separable state. Note that the closest separable state is not necessarily unique. For the neural network method presented in this paper, any which is differentiable with respect to one of the states can be used. We choose to work with two distances; the first is the trace distance (related to the Schatten 1-norm) Eisert et al. (2003),

 DTr(σ1,σ2)=12Tr√(σ1−σ2)2=12∑i|μi|, (6)

where

are the eigenvalues of

. Note that the trace-distance-based measure can be useful in quantum hypothesis testing, and, among other measures, is an important measure in the study of closest classical states, which is distinct from the closest separable state Aaronson et al. (2013); Paula et al. (2013); Nakano et al. (2013); Modi et al. (2010); Bellomo et al. (2012). We will not examine closest classical states in this work, but note that our methods can easily be adopted for their study.

The second distance we consider is the Hilbert-Schmidt distance (related to the Schatten 2-norm) Vedral and Plenio (1998); Witte and Trucks (1999); Krammer (2009)

 DHS(σ1,σ2) =√Tr[(σ1−σ2)2]. (7)

The Hilbert–Schmidt-based measure can be useful for constructing entanglement witnesses Pittenger and Rubin (2002); Bertlmann et al. (2002); Pittenger and Rubin (2003); Bertlmann et al. (2005); Bertlmann and Krammer (2008). Both the trace distance and Hilbert-Schmidt distance can be used as a basis for an entanglement measure, however, one could consider others, such as the Bures distance Vedral and Plenio (1998), relative entropy of entanglement Vedral et al. (1997) or the robustness of entanglement Vidal and Tarrach (1999); see e.g. Ref. Zyczkowski and Bengtsson (2006) for an overview and other examples of geometric measures of entanglement.

Let us now concisely introduce the concept of an artificial neural network Goodfellow et al. (2016)

, the basis of our numerical representation of separable states. A neural network is a numeric model which can in principle represent any multivariate function. A crucial point is to be able to adjust the parameters of the neural network in order to represent the desired function, however in many use-cases this can be done surprisingly efficiently with the techniques of deep learning.

In this work we will be using one of the simplest types of neural networks, the so-called multilayer perceptron. It is characterized by the number of neurons per layer (width), the number of layers (depth), and the activation functions used at the neurons. Altogether these model an iterative sequence of

parametrized affine, and fixed nonlinear transformations, on the input; namely the map from layer to is

 rl+1=h(Wlrl+bl), (8)

where the weight matrix

and bias vector

parametrize the affine transformation, is a fixed differentiable nonlinear function (activation function), and is the input of layer , and its length signifies the width (number of “neurons”) of layer

. The vector

() is the input (output) of the whole model. At initialization, the weights and biases of all layers are set randomly. During training, the parameters of the model () are updated such that they minimize a differentiable loss function of the training set, which as we will see later, in our case will be the trace or Hilbert–Schmidt distance. This is done by first evaluating the model for a batch

of inputs, and then by slightly updating the parameters via a method called backpropagation, which relies on the gradient of the loss function with respect to the model parameters. This is repeated for many batches, until the model converges, a maximum training time is reached, or a satisfactory loss is achieved. Once trained, the neural network can be evaluated on new input instances.

## Iv Neural networks as separable states

The task is to find the closest separable state to a given target density matrix. The central idea of this work is to use a neural network as a variational ansatz for the density matrix by representing the local components of the separable decomposition with a single neural network. The approach is inspired by a similar approach taken for nonlocality, where neural networks represent the local components of a Bell-local behavior Kriváchy et al. (2020).

In order to demonstrate the method, let us examine the example of a bipartite 2-qubit state. We ask a neural network to represent the map

 k→(pk,ρk1,ρk2), (9)

where we take to be pure states, with . That is, the neural network will take as input an integer value between and (in a one-hot representation), and will output the numbers , such that normalization for each subsystem is satisfied. Note that for each complex number, two real numbers are output, the real and imaginary part. We evaluate the neural network for values of , normalize the probability vector and sum up the outputs in order to construct a separable state via Eq. (1), namely . The neural network is trained to minimize the distance between the target density matrix and the constructed separable density matrix , i.e. . The process is roughly illustrated in Fig. 1, where the are not shown explicitly.

By construction the neural network represents a single density matrix , so for each target state , the network must be retrained in order to obtain an approximation of the closest separable state to that target state. During training, requiring values of in order to evaluate the state technically means working with a batch size of size . That is, we evaluate inputs () in order to construct and only then calculate the gradients required for the optimization of the neural network. A crucial point of the method is that a the size of the neural network depends only on the number of parties and the local Hilbert space dimensions, and not on the number of elements in the decomposition, .

More generally, for more parties or higher dimensions, the neural network represents the map

 k→(pk,ρk1,ρk2,…ρkn), (10)

where we take the () to be pure, and the neural network explicitly outputs the parameters of the pure states. By evaluating this neural network for values of , we construct a separable state via either Eq. (1) for the bipartite case (n=2), or any of Eqs. (2,3,4) for the different notions of multipartite separability. Recall that by Caratheodory’s theorem, in principle the largest needed is , however even less could be sufficient. Thus we keep

as a free hyperparameter, which we set before training begins. More technical details on the neural networks we used can be found in App.

D or in the sample code provided in the Code Availability section.

The neural network is optimized in the high-dimensional non-convex landscape of the network’s weights, so it is not guaranteed to converge to the optimal solution. However, in practice, optimization procedures based on gradient descent reach close-to-optimal solutions efficiently. Notice, that even for suboptimal solutions we obtain an upper bound on the amount of entanglement of the target state, since the utilized distances serve as entanglement measures Vedral et al. (1997); Vedral and Plenio (1998); Horodecki et al. (2000). However, we can go one step further, and examine families of states parametrized by a single parameter, which we refer to as , typically of the form

 ρT(q)=qρent+(1−q)ρsep, (11)

where is an entangled state and is a separable state, oftentimes the maximally mixed state. If is truly entangled, then when decreasing , for some value we will cross the separability boundary. We can observe this transition by varying and retraining the neural network from scratch for each target distribution. An approximation of becomes clear from how close the algorithm can get to the target states for different values.

## V Results

In order to benchmark the method, we first use the algorithm to examine the separability boundary for some exemplary families of bipartite states, including an example where the partial transpose criterion is inconclusive. Then, we examine some multipartite cases, up to 4 parties, where many things are still unknown about the separability boundary even for quintessential cases, such as GHZ and W states mixed with white noise. We provide numeric estimates on these thresholds. Additionally, in Appendix

A

we compare our neural network algorithm to a naive gradient-descent based heuristic to show its advantage. In Appendix

B, we provide analytic guesses of the closest separable state to the Bell state based on the numerical results, and in Appendix C we examine the performance of the algorithm on random bipartite density matrices and conjecture an analytic ansatz of the closest separable state for 2-qubit states, which we find to be very close in trace distance to the solutions found by the neural network, and prove a bound on the trace distance. The examples presented in the appendix are meant to serve as inspiration in how numeric techniques can aid analytic insight, particularly when constructing ansätze and conjectures.

Werner states and isotropic states are highly symmetric bipartite states that are separable if and only if they have a positive partial transpose. They are defined for local systems of the same dimension, . Isotropic states are

 ρiso(q)=1−qd2I12+q|ϕ+⟩⟨ϕ+|, (12)

where is the identity operator on the joint space and is the canonical maximally entangled state

 |ϕ+⟩=1√dd∑i=1|i⟩1|i⟩2, (13)

where and are bases of and , respectively.

Werner states are defined as

 ρWerner(q)=(1−q)2d(d+1)Psym+q2d(d−1)Pas, (14)

where

 Psym= 12(I12+F12), Pas= 12(I12−F12),

with the flip operator. Isotropic states are separable for , while Werner states for .

For both the isotropic and Werner states, we run the neural network independently for 11 values of , and additionally for the exact separability boundary value. The results for both the trace distance and Hilbert-Schmidt distance for are depicted in Fig. 2 (each line is plotted with its respective loss function, the trace or Hilbert-Schmidt distance). They confirm that the algorithm works properly in this regime, finding a sharp transition at the known separability thresholds. When making a linear fit to the data that is outside the seemingly flat separable region, we recover the thresholds with a precision of at least . To give an example of the running time on a personal computer, for isotropic states the training for a single target state for took at most 15 minutes, while for it took only at most 30 seconds222Timed with an Intel i7-8700k CPU @ 3.70 GHz with 6 cores (12 threads) and 16 GB RAM.. When the trace distance is found to be smaller than , we choose to stop the training, and conclude that the state to be separable. Otherwise we run the algorithm until the resulting trace distance converges, i.e. it doesn’t change more that

in one epoch.

Additionally, for we examine the Werner states, also plotted in Fig. 2. For such a large state, with , training took about 1 hours 15 minutes on a personal computer for a single epoch (3000 batches), which was reduced to 45 minutes when training on a GPU333Trained on a RTX-3080 GPU with 10 GB memory.. Due to the increased runtime we only ran one epoch for each point in Fig. 2, and did not wait until convergence. We observe that the neural network struggles more in finding a closest separable state in the separable area, however it works remarkably well in the entangled regime, and still manages to give qualitatively interpretable results on where the entanglement boundary lies. For increased accuracy one could run the algorithm several times independently and take the smallest value for each , or one could run the algorithm with a larger batch size . For example for the separability boundary at , by using instead of 100, after 5 epochs (5 times 3000 batches), the trace distance reduced to 0.024 from the 0.045 seen in Fig. 2.

Before moving on to the multipartite setting, we consider another family of states from the bipartite scenario, introduced in Ref. Horodecki et al. (1999), however, we adopt the parametrization used in Ref. Mintert et al. (2005). This family of 2-qutrit states exhibits bound entanglement, i.e. a PPT entangled region. The states are

 ρq=121⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝2000200020β−000000000β+000000000β+0000020002000200000β−000000000β−000000000β+0200020002⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠, (15)

with , and , however, we only consider since the negative regime gives the same states up to permutations. It is known that is separable for , is PPT entangled for and is NPT entangled for . For several values of we train the neural network to approximate , and display the results in Fig. 2. We can see that by explicitly constructing the separable decomposition, our results are not sensitive to whether the partial transpose is positive or negative, and the neural network approach successfully identifies the separable and entangled regions.

We now consider three and four qubit multipartite states by examining the exemplary GHZ and W states, mixed with white noise. The GHZ state is

 |GHZ⟩=1√2(|00…0⟩+|11…1⟩), (16)

while the W state is

 |W⟩=1√n((|0…01⟩+|0…10⟩+⋯+|10…0⟩). (17)

We mix both with the maximally mixed state as we did for the isotropic states in Eq. (12). For three qubits, we use the neural network to distinctly examine

1. full separability, as in Eq. (2) (),

2. biseparability with respect to a single partition (123), as in Eq. (3),

3. biseparability, as in Eq. (4),

and for the four qubits,

1. full separability, as in Eq. (2) (),

2. biseparability with respect to the partition (1234), as in Eq. (3),

3. biseparability with respect to the partition (1234), as in Eq. (3),

4. biseparability with respect to 2 vs. 2 partitions, i.e. as in Eq. (4), except all partitions are constrained to have 2 parties,

5. biseparability with respect to 1 vs. 3 partitions, i.e. as in Eq. (4), except all partitions are constrained to have 1 party (and thus the complements have 3 parties),

6. triseparability, as a generalization of Eq. (4), namely , with a partitioning of for each ,

7. biseparability, as in Eq. (4).

For the biseparable and triseparable case, on a technical level, for each we ask the neural network to output density matrices for all possible partitions, i.e. for each it actually outputs 3 terms at a time for the 3-party case, and 6 terms for the 4-party case.

We present the results in Fig. 3, except for 4-qubit separability with respect to a fixed partition, in order to not overcrowd the figure, however, note that those results are qualitatively similar. The consistent straight lines formed from independent runs give us confidence that the algorithm works well for approximately detecting the separability boundaries.

From Fig. 3, we extract estimates of the separability bound by fitting linear curves to the data that is outside the seemingly flat separable region. We summarize these values them in Tables 1 and 2. With the flexibility of the current technique, we are able to quickly get estimates on the noise thresholds for many notions of separability, or alternatively, entanglement. In cases where the exact threshold is known, our estimate is close to it. Where the boundary is not known to be exact, we can see how close it is to being tight. We observe that in these cases (3-qubit W separability w.r.t. a fixed partition, and 4-qubit biseparability for W states), in fact the analytic upper bounds seem to be close to, or in fact, optimal. Finally, we establish estimates for many notions of separability, for which we did not find previous estimates or bounds in the literature, marked with a "?" in the tables.

## Vi Conclusion and outlook

In summary, we have addressed the question of constructing the closest separable state to a given target state, by using a neural network as a compact model for separable states. We avoided the bottleneck of having to explicitly model many (up to ) separable pure states in a decomposition by using a single neural network to represent them all. We demonstrated that by training the model independently on multiple states from a family, we can identify the separability boundary well. We did this for examples where the boundaries are known, PPT entangled states, as well as 3- and 4-party examples where there are still major gaps in our knowledge of the various separability boundaries. Additionally, in the Appendices, we provided examples of how to extract analytic guesses and insight from the numeric results for Bell states and for random 2-qubit states. We provide an ansatz for the closest separable state to the Bell state, as well as a generic approximation to the closest separable state, and based on the random state numerics, we observe that both the trace distance and Hilbert-Schmidt distance of the closest separable states are bounded by the absolute value of the smallest eigenvalue of the partial transpose.

The technique presented here opens up avenues to a variety of numeric applications in quantum foundations. In particular, for any task with reasonable Hilbert space sizes, it is possible to optimize over the set of separable states, as long as the loss function is differentiable. Among other potential applications, it can be especially helpful for obtaining (estimates or bounds on) entanglement measures, measures of robustness, separable ground state energies, and with minor modifications can be easily adapted to finding the closest classical state. Moreover, a particularly fruitful avenue for research could be focused on combining our approach with other generative neural network approaches to quantum state representations, namely “quantum neural network states” Carleo and Troyer (2017); Melko et al. (2019), particularly their extension to density matrices Yoshioka and Hamazaki (2019); Hartmann and Carleo (2019); Nagy and Savona (2019); Vicentini et al. (2019). Using such an ansatz for the separability problem has been examined in Ref. Harney et al. (2021). Such prospects of further developing the algorithms give the promise of exciting novel numerical tools for a broad range of tasks, both for numerical work and gaining analytic insight.

## Viii Acknowledgments

We thank Pavel Sekatski for discussions. We acknowledge financial support from the Swiss National Science Foundation (project and NCCR QSIT). TK additionally acknowledges funding from the Swiss National Science Foundation Doc.Mobility grant (project P1GEP2_199676).

## Appendix A Comparing with gradient descent

In order to see the advantage of using a neural network, we compare our algorithm with the naive optimization algorithm of gradient descent, for the simplest case of two qubits.

We parametrize the quantum state in a similar way as in Eq. (9), i.e. the free parameters are the probabilities and the real and imaginary parts of the pure states composing the separable state according to Eq. (1), with . The gradient descent algorithm varies these parameters in order to minimize the trace distance with respect to a target state, which we chose to be the Bell state, namely Eq. (13), with . The gradient descent algorithm was run with an initial learning rate of 1, decreased by a factor of 0.98 each round for 250 rounds, and with a momentum factor of 0.2.

Recall that the neural network, even with one layer, did not have any trouble finding the closest separable state with a trace distance of 0.5. However, as shown in the left panel of Fig. 4, we notice that already for this simple case the gradient descent technique has difficulties in finding the closest state. Somewhat surprisingly, if only real numbers are chosen to represent the state, the gradient descent technique performs better and converges to a good solution. Note that for higher dimensions, e.g. , the real-valued gradient descent also has difficulties, as shown in the right panel of Fig. 4.

## Appendix B Analyzing a Bell state

Though in the current work we primarily use the neural network technique to find transitions from separability to entanglement on families of states, it can just as well be used to study specific states. Here we will look at what closest separable state the neural network finds for the Bell state . We examine and deduce a family of closest separable states which are all at the same distance to . Based on our numerics, we believe the closest separable with respect to the trace distance to be at a distance of . However, we have not found this explicitly proven in the literature, even though the related concept of finding the closest classical state has been well studied Paula et al. (2013); Nakano et al. (2013); Aaronson et al. (2013).

When using the trace distance as the loss function of the neural network, it finds the separable state

 ⎛⎜ ⎜ ⎜ ⎜ ⎜⎝(12−a)c¯ca¯cab−¯cc¯ba−c¯a−c−¯c(12−a)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠, (18)

with , and small values. However, when using the Hilbert–Schmidt distance as the loss, the neural network converges to the , and solution. Both solutions have the same trace distance from the Bell state. From these two extremes, we constructed the ansatz (18) for the closest separable state and verify that for , and they indeed all give a trace distance of 0.5. We even go further and find other values of for which the trace distance is 0.5. For example if all parameters are set to be real, and , then it is a closest separable state for . The same hold for , (with all parameters real). Clearly there are countless others, but characterizing the whole range of values which give a trace distance of 0.5 is beyond the scope of this paper. Indeed this analysis stands here to show how one can gain insight by looking at the output state of the neural network.

## Appendix C Random states

When benchmarking the method on random states, we noticed that there is a strong connection between the obtained trace distance of the closest separable state and the lowest eigenvalue of the partial transpose. In this section we first show benchmark results for the method on random two-qubit states (), where the PPT criteria clearly distinguishes entangled from separable states. We observe a strong correlation between the trace distance and Hilbert–Schmidt distance of the closest separable state and the smallest eigenvalue of the partial transpose of the state. Finally, we present an analytic ansatz of the closest separable state, based on the numerical results of the neural network and our intuition, which we numerically validate to be very close to the actual closest separable state.

In the two-qubit case the positive partial transpose criteria is a necessary and sufficient condition for separability. Thus, in Fig. 5 we plot the distance to the closest separable state obtained by the neural network against the smallest eigenvalue of the partial transpose, which we will refer to as . Using the trace distance as a loss, we tested 400 random states with the trace distance as a loss function, and 300 with the Hilbert–Schmidt distance as the loss (the neural network was retrained 5 times for each state and the lowest distance was kept).

First, we observe that the neural network achieves close to zero distance in the separable regime for all states. Clearly it can not and should not reach zero distance for entangled states (i.e. on the left side of the figures, where ). We observe a much stronger relation: in fact the Hilbert–Schmidt distances of the closest separable state seem to line up on a line with slope , while the trace distance results seem to be below this. We formulate these two observations; namely in the entangled regime, for ,

 DHS(ρT,ρCSS;HS) ≤−λ(ρT), (19) DTr(ρT,ρCSS;Tr) ≤−λ(ρT), (20)

where we explicitly denoted which distance was minimized in the subscript of .

Finally, we provide an ansatz for the closest separable state with respect to the trace distance. Intuitively, we set the smallest eigenvalue of the partial trace to be 0 instead of negative, and adjust the others such that the trace remains unchanged.

###### Theorem 1.

Let be an entangled state whose partial transpose has an eigendecomposition of , with , where is the smallest eigenvalue (i.e. ). Then let our ansatz of the closest separable state be with and denoting the partial transpose of . If is a valid density matrix then

 DTr(ρT,ρ′)≤−λ(ρT). (21)

Before proceeding to the proof, note that is only actually a separable density matrix if . However, only about of random states have a approximation which is not a valid separable density matrices. The trace distances of the approximations of the 400 random states examined previously are depicted in Fig. 5.

###### Proof.

Recall that , where is the set of eigenvalues of the difference. As a first step let us examine this difference.

 ρT−ρ′ =(U(D−D′)U†)Γ= =(U⎛⎜ ⎜ ⎜ ⎜⎝λ10000−λ1/30000−λ1/30000−λ1/3⎞⎟ ⎟ ⎟ ⎟⎠U†)Γ= =λ1(43UE11U†−I/3)Γ= =λ1(43uu†−I/3)Γ,

where is the matrix with a single nonzero entry in its first position, and thus is the first column of .

In order to prove the theorem we must show that no matter what appears in the decomposition, the trace distance is bounded, namely that

 maxuDTr(ρT,ρ′)≤−λ1, (22)

which, after canceling out , reads explicitly as

 maxu12∑i∣∣∣e.v.i(43(uu†)Γ−I/3)∣∣∣≤1, (23)

where we have used the notation for the

-th eigenvalue. Using that the identity matrix is jointly diagonalizable with

, the left-hand side becomes

 maxu16∑i|4νi−1|, (24)

where are the eigenvalues of in non-decreasing order. Notice that the partial transpose preserves the trace, so , since

is the (unit-length) first column of a unitary matrix. So essentially, we must maximize (

24) by distributing 1 among the four eigenvalues . Due to the absolute value, the value becomes a divider: eigenvalues below it should be as small as possible, while eigenvalues above it should be as large as possible. So we split the eigenvalues into two parts

 S≤14 :={νi|νi≤14}, S>14 :={νi|νi>14}.

Case-by-case we give an upper bound for Expression (24), based on the number of in . For any eigenvalues appearing in the absolute value just disappears when upper bounding Expression (24). So if , then

 maxu∑i|4νi−1|≤∑i(4νi−1)=4∑i(νi)−4=0. (25)

If we have , then the best we can do is push down to be as negative as possible, so that the other eigenvalues can jointly be larger ( if ). Additionally, it is known that there can be at most one negative eigenvalue of the partial transposeRana (2013); Johnston and Kribs (2010), and that all eigenvalues are larger than  Rana (2013), i.e.

 −12 ≤ν1, (26) 0 ≤ν2. (27)

Using the first, and that the eigenvalues sum to 1, we see that

 maxu4∑i=1|4νi−1| =maxu4∑i=2(4νi−1)−(4ν1−1)= (28) =maxu4(4∑i=2νi)−4ν1−2≤ (29) ≤4(1−ν1)−4ν1−2= (30) =2−8ν1≤2+82=6. (31)

Finally, note that it does not make sense to add more eigenvalues to , since if , then by Ineq. (27), namely that , we cannot increase the weight of , i.e. . So essentially we are in the same position as when , and thus the upper bound is 6. Placing this back in Expression(24), or Ineq. (23), we see that the theorem is proven. ∎

## Appendix D Technical details of the utilized neural networks

The main idea of how we use neural networks can be found in the maintext, while the implemented code can be found in the online repository provided. Here, we briefly describe some of the technical details and hyperparameters that we used.

As described in the maintext we use a feedforward neural network to represent a generic separable state of a fixed dimension and separability structure. We use a multilayer perceptron with rectified linear units as activations, except in the final layer where we use sigmoid activations. The outputs are normalized via a softmax function for the probability vectors, and by dividing by the 2-norm for the complex entries of the pure states. For the calculation in the maintext we employed a single hidden layer, with a width of 100, or 200 for more difficult calculations. The number of elements in the separable decomposition,

, is analytically upper bounded by , however in the implementation, typically gives satisfactory results and allows for much quicker training. For training we use the Adadelta optimizer.