I Introduction
Entanglement is now considered a defining feature of quantum theory, with broad implications in modern physics, from quantum information processing to manybody physics.
The detection and characterisation of entanglement is however a notoriously challenging problem Horodecki et al. (2009); Gühne and Toth (2009). First of all, it is known that the problem of determining whether a given density matrix is entangled or separable is NPhard Gurvits (2003); Gharibian (2010). There exist however general methods for detecting entanglement, notably the celebrated negativity under partial transposition (NPT) criteria which ensures the considered density matrix must be entangled Peres (1996); Horodecki et al. (1996). The converse, however, does not hold, as there exist entangled states which are positive under partial transposition, socalled bound (or PPT) entanglement Horodecki et al. (1998). Other techniques have been developed, yet all of them are only useful in specific cases. Moving beyond the bipartite case, the certification of multipartite entanglement, of which there exist a zoology of different forms, is by far even more challenging and less understood.
Beyond the question of determining whether a given quantum state is entangled or not, one may consider the problem of approximating a given target state via a separable one. More precisely, if the target state is separable, the question is to provide an explicit (separable) decomposition for the density matrix. While, if the state is entangled, to construct a separable state that minimizes a certain distance (in the Hilbert space) with respect to the target state.
This question has been addressed indirectly in the studies of entanglement measures based on the distance from the set of separable states Horodecki et al. (2009); Vedral et al. (1997), and is particularly relevant when constructing entanglement witnesses Pittenger and Rubin (2002); Bertlmann et al. (2002); Pittenger and Rubin (2003); Bertlmann et al. (2005); Bertlmann and Krammer (2008)
. Additionally, finding the closest separable state has been studied directly, but this task is even difficult for twoqubit systems
Kim et al. (2010). For a very specific notion of distance, it has also been studied directly though the concept of “best separable approximation” of a quantum state Lewenstein and Sanpera (1998). The construction of separable approximations for multipartite states is largely unexplored, except for specific families of states, which typically have a high level of symmetry Ishizaka (2002); Hayashi et al. (2008, 2009); Hübener et al. (2009); Parashar and Rana (2011); Carrington et al. (2015); Quesada and Sanpera (2014); Akulin et al. (2015); Rodriques et al. (2014).In the present work, we attack these questions using tools from machine learning. Specifically, we devise neural networks for constructing a separable approximation, given a target density matrix. We define a notion of “closest separable state”, which represents the separable state minimizing a given distance with respect to the target; note that this does not coincide with the best separable approximation in general. We benchmark our method with two distance measures, the trace distance and HilbertSchmidt distance, on several examples, including a bipartite entangled state of local dimension up to
. We also demonstrate the potential of our method in the multipartite case, where we construct multiseparable decompositions for several classes of entangled states (noisy GHZ and W states) up to four qubits. In particular, we obtain tighter bounds or establish new estimates on multiseparability for several classes of states. We conclude with a number of open questions and directions for future research. Finally, in the appendices we study the output of the neural network in order to gain analytic insight into the closest separable state to a Bell state, as well as random twoqubit states. From the intuition gained we create ansätze for closest separable states for both cases, and derive an exact bound for the twoqubit generic case.
Ii Related work
Previous work on using machine learning for the separability problem has been focused either having the machine choose good measurements and then using an existing entanglement criteria Wang (2017); Yosefpor et al. (2020) , or on viewing the task as a classification problem Lu et al. (2018); Gao et al. (2018); Ma and Yung (2018); Gray et al. (2018); Yang et al. (2019); Goes et al. (2021); Ren and Chen (2019). For classification, typically a training set is constructed where quantum states are labeled as separable or entangled. The machine learns on this training set and given a new example predicts whether it is entangled or separable. There are several difficulties with this approach. First, the machine just gives a guess of whether the state is entangled or separable, and does not provide any kind of certificate. Second, the training data can only be generated in a regime where we already understand the problem well, which results in the machine giving only marginal new insight at best. This could be circumvented by using suboptimal criteria (e.g. PPT) in order to create the training data, however, the machine would just learn this criteria instead of correctly identifying the entanglement/separability boundary.
We overcome these challenges by using a generative model, which tries to give an explicit separable decomposition of a target state. This way we immediately get a certified upper bound on the distance from the separable states. A similar approach has been taken in Refs. Harney et al. (2020, 2021), where the authors represent the quantum states with “quantum neural network states” Carleo and Troyer (2017); Melko et al. (2019), and their extension to density matrices Yoshioka and Hamazaki (2019); Hartmann and Carleo (2019); Nagy and Savona (2019); Vicentini et al. (2019)
, as opposed to the dense representation we utilise. Their results show a more limited flexibility in the loss function and in the design of types of separable states. One such family of separable states that is not examined is the very challenging, yet interesting question of multipartite bi and triseparability (relevant for 3party and 4party genuine multipartite entanglement), which we address here.
Iii Preliminaries
In this section we first introduce the notions of separability for bipartite and multipartite systems and then define the closest separable state. Finally we introduce the basic concepts of neural networks. For more detailed introductions on separability and entanglement or on neural networks, we refer the interested reader to Ref. Horodecki et al. (2009) and Goodfellow et al. (2016), respectively.
A quantum state acting on , shared between two parties, is said to be separable if it can be constructed by the convex combination of some local quantum density matrices acting on , and acting on as
(1) 
with
a normalized discrete probability distribution. Any state which is not separable is
entangled. For finite dimensional systems, i.e. where , for , the local states of the decomposition, , can be taken to be pure. Due to Caratheodory’s theorem, the number of terms required in the sum, , is upper bounded by .For a multipartite system of parties several notions of separability exist. The straightforward generalizaiton of Eq. (1) results in the notion of a fully separable decomposition,
(2) 
Naturally, one can also just examine bipartite separability on the mutlipartite system by grouping the parties together. This leads to the notion of biseparability with respect to the partition ,
(3) 
where denotes a subset of the indices and denotes its complement. A multipartite state is called biseparable if it can be decomposed as a convex mixture of states that are separable considering all possible bipartitions, namely
(4) 
where crucially, now each can be different.
There are many ways to quantify entanglement of a target state , among which a particularly useful one is based on the distance of a state from the set of separable states. Any distance measure^{1}^{1}1We use the term distance, in line with the literature, however note that must not necessarily be a metric, and is thus more related to the notion of a divergence. between quantum states , which is zero if and only if , and for which for any completely positive trace preserving map , can be used to construct an entanglement measure, by minimizing over separable states Horodecki et al. (2009); Vedral et al. (1997). We will use the neural network to find the closest separable state with respect to a distance , formally
(5) 
where is a separable state. Note that the closest separable state is not necessarily unique. For the neural network method presented in this paper, any which is differentiable with respect to one of the states can be used. We choose to work with two distances; the first is the trace distance (related to the Schatten 1norm) Eisert et al. (2003),
(6) 
where
are the eigenvalues of
. Note that the tracedistancebased measure can be useful in quantum hypothesis testing, and, among other measures, is an important measure in the study of closest classical states, which is distinct from the closest separable state Aaronson et al. (2013); Paula et al. (2013); Nakano et al. (2013); Modi et al. (2010); Bellomo et al. (2012). We will not examine closest classical states in this work, but note that our methods can easily be adopted for their study.The second distance we consider is the HilbertSchmidt distance (related to the Schatten 2norm) Vedral and Plenio (1998); Witte and Trucks (1999); Krammer (2009)
(7) 
The Hilbert–Schmidtbased measure can be useful for constructing entanglement witnesses Pittenger and Rubin (2002); Bertlmann et al. (2002); Pittenger and Rubin (2003); Bertlmann et al. (2005); Bertlmann and Krammer (2008). Both the trace distance and HilbertSchmidt distance can be used as a basis for an entanglement measure, however, one could consider others, such as the Bures distance Vedral and Plenio (1998), relative entropy of entanglement Vedral et al. (1997) or the robustness of entanglement Vidal and Tarrach (1999); see e.g. Ref. Zyczkowski and Bengtsson (2006) for an overview and other examples of geometric measures of entanglement.
Let us now concisely introduce the concept of an artificial neural network Goodfellow et al. (2016)
, the basis of our numerical representation of separable states. A neural network is a numeric model which can in principle represent any multivariate function. A crucial point is to be able to adjust the parameters of the neural network in order to represent the desired function, however in many usecases this can be done surprisingly efficiently with the techniques of deep learning.
In this work we will be using one of the simplest types of neural networks, the socalled multilayer perceptron. It is characterized by the number of neurons per layer (width), the number of layers (depth), and the activation functions used at the neurons. Altogether these model an iterative sequence of
parametrized affine, and fixed nonlinear transformations, on the input; namely the map from layer to is(8) 
where the weight matrix
and bias vector
parametrize the affine transformation, is a fixed differentiable nonlinear function (activation function), and is the input of layer , and its length signifies the width (number of “neurons”) of layer. The vector
() is the input (output) of the whole model. At initialization, the weights and biases of all layers are set randomly. During training, the parameters of the model () are updated such that they minimize a differentiable loss function of the training set, which as we will see later, in our case will be the trace or Hilbert–Schmidt distance. This is done by first evaluating the model for a batchof inputs, and then by slightly updating the parameters via a method called backpropagation, which relies on the gradient of the loss function with respect to the model parameters. This is repeated for many batches, until the model converges, a maximum training time is reached, or a satisfactory loss is achieved. Once trained, the neural network can be evaluated on new input instances.
Iv Neural networks as separable states
The task is to find the closest separable state to a given target density matrix. The central idea of this work is to use a neural network as a variational ansatz for the density matrix by representing the local components of the separable decomposition with a single neural network. The approach is inspired by a similar approach taken for nonlocality, where neural networks represent the local components of a Belllocal behavior Kriváchy et al. (2020).
In order to demonstrate the method, let us examine the example of a bipartite 2qubit state. We ask a neural network to represent the map
(9) 
where we take to be pure states, with . That is, the neural network will take as input an integer value between and (in a onehot representation), and will output the numbers , such that normalization for each subsystem is satisfied. Note that for each complex number, two real numbers are output, the real and imaginary part. We evaluate the neural network for values of , normalize the probability vector and sum up the outputs in order to construct a separable state via Eq. (1), namely . The neural network is trained to minimize the distance between the target density matrix and the constructed separable density matrix , i.e. . The process is roughly illustrated in Fig. 1, where the are not shown explicitly.
By construction the neural network represents a single density matrix , so for each target state , the network must be retrained in order to obtain an approximation of the closest separable state to that target state. During training, requiring values of in order to evaluate the state technically means working with a batch size of size . That is, we evaluate inputs () in order to construct and only then calculate the gradients required for the optimization of the neural network. A crucial point of the method is that a the size of the neural network depends only on the number of parties and the local Hilbert space dimensions, and not on the number of elements in the decomposition, .
More generally, for more parties or higher dimensions, the neural network represents the map
(10) 
where we take the () to be pure, and the neural network explicitly outputs the parameters of the pure states. By evaluating this neural network for values of , we construct a separable state via either Eq. (1) for the bipartite case (n=2), or any of Eqs. (2,3,4) for the different notions of multipartite separability. Recall that by Caratheodory’s theorem, in principle the largest needed is , however even less could be sufficient. Thus we keep
as a free hyperparameter, which we set before training begins. More technical details on the neural networks we used can be found in App.
D or in the sample code provided in the Code Availability section.The neural network is optimized in the highdimensional nonconvex landscape of the network’s weights, so it is not guaranteed to converge to the optimal solution. However, in practice, optimization procedures based on gradient descent reach closetooptimal solutions efficiently. Notice, that even for suboptimal solutions we obtain an upper bound on the amount of entanglement of the target state, since the utilized distances serve as entanglement measures Vedral et al. (1997); Vedral and Plenio (1998); Horodecki et al. (2000). However, we can go one step further, and examine families of states parametrized by a single parameter, which we refer to as , typically of the form
(11) 
where is an entangled state and is a separable state, oftentimes the maximally mixed state. If is truly entangled, then when decreasing , for some value we will cross the separability boundary. We can observe this transition by varying and retraining the neural network from scratch for each target distribution. An approximation of becomes clear from how close the algorithm can get to the target states for different values.
V Results
In order to benchmark the method, we first use the algorithm to examine the separability boundary for some exemplary families of bipartite states, including an example where the partial transpose criterion is inconclusive. Then, we examine some multipartite cases, up to 4 parties, where many things are still unknown about the separability boundary even for quintessential cases, such as GHZ and W states mixed with white noise. We provide numeric estimates on these thresholds. Additionally, in Appendix
Awe compare our neural network algorithm to a naive gradientdescent based heuristic to show its advantage. In Appendix
B, we provide analytic guesses of the closest separable state to the Bell state based on the numerical results, and in Appendix C we examine the performance of the algorithm on random bipartite density matrices and conjecture an analytic ansatz of the closest separable state for 2qubit states, which we find to be very close in trace distance to the solutions found by the neural network, and prove a bound on the trace distance. The examples presented in the appendix are meant to serve as inspiration in how numeric techniques can aid analytic insight, particularly when constructing ansätze and conjectures.Werner states and isotropic states are highly symmetric bipartite states that are separable if and only if they have a positive partial transpose. They are defined for local systems of the same dimension, . Isotropic states are
(12) 
where is the identity operator on the joint space and is the canonical maximally entangled state
(13) 
where and are bases of and , respectively.
Werner states are defined as
(14) 
where
with the flip operator. Isotropic states are separable for , while Werner states for .
For both the isotropic and Werner states, we run the neural network independently for 11 values of , and additionally for the exact separability boundary value. The results for both the trace distance and HilbertSchmidt distance for are depicted in Fig. 2 (each line is plotted with its respective loss function, the trace or HilbertSchmidt distance). They confirm that the algorithm works properly in this regime, finding a sharp transition at the known separability thresholds. When making a linear fit to the data that is outside the seemingly flat separable region, we recover the thresholds with a precision of at least . To give an example of the running time on a personal computer, for isotropic states the training for a single target state for took at most 15 minutes, while for it took only at most 30 seconds^{2}^{2}2Timed with an Intel i78700k CPU @ 3.70 GHz with 6 cores (12 threads) and 16 GB RAM.. When the trace distance is found to be smaller than , we choose to stop the training, and conclude that the state to be separable. Otherwise we run the algorithm until the resulting trace distance converges, i.e. it doesn’t change more that
in one epoch.
Additionally, for we examine the Werner states, also plotted in Fig. 2. For such a large state, with , training took about 1 hours 15 minutes on a personal computer for a single epoch (3000 batches), which was reduced to 45 minutes when training on a GPU^{3}^{3}3Trained on a RTX3080 GPU with 10 GB memory.. Due to the increased runtime we only ran one epoch for each point in Fig. 2, and did not wait until convergence. We observe that the neural network struggles more in finding a closest separable state in the separable area, however it works remarkably well in the entangled regime, and still manages to give qualitatively interpretable results on where the entanglement boundary lies. For increased accuracy one could run the algorithm several times independently and take the smallest value for each , or one could run the algorithm with a larger batch size . For example for the separability boundary at , by using instead of 100, after 5 epochs (5 times 3000 batches), the trace distance reduced to 0.024 from the 0.045 seen in Fig. 2.
Before moving on to the multipartite setting, we consider another family of states from the bipartite scenario, introduced in Ref. Horodecki et al. (1999), however, we adopt the parametrization used in Ref. Mintert et al. (2005). This family of 2qutrit states exhibits bound entanglement, i.e. a PPT entangled region. The states are
(15) 
with , and , however, we only consider since the negative regime gives the same states up to permutations. It is known that is separable for , is PPT entangled for and is NPT entangled for . For several values of we train the neural network to approximate , and display the results in Fig. 2. We can see that by explicitly constructing the separable decomposition, our results are not sensitive to whether the partial transpose is positive or negative, and the neural network approach successfully identifies the separable and entangled regions.
We now consider three and four qubit multipartite states by examining the exemplary GHZ and W states, mixed with white noise. The GHZ state is
(16) 
while the W state is
(17) 
We mix both with the maximally mixed state as we did for the isotropic states in Eq. (12). For three qubits, we use the neural network to distinctly examine

full separability, as in Eq. (2) (),

biseparability with respect to a single partition (123), as in Eq. (3),

biseparability, as in Eq. (4),
and for the four qubits,

full separability, as in Eq. (2) (),

biseparability with respect to the partition (1234), as in Eq. (3),

biseparability with respect to the partition (1234), as in Eq. (3),

biseparability with respect to 2 vs. 2 partitions, i.e. as in Eq. (4), except all partitions are constrained to have 2 parties,

biseparability with respect to 1 vs. 3 partitions, i.e. as in Eq. (4), except all partitions are constrained to have 1 party (and thus the complements have 3 parties),

triseparability, as a generalization of Eq. (4), namely , with a partitioning of for each ,

biseparability, as in Eq. (4).
For the biseparable and triseparable case, on a technical level, for each we ask the neural network to output density matrices for all possible partitions, i.e. for each it actually outputs 3 terms at a time for the 3party case, and 6 terms for the 4party case.
3qubit GHZ  3qubit W  
Separability  Estimate  Previous bound  Estimate  Previous bound 
Full sep.  0.199  0.2* Dür and Cirac (2000)  0.188  0.178* Chen and Jiang (2020) 
123 sep.  0.198  0.2* Dür and Cirac (2000)  0.208  0.210 Szalay (2011) 
Bisep.  0.429  0.429* Gühne and Seevinck (2010)  0.480  0.479*Jungnitsch et al. (2011) 
We present the results in Fig. 3, except for 4qubit separability with respect to a fixed partition, in order to not overcrowd the figure, however, note that those results are qualitatively similar. The consistent straight lines formed from independent runs give us confidence that the algorithm works well for approximately detecting the separability boundaries.
From Fig. 3, we extract estimates of the separability bound by fitting linear curves to the data that is outside the seemingly flat separable region. We summarize these values them in Tables 1 and 2. With the flexibility of the current technique, we are able to quickly get estimates on the noise thresholds for many notions of separability, or alternatively, entanglement. In cases where the exact threshold is known, our estimate is close to it. Where the boundary is not known to be exact, we can see how close it is to being tight. We observe that in these cases (3qubit W separability w.r.t. a fixed partition, and 4qubit biseparability for W states), in fact the analytic upper bounds seem to be close to, or in fact, optimal. Finally, we establish estimates for many notions of separability, for which we did not find previous estimates or bounds in the literature, marked with a "?" in the tables.
4qubit GHZ  4qubit W  
Separability  Estimate  Previous bound  Estimate  Previous bound 
Full sep.  0.110  0.111* Dür and Cirac (2000)  0.087  0.093* Chen and Jiang (2020) 
(1234)  0.109  0.111* Dür and Cirac (2000)  0.108  ? 
(1234)  0.108  0.111* Dür and Cirac (2000)  0.123  ? 
2 vs. 2 sep.  0.271  ?  0.254  ? 
1 vs. 3 sep.  0.333  ?  0.453  ? 
Trisep.  0.194  ?  0.262  ? 
Bisep.  0.467  0.467* Jungnitsch et al. (2011)  0.473  0.474 Jungnitsch et al. (2011) 
Vi Conclusion and outlook
In summary, we have addressed the question of constructing the closest separable state to a given target state, by using a neural network as a compact model for separable states. We avoided the bottleneck of having to explicitly model many (up to ) separable pure states in a decomposition by using a single neural network to represent them all. We demonstrated that by training the model independently on multiple states from a family, we can identify the separability boundary well. We did this for examples where the boundaries are known, PPT entangled states, as well as 3 and 4party examples where there are still major gaps in our knowledge of the various separability boundaries. Additionally, in the Appendices, we provided examples of how to extract analytic guesses and insight from the numeric results for Bell states and for random 2qubit states. We provide an ansatz for the closest separable state to the Bell state, as well as a generic approximation to the closest separable state, and based on the random state numerics, we observe that both the trace distance and HilbertSchmidt distance of the closest separable states are bounded by the absolute value of the smallest eigenvalue of the partial transpose.
The technique presented here opens up avenues to a variety of numeric applications in quantum foundations. In particular, for any task with reasonable Hilbert space sizes, it is possible to optimize over the set of separable states, as long as the loss function is differentiable. Among other potential applications, it can be especially helpful for obtaining (estimates or bounds on) entanglement measures, measures of robustness, separable ground state energies, and with minor modifications can be easily adapted to finding the closest classical state. Moreover, a particularly fruitful avenue for research could be focused on combining our approach with other generative neural network approaches to quantum state representations, namely “quantum neural network states” Carleo and Troyer (2017); Melko et al. (2019), particularly their extension to density matrices Yoshioka and Hamazaki (2019); Hartmann and Carleo (2019); Nagy and Savona (2019); Vicentini et al. (2019). Using such an ansatz for the separability problem has been examined in Ref. Harney et al. (2021). Such prospects of further developing the algorithms give the promise of exciting novel numerical tools for a broad range of tasks, both for numerical work and gaining analytic insight.
Vii Code availibility
We have made sample code available at www.github.com/Antoine0Girardin/Neuralnetworkforseparabilityproblem.
Viii Acknowledgments
We thank Pavel Sekatski for discussions. We acknowledge financial support from the Swiss National Science Foundation (project and NCCR QSIT). TK additionally acknowledges funding from the Swiss National Science Foundation Doc.Mobility grant (project P1GEP2_199676).
Appendix A Comparing with gradient descent
In order to see the advantage of using a neural network, we compare our algorithm with the naive optimization algorithm of gradient descent, for the simplest case of two qubits.
We parametrize the quantum state in a similar way as in Eq. (9), i.e. the free parameters are the probabilities and the real and imaginary parts of the pure states composing the separable state according to Eq. (1), with . The gradient descent algorithm varies these parameters in order to minimize the trace distance with respect to a target state, which we chose to be the Bell state, namely Eq. (13), with . The gradient descent algorithm was run with an initial learning rate of 1, decreased by a factor of 0.98 each round for 250 rounds, and with a momentum factor of 0.2.
Recall that the neural network, even with one layer, did not have any trouble finding the closest separable state with a trace distance of 0.5. However, as shown in the left panel of Fig. 4, we notice that already for this simple case the gradient descent technique has difficulties in finding the closest state. Somewhat surprisingly, if only real numbers are chosen to represent the state, the gradient descent technique performs better and converges to a good solution. Note that for higher dimensions, e.g. , the realvalued gradient descent also has difficulties, as shown in the right panel of Fig. 4.
Appendix B Analyzing a Bell state
Though in the current work we primarily use the neural network technique to find transitions from separability to entanglement on families of states, it can just as well be used to study specific states. Here we will look at what closest separable state the neural network finds for the Bell state . We examine and deduce a family of closest separable states which are all at the same distance to . Based on our numerics, we believe the closest separable with respect to the trace distance to be at a distance of . However, we have not found this explicitly proven in the literature, even though the related concept of finding the closest classical state has been well studied Paula et al. (2013); Nakano et al. (2013); Aaronson et al. (2013).
When using the trace distance as the loss function of the neural network, it finds the separable state
(18) 
with , and small values. However, when using the Hilbert–Schmidt distance as the loss, the neural network converges to the , and solution. Both solutions have the same trace distance from the Bell state. From these two extremes, we constructed the ansatz (18) for the closest separable state and verify that for , and they indeed all give a trace distance of 0.5. We even go further and find other values of for which the trace distance is 0.5. For example if all parameters are set to be real, and , then it is a closest separable state for . The same hold for , (with all parameters real). Clearly there are countless others, but characterizing the whole range of values which give a trace distance of 0.5 is beyond the scope of this paper. Indeed this analysis stands here to show how one can gain insight by looking at the output state of the neural network.
Appendix C Random states
When benchmarking the method on random states, we noticed that there is a strong connection between the obtained trace distance of the closest separable state and the lowest eigenvalue of the partial transpose. In this section we first show benchmark results for the method on random twoqubit states (), where the PPT criteria clearly distinguishes entangled from separable states. We observe a strong correlation between the trace distance and Hilbert–Schmidt distance of the closest separable state and the smallest eigenvalue of the partial transpose of the state. Finally, we present an analytic ansatz of the closest separable state, based on the numerical results of the neural network and our intuition, which we numerically validate to be very close to the actual closest separable state.
In the twoqubit case the positive partial transpose criteria is a necessary and sufficient condition for separability. Thus, in Fig. 5 we plot the distance to the closest separable state obtained by the neural network against the smallest eigenvalue of the partial transpose, which we will refer to as . Using the trace distance as a loss, we tested 400 random states with the trace distance as a loss function, and 300 with the Hilbert–Schmidt distance as the loss (the neural network was retrained 5 times for each state and the lowest distance was kept).
First, we observe that the neural network achieves close to zero distance in the separable regime for all states. Clearly it can not and should not reach zero distance for entangled states (i.e. on the left side of the figures, where ). We observe a much stronger relation: in fact the Hilbert–Schmidt distances of the closest separable state seem to line up on a line with slope , while the trace distance results seem to be below this. We formulate these two observations; namely in the entangled regime, for ,
(19)  
(20) 
where we explicitly denoted which distance was minimized in the subscript of .
Finally, we provide an ansatz for the closest separable state with respect to the trace distance. Intuitively, we set the smallest eigenvalue of the partial trace to be 0 instead of negative, and adjust the others such that the trace remains unchanged.
Theorem 1.
Let be an entangled state whose partial transpose has an eigendecomposition of , with , where is the smallest eigenvalue (i.e. ). Then let our ansatz of the closest separable state be with and denoting the partial transpose of . If is a valid density matrix then
(21) 
Before proceeding to the proof, note that is only actually a separable density matrix if . However, only about of random states have a approximation which is not a valid separable density matrices. The trace distances of the approximations of the 400 random states examined previously are depicted in Fig. 5.
Proof.
Recall that , where is the set of eigenvalues of the difference. As a first step let us examine this difference.
where is the matrix with a single nonzero entry in its first position, and thus is the first column of .
In order to prove the theorem we must show that no matter what appears in the decomposition, the trace distance is bounded, namely that
(22) 
which, after canceling out , reads explicitly as
(23) 
where we have used the notation for the
th eigenvalue. Using that the identity matrix is jointly diagonalizable with
, the lefthand side becomes(24) 
where are the eigenvalues of in nondecreasing order. Notice that the partial transpose preserves the trace, so , since
is the (unitlength) first column of a unitary matrix. So essentially, we must maximize (
24) by distributing 1 among the four eigenvalues . Due to the absolute value, the value becomes a divider: eigenvalues below it should be as small as possible, while eigenvalues above it should be as large as possible. So we split the eigenvalues into two partsCasebycase we give an upper bound for Expression (24), based on the number of in . For any eigenvalues appearing in the absolute value just disappears when upper bounding Expression (24). So if , then
(25) 
If we have , then the best we can do is push down to be as negative as possible, so that the other eigenvalues can jointly be larger ( if ). Additionally, it is known that there can be at most one negative eigenvalue of the partial transposeRana (2013); Johnston and Kribs (2010), and that all eigenvalues are larger than Rana (2013), i.e.
(26)  
(27) 
Using the first, and that the eigenvalues sum to 1, we see that
(28)  
(29)  
(30)  
(31) 
Finally, note that it does not make sense to add more eigenvalues to , since if , then by Ineq. (27), namely that , we cannot increase the weight of , i.e. . So essentially we are in the same position as when , and thus the upper bound is 6. Placing this back in Expression(24), or Ineq. (23), we see that the theorem is proven. ∎
Appendix D Technical details of the utilized neural networks
The main idea of how we use neural networks can be found in the maintext, while the implemented code can be found in the online repository provided. Here, we briefly describe some of the technical details and hyperparameters that we used.
As described in the maintext we use a feedforward neural network to represent a generic separable state of a fixed dimension and separability structure. We use a multilayer perceptron with rectified linear units as activations, except in the final layer where we use sigmoid activations. The outputs are normalized via a softmax function for the probability vectors, and by dividing by the 2norm for the complex entries of the pure states. For the calculation in the maintext we employed a single hidden layer, with a width of 100, or 200 for more difficult calculations. The number of elements in the separable decomposition,
, is analytically upper bounded by , however in the implementation, typically gives satisfactory results and allows for much quicker training. For training we use the Adadelta optimizer.References
 Horodecki et al. (2009) R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, Quantum entanglement, Reviews of Modern Physics 81, 865 (2009).
 Gühne and Toth (2009) O. Gühne and G. Toth, Entanglement detection, Physics Reports 474, 1 (2009).
 Gurvits (2003) L. Gurvits, in Proceedings of the thirtyfifth annual ACM symposium on Theory of computing (Association for Computing Machinery, New York, NY, USA, 2003), STOC ’03, pp. 10–19.
 Gharibian (2010) S. Gharibian, Strong NPhardness of the quantum separability problem, Quantum Information and Computation 10, 343 (2010).
 Peres (1996) A. Peres, Separability Criterion for Density Matrices, Physical Review Letters 77, 1413 (1996).
 Horodecki et al. (1996) M. Horodecki, P. Horodecki, and R. Horodecki, Separability of mixed states: necessary and sufficient conditions, Physics Letters A 223, 1 (1996).
 Horodecki et al. (1998) M. Horodecki, P. Horodecki, and R. Horodecki, MixedState Entanglement and Distillation: Is there a “Bound Entanglement in Nature?, Physical Review Letters 80, 5239 (1998).
 Vedral et al. (1997) V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight, Quantifying Entanglement, Physical Review Letters 78, 2275 (1997).
 Pittenger and Rubin (2002) A. O. Pittenger and M. H. Rubin, Convexity and the separability problem of quantum mechanical density matrices, Linear Algebra and its Applications 346, 47 (2002).
 Bertlmann et al. (2002) R. A. Bertlmann, H. Narnhofer, and W. Thirring, Geometric picture of entanglement and Bell inequalities, Physical Review A 66, 032319 (2002).
 Pittenger and Rubin (2003) A. O. Pittenger and M. H. Rubin, Geometry of entanglement witnesses and local detection of entanglement, Physical Review A 67, 012327 (2003).
 Bertlmann et al. (2005) R. A. Bertlmann, K. Durstberger, B. C. Hiesmayr, and P. Krammer, Optimal entanglement witnesses for qubits and qutrits, Physical Review A 72, 052331 (2005).
 Bertlmann and Krammer (2008) R. A. Bertlmann and P. Krammer, Geometric entanglement witnesses and bound entanglement, Physical Review A 77, 024303 (2008).
 Kim et al. (2010) H. Kim, M.R. Hwang, E. Jung, and D. Park, Difficulties in analytic computation for relative entropy of entanglement, Physical Review A 81, 052325 (2010).
 Lewenstein and Sanpera (1998) M. Lewenstein and A. Sanpera, Separability and Entanglement of Composite Quantum Systems, Physical Review Letters 80, 2261 (1998).
 Ishizaka (2002) S. Ishizaka, The reduction of the closest disentangled states, Journal of Physics A: Mathematical and General 35, 8075 (2002).
 Hayashi et al. (2008) M. Hayashi, D. Markham, M. Murao, M. Owari, and S. Virmani, Entanglement of multipartystabilizer symmetric and antisymmetric states, Physical Review A 77, 012104 (2008).
 Hayashi et al. (2009) M. Hayashi, D. Markham, M. Murao, M. Owari, and S. Virmani, The geometric measure of entanglement for a symmetric pure state with nonnegative amplitudes, Journal of Mathematical Physics 50, 122104 (2009).
 Hübener et al. (2009) R. Hübener, M. Kleinmann, T.C. Wei, C. GonzálezGuillén, and O. Gühne, Geometric measure of entanglement for symmetric states, Physical Review A 80, 032324 (2009).
 Parashar and Rana (2011) P. Parashar and S. Rana, Entanglement and discord of the superposition of GreenbergerHorneZeilinger states, Physical Review A 83, 032301 (2011).
 Carrington et al. (2015) M. E. Carrington, G. Kunstatter, J. Perron, and S. Plosker, On the geometric measure of entanglement for pure states, Journal of Physics A: Mathematical and Theoretical 48, 435302 (2015).
 Quesada and Sanpera (2014) R. Quesada and A. Sanpera, Best separable approximation of multipartite diagonal symmetric states, Physical Review A 89, 052319 (2014).
 Akulin et al. (2015) V. M. Akulin, G. A. Kabatiansky, and A. Mandilara, Essentially entangled component of multipartite mixed quantum states its properties and an efficient algorithm for its extraction, Physical Review A 92, 042322 (2015).
 Rodriques et al. (2014) S. Rodriques, N. Datta, and P. Love, Bounding polynomial entanglement measures for mixed states, Physical Review A 90, 012340 (2014).
 Wang (2017) B. Wang, Learning to Detect Entanglement, arXiv:1709.03617 [quantph] (2017).
 Yosefpor et al. (2020) M. Yosefpor, M. R. Mostaan, and S. Raeisi, Finding semioptimal measurements for entanglement detection using autoencoder neural networks, Quantum Science and Technology 5, 045006 (2020).
 Lu et al. (2018) S. Lu, S. Huang, K. Li, J. Li, J. Chen, D. Lu, Z. Ji, Y. Shen, D. Zhou, and B. Zeng, Separabilityentanglement classifier via machine learning, Physical Review A 98, 012315 (2018).
 Gao et al. (2018) J. Gao, L.F. Qiao, Z.Q. Jiao, Y.C. Ma, C.Q. Hu, R.J. Ren, A.L. Yang, H. Tang, M.H. Yung, and X.M. Jin, Experimental Machine Learning of Quantum States, Physical Review Letters 120, 240501 (2018).
 Ma and Yung (2018) Y.C. Ma and M.H. Yung, Transforming Bell’s inequalities into state classifiers with machine learning, npj Quantum Information 4, 1 (2018).
 Gray et al. (2018) J. Gray, L. Banchi, A. Bayat, and S. Bose, MachineLearningAssisted ManyBody Entanglement Measurement, Physical Review Letters 121, 150503 (2018).
 Yang et al. (2019) M. Yang, C.l. Ren, Y.c. Ma, Y. Xiao, X.J. Ye, L.L. Song, J.S. Xu, M.H. Yung, C.F. Li, and G.C. Guo, Experimental Simultaneous Learning of Multiple Nonclassical Correlations, Physical Review Letters 123, 190401 (2019).
 Goes et al. (2021) C. B. D. Goes, A. Canabarro, E. I. Duzzioni, and T. O. Maciel, Automated machine learning can classify bound entangled states with tomograms, Quantum Information Processing 20, 99 (2021).
 Ren and Chen (2019) C. Ren and C. Chen, Steerability detection of an arbitrary twoqubit state via machine learning, Physical Review A 100, 022314 (2019).
 Harney et al. (2020) C. Harney, S. Pirandola, A. Ferraro, and M. Paternostro, Entanglement classification via neural network quantum states, New Journal of Physics 22, 045001 (2020).
 Harney et al. (2021) C. Harney, M. Paternostro, and S. Pirandola, Mixed state entanglement classification using artificial neural networks, New Journal of Physics 23, 063033 (2021).
 Carleo and Troyer (2017) G. Carleo and M. Troyer, Solving the quantum manybody problem with artificial neural networks, Science 355, 602 (2017).
 Melko et al. (2019) R. G. Melko, G. Carleo, J. Carrasquilla, and J. I. Cirac, Restricted Boltzmann machines in quantum physics, Nature Physics 15, 887 (2019).
 Yoshioka and Hamazaki (2019) N. Yoshioka and R. Hamazaki, Constructing neural stationary states for open quantum manybody systems, Physical Review B 99, 214306 (2019).
 Hartmann and Carleo (2019) M. J. Hartmann and G. Carleo, NeuralNetwork Approach to Dissipative Quantum ManyBody Dynamics, Physical Review Letters 122, 250502 (2019).
 Nagy and Savona (2019) A. Nagy and V. Savona, Variational Quantum Monte Carlo Method with a NeuralNetwork Ansatz for Open Quantum Systems, Physical Review Letters 122, 250501 (2019).
 Vicentini et al. (2019) F. Vicentini, A. Biella, N. Regnault, and C. Ciuti, Variational NeuralNetwork Ansatz for Steady States in Open Quantum Systems, Physical Review Letters 122, 250503 (2019).
 Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).
 Eisert et al. (2003) J. Eisert, K. Audenaert, and M. B. Plenio, Remarks on entanglement measures and nonlocal state distinguishability, Journal of Physics A: Mathematical and General 36, 5605 (2003).
 Aaronson et al. (2013) B. Aaronson, R. L. Franco, G. Compagno, and G. Adesso, Hierarchy and dynamics of trace distance correlations, New Journal of Physics 15, 093022 (2013).
 Paula et al. (2013) F. M. Paula, T. R. de Oliveira, and M. S. Sarandy, Geometric quantum discord through the Schatten 1norm, Physical Review A 87, 064101 (2013).
 Nakano et al. (2013) T. Nakano, M. Piani, and G. Adesso, Negativity of quantumness and its interpretations, Physical Review A 88, 012117 (2013).
 Modi et al. (2010) K. Modi, T. Paterek, W. Son, V. Vedral, and M. Williamson, Unified View of Quantum and Classical Correlations, Physical Review Letters 104, 080501 (2010).
 Bellomo et al. (2012) B. Bellomo, G. L. Giorgi, F. Galve, R. Lo Franco, G. Compagno, and R. Zambrini, Unified view of correlations using the squarenorm distance, Physical Review A 85, 032104 (2012).
 Vedral and Plenio (1998) V. Vedral and M. B. Plenio, Entanglement measures and purification procedures, Physical Review A 57, 1619 (1998).
 Witte and Trucks (1999) C. Witte and M. Trucks, A new entanglement measure induced by the HilbertSchmidt norm, Physics Letters A 257, 14 (1999).
 Krammer (2009) P. Krammer, Characterizing entanglement with geometric entanglement witnesses, Journal of Physics A: Mathematical and Theoretical 42, 065305 (2009).
 Vidal and Tarrach (1999) G. Vidal and R. Tarrach, Robustness of entanglement, Physical Review A 59, 141 (1999).
 Zyczkowski and Bengtsson (2006) K. Zyczkowski and I. Bengtsson, An Introduction to Quantum Entanglement: a Geometric Approach, arXiv:quantph/0606228 (2006).
 Kriváchy et al. (2020) T. Kriváchy, Y. Cai, D. Cavalcanti, A. Tavakoli, N. Gisin, and N. Brunner, A neural network oracle for quantum nonlocality problems in networks, npj Quantum Information 6, 1 (2020).
 Horodecki et al. (2000) P. Horodecki, M. Lewenstein, G. Vidal, and I. Cirac, Operational criterion and constructive checks for the separability of lowrank density matrices, Physical Review A 62, 032310 (2000).
 Horodecki et al. (1999) P. Horodecki, M. Horodecki, and R. Horodecki, Bound Entanglement Can Be Activated, Physical Review Letters 82, 1056 (1999).
 Mintert et al. (2005) F. Mintert, A. R. R. Carvalho, M. Kuś, and A. Buchleitner, Measures and dynamics of entangled states, Physics Reports 415, 207 (2005).
 Dür and Cirac (2000) W. Dür and J. I. Cirac, Classification of multiqubit mixed states: Separability and distillability properties, Physical Review A 61, 042314 (2000).
 Chen and Jiang (2020) X.y. Chen and L.z. Jiang, Noise tolerance of Dicke states, Physical Review A 101, 012308 (2020).
 Szalay (2011) S. Szalay, Separability criteria for mixed threequbit states, Physical Review A 83, 062337 (2011).
 Gühne and Seevinck (2010) O. Gühne and M. Seevinck, Separability criteria for genuine multiparticle entanglement, New Journal of Physics 12, 053002 (2010).
 Jungnitsch et al. (2011) B. Jungnitsch, T. Moroder, and O. Gühne, Taming Multiparticle Entanglement, Physical Review Letters 106, 190502 (2011).
 Rana (2013) S. Rana, Negative eigenvalues of partial transposition of arbitrary bipartite states, Physical Review A 87, 054301 (2013).
 Johnston and Kribs (2010) N. Johnston and D. W. Kribs, A family of norms with applications in quantum information theory, Journal of Mathematical Physics 51, 082202 (2010).