Nick von Selzamnick.von-selzam@mpl.mpg.deMax Planck Institute for the Science of Light, 91058 Erlangen, Germany Florian MarquardtMax Planck Institute for the Science of Light, 91058 Erlangen, GermanyDepartment of Physics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058 Erlangen, Germany

(July 5, 2024)

###### Abstract

Measurement correlations in quantum systems can exhibit non-local behavior, a fundamental aspect of quantum mechanics with applications such as device-independent quantum information processing. However, the explicit construction of local hidden-variable (LHV) models remains an outstanding challenge in the general setting. To address this, we develop an approach that employs gradient-descent algorithms from machine learning to find LHV models which reproduce the statistics of arbitrary measurements for quantum many-body states. In contrast to previous approaches, our method employs a general ansatz, enabling it to discover an LHV model in all cases where the state is local. Therefore, it provides actual estimates for the critical noise levels at which two-qubit Werner states and three-qubit GHZ and W states become non-local. Furthermore, we find evidence suggesting that two-spin subsystems in the ground states of translationally invariant Hamiltonians are local, while bigger subsystems are in general not. Our method now offers a quantitative tool for determining the regimes of non-locality in any given physical context, including scenarios involving non-equilibrium and decoherence.

## I Introduction

Bell famously showed that quantum mechanics predicts states which display non-local measurement correlations [1]. By now this has also been confirmed experimentally [2, 3, 4, 5, 6, 7, 8, 9]. It remains a difficult task to determine whether any given state is local or not, though. Even for the simple example of two-qubit Werner states [10] – mixtures of the Bell singlet with white noise – the critical visibility at which these switch from being local to non-local is not known. While entanglement is necessary for non-locality [10] and pure states are non-local if and only if they are entangled [11, 12], there are entangled mixed states which nevertheless remain local [10]. That is, entanglement and non-locality are inequivalent resources [13, 14].

In this work, we present a general approach to solve the challenge of determining whether a given quantum state is local. Specifically, we use machine learning methods to discover local hidden-variable (LHV) models which reproduce the measurement statistics of quantum many-body states.

For fixed finite measurement options one would obtain a finite set of correlations. Locality of these correlations is fairly well understood and there exist efficient algorithms which determine whether such correlations are local or not (see e.g. [15] and references therein). However, to reliably decide whether a given state is local, we are interested in the significantly more general case of an infinite continuum of measurement options (e.g., all projective measurements). We demonstrate locality by explicitly constructing local models that take into account this whole continuum. Only checking particular Bell inequalities, which is often done in studies of non-locality in quantum many-body systems, is not sufficient to solve this challenge.

To demonstrate the broad applicability of our approach we illustrate it within two different research areas.From the quantum information perspective the noise robustness of non-locality is of interest. Importantly, there is a series of works establishing bounds on the critical visibility of the two-qubit Werner states [2, 10, 16, 17, 18, 19, 20, 21, 22]. With our method we significantly improve on this by providing an actual estimate for the critical visibility. Likewise, we are able to obtain estimates for the critical visibilities of noisy three-qubit GHZ and W states.

From the condensed matter perspective the study of non-locality in quantum many-body systems has long been of interest [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] (see [36] for a review). It has been demonstrated that for some systems simple Bell inequalities, such as the CHSH inequality, are not violated in the ground state or even during non-equilibrium time evolution [23, 24]. In fact, there exists a ‘monogamy of Bell correlations’relation [37] which implies that the CHSH Bell-inequality is never violated in translation-invariant systems [38, 39]. We can now take the same kind of systems and analyze whether these states are also genuinely local. Moreover, we are able to go beyond two-spin subsystems and study locality of larger subsystems for which even less is known.

We mention some related approaches:In [40] restricted Boltzmann machines and reinforcement learning are used to find maximal violations of a given Bell inequality. In [41] neural networks together with genetic algorithms are used to predict a non-locality quantifier for given finite correlations. In [42] neural networks are used to construct LHV models in network scenarios for finite measurement settings.In [43] a similar task is addressed without neural networks.In [44, 45, 22] locality of states is analyzed using a combination of numerical and analytic methods. There are also some purely analytic results [46]. To the best of our knowledge all these approaches have at least one of the following shortcomings: They only allow for a finite set of measurement settings, they only consider a single Bell inequality or they are specialized to specific states. Additionally, they are often not constructive. With our numerical scheme we are able to overcome all of these restrictions simultaneously.

## II Physical LHV Models

An LHV model describes a particular form of joint measurement statistics. For$N$ parties the probabilities for obtaining measurement results$a=(a_{1},\ldots,a_{N})$ given measurement settings$x=(x_{1},\ldots,x_{N})$ are (Bell-)local if they can be written as [1]

$\displaystyle P^{LHV}(a\,|\,x)=\int_{\Lambda}\text{d}\lambda\,p(\lambda)\prod_%{j=1}^{N}q_{j}(a_{j}|x_{j},\lambda).$ | (1) |

Here,$\lambda\in\Lambda$ is the hidden variable with some distribution$p(\lambda)$. The *measurement rule*$q_{j}(a_{j}|x_{j},\lambda)$ is the probability for the$j^{\text{th}}$ party to measure$a_{j}$ given their input$x_{j}$ and a fixed value of the hidden variable.

Our goal is to match $P^{LHV}$ to the quantum measurement statistics of an arbitrary$N$-particle state$\rho$. In that scenario, we may reformulate (1) in an equivalent but more physical way, where each particle carries its own hidden variable. As we will show below, this allows us to set up a description where the measurement rule is independent of the underlying quantum state $\rho$, as would be plausibly expected. Mathematically, each particle$j$ is now described by a local hidden state$\lambda_{j}$ as indicated in Fig.1 (a). Upon measurement the measurement rule$q_{j}$ describes the outcome as visualized in Fig.1 (b). Information about the state$\rho$ is only encoded in the *hidden-state distribution*$p(\lambda_{1},\ldots,\lambda_{N})$.

Typically, the particles will be of the same type. In this case the measurement in- and outputs are all of the same form. For example, in the case of spin-$1/2$ particles the measurement choice is a normal vector – e.g., the orientation of a Stern-Gerlach magnet – and the possible results are ‘up’or ‘down’. This is the same for every spin. Therefore, the hidden states are now of the same form as well,$\lambda_{j}\in\Lambda$. Moreover, the measurement rule does not depend on the particle anymore – it makes no difference whether we measure the first or the second spin. We have the following equivalent definition of local correlations which incorporates all of this, with$\lambda=(\lambda_{1},\ldots,\lambda_{N})$

$\displaystyle P^{LHV}(a\,|\,x)=\int_{\Lambda^{N}}\text{d}\lambda\,p(\lambda)%\prod_{j=1}^{N}q(a_{j}|x_{j},\lambda_{j})$ | (2) |

(see Appendix F).

In the most general case the inputs$x_{j}$ correspond to arbitrary POVM measurements. The quantum mechanical measurement probabilities depend only on the outcome (they are non-contextual): Suppose the measurement operator representing the outcome for particle$j$ is$a_{j}$, $a=(a_{1},\ldots,a_{N})$ (for projective measurements these would be projection operators). Then

$\displaystyle P^{QM}_{\rho}(a|x)=\Tr(a_{1}\otimes\cdots\otimes a_{N}\cdot\rho).$ | (3) |

For all possible measurements the LHV prediction should match the quantum mechanical one. It follows that we may restrict ourselves to non-contextual measurement rules (see Appendix G). Once we fix a sufficiently general measurement rule it is an optimization task to find the best hidden-state distribution given any quantum state

$\displaystyle\rho\xmapsto{?}p(\lambda).$ | (4) |

Note that there are hidden-state distributions not corresponding to any quantum state (see Appendix E).

## III Optimization

We have set up the structure of our LHV models. Now given an $N$-particle state$\rho$, how do we optimize the hidden-state distribution$p(\lambda_{1},\ldots,\lambda_{N})$ such that the resulting measurement statistics (eq.(2)) are as close as possible to the quantum mechanical ones (eq.(3))? Borrowing some machine learning terminology, we achieve this by minimising a scalar loss function: a distance measure between$P^{QM}$ and$P^{LHV}$. This includes the whole continuum of possible measurements$x$. Since the method should not be tailored to specific states, all measurement choices are equally important. Therefore, we propose to choose a measure of distanceD between$P^{QM}$ and$P^{LHV}$ for fixed measurement choices and then average over all of them. The resulting deviation between LHV model and QM is of the form

$\displaystyle\mathcal{L}({LHV}\,||\,{QM})=\left\langle\text{D}\left(P^{QM}(%\cdot|x),P^{LHV}(\cdot|x)\right)\right\rangle_{x}.$ | (5) |

Since we are dealing with probabilities it is natural to choose the Kullback-Leibler divergence for the distance measureD.Note that there are many alternative choices for the loss function. If one is interested only in specific states or measurements, one may tailor it accordingly. In the case of local states a corresponding LHV model leads to a vanishing loss independent of the specific form. However, for non-local states only approximate LHV models can be found and the necessary compromises in the hidden-state distribution will depend on the chosen loss function.

What are the parameters we are optimizing?The hidden-state distribution$p(\lambda_{1},\ldots,\lambda_{N})$ for$N$ particles and$d$-dimensional hidden states ($\lambda_{j}\in\Lambda=\mathbb{R}^{d}$) is a function on$\mathbb{R}^{N\times d}$.It is infeasible to discretize this space for large$N$ or$d$. Therefore, we adopt a Monte-Carlo like approach: The distribution$p$ is represented by$N_{\text{h}}$ samples of hidden-state tuples$(\lambda_{1},\ldots,\lambda_{N})$.To minimize the loss function we vary the$N_{\text{h}}\cdot N\cdot d$ real parameters of this *hidden-state cloud* (see Fig.1 (c) for a visualization). Note that the average over all measurement settings $x$ in the loss function corresponds to an integral that we can not evaluate. As is common in the field of machine learning we approximate the exact average by repeatedly sampling finite batches of measurement combinations. The optimization is carried out via stochastic gradient descent.

Optimizing a particle cloud and sampling batches of arbitrarily chosen measurements are the two crucial technical innovations which allow us to go to $N$-particle states and continuous measurements.

We note that such methods – requiring efficient automatic differentiation with respect to millions of parameters – have only recently become routinely available through frameworks such as TensorFlow, PyTorch, JAX and due to powerful GPUs.

## IV Spin-1/2 Systems

So far, the setup has been very general. We now make it more concrete by applying it to projective measurements of spin-$1/2$ systems, where we illustrate our approach in several examples. In this case the measurements correspond to asking whether the spin is ‘up’or ‘down’(possible measurement results$a$) along a direction$\hat{n}\in S^{2}$ (measurement choice $x$).The average over measurement settings in the loss function (eq.(5)) becomes an average over directions$(\hat{n}_{1},\ldots,\hat{n}_{N})\in(S^{2})^{N}$. We simply sample each direction uniformly and independently from the surface of a sphere. See Appendix H for the specific slightly simplified loss function we use in practice.

We now describe how to systematically obtain the most general measurement rule for spin-$1/2$ systems. Due to this generality there is no need for a dependence on the state$\rho$.With a hidden-state space$\Lambda$, the measurement rule becomes a function$q:S^{2}\times\Lambda\rightarrow[0,\,1]$. It describes the probability for measuring ‘up’along the direction$\hat{n}$ given a hidden state$\lambda$: $q({\hat{n}},\lambda)\equiv q(`\rm\text{up}^{\prime}|x={\hat{n}},\lambda)$.It is known that deterministic rules,$q\in\{0,1\}$, already capture everything (see e.g. [15]). Therefore, the most general rule can be written as

$\displaystyle q(\hat{n},\lambda)=H(f_{\lambda}(\hat{n})),$ | (6) |

where$H$ is the Heaviside step function and the$f_{\lambda}$ are arbitrary real-valued functions defined on the surface of a sphere. Without loss of generality, we can impose (see Appendix G)

$\displaystyle q(-\hat{n},\lambda)=1-q(\hat{n},\lambda).$ | (7) |

That is, the$f_{\lambda}$ are odd functions. We may expand them into (real) odd spherical harmonics (these were also used in a similar but more handcrafted way in [46] for a specific class of states) . Without loss of generality, the expansion coefficients can be taken to become the components of the hidden-state vector$\vec{\lambda}\in\mathbb{R}^{\mathbb{N}}$. If$\vec{S}(\hat{n})$ is a vector containing all odd spherical harmonics in$\hat{n}\in S^{2}$, this can formally be written as

$\displaystyle q(\hat{n},\vec{\lambda})=H(\vec{S}(\hat{n})\cdot\vec{\lambda}).$ | (8) |

See Appendix I for a generalization to POVM measurements.

For the numerical implementation we modify two things: First, we introduce a cutoff in the spherical harmonics expansion, writing$\vec{S}_{D}(\hat{n})$ for a vector containing all odd spherical harmonics up to degree$D$. For example (up to normalization factors)

$\displaystyle\vec{S}_{3}(\hat{n})\sim(x,y,z,xyz,\ldots,3x^{2}y-y^{3}),\quad(x,%y,z)=\hat{n}.$ | (9) |

This leads to finite-dimensional hidden variables$\vec{\lambda}\in\mathbb{R}^{d}$, with$d=\tfrac{1}{2}(D+1)(D+2)$.Second, for the numerical optimization we require non-vanishing gradients with respect to the hidden-state vectors. To achieve this, we replace the Heaviside function by a sigmoid function, bringing us back to probabilistic measurement rules:

$\displaystyle q(\hat{n},\vec{\lambda})=\sigma(\vec{S}_{D}(\hat{n})\cdot\vec{%\lambda}),\qquad\sigma(x)=\frac{1}{1+e^{-x}}.$ | (10) |

For large norms$\norm*{\vec{\lambda}}\to\infty$ we recover the deterministic case.The choice$D=1$ would lead to

$\displaystyle q(\hat{n},\hat{\lambda})=H(\hat{n}\cdot\hat{\lambda}),\qquad\hat%{\lambda}\in S^{2},$ | (11) |

which was already considered by Bell in his original paper [1]. In Appendix J we show that this ‘Bell’s rule’at least can represent all separable$N$-spin-$1/2$ states.In all of our examples, including complex many-body states, we find that$D=5$ already yields results of high accuracy.

We now apply our framework to illustrative examples of states, both for elementary entangled states and for ground states of strongly correlated quantum many-body systems.

## V Werner States

Werner states were introduced to show the existence of entangled states which are local [10]. For two spins they are mixtures of the Bell singlet with the maximally mixed state

$\displaystyle\rho_{v}=v\outerproduct{\psi}{\psi}+(1-v)\frac{\text{1}}{4},\quad%\ket{\psi}=\frac{1}{\sqrt{2}}\left(\ket{\uparrow\downarrow}-\ket{\downarrow%\uparrow}\right).$ | (12) |

The parameter$v\in[0,\,1]$ is the visibility of the singlet. It is known that these states are separable for$v\leq\frac{1}{3}$ and entangled otherwise. Werner showed that$\rho_{\frac{1}{2}}$ is still a local state for all projective measurements. Up to the present day, the critical visibility$v_{c}$ at which the Werner states cease to be local is unknown. The best bounds so far were obtained in [22]

$\displaystyle 0.6875\leq v_{C}\leq 0.6961.$ | (13) |

Note that the CHSH inequality is only violated for$v>\frac{1}{\sqrt{2}}\approx 0.7071$, giving a comparatively loose upper bound.

We now apply our algorithm. That is, we optimize hidden-state distributions for Werner states with different visibilities, once with Bell’s measurement rule ($D$=1) and once with the general spherical-harmonics rule up to degree five ($D=5$).We monitor the loss during the optimization procedure to ensure convergence (Fig.2, (a)). The optimized deviations as a function of the visibility are shown in Fig.2 (b). While Bell’s rule is only expressive enough to represent Werner states up to$v\approx 1/2$, the more general rule can represent all Werner states which are already known to be local. Moreover, it is able to find LHV models beyond the previously known range, up to a visibility of

$\displaystyle v_{C}\approx 0.691.$ | (14) |

This, for the first time, provides an actual numerical estimate for the critical visibility, as opposed to merely lower and upper bounds. In Appendix C we also obtain estimates for the critical visibilities of noisy three-qubit GHZ and W states.

In order to gauge the precision of these results, we have performed the following additional steps (see Appendix A):We explain the particular non-zero value of the loss in the local region. For single spin states as well as planar measurements of Werner states we validate that our method agrees with the analytic results. And finally, we observe convergence of the results (e.g., the estimate for the critical visibility) with increasing hidden-state cloud size$N_{\text{h}}$ and increasing maximal degree$D$ in the spherical-harmonics expansion.

We note that the algorithm itself can likely be improved. In particular, there are more sophisticated ways to represent high-dimensional probability distributions such as the hidden-state distribution, for example via generative neural networks. In Appendix B we give details on the complexity and run-time of the algorithm in terms of the hyper parameters.

## VI XXZ Models

We now turn to quantum many-body systems. In this setting even less is known about locality and researchers have studied mostly the violation of individual Bell inequalities. Local models for$N$-spin states have been constructed only in very special cases (see e.g. [47] or [14]).

As an example we analyze locality in the XXZ model, first for two-spin subsystems and then for$N$-spin subsystems up to$N=6$. The Hamiltonian is

$\displaystyle H_{XXZ}=\sum_{\langle j,k\rangle}\left(S^{x}_{j}S^{x}_{k}+S^{y}_%{j}S^{y}_{k}+\Delta S^{z}_{j}S^{z}_{k}\right),$ | (15) |

where$S_{j}^{\alpha}$ is the spin operator in direction$\alpha\in\{x,y,z\}$ at site$j$ and$\Delta$ is the anisotropy parameter.

A ground state respects the symmetries of the Hamiltonian (except in cases of degeneracies).This allows to parameterize all its possible two-spin subsystem states by the correlators$\langle\sigma_{x}\otimes\sigma_{x}\rangle$ and$\langle\sigma_{z}\otimes\sigma_{z}\rangle$ [48]. As a function of these two parameters it is known which correlations correspond to separable or entangled states [48]. We also know for which parameters the corresponding state violates the CHSH Bell-inequality [24].However, it is an open problem to decide which ones are local and which ones are not. We optimize the hidden-state distribution to find LHV models for the whole range of parameters. The optimized deviations are shown in Figure3.We find that the CHSH Bell-inequality is almost tight in this case: Most states not violating it can be represented by an LHV model. In particular, all states which are actually realized by infinite XXZ lattice models (that is, those which not only have the correct symmetries but are also partial traces of infinite translation-invariant states) are local. In Appendix D we provide more support for this claim.

Our numerical approach allows us to construct LHV models for more than two spins, which is crucial to acquire a complete understanding of the locality properties of quantum many-body states. To study this regime in an example, we numerically obtain the ground state of a finite one-dimensional XXZ chain with$22$spins and optimize LHV models for sub-chains of$N=2,\ldots,6$ neighboring spins. This is shown in Fig.4. The two-spin subsystems are local for all choices of the anisotropy parameter$\Delta$, consistent with the results above. The three-spin subsystems are local for almost all$\Delta$. For$N\geq 4$ the correlations are essentially always non-local unless the full ground state is separable.In Appendix D we also show how the$N$-spin LHV models perform when restricted to a two-spin subsystem.

## VII Conclusion

We have introduced an algorithm for the automatic construction of LHV models which reproduce the measurement statistics of quantum many-body states. Importantly, this includes arbitrary continuous measurement settings. We demonstrated the effectiveness of the algorithm by applying it to projective measurements of spin-$1/2$ systems. For two-qubit Werner states as well as noisy three-qubit GHZ and W states we were able for the first time to present estimates for their critical visibilities. Furthermore, we established that two-spin subsystems of the ground states in nearest-neighbor XXZ lattice models are always local. On the other hand, $N$-spin sub-chains in a one dimensional chain appear to be non-local for$N\geq 4$, in particular around the gapless-to-anti-ferromagnetic phase transition.

We speculate that these properties are not specific to XXZ models. As mentioned before, it is already known that two-spin subsystems in the ground state of translationally invariant Hamiltonians never violate the CHSH inequality. Based on our results one may expect them to be genuinely local as well. On the other hand, we observe that larger subsystems are in general not local, unless the ground state is separable.

While the method is numerical and therefore approximate in nature, it can be made arbitrarily accurate in a systematic way. The local measurement rule becomes more and more general as one increases the maximal degree$D$ of the spherical harmonics; increasing the hidden-state cloud size$N_{\text{h}}$ allows us to represent ever more complex hidden-state distributions accurately. Taking both limits together we can represent arbitrary LHV models.

Our algorithm readily enables the study of locality in all kinds of situations. For example, one could calculate the critical temperature at which thermal states of many-body systems become local. Also, when comparing against analytical methods, we are not restricted to ‘nice’states such as those with a high degree of symmetry. Hence, it is now straightforward to obtain the critical visibilities for more complicated states or noise channels more elaborate than the fully depolarizing noise studied for Werner states. Finally, in special situations the CHSH inequality is not violated even during non-equilibrium time evolution. With the help of the method developed here, it is now possible to answer the question whether the corresponding states are also local.

## VIII Data and Code Availability

The implementation of the algorithm in Python (we use JAX) along with an example Jupyter notebook demonstrating its application is available on GitHub [49]. There we also provide the figures and corresponding data files.

## References

- Bell [1964]J.S.Bell,On the Einstein PodolskyRosen Paradox,Physics Physique Fizika1,195–200 (1964).
- Clauser
*etal.*[1969]J.F.Clauser, M.A.Horne,A.Shimony,andR.A.Holt,Proposed Experiment to Test Local Hidden-VariableTheories,Physical Review Letters23,880–884 (1969). - Aspect
*etal.*[1982]A.Aspect, J.Dalibard,andG.Roger,Experimental Test of Bell’s Inequalities UsingTime-Varying Analyzers,Physical Review Letters49,1804–1807 (1982). - Weihs
*etal.*[1998]G.Weihs, T.Jennewein,C.Simon, H.Weinfurter,andA.Zeilinger,Violation of Bell’s Inequality under Strict EinsteinLocality Conditions,Physical Review Letters81,5039–5043 (1998). - Christensen
*etal.*[2013]B.G.Christensen, K.T.McCusker, J.B.Altepeter, B.Calkins,T.Gerrits, A.E.Lita, A.Miller, L.K.Shalm, Y.Zhang, S.W.Nam, N.Brunner, C.C.W.Lim,N.Gisin,andP.G.Kwiat,Detection-Loophole-Free Test of Quantum Nonlocality, andApplications,Physical Review Letters111,130406 (2013). - Hensen
*etal.*[2015]B.Hensen, H.Bernien,A.E.Dréau, A.Reiserer, N.Kalb, M.S.Blok, J.Ruitenberg, R.F.L.Vermeulen, R.N.Schouten, C.Abellán, W.Amaya,V.Pruneri, M.W.Mitchell, M.Markham, D.J.Twitchen, D.Elkouss, S.Wehner, T.H.Taminiau,andR.Hanson,Loophole-free Bell inequality violation using electron spins separated by1.3 kilometres,Nature526,682–686 (2015). - Giustina
*etal.*[2015]M.Giustina, M.A.M.Versteegh, S.Wengerowsky, J.Handsteiner, A.Hochrainer, K.Phelan,F.Steinlechner, J.Kofler, J.-A.Larsson, C.Abellán, W.Amaya, V.Pruneri, M.W.Mitchell, J.Beyer, T.Gerrits,A.E.Lita, L.K.Shalm, S.W.Nam, T.Scheidl, R.Ursin, B.Wittmann,andA.Zeilinger,Significant-Loophole-Free Test of Bell’s Theorem with EntangledPhotons,Physical Review Letters115,250401 (2015). - Shalm
*etal.*[2015]L.K.Shalm, E.Meyer-Scott,B.G.Christensen,P.Bierhorst, M.A.Wayne, M.J.Stevens, T.Gerrits, S.Glancy, D.R.Hamel, M.S.Allman, K.J.Coakley,S.D.Dyer, C.Hodge, A.E.Lita, V.B.Verma, C.Lambrocco, E.Tortorici, A.L.Migdall, Y.Zhang, D.R.Kumor, W.H.Farr, F.Marsili, M.D.Shaw,J.A.Stern, C.Abellán, W.Amaya, V.Pruneri, T.Jennewein, M.W.Mitchell, P.G.Kwiat, J.C.Bienfang, R.P.Mirin,E.Knill,andS.W.Nam,Strong Loophole-Free Test of Local Realism,Physical Review Letters115,250402 (2015). - Rosenfeld
*etal.*[2017]W.Rosenfeld, D.Burchardt, R.Garthoff,K.Redeker, N.Ortegel, M.Rau,andH.Weinfurter,Event-Ready Bell Test Using Entangled AtomsSimultaneously Closing Detection and Locality Loopholes,Physical Review Letters119,010402 (2017). - Werner [1989]R.F.Werner,Quantum states withEinstein-Podolsky-Rosen correlations admitting a hidden-variable model,Physical Review A40,4277–4281 (1989).
- PopescuandRohrlich [1992]S.PopescuandD.Rohrlich,Generic quantumnonlocality,Physics Letters A166,293–297 (1992).
- GisinandPeres [1992]N.GisinandA.Peres,Maximal violation of Bell’sinequality for arbitrarily large spin,Physics Letters A162,15–17 (1992).
- Augusiak
*etal.*[2015]R.Augusiak, M.Demianowicz, J.Tura,andA.Acín,Entanglement and Nonlocality are Inequivalentfor Any Number of Parties,Physical Review Letters115,030404 (2015). - Bowles
*etal.*[2016]J.Bowles, J.Francfort,M.Fillettaz, F.Hirsch,andN.Brunner,Genuinely Multipartite Entangled Quantum States with Fully LocalHidden Variable Models and Hidden Multipartite Nonlocality,Physical Review Letters116,130401 (2016). - Brunner
*etal.*[2014]N.Brunner, D.Cavalcanti,S.Pironio, V.Scarani,andS.Wehner,Bell nonlocality,Reviews of Modern Physics86,419–478 (2014). - Acín
*etal.*[2006]A.Acín, N.Gisin,andB.Toner,Grothendieck’s constant and local models fornoisy entangled quantum states,Phys. Rev. A73,062105 (2006). - Vértesi [2008]T.Vértesi,More efficient Bellinequalities for Werner states,Phys. Rev. A78,032112 (2008).
- Hua
*etal.*[2015]B.Hua, M.Li, T.Zhang, C.Zhou, X.Li-Jost,andS.-M.Fei,TowardsGrothendieck constants and LHV models in quantum mechanics,Journal of Physics A: Mathematical and Theoretical48,065302 (2015). - Brierley
*etal.*[2016]S.Brierley, M.Navascues,andT.Vertesi,Convex separation from convexoptimization for large-scale problems (2016),arXiv:1609.05011 [quant-ph] . - Diviánszky
*etal.*[2017]P.Diviánszky, E.Bene,andT.Vértesi,Qutrit witness from the Grothendieckconstant of order four,Phys. Rev. A96,012113 (2017). - Hirsch
*etal.*[2017]F.Hirsch, M.T.Quintino, T.Vértesi, M.Navascués,andN.Brunner,Better local hiddenvariable models for two-qubit Werner states and an upper bound on theGrothendieck constant $K_{G}(3)$,Quantum1,3 (2017). - Designolle
*etal.*[2023]S.Designolle, G.Iommazzo, M.Besançon,S.Knebel, P.Gelß,andS.Pokutta,Improved local models and new Bell inequalities via Frank-Wolfealgorithms,Physical Review Research5,043059 (2023). - BatleandCasas [2010]J.BatleandM.Casas,Nonlocality and entanglement in the$\mathit{XY}$ model,Physical Review A82,062101 (2010).
- JustinoanddeOliveira [2012]L.JustinoandT.R.deOliveira,Bell inequalitiesand entanglement at quantum phase transitions in the $\mathit{XXZ}$ model,Physical Review A85,052128 (2012).
- Sun
*etal.*[2015]Z.-Y.Sun, B.Guo,andH.-L.Huang,Global multipartite nonlocality and Bell-typeinequalities in infinite-size quantum spin chains,Physical Review A92,022120 (2015). - Batle
*etal.*[2017]J.Batle, M.Naseri,M.Ghoranneviss, A.Farouk, M.Alkhambashi,andM.Elhoseny,Shareability of correlations in multiqubit states: Optimization ofnonlocal monogamy inequalities,Physical Review A95,032123 (2017). - Wagner
*etal.*[2017]S.Wagner, R.Schmied,M.Fadel, P.Treutlein, N.Sangouard,andJ.-D.Bancal,Bell Correlations in a Many-Body System with FiniteStatistics,Physical Review Letters119,170403 (2017). - Wang
*etal.*[2017]Z.Wang, S.Singh,andM.Navascués,Entanglement and Nonlocality in Infinite 1DSystems,Physical Review Letters118,230401 (2017). - Getelina
*etal.*[2018]J.C.Getelina, T.R.deOliveira,andJ.A.Hoyos,Violation of the Bellinequality in quantum critical random spin-1/2 chains,Physics Letters A382,2799–2804 (2018). - Bao
*etal.*[2020]J.Bao, B.Guo, H.-G.Cheng, M.Zhou, J.Fu, Y.-C.Deng,andZ.-Y.Sun,Multipartitenonlocality in the Lipkin-Meshkov-Glick model,Physical Review A101,012110 (2020). - Cheng
*etal.*[2020]H.-G.Cheng, M.Li, Y.-Y.Wu, M.Wang, D.Zhang, J.Bao, B.Guo,andZ.-Y.Sun,Magnetic-field-induced oscillation ofmultipartite nonlocality in spin ladders,Physical Review A101,052116 (2020). - Sun
*etal.*[2021]Z.-Y.Sun, M.Li, L.-H.Sheng,andB.Guo,Multipartite nonlocality in one-dimensional quantum spinchains at finite temperatures,Physical Review A103,052205 (2021). - Vieira
*etal.*[2021]C.H.S.Vieira, C.Duarte, R.C.Drumond,andM.TerraCunha,Bell Non-Locality inMany-Body Quantum Systems with Exponential Decay of Correlations,Brazilian Journal of Physics51,1603–1616 (2021). - Yang
*etal.*[2022]Y.-H.Yang, X.Yang,andM.-X.Luo,General spin systems without genuinelymultipartite nonlocality,The European Physical Journal D76,61 (2022). - Sun
*etal.*[2023]Z.-Y.Sun, L.-F.Liu,H.-X.Wen, S.Qu, F.-Q.Xu,andB.Guo,Multipartite nonlocality and symmetry breaking in one-dimensionalquantum chains,Physical Review A107,052216 (2023). - ChiaraandSanpera [2018]G.D.ChiaraandA.Sanpera,Genuine quantumcorrelations in quantum many-body systems: a review of recent progress,Reports on Progress in Physics81,074002 (2018).
- TonerandVerstraete [2006]B.TonerandF.Verstraete,Monogamy of Bellcorrelations and Tsirelson’s bound,arXiv10.48550/arXiv.quant-ph/0611001 (2006),arXiv:quant-ph/0611001.
- Oliveira
*etal.*[2013]T.R.d.Oliveira, A.Saguia,andM.S.Sarandy,Nonviolation of Bell’sinequality in translation invariant systems,Europhysics Letters100,60004 (2013). - Sun
*etal.*[2013]Z.-Y.Sun, Y.-Y.Wu,J.Xu, H.-L.Huang, B.-J.Chen,andB.Wang,Violation of Bell inequality in perfect translation-invariantsystems,Physical Review A88,054101 (2013). - Deng [2018]D.-L.Deng,Machine Learning Detectionof Bell Nonlocality in Quantum Many-Body Systems,Physical Review Letters120,240402 (2018).
- Canabarro
*etal.*[2019]A.Canabarro, S.Brito,andR.Chaves,Machine Learning NonlocalCorrelations,Physical Review Letters122,200401 (2019). - Kriváchy
*etal.*[2020]T.Kriváchy, Y.Cai,D.Cavalcanti, A.Tavakoli, N.Gisin,andN.Brunner,A neural network oracle for quantum nonlocality problems innetworks,npj Quantum Information6,1–7 (2020). - daSilvaandParisio [2023]J.M.daSilvaandF.Parisio,Numerically assisteddetermination of local models in network scenarios,Physical Review A108,052602 (2023).
- Cavalcanti
*etal.*[2016]D.Cavalcanti, L.Guerini,R.Rabelo,andP.Skrzypczyk,General Method for Constructing Local Hidden VariableModels for Entangled Quantum States,Physical Review Letters117,190401 (2016). - Hirsch
*etal.*[2016]F.Hirsch, M.T.Quintino, T.Vértesi,M.F.Pusey,andN.Brunner,Algorithmic Construction of Local Hidden VariableModels for Entangled Quantum States,Physical Review Letters117,190402 (2016). - Toner [2007]B.F.Toner,
*Quantifying quantum nonlocality*,Phdthesis,California Institute of Technology (2007). - TóthandAcín [2006]G.TóthandA.Acín,Genuine tripartite entangled stateswith a local hidden-variable model,Physical Review A74,030306 (2006).
- VerstraeteandCirac [2006]F.VerstraeteandJ.I.Cirac,Matrix product statesrepresent ground states faithfully,Physical Review B73,094423 (2006).
- von Selzam [2024]N.vonSelzam,AutoLHVs,https://github.com/Nick-von-Selzam/AutoLHVs (2024).

## Appendix A Appendix

## Appendix B A. Accuracy of the Method

The presented approach is numerical and thereby approximate to some degree. We evaluate its accuracy and justify how we interpret the results.

First, we explain the particular value of the optimized loss for local states. Additionally to the examples considered in the main text we optimize LHV models for product states and noisy GHZ states of different numbers of spins. We show the optimized losses in Fig.5 (a). As in all examples of local states studied so far we consistently get values around$\mathcal{L}\approx 2\cdot 10^{-8}$ to$3\cdot 10^{-8}$. We can explain this particular value from the combination of two things: We work with single precision floats (‘float32’) since this is efficient computationally. And we chose the Kullback-Leibler divergence as the loss function. If one only optimizes a single scalar to match a target value using gradient descent with the KL-divergence as loss function and with single precision floats, the loss will still plateau at roughly the same value. Thus we are assured the observed behaviour is not an artifact of the rest of the algorithm, in particular not of the way we represent the hidden-state distribution. Note that the precision of the mantissa in float32 numbers is of the same order ($2^{-24}\approx 6\cdot 10^{-8}$). Changing the datatype (e.g. to doubles or shorts) or the loss function (e.g. to a square difference loss) will in general change this value.

Next, we consider two situations where the result is known.Using Bell’s measurement rule we have shown that there exist hidden-state distributions reproducing the single spin measurement statistics exactly and we know what these distributions are (see section J below). We find that they are correctly found by our optimization scheme: As a demonstration consider a single spin in either the state$\ket{\uparrow}$ or the state$\rho(\frac{1}{2},0,0)$. The optimized hidden-state clouds are shown in Fig.5 (b). Visually, the optimized distributions coincide well with the exact results and the optimized deviation is again roughly$2\cdot 10^{-8}$ to$3\cdot 10^{-8}$.

If one restricts to measurements in a plane, the critical visibility for Werner states is known exactly [46]

$\displaystyle v_{C}^{\text{(2D)}}=\frac{1}{\sqrt{2}}.$ | (16) |

To analyze this scenario, we optimize LHV models for the Werner states while only sampling measurement directions in the$xy$-plane. The result is shown in Fig.6 (a). We observe the transition to increasing deviations at the correct critical visibility. This gives us confidence that our previous estimate for the critical visibility in case of all projective measurements is accurate as well.

Finally we analyze the dependence of the solution on the size of the hidden-state cloud$N_{\text{h}}$ (Fig.6 (b)). Larger clouds can represent more complex distributions and approximate continuous distributions more accurately. We optimize LHV models for the Werner states (now again for all projective measurements) using different$N_{\text{h}}$. As expected, we see that the optimized deviations converge to some lower bound as we increase$N_{\text{h}}$. Already for small clouds we reach$\mathcal{L}\approx 2.5\cdot 10^{-8}$ for small visibilities. For larger clouds the region of small loss grows. This corresponds to a convergence of the estimate for$v_{C}$ to some point in the previously unknown region. Increasing the maximal degree in$D$ in the spherical-harmonics expansion leads to similar behaviour.

### B.1 B. Glassy Optimization Landscape and Hyper Parameters

We already mentioned in the main text that convergence of the hidden-state distribution takes longer close to the critical visibility of Werner states. We observe similar behaviour for different kinds of states always at the boundary between local and non-local states. It seems to be easy for our class of LHV models to represent very local states, such as separable ones. Conversely, for highly entangled, very non-local states there is not much to optimize (the approximation remains bad). Hence in these cases convergence is fast. In between, close to the boundary, the optimization landscape seems to be very complicated or ‘glassy’in the sense that the optimal hidden-state distribution depends on, e.g., the visibility in a non-continuous way. This can be seen by optimizing the hidden-state cloud for some visibility$v$ and using the result as initialization for$v^{\prime}=v\pm\Delta v$. We observe that close to$v_{C}$ the results obtained in this manner for$v^{\prime}$ are significantly worse compared to optimizing with unbiased initialization (i.e. fresh initialization for each value of the visibility). In detail, we individually optimize LHV models for a range of Werner states. In a second step we ‘interpolate’in between each pair of neighboring visibilities by always using the result for the closest visibility as initialization. This is shown in Fig.7. We interpret these observations as meaning that the loss function has many local minima whose relative depths depend on the visibility, thereby constantly changing the global minimum in a non-continuous way.

The hyper-parameters which change the run-time of the algorithm are (together with the typical values we used)

- •
Number of gradient descent steps$N_{\text{steps}}=10^{4}\ldots 10^{7}$

- •
Number of spins$N=1\ldots 6$

- •
hidden-state cloud size$N_{\text{h}}=2^{10}\ldots 2^{16}$

- •
Maximal degree$D=1\ldots 5$ in the spherical-harmonics expansion of the measurement rule

- •
Dimension of the hidden states (per spin)$d=\frac{1}{2}(D+1)(D+2)$

- •
The batch size$N_{\text{batch}}=2^{8}\ldots 2^{10}$, that is the number of combinations of measurement directions sampled per gradient descent step.

For completeness we mention the learning ratelr with typical values$\text{lr}/N_{\text{h}}=5\cdot 10^{-6}\ldots 5\cdot 10^{-5}$. To give a specific example for the run-time, with hyper parameters$N_{\text{steps}}=10^{4},\ N=2,\ N_{\text{h}}=2^{14},\ D=5,\ d=21,\ N_{\text{%batch}}=2^{10}$ sampling all projective measurements for a Werner state, the run-time on a single NVIDIA Quadro RTX 6000 GPU is about$16$ seconds ($18$ seconds when running the jitted function for the first time). Optimizing multiple states in parallel allows for an additional speed up. Generating the data for Fig.2 takes a few days with most of the time spent on the visibilities close to$v_{C}$.

### B.2 C. GHZ and W States

Werner states are noisy Bell singlets. Analogously, we can consider noisy three-qubit GHZ and W states

$\displaystyle\ket{GHZ}=\frac{1}{\sqrt{2}}\left(\ket{\uparrow\uparrow\uparrow}+%\ket{\downarrow\downarrow\downarrow}\right),\qquad\ket{W}=\frac{1}{\sqrt{3}}%\left(\ket{\downarrow\uparrow\uparrow}+\ket{\uparrow\downarrow\uparrow}+\ket{%\uparrow\uparrow\downarrow}\right)$ | (17) |

Similar to the two-qubit Werner states, only bounds for their critical visibilities are known [22]. We optimize LHV models for both of them and again find numerical estimates for their critical visibilities (see Fig.8)

$\displaystyle v_{C}^{GHZ}\approx 0.485,\qquad v_{C}^{W}\approx 0.518.$ | (18) |

### B.3 D. More on XXZ Groundstates

In the main text we claimed that for the two-qubit states respecting the symmetries of the XXZ Hamiltonian the CHSH inequality is almost tight, except close to the Werner states. To see this more clearly, we optimize LHV models along the boundary, that is along the dashed and the red line in Fig.3. This is shown in Fig.9 (a). We can clearly see that along the physical boundary the deviation stays small throughout. Along the CHSH boundary the deviation is small except close to the Werner states where we find a local maximum.

So far we spoke of ‘all projective measurements’for the spin-$1/2$ systems. In practice, we only considered the non-trivial ones, that is excluding the option of not measuring a spin. However, if all non-trivial measurements are correctly captured by the LHV model we automatically get the correct result for the trivial measurements as well. To check how well this works in practice we take the optimized models for the$N$-spin sub-chains of the XXZ ground state in one dimension (Fig.4) and test them on just two spins (Fig.9 (b)). We observe that our expectation is essentially correct: In case the$N$-spin subsystem is local and we found an accurate LHV model, it also performs well on the two spin subsystem. In case the$N$-spin subsystem is non-local and the optimized LHV model is bad it also performs poorly on the subsystem.

### B.4 E. Remark on Observables

Suppose a model of the form in eq.(2) describes the measurement statistics of a given $N$-particle quantum state$\rho$ exactly for all projective measurements. Consider any observable$A$ for the$N$ particles. It corresponds to a hermitian operator which can be decomposed as a linear combination of products of single particle projection operators

$\displaystyle A=\sum_{k}c_{k}P^{(k)}_{1}\otimes\cdots\otimes P^{(k)}_{N}.$ | (19) |

Therefore, the quantum mechanical expectation value can be calculated using the LHV model

$\displaystyle\langle A\rangle$ | $\displaystyle=\sum_{k}c_{k}\Tr(P^{(k)}_{1}\otimes\cdots\otimes P^{(k)}_{N}%\cdot\rho)=\sum_{k}c_{k}P^{LHV}(P^{(k)}_{1},\ldots,P^{(k)}_{N}\,|\,\rho).$ |

The decomposition of$A$ is not unique. There are infinitely many choices for such representations. As long as all measurement statistics are exactly reproduced by the LHV model, this is not a problem. However, in practice this will only hold approximately when we represent non-local states using an LHV model (also, strictly speaking, even in the local region due to small remaining approximation errors). In this situation the LHV statistics generically do not represent a quantum state. Then, the expectation value predicted by the LHV model depends on the decomposition of the observable.The simplest way to see that there are LHV models which are incompatible with quantum mechanics is for a single qubit:For any qubit state and two measurement directions$\hat{n}_{1},\hat{n}_{2}\in S^{2}$ the quantum mechanical probabilities for obtaining ‘up’are not independent in the sense that

$\displaystyle P^{QM}(\hat{n}_{1})+P^{QM}(\hat{n}_{2})\leq 1+\absolutevalue{%\cos(\theta/2)},\qquad\hat{n}_{1}\cdot\hat{n}_{2}=\cos(\theta).$ | (20) |

###### Proof.

Let the state be$\rho=\rho(\vec{r})$, then

$\displaystyle P^{QM}(\hat{n}_{1})+P^{QM}(\hat{n}_{2})$ | $\displaystyle=\frac{1}{2}\left(1+\hat{n}_{1}\cdot\vec{r}\right)+\frac{1}{2}%\left(1+\hat{n}_{2}\cdot\vec{r}\right)=1+\frac{1}{2}(\hat{n}_{1}+\hat{n}_{2})%\cdot\vec{r}$ | (21) | ||

$\displaystyle\leq 1+\frac{1}{2}\norm{\hat{n}_{1}+\hat{n}_{2}}\norm{\vec{r}}%\leq 1+\frac{1}{2}\norm{\hat{n}_{1}+\hat{n}_{2}}$ | (22) | |||

$\displaystyle=1+\frac{1}{2}\sqrt{\norm{\hat{n}_{1}}^{2}+\norm{\hat{n}_{2}}^{2}%+2\hat{n}_{1}\cdot\hat{n}_{2}}=1+\frac{1}{\sqrt{2}}\sqrt{1+\hat{n}_{1}\cdot%\hat{n}_{2}}$ | (23) | |||

$\displaystyle=1+\absolutevalue{\cos(\theta/2)}.$ | (24) |

∎

For distinct measurement directions the sum of probabilities is strictly less than two. However, our class of LHV models contains hidden-state distributions that for any given pair of measurement directions with relative angle less than$\pi$ allow for this sum to be equal to two. This means there are LHV models whose correlations do not correspond to any state. One could try to constrain the hidden-state distributions, however it is not clear how to do this.

### B.5 F. Different Definitions of Bell-Locality

The two presented ways to construct Bell-local correlations are equivalent in the sense that correlations can be written in the form of eq.(1) if and only if they can be written in the form of eq.(2).

###### Proof.

Suppose$P(a|x)$ can be written in the form of eq.(1). Define the new hidden-state space

$\displaystyle\Lambda^{\prime}=\Lambda\times\{1,\ldots,N\}\ni\lambda^{\prime}=(%\lambda,n).$ | (25) |

Set the hidden-state distribution to be

$\displaystyle p^{\prime}(\lambda_{1}^{\prime},\ldots,\lambda_{N}^{\prime})=%\delta(\lambda_{1}-\lambda_{2})\cdots\delta(\lambda_{1}-\lambda_{N})\delta_{n_%{1},1}\cdots\delta_{n_{N},N}p(\lambda_{1})$ | (26) |

and the local measurement rule to

$\displaystyle q^{\prime}(a|x,(\lambda,n))=q_{n}(a|x,\lambda).$ | (27) |

Then

$\displaystyle P^{\prime}(a|x)$ | $\displaystyle=\sum_{n_{1}=1}^{N}\int_{\Lambda}\text{d}\lambda_{1}\cdots\sum_{n%_{N}=1}^{N}\int_{\Lambda}\text{d}\lambda_{N}\,\delta(\lambda_{1}-\lambda_{2})%\cdots\delta(\lambda_{1}-\lambda_{N})\delta_{n_{1},1}\cdots\delta_{n_{N},N}p(%\lambda_{1})$ | (28) | ||

$\displaystyle\hskip 150.0pt\times q_{n_{1}}(a_{1}|x_{1},\lambda_{1})\cdots q_{%n_{N}}(a_{N}|x_{N},\lambda_{N})$ | (29) | |||

$\displaystyle=\int\text{d}\lambda_{1}\,p(\lambda_{1})q_{1}(a_{1}|x_{1},\lambda%_{1})\cdots q_{N}(a_{N}|x_{N},\lambda_{1})=P(a|x).$ | (30) |

Now, suppose$P(a|x)$ can be written in the form of eq.(2). Define the new hidden-state space

$\displaystyle\Lambda^{\prime}=\Lambda^{N}\ni\lambda^{\prime}=(\lambda_{1},%\ldots,\lambda_{N})$ | (31) |

with hidden-state distribution

$\displaystyle p^{\prime}(\lambda^{\prime})=p(\lambda_{1},\ldots,\lambda_{N})$ | (32) |

and local measurement rules

$\displaystyle q_{j}(a|x,\lambda^{\prime})=q(a|x,\lambda_{j}).$ | (33) |

Then

$\displaystyle P^{\prime}(a|x)$ | $\displaystyle=\int_{\Lambda^{N}}\text{d}\lambda^{\prime}\,p^{\prime}(\lambda^{%\prime})q_{1}(a_{1}|x_{1},\lambda^{\prime})\cdots q_{N}(a_{N}|x_{N},\lambda^{%\prime})$ | (34) | ||

$\displaystyle=\int_{\Lambda}\text{d}\lambda_{1}\cdots\int_{\Lambda}\text{d}%\lambda_{N}\,p(\lambda_{1},\ldots,\lambda_{N})q(a_{1}|x_{1},\lambda_{1})\cdotsq%(a_{N}|x_{N},\lambda_{N})=P(a|x).$ | (35) |

∎

### B.6 G. Non-Contextual Measurement Rules

Besides the hidden-state vector$\lambda$ the local measurement rule depends on the measurement input$x$ and the measurement outcome$a$. A measurement corresponds to a POVM (for simplicity we stick to discrete ones here)

$\displaystyle x=\{M_{k}\}_{k=1}^{K},\quad 0\leq M_{k}\leq\text{1},\quad\sum_{k%=1}^{K}M_{k}=\text{1},\quad K\in\mathbb{N}\cup\{\infty\}.$ | (36) |

And the possible outputs are the POVM elements:$a=M_{k}\in x$.In this sense the local measurement rule is contextual: The probability for a specific outcome can depend on the possible alternative outcomes defined by the measurement setup. However, without loss of generality we can assume non-contextual measurement rules. To see this suppose an$N$-particle state$\rho$ admits an LHV model

$\displaystyle P^{QM}(a|x)=\int_{\Lambda^{N}}\text{d}\lambda p(\lambda)\prod_{j%=1}^{N}q(a_{j}|x_{j},\lambda_{j})\quad\forall x=(x_{1},\ldots,x_{N}),\ \foralla%=(a_{1},\ldots,a_{N}),a_{j}\in x_{j}.$ | (37) |

Since the quantum mechanical measurement probabilities are non-contextual

$\displaystyle P^{QM}(a|x)=\Tr(a_{1}\otimes\cdots\otimes a_{N}\cdot\rho)=P^{QM}%(a)$ | (38) |

we can simply choose an arbitrary valid context. For instance

$\displaystyle\overline{q}(a,\lambda)\equiv q(a|\{a,\text{1}-a\},\lambda)\ %\Rightarrow\ P^{QM}(a)=\int_{\Lambda^{N}}\text{d}\lambda p(\lambda)\prod_{j=1}%^{N}\overline{q}(a_{j},\lambda_{j})\quad\forall a.$ | (39) |

That is,$\rho$ admits a non-contextual LHV model and it is sufficient to look for such models to determine whether a given state is local or not. If one is interested in a subset of all POVM measurements the same argument applies as long as one can fix a context for all possible measurement outcomes. For spin-$1/2$ systems this shows that we did not loose anything by working (only) with$q(\hat{n},\lambda)=q(\uparrow|\hat{n},\lambda)$.

With the choice above (which is in particular valid for projective measurements) we have the normalization

$\displaystyle\overline{q}(a,\lambda)+\overline{q}(1-a,\lambda)=q(a,\{a,\text{1%}-a\},\lambda)+q(1-a,\{a,\text{1}-a\},\lambda)=1$ | (40) |

which translates toeq.(7) under the change of notation $a=P_{\hat{n}}\mapsto\hat{n}$, $\text{1}-P_{\hat{n}}=P_{-\hat{n}}\mapsto-\hat{n}$ ($P_{\hat{n}}$ is the projection onto the ‘up’eigenstate of$\hat{n}\cdot\vec{\sigma}$). We note that this is slightly more than ‘just normalization’: It implicitly assumes that measuring ‘up’along direction$\hat{n}$ is the same as measuring down in direction$-\hat{n}$. It is not obvious why this would hold for individual instances of the hidden variable. However, the argument above shows that we can assume so.

### B.7 H. Efficient Loss Function

Our initially proposed loss function for spin-$1/2$ systems would be the average of the KL-divergence between$P^{QM}$ and$P^{LHV}$ over all possible combinations of measurement directions$x=(\hat{n}_{1},\ldots,\hat{n}_{N})$. That is, the average of

$\displaystyle\text{D}_{KL}\left(P^{QM}(\cdot|x)||P^{LHV}(\cdot|x)\right)$ | $\displaystyle=\sum_{s_{1},\ldots,s_{N}\in\{\pm 1\}}P^{QM}(s_{1}\hat{n}_{1},%\ldots,s_{N}\hat{n}_{N})\ln(\frac{P^{QM}(s_{1}\hat{n}_{1},\ldots,s_{N}\hat{n}_%{N})}{P^{LHV}(s_{1}\hat{n}_{1},\ldots,s_{N}\hat{n}_{N})})$ | (41) |

This is an exponentially large sum in the number of spins. A more efficient alternative is to only consider the outcomes ‘all spins up’and ‘not all spins up’. That is, we consider the KL-divergence between the simplified distributions$\{P^{QM}(x),1-P^{QM}(x)\}$ and$\{P^{LHV}(x),1-P^{LHV}(x)\}$ (we write $P(x)\equiv P(\uparrow\cdots\uparrow|x)$)

$\displaystyle\mathcal{L}({LHV}\,||\,{QM})=\left\langle P^{QM}(x)\ln(\frac{P^{%QM}(x)}{P^{LHV}(x)})+\left(1-P^{QM}(x)\right)\ln(\frac{1-P^{QM}(x)}{1-P^{LHV}(%x)})\right\rangle_{x}.$ | (42) |

This is enough because we sample all possible measurement directions. A vanishing loss means$P^{QM}(x)=P^{LHV}(x)$ for all$x$, i.e. the LHV model reproduces quantum mechanics perfectly. For non-local states this change of the loss function can lead to different compromises for the optimal hidden-state distribution as mentioned before.

### B.8 I. POVM Measurements for Spin-1/2 Particles

In this appendix, we work out how to deal with general POVM measurements. We have not implemented this so far in our numerics, though.The most general single qubit POVM element is a positive two by two matrix bounded by the identity$0\leq M\leq\text{1}$. A general positive two by two matrix can be obtained by scaling a single qubit state ($\vec{r}$ is a Bloch vector)

$\displaystyle M=m\rho(\vec{r})=\frac{m}{2}\left(\text{1}+\vec{r}\cdot\vec{%\sigma}\right),\qquad m\geq 0,\ \norm{\vec{r}}\leq 1.$ | (43) |

The condition$M\leq\text{1}$ can then be implemented by requiring

$\displaystyle m\leq\frac{2}{1+\norm{\vec{r}}}.$ | (44) |

This can be rewritten as ($m_{0}=m/2,\vec{m}=m\vec{r}$)

$\displaystyle M=M(m_{\mu})=\sum_{\mu}m_{\mu}\sigma^{\mu},\qquad\norm{\vec{m}}%\leq\min(m_{0},1-m_{0}),$ | (45) |

where$\sigma^{\mu}=(\text{1},\vec{\sigma})$ and implicitly$0\leq m_{0}\leq 1$. The projection operators correspond to the special case$m_{0}=1/2$. We have

$\displaystyle\text{1}-M(m_{0},\vec{m})=M(1-m_{0},-\vec{m}).$ | (46) |

For implementations it will be convenient to rewrite this expression by setting $m_{0}^{\rm old}=m_{0}^{\rm new}+1/2$ such that

$\displaystyle M(m_{\mu})=\left(\frac{1}{2}+m_{0}\right)\text{1}+\vec{m}\cdot%\vec{\sigma},\qquad\absolutevalue{m_{0}}\leq\frac{1}{2},\ \norm{\vec{m}}\leq%\frac{1}{2}-\absolutevalue{m_{0}}.$ | (47) |

In this case

$\displaystyle\text{1}-M(m_{\mu})=M(-m_{\mu}),$ | (48) |

which means the local measurement rule can again be written as

$\displaystyle q(m_{\mu},\lambda)=\sigma(f_{\lambda}(m_{\mu}))$ | (49) |

with odd functions$f_{\lambda}$. The mean over POVM elements in the loss function can just be taken with respect to a uniform distribution over the allowed (compact) range of$m_{\mu}$. The functions$f_{\lambda}$ can be expanded into odd polynomials where the coefficients again become the components of the hidden-state vector.

### B.9 J. Exact LHV Models for Separable States

If a given measurement rule$q$ allows to represent all single particle states ($M$ is a POVM element)

$\displaystyle\forall\rho\,\exists\,p(\lambda|\rho):\ \Tr(M\rho)=\int_{\Lambda}%\text{d}\lambda\,q(M|\lambda)p(\lambda|\rho),$ | (50) |

then all non-entangled $N$-particle states can be represented exactly.

###### Proof.

Any separable $N$particle state is a convex linear combination of product states

$\displaystyle\rho=\sum_{k}t_{k}\rho^{(k)}_{1}\otimes\cdots\otimes\rho^{(k)}_{N%},\qquad\sum_{k}t_{k}=1.$ | (51) |

By assumption we have exact hidden-state distributions$p(\lambda|\rho^{(k)}_{j})$ for the individual parts$\rho^{(k)}_{j}$. We claim that

$\displaystyle p(\lambda_{1},\ldots,\lambda_{N}|\rho)=\sum_{k}t_{k}p(\lambda_{1%}|\rho^{(k)}_{1})\cdots p(\lambda_{N}|\rho^{(k)}_{N})$ | (52) |

is the correct hidden-state distribution. Indeed this reproduces the correct measurement statistics

$\displaystyle P^{LHV}(M_{1},\ldots,M_{N}|\rho)$ | (53) | ||

$\displaystyle=\int_{\Lambda}\text{d}\lambda_{1}\cdots\int_{\Lambda}\text{d}%\lambda_{N}\,p(\lambda_{1},\ldots,\lambda_{N}|\rho)q(M_{1},\lambda_{1})\cdots q%(M_{N},\lambda_{N})$ | (54) | ||

$\displaystyle=\sum_{k}t_{k}\int_{\Lambda}\text{d}\lambda_{1}\,p(\lambda_{1}|%\rho^{(k)}_{1})q(M_{1},\lambda_{1})\cdots\int_{\Lambda}\text{d}\lambda_{N}\,p(%\lambda_{N}|\rho^{(k)}_{N})q(M_{N},\lambda_{N})$ | (55) | ||

$\displaystyle=\sum_{k}t_{k}P^{LHV}(M_{1}|\rho^{(k)}_{1})\cdots P^{LHV}(M_{N}|%\rho^{(k)}_{N})$ | (56) | ||

$\displaystyle=\sum_{k}t_{k}\Tr(M_{1}\rho^{(k)}_{1})\cdots\Tr(M_{N}\rho^{(k)}_{%N})$ | (57) | ||

$\displaystyle=\Tr(\sum_{k}t_{k}M_{1}\otimes\cdots\otimes M_{N}\cdot\rho^{(k)}_%{1}\otimes\cdots\otimes\rho^{(k)}_{N})$ | (58) | ||

$\displaystyle=\Tr(M_{1}\otimes\cdots\otimes M_{N}\cdot\rho)=P^{QM}(M_{1},%\ldots,M_{N}|\rho).$ | (59) |

∎

And with Bells measurement rule – and thereby with the spherical harmonic rule of any degree greater or equal to one – at least all separable $N$spin-$1/2$ states can be represented exactly with respect to projective measurements. The hidden-state distribution for a single spin in the state$\rho=\rho(\vec{r})$ ($\vec{r}$ is the Bloch vector) is given by

$\displaystyle p(\hat{\lambda}|\vec{r})$ | $\displaystyle=\frac{1}{\pi}\left(\vec{r}\cdot\hat{\lambda}\,H(\vec{r}\cdot\hat%{\lambda})+\frac{1-\absolutevalue{\vec{r}}}{4}\right).$ | (60) |

###### Proof.

We have

$\displaystyle P^{LHV}_{\uparrow}(\hat{n})=\int_{S^{2}}\text{d}A(\hat{\lambda})%\,p(\hat{\lambda})H(\hat{n}\cdot\hat{\lambda}).$ | (61) |

Let us simplify this for a general hidden-state distribution$p(\hat{\lambda})=p(\tilde{\phi},\tilde{\theta})$

$\displaystyle P(\phi,\theta)$ | $\displaystyle\equiv P_{\uparrow}^{LHV}(\hat{n}(\phi,\theta))=\int_{S^{2}}\text%{d}A(\hat{\lambda})\,p(\hat{\lambda})H(\hat{n}(\phi,\theta)\cdot\hat{\lambda}),$ | |||

$\displaystyle P(\phi,0)$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tilde{\phi}\int_{0}^{1}\text{d}\cos(%\tilde{\theta})\,p(\tilde{\phi},\tilde{\theta}),$ | |||

$\displaystyle\partial_{\theta}P(\phi,\theta)$ | $\displaystyle=\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda})\partial_{%\theta}(\hat{n}\cdot\hat{\lambda})\delta(\hat{n}\cdot\hat{\lambda}).$ | (62) |

Introduce

$\displaystyle\hat{n}_{x}=\begin{bmatrix}\cos(\theta)\cos(\phi)\\\cos(\theta)\sin(\phi)\\-\sin(\theta)\end{bmatrix}=\partial_{\theta}\hat{n},\qquad\hat{n}_{y}=\begin{%bmatrix}-\sin(\phi)\\\cos(\phi)\\0\end{bmatrix}.$ | (63) |

Together with$\hat{n}_{z}=\hat{n}$ they form a right-handed orthonormal basis. The delta function$\delta(\hat{n}\cdot\hat{\lambda})$ means the integration becomes restricted to the circle (centered at the origin) that is orthogonal to$\hat{n}$, that is the circle around the origin in the$\hat{n}_{x},\,\hat{n}_{y}$ plane. We can parameterize this circle as

$\displaystyle\hat{\lambda}(\tau)$ | $\displaystyle=\cos(\tau)\hat{n}_{x}+\sin(\tau)\hat{n}_{y},\quad\tau\in[0,\,2%\pi],$ | |||

$\displaystyle\norm*{\dot{\hat{\lambda}}(\tau)}^{2}$ | $\displaystyle=\norm{-\sin(\tau)\hat{n}_{x}+\cos(\tau)\hat{n}_{y}}^{2}=1.$ | (64) |

Then

$\displaystyle\partial_{\theta}P(\phi,\theta)$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tau\,p(\hat{\lambda}(\tau))\left(\hat{n}%_{x}\cdot\hat{\lambda}(\tau)\right)\norm*{\dot{\hat{\lambda}}(\tau)}=\int_{0}^%{2\pi}\text{d}\tau\,\cos(\tau)p(\hat{\lambda}(\tau)).$ |

Plugging in the specific trajectory we obtain

$\displaystyle\partial_{\theta}P(\phi,\theta)$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tau\,\cos(\tau)\,p\left(\begin{bmatrix}%\cos(\tau)\cos(\theta)\cos(\phi)-\sin(\tau)\sin(\phi)\\\cos(\tau)\cos(\theta)\sin(\phi)+\sin(\tau)\cos(\phi)\\-\cos(\tau)\sin(\theta)\end{bmatrix}\right)$ | (65) |

Consider a general single spin state

$\displaystyle\rho=\rho(\vec{r})=\frac{1}{2}\left(\text{1}+\vec{r}\cdot\vec{%\sigma}\right).$ | (66) |

This yields

$\displaystyle P^{QM}_{\uparrow}(\hat{n})=\frac{1}{2}\left(1+\vec{r}\cdot\hat{n%}\right).$ | (67) |

We have (and this uniquely determines$P^{QM}_{\uparrow}(\phi,\theta)$)

$\displaystyle P_{\uparrow}^{QM}(\phi,\theta=0)$ | $\displaystyle=\frac{1}{2}\left(1+r_{z}\right),\qquad\partial_{\theta}P_{%\uparrow}^{QM}(\phi,\theta)=\frac{1}{2}\vec{r}\cdot\hat{n}_{x}.$ |

First consider the case$\rho=\outerproduct{0}{0}$, that is$\vec{r}=\hat{e}_{z}=(0,0,1)$. Then

$\displaystyle P^{QM}_{\uparrow}(\phi,\theta=0)=1,\qquad\partial_{\theta}P_{%\uparrow}^{QM}(\phi,\theta)=-\frac{1}{2}\sin(\theta).$ | (68) |

We claim that the correct hidden-state distribution which reproduces this is (with$\hat{\lambda}=\hat{\lambda}(\tilde{\phi},\tilde{\theta})$)

$\displaystyle p(\hat{\lambda}|\vec{r}=\hat{e}_{z})=\frac{\hat{e}_{z}\cdot\hat{%\lambda}}{\pi}H(\hat{e}_{z}\cdot\hat{\lambda})=\frac{\cos(\tilde{\theta})}{\pi%}H(\cos(\tilde{\theta}))=\frac{\lambda_{z}}{\pi}H(\lambda_{z}).$ | (69) |

First of all this is a valid probability distribution over the sphere

$\displaystyle\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda}|\hat{e}_{z})$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tilde{\phi}\int_{-1}^{1}\text{d}(\cos(%\tilde{\theta}))\frac{\cos(\tilde{\theta})}{\pi}H(\cos(\tilde{\theta}))=\frac{%2\pi}{\pi}\int_{0}^{1}\text{d}u\,u=1.$ | (70) |

Next, the value at$\theta=0$ is correct (this is the same integral as the normalization above)

$\displaystyle P(\phi,0)$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tilde{\phi}\int_{0}^{1}\text{d}(\cos(%\tilde{\theta}))\,p(\tilde{\phi},\tilde{\theta}|\hat{e}_{z})=1.$ | (71) |

And lastly, the derivative is correct

$\displaystyle\partial_{\theta}P(\phi,\theta)$ | $\displaystyle=\int_{0}^{2\pi}\text{d}\tau\,\cos(\tau)p(\hat{\lambda}(\tau)|%\hat{e}_{z})$ | |||

$\displaystyle=\int_{0}^{2\pi}\text{d}\tau\,\cos(\tau)\left(-\frac{1}{\pi}\cos(%\tau)\sin(\theta)H(-\cos(\tau)\sin(\theta))\right)$ | ||||

$\displaystyle=-\frac{1}{\pi}\sin(\theta)\int_{0}^{2\pi}\text{d}\tau\,\cos^{2}(%\tau)H(-\cos(\tau))$ | ||||

$\displaystyle=-\frac{1}{\pi}\sin(\theta)\int_{\pi/2}^{3\pi/2}\text{d}\tau\,%\cos^{2}(\tau)=-\frac{1}{2}\sin(\theta)$ | (72) |

because$\sin(\theta)\geq 0$ as$\theta\in[0,\,\pi]$.

Now assume$\hat{r}$ is any normalized vector corresponding to an arbitrary pure state. It corresponds to the hidden-state distribution$p(\hat{\lambda}|\hat{r})$ such that

$\displaystyle P^{QM}_{\uparrow}(\hat{n}|\hat{r})=P^{LHV}_{\uparrow}(\hat{n}|%\hat{r})=\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda}|\hat{r})H(\hat{%n}\cdot\hat{\lambda}).$ | (73) |

We know that for any rotation matrix$R$ it holds that

$\displaystyle P^{QM}_{\uparrow}(\hat{n}|R\hat{r})=\frac{1}{2}(1+(R\hat{r})%\cdot\hat{n})=\frac{1}{2}(1+\hat{r}\cdot(R^{T}\hat{n}))=P^{QM}_{\uparrow}(R^{T%}\hat{n}|\hat{r}).$ | (74) |

Therefore the hidden-state distributions for Bloch vectors on the surface of the Bloch sphere must satisfy

$\displaystyle\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda}|R\hat{r})H(%\hat{n}\cdot\hat{\lambda})\stackrel{{\scriptstyle!}}{{=}}\int_{S^{2}}\text{d}A%(\hat{\lambda})\,p(\hat{\lambda}|\hat{r})H((R^{T}\hat{n})\cdot\hat{\lambda})$ | |||

$\displaystyle=\int_{S^{2}}\text{d}A(R\hat{\lambda})\,p(R^{T}(R\hat{\lambda})|%\hat{r})H(\hat{n}\cdot(R\hat{\lambda}))=\int_{S^{2}}\text{d}A(\hat{\lambda})\,%p(R^{T}\hat{\lambda}|\hat{r})H(\hat{n}\cdot\hat{\lambda}).$ | (75) |

This means if$p(\hat{\lambda}|\hat{r})$ is a valid solution for the Bloch vector$\hat{r}$, then$p(R^{T}\hat{\lambda}|\hat{r})$ is always a valid solution for the Bloch vector$R\hat{r}$ (it is correctly normalized), i.e.

$\displaystyle p(R^{T}\hat{\lambda}|\hat{r})=p(\hat{\lambda}|R\hat{r})\quad%\forall R,\hat{r},\hat{\lambda}.$ | (76) |

Since we know the solution for one unit Bloch vector($\hat{e}_{z}$) we can use this property to obtain the solution for all unit Bloch vectors: Let$R$ be the rotation matrix mapping$\hat{e}_{z}$ to$\hat{r}$, then

$\displaystyle p(\hat{\lambda}|\vec{r})$ | $\displaystyle=p(\hat{\lambda}|R\hat{e}_{z})=p(R^{T}\hat{\lambda}|\hat{e}_{z})=%\frac{\hat{e}_{z}\cdot(R^{T}\hat{\lambda})}{\pi}H(\hat{e}_{z}\cdot(R^{T}\hat{%\lambda}))$ | |||

$\displaystyle=\frac{(R\hat{e}_{z})\cdot\hat{\lambda}}{\pi}H((R\hat{e}_{z})%\cdot\hat{\lambda})=\frac{\hat{r}\cdot\hat{\lambda}}{\pi}H(\hat{r}\cdot\hat{%\lambda}).$ | (77) |

Finally, we extend the result to non-unit Bloch vectors. Let$\vec{r}$ be any Bloch vector and$\alpha\in[0,\,1]$. Then

$\displaystyle P^{QM}_{\uparrow}(\hat{n}|\alpha\vec{r})$ | $\displaystyle=\frac{1}{2}(1+\alpha\vec{r}\cdot\hat{n})=\frac{1}{2}\left(1+2%\alpha\left(\frac{1}{2}\left(1+\vec{r}\cdot\hat{n}\right)-\frac{1}{2}\right)\right)$ | ||

$\displaystyle=\frac{1-\alpha}{2}+\alpha P^{QM}_{\uparrow}(\hat{n}|\vec{r}).$ |

This necessitates

$\displaystyle\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda}|\alpha\hat{%r})H(\hat{n}\cdot\hat{\lambda})\stackrel{{\scriptstyle!}}{{=}}\frac{1-\alpha}{%2}+\alpha\int_{S^{2}}\text{d}A(\hat{\lambda})\,p(\hat{\lambda}|\hat{r})H(\hat{%n}\cdot\hat{\lambda})$ | |||

$\displaystyle=\int_{S^{2}}\text{d}A(\hat{\lambda})\,\left(\alpha p(\hat{%\lambda}|\hat{r})+\frac{1-\alpha}{4\pi}\right)H(\hat{n}\cdot\hat{\lambda}).$ | (78) |

We obtain the valid (and correctly normalized, non-negative) solutions (with$\hat{r}=\vec{r}/\norm{\vec{r}}$)

$\displaystyle p(\hat{\lambda}|\vec{r})$ | $\displaystyle=p(\hat{\lambda}|\norm{\vec{r}}\hat{r})=\norm{\vec{r}}p(\hat{%\lambda}|\hat{r})+\frac{1-\norm{\vec{r}}}{4\pi}=\norm{\vec{r}}\frac{\hat{r}%\cdot\hat{\lambda}}{\pi}H(\hat{r}\cdot\hat{\lambda})+\frac{1-\norm{\vec{r}}}{4\pi}$ | |||

$\displaystyle=\frac{\vec{r}\cdot\hat{\lambda}}{\pi}H(\vec{r}\cdot\hat{\lambda}%)+\frac{1-\norm{\vec{r}}}{4\pi}.$ | (79) |

∎