### Optical circuit

The input of the interferometer is provided by a single optical parametric oscillator (OPO), emitting pulsed single-mode squeezed states at a 6 MHz rate that are then sent to three concatenated, programmable, loop-based interferometers. Each loop contains a VBS, including a programmable phase shifter, and an optical fibre delay line acting as a buffer memory for light, and allows for the interference of modes that are temporally adjacent (τ = (6 MHz)−1), or separated by six or 36 time bins (6 τ or 36 τ) in the first, second and third loop, respectively. Optical delays provide a compact and elegant method to mediate short- and long-range couplings between modes. The high-dimensional Gaussian state generated for this experiment can be visualized, as depicted above the three loops in Fig. 1, using a three-dimensional lattice representation. Given a lattice of size a = 6, where a is the number of modes separating two interacting pulses in the second loop, one can form a cubic lattice by injecting M = a3 = 216 squeezed-light pulses into the interferometer.

Owing to the use of a single time-multiplexed squeezed-light source, all temporal modes are, to very good approximation, indistinguishable in all degrees of freedom except time signature, and passively phased locked with respect to each other; the squeezer is driven by pump pulses engineered to generate nearly single-temporal-mode squeezed-light pulses on a 6 MHz clock. Spatial overlap is ensured by using single-mode fibre coupling at the entrance and exit of each loop delay, and samples are collected using an array of photon-number resolving (PNR) detectors based on superconducting transition-edge sensors (TES) with 95% detection efficiency39,40. These samples consist of 216 (integer) photon-number measurement outcomes for as many modes. To bridge the gap between the 6 MHz clock, chosen to maintain manageable fibre loop lengths, and the slower relaxation time of the TES detectors, a 1-to-16 binary-tree switching network was used to partially demultiplex the pulse train after the loops and before the detectors.

### Experimental challenges

Despite the simple conceptual design of Borealis (Fig. 1), building a machine capable of delivering quantum computational advantage in a programmable fashion using photonics, in a large photon-number regime, required solving considerable technological hurdles that were previously outstanding. These include: (1) lack of PNR-compatible single-mode squeezed-light sources and non-invasive phase stabilization techniques requiring bright laser beams, (2) slow PNR reset times that would necessitate unfeasibly long fibre loops and (3) lack of sufficiently fast and low-loss electro-optic modulators (EOMs) preventing programmability. Our solutions to these challenges for this work are, respectively, (1) the design of a bright and tunable source of single-mode squeezed states and phase stabilization techniques (OPO and interferometer) using locking schemes compatible with PNR detectors, (2) active demultiplexing to increase the effective rate of PNR acquisition by a factor of 60, compared to previous systems40, by constructing a low-loss 1-to-16 binary switch tree and developing new photon-number extraction techniques and (3) the use of new, efficient and fast customized EOMs (QUBIG GmbH) that enable arbitrary dynamic programming of photonic gates with low loss and high speeds. The success of this experiment also relies on a robust calibration routine, accurately extracting all experimental parameters contained in the transfer matrix T and the squeezing parameters r that define each GBS instance. We describe each of these advances in the following sections. Other details pertinent to the apparatus can be found in the Supplementary Information.

With further fabrication and device optimization, the raw operational speed of PNR detectors can be increased, eliminating the need for the demultiplexer (demux) and associated losses (roughly 15%). Improvements to the filter stack (20% loss) would also considerably increase performance. Several paths thus exist to even further increase the robustness of our machine against hypothetical improved classical adversaries. In addition, in trial runs we have extended the number of accessible modes to 288 (see Supplementary Information) without any changes to the physical architecture, and expect further scalability in this number to be readily achievable by improving the long-time stabilization of the device. Such scaling will place the device even further ahead of the regime of classical simulability and potential vulnerability to spoofing.

For applications requiring a universal interferometer, a recirculation loop long enough to accommodate all 216 modes could be implemented41, replacing any two of the three existing loops. The remaining existing loop would be nested in the larger 216-mode loop, allowing repeated application of the remaining VBS to all 216 modes, albeit at the cost of higher losses.

### Pulsed squeezed-light source

The main laser is an ultralow phase noise fibre laser with a sub-100 Hz linewidth centred at 1,550 nm, branched out into different paths. To prepare the pump, in one path pulses are carved using a 4 GHz lithium niobate electro-optic intensity modulator. It is then amplified and upconverted to 775 nm using a fibre-coupled MgO:LN ridge waveguide. The resulting pump is a 6 MHz stream of 3-ns-duration rectangular pulses with an average power of 3.7 mW. Squeezed-light pulses are generated in a doubly resonant, phase-stabilized hemilithic cavity42 comprising a 10-mm-long plano-convex potassium titanyl phosphate crystal with its temperature stabilized at 32.90 °C using a Peltier element, for optimal Type-0 phase matching (Supplementary Information). All spectral side bands of the OPO cavity, around the degenerate frequency band, are suppressed by more than 25 dB using a pair of fibre Bragg gratings (0.04 nm bandwidth at 0.5 dB), one in reflection and the other in transmission (more details in Supplementary Information).

### Programmable photonic processor

A train of single-mode squeezed vacuum pulses is emitted by the OPO, coupled into a single-mode fibre and directed towards the programmable photonic processor consisting of three loop-based interferometers in series, as shown in Fig. 1. Each loop ({ell }=0,1,2) is characterized by a VBS with transfer matrix

$$B{S}^{{ell }}({alpha }_{k},{varphi }_{k})=(begin{array}{cc}{e}^{i{varphi }_{k}},cos ,{alpha }_{k} & isqrt{{eta }_{{ell }}}{e}^{i{mu }_{{ell }}},sin ,{alpha }_{k}\ i{e}^{i{varphi }_{k}},sin ,{alpha }_{k} & sqrt{{eta }_{{ell }}}{e}^{i{mu }_{{ell }}},cos ,{alpha }_{k}end{array})$$

(5)

where each phase ϕk = [−π/2, π/2] and αk = [0, π/2] can be programmed independently, ({mu }_{{ell }}) is a phase offset associated with each loop and ({eta }_{{ell }}) is the energy transmittance coefficient associated with one complete circulation in loop ({ell }). The time delay experienced in the first loop is τ = 1/(6 MHz), equals the delay between two consecutive squeezed-light pulses, whereas the second and third loops have 6 τ and 36 τ time delay, respectively. The transmittance tk of a VBS with parameter αk is given by tk = cos2αk. For tk = 1 all the incoming light is directed into the fibre delay, whereas the light entering the VBS from the fibre delay is fully coupled out. The output of the last loop is coupled into a single-mode fibre and directed towards the final sampling stage of the experiment.

All three loops are independently phase stabilized using a counter-propagating laser beam, piezo transducers and lock-in techniques. To avoid stray light from reflections of this beam towards the detectors, we alternate between measurement (65 μs) and phase stabilization of the loops (35 μs), leading to a sampling rate of 10 kHz. The estimated phase noise (standard deviation from the mean) inside the interferometer is 0.02, 0.03 and 0.15 rad for the first, second and third loops, respectively, as measured with classical pulses. We carefully reduced mode mismatch throughout the entire interferometer: spatial overlap is ensured using single-mode fibres, with coupling efficiencies >97%, and the length of each loop delay is carefully adjusted to have >80% classical visibility between 250-ps-long classical pulses, which gives >99% temporal overlap for the squeezed states.

### Connectivity

The programmable time-domain multiplexed architecture implemented here and introduced in ref. 5 generates sufficiently connected transmission matrices (in which two-thirds of the entries of the matrix are non-zero) to furnish a high level of entanglement between the modes (we estimate the log negativity between modes 0…i−1 and i…216 for the ground truth to be on average 5.96 for (iin {36,72,108,144,180})), while keeping losses sufficiently low (with transmission above 33%). This is not the case for other architectures in which one either has to give up programmability1,2 or suffer steep losses that, in the asymptotic limit of many modes, render the sampling task roughly simulable as the loss scales exponentially with the system size31. In a universal programmable interferometer each mode passes through several lossy components (with transmission ηunit) proportional to the number of modes. For the interferometers considered here, each mode sees a fixed number (six) of beamsplitters in which the loss is dominated by the transmission of the largest loop. If the shortest loop, which accommodates only one mode, has transmission ηunit then the largest loss is given by ({eta }_{{rm{unit}}}^{36}), which should be contrasted with ({eta }_{{rm{unit}}}^{216}) for a universal interferometer. Whereas we sacrifice some connectivity, the many-mode entanglement predicted in our ground truth (logarithmic negativity43 of 6.08 when splitting the modes of the ground truth between the first and last 108) is comparable to the one found in Gaussian state prepared using a random Haar-interferometer with a comparable net transmission and brightness (for which the logarithmic negativity across the same bipartition is 15.22). For the largest experiment considered below, the net transmittance is around 33%. As discussed in the Methods, combined with the high brightness of our source averaging r ~ 1.1, places our experiment well beyond any attempt at a now-known polynomial-time approximate classical simulation31.

### Sampling of high-dimensional GBS instances

All temporal modes of our synthesized high-dimensional Gaussian states are sampled using superconducting TES allowing photon-number resolution up to 13 photons per detector in our data. Relaxation time of our TES, back to baseline following illumination, is of the order of 10 to 20 μs corresponding to 50–100 kHz (ref. 40), and depends on the expected photon number. At this speed, the length of the shortest loop delay would be 2 km, leading to excessive losses and more challenging phase stabilization in our system. Thus our pulse train and thus processing speed of 6 MHz, chosen to maintain manageable loop lengths, is too fast for a reliable photon-number extraction. To bridge the gap between the typical PNR speed and our processing speed, we use a demultiplexing device allowing to speed up by effectively 16×, and to develop a postprocessing scheme, described below, for ‘tail-subtraction’ enabling operation of each PNR at 375 kHz.

The role of the demux, depicted as a binary tree in Fig. 1, is to reroute squeezed-light pulse modes from the incoming train into 16 separate and independent spatial modes, each containing a fibre-coupled PNR-TES detector. There are 15 low-loss resonant EOMs grouped in four different layers. EOMs in each layer have a preset frequency: one at 3 MHz, two at 1.5 MHz, four at 750 kHz and eight at 375 kHz. Each EOM is sandwiched between two polarizing beamsplitter and a quarter-waveplate at 45° in front. The modulators are driven by a standalone unit, generating several phase-locked sine wave signals temporally synchronized with the input train. The switching extinction ratio is measured to be above 200:1 for all modulators.

Several methods have been demonstrated to extract photon numbers from a PNR’s output voltage waveform, each with their own advantages44,45,46,47. Here we use a modified version of the method presented in ref. 47. First, each detector is calibrated using well separated pulses of squeezed light with a high mean photon number around (langle nrangle approx 1) and 500 × 103 repetitions. This gives enough high photon-number events to ensure that at least the 0 to 11 photon clusters can be identified using the area method. From each cluster, the mean shape of the waveforms is defined. To extract the photon-number arrays from experiment, the mean square distance between each waveform and the mean shape is estimated. The photon number is then assigned to the closest cluster. Because we operate the individual PNRs at 375 kHz, faster than the relaxation time (back to baseline following illumination), the tail of each pulse still persists when the next pulse arrives at the same PNR. To avoid these tails reducing photon-number extraction fidelity in a pulse, the mean shape for the identified previous photon number is subtracted. See Supplementary Information for details.

### Estimation of the ground truth parameters

Given that all the squeezed states come from the same squeezer and the programmability of our system, we can parametrize and characterize the loss budget of our system using a very small set of parameters. The first set of parameters correspond to the relative efficiencies of the 16 different demux-detector channels, ηdemux,i for (iin {0,1,ldots ,15}). The second parameter is simply the common transmittance ηC. Finally, we have the transmittance associated with a round-trip through each loop ηk for (kin {0,1,2}).

To characterize the first two parameter sets, namely the demux and common loss, we set all the loops to a ‘bar’ state (αk = π/2), preventing any light from entering the delays. As the input energy is the same, we can simply estimate the ratio of the transmittance of the different demux-detector channels as ({eta }_{{rm{demux}},i}/{eta }_{{rm{demux}},j}={bar{n}}_{i}/{bar{n}}_{j}) where ({bar{n}}_{j}) is the mean photon number measured in the detector j. Without loss of generality, we can take the largest of the ηdemux,i to be equal to one and assign any absolute loss from this and any other channel into the common loss ηC. To determine the common loss, we use the noise reduction factor (NRF), defined as48,49

$${rm{NRF}}=frac{{Delta }^{2}({n}_{i}-{n}_{j})}{langle {n}_{i}+{n}_{j}rangle },$$

(6)

where ni and nj are the photon-number random variables measured in mode i and j, and we write variances as ({Delta }^{2}X={langle Xrangle }^{2}-{langle Xrangle }^{2}).

If losses can be considered as uniform, which is an excellent approximation if we use only the loop with the shortest delay, it is straightforward to show that the NRF of a two-mode squeezed vacuum gives directly the loss seen by the two modes as NRFTMSV = 1−η. To prepare the two-mode squeezed vacuum we set our VBS matrix to be proportional to ((begin{array}{ll}1 & i\ i & 1end{array})) when the two single-mode squeezed pulses meet at the beamsplitter. To this end, we use the following sequence [t0 = 0, t1 = 1/2, t2 = 0], where, recall, we write ti = cos2αi to indicate the transmittance of a particular loop time bin i. We can now scan the controllable phase of the VBS, ϕk, and determine where the minimum occurs (({varphi }_{k}^{{rm{min }}}={mu }_{0},{rm{mod}},pi )), and at the same time provide the relative offset in the first loop and the net transmittance of the setup. This observation can be used to obtain the phase offset of any other loop round-trip. Although in the current version of our system these are set by the locking system, they can in principle also be made programmable. The transmittance η = 1 − NRFTMSV = ηC × η0 × ηdemux is the product of the common transmittance ηC, the round-trip in the first loop η0 and the average transmittance associated with two demux-detector channels used to detect the two halves of the two-mode squeezed vacuum ({eta }_{{rm{demux}}}=frac{1}{2}{{eta }_{{rm{demux}},i}+{eta }_{{rm{demux}},j}}). From this relation, we can find

$${eta }_{C}=frac{1-{{rm{NRF}}}_{{rm{TMSV}}}}{{eta }_{0}times {eta }_{{rm{demux}}}}.$$

(7)

This calibration depends on knowing the value of the round-trip transmittance factor associated with the first loop. To estimate the round-trip transmittance of a particular loop ({ell }), we bypass the other loop delays and compare the amount of light detected when light undergoes a round-trip through a particular loop, relative to when all the round-trip channels are closed, that is, all loops in a ‘bar’ state. We obtain ({eta }_{{ell }}), which we can then plug in equation (7) to complete the calibration sequence.

Finally, having characterized the loss budget in the experiment, we can obtain the brightness and squeezing parameters at the source by measuring photon numbers when all the loops are closed and then dividing by the net transmittance. For any of the three regimes considered in the main text the standard deviation of the estimated squeezing parameters and mean photon numbers is below 1% of the respective means.

From the same data acquired above for a pair of modes, we calculate the unheralded second-order correlation

$${g}^{(2)}=frac{langle {n}_{i}^{2}rangle -langle {n}_{i}rangle }{{langle {n}_{i}rangle }^{2}}$$

(8)

for each pair of temporal modes. When we attain the minimum NRF at ϕk = μ0, that is, when we prepare two-mode squeezed vacuum, it is easy to see that50

$${g}^{(2)}=1+frac{1}{K},$$

(9)

where K is the so-called Schmidt number of the source. This quantifies the amount of spectral mixedness in the generated squeezed light. An ideal squeezed vacuum light source would yield g(2) = 2. We report K = 1.12 for g(2) = 1.89 for the dataset used in the large mode and photon-number regime.

### Theory sections

#### Transfer matrix, T

The loop-based interferometer, as well as any other interferometer, can be described by a transfer matrix T that uniquely specifies the transformation effected on the input light. For our GBS implementation, this interferometer is obtained by combining three layers of phase gates and beamsplitters (two-mode gates), interfering modes that are contiguous, or separated by six or 36 time bins, which we write as

$$T=sqrt{{eta }_{C}}{T}_{{rm{demux}}}[mathop{mathop{otimes }limits_{d=0}}limits^{D-1}mathop{mathop{otimes }limits_{i=0}}limits^{M-{a}^{d}}{B}_{i,i+{a}^{d}}({{rm{VBS}}}^{d}({alpha }_{i},{varphi }_{i}))]$$

(10)

where in our case D = 3 gives the number of loops, while ({a}^{d}{|}_{din {0,1,2}}=) {1, 6, 36} with a = 6 gives the number of modes that each loop can hold. ({B}_{i,i+{a}^{d}})(VBS) is an M × M transfer matrix that acts like the VBS in the subspace of modes i and j = i + ad and like the identity elsewhere.

In the last equation, ηC is the common transmittance throughout the interferometer associated with the escape efficiency of the squeezer cavity and the propagation loss in common elements. Tdemux is a diagonal matrix that contains the square roots of the energy transmittance into which any of the modes are rerouted for measurement using the demux. Because the demux has 16 channels, it holds that ({({T}_{{rm{demux}}})}_{i,i}={({T}_{{rm{demux}}})}_{i+16,i+16}=sqrt{{eta }_{{rm{demux}},i}}). Finally, we set the phases of the VBS to be uniformly distributed in the range [−π/2, π/2] and the transmittances to be uniformly in the range [0.45, 0.55]. This range highlights the programmability of the device while also generating high degrees of entanglement that are typically achieved when the transmittance is half.

In the idealized limit of a lossless interferometer, the matrix representing it is unitary, otherwise the matrix T is subunitary (meaning its singular values are bounded by 1). The matrix T together with the input squeezing parameters r defines a GBS instance. Squeezed states interfered in an interferometer (lossy or lossless) always lead to a Gaussian state, that is, one that has a Gaussian Wigner function. Moreover, loss is never able to map a non-classical state (having noise in a quadrature below the vacuum level) to a classical state. Thus there exists a finite separation in Hilbert space between lossy-squeezed states and classical states. To gauge this separation, and how it influences sampling, we use the results from ref. 31 to show in the section ‘Regimes of classical simulability’ that the probability distribution associated with the ground truth programmed into the device cannot be well-approximated by any classical-Gaussian state.

Similar to previous GBS experiments in which the ground truth to which a quantum computer is compared contains imperfections due to loss, we also benchmark our machine against the operation of a lossy unitary. In this more realistic scenario in which losses are included, the state generated at the output cannot be described by a state vector and thus one cannot assign probability amplitudes to an event. In this case, probabilities are calculated from the density matrix of the Gaussian state using the standard Born rule and then the probability of an N photon event is proportional to the hafnian of a 2N × 2N matrix.

#### Regimes of classical simulability

As a necessary but not sufficient test for beyond-classical capabilities of our machine, we consider the GBS test introduced in ref. 31. This test states that a noisy GBS device can be classically efficiently simulated up to error ϵ if the following condition is satisfied:

$$text{sec}{rm{h}}left{frac{1}{2},max left[0,,mathrm{ln},frac{1-2{q}_{D}}{eta {e}^{-2r}+1-eta }right]right} > {e}^{-frac{{{epsilon }}^{2}}{4M}}.$$

(11)

Here qD is the dark count probability of the detectors, η is the overall transmittance of the interferometer, r is the squeezing parameter of the M input squeezed states (assumed to be identical) and ϵ is a bound in the TVD of the photon-number probability distributions of GBS instance and the classical adversary. For our experiment, we estimate an average transmittance of η = Tr(TT)/M = 0.32, qD = 10−3, an average squeezing parameter of r = 1.10 and M is the total number of modes. With these parameters we find that the inequality above has no solution for ({epsilon }in [0,1]), meaning that our machine passes this non-classicality test.