/
SOFTWARE AND ARCHITECTURE SOFTWARE AND ARCHITECTURE

SOFTWARE AND ARCHITECTURE - PowerPoint Presentation

martin
martin . @martin
Follow
70 views
Uploaded On 2023-05-29

SOFTWARE AND ARCHITECTURE - PPT Presentation

FOR RELIABLE QUANTUM COMPUTING Poulami Das ETH Zurich Sep 27 2022 2 Why Quantum Computing Quantum computers can fundamentally change what is computable Quantum computers promise computational advantages over conventional machines for many important applications ID: 1000072

errors error decoding quantum error errors quantum decoding accurate decoder data memory fast nisq scalable hardware measurement afs time

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "SOFTWARE AND ARCHITECTURE" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. SOFTWARE AND ARCHITECTURE FOR RELIABLE QUANTUM COMPUTINGPoulami DasETH ZurichSep 27, 2022

2. 2Why Quantum Computing?Quantum computers can fundamentally change what is computableQuantum computers promise computational advantages over conventional machines for many important applicationsMaterial ScienceHigh-energy PhysicsOptimizationInteger FactorizationMachine Learning

3. 3Quantum Computing 101Quantum computers get computational advantage by using properties of qubitsFundamental unit of information- Qubit Quantum algorithms use quantum gates to manipulate qubitsEntanglementSuperposition Exponential State SpaceInitializeQubitsEncode ProblemManipulate QubitsApply GatesMeasure QubitsObtain Outcome

4. 4Quantum Computers are Here!!!Hardware errors limit us from running most practical quantum applications todayQuantum computers with 100+ qubits are already available!High error-rates1000+ qubits by 2023IBM CondorUse Quantum Error Correction (QEC)Noisy Intermediate Scale Quantum (NISQ)20x-1000xLogical QubitPhysical QubitsErrors will happen, live with themImpractical

5. 5Reliability Challenges in the Quantum RoadmapCan software and architecture solutions bridge the gap between applications and noisy devices? ChemShorBeyond ClassicalAlgorithmsDevicesCompilerGroverQECVQE/QAOA?ArchitectureMultiprogramming QC [MICRO’19]ForeSight [ArXiv]JigSaw [MICRO’21]ADAPT [MICRO’21]LILLIPUT [ASPLOS’22]AFS [HPCA’22]Superconducting Acc.[Best Paper, CF’19]HAMMER [ASPLOS’22]Error Mitigation11Understand QEC22Accurate, Fast, Scalable QEC33Multiprogramming QC [MICRO’19]ForeSight [ArXiv]ADAPT [MICRO’21]Superconducting Acc.[Best Paper, CF’19]HAMMER [ASPLOS’22]FrozenQubits [ASPLOS’23]FrozenQubits [ASPLOS’23]

6. 6OutlineBackground and MotivationMeasurement Error Mitigation for the NISQ EraUnderstanding Quantum Error Correction in the Near-TeamEnabling Accurate, Fast, and Scalable Decoding in Fault-Tolerant Systems

7. 7Quantum Errors and NISQ Computing ModelQuantum programs are vulnerable to different sources of hardware errorsNISQ DeviceX q[0]CNOT q[0], q[1]Measure q[0]Measure q[1]q[0]q[1]MapNISQ CompilerProgramX q[0]CX q[0], q[1]Measure q[0]Measure q[1]TranslateExecute01111110CorrectErroneousRepeat TrialsIdle ErrorGate ErrorMeasurement Error11CorrectErroneousCorrelated Error

8. 8The Problem: Measurement ErrorsMeasurement errors are dominant sources of errors in large programsEach trial measures all program qubitsAll measurements must be error-freeq0q1ProgramNISQ Device

9. 9Even Bigger Problem: Measurements at ScaleMeasurement crosstalk increases with program sizeMeasurement crosstalk can increase the effective error-rateCrosstalkIsolatedMultiple

10. 10Prior Works on Measurement Error MitigationExisting measurement error mitigation schemes rely on measuring all qubitsMatrix-based Approach from IBM0001101100 01 10 112n x 2nmatrixState-transformation ApproachFlip-And-Measure (MICRO-2019): Measurement error depends on stateCalibrationPost-processingProbability of Success increases from 50 to 70%

11. 11Goal: Reduce Measurement ErrorsAre high fidelity circuits with partial measurements alone sufficient?Insight: Measure fewer qubits and reconstruct distributionXIdeal

12. 12Need for CorrelationIdeally, we want full correlation and high fidelityXIdealCorrelation?

13. 13JigSaw: DesignProgram gives full correlation; CPM gives high fidelity. JigSaw combines both.. . .Original programCircuits with Partial Measurements (CPM)NISQ DeviceNISQ DeviceGlobal-Mode(50% of Trials)Subset-Mode(50% of Trials)RecompileRecompileP. Das, S. Tannu, M. Qureshi, JigSaw: Boosting Fidelity of NISQ Programs via Measurement Subsetting, MICRO-2021Bayesian Updates

14. 14JigSaw-M: Multi-Layer JigSawEffectiveness of JigSaw can be improved further by using heterogeneous CPMProgram. . .CPMDefault subset size: 2. . .More Unique CPMOther subset size: 3

15. 15Impact of JigSawAverage: 3.1x Best-Case: 8.4x Outperforms IBM’s measurement error mitigation and Flip-and-MeasureGreater effectiveness when combined with other optimizations

16. 16OutlineBackground and MotivationError Mitigation Techniques for the NISQ EraUnderstanding Quantum Error Correction in the Near-TeamEnabling Accurate, Fast, and Scalable Decoding in Fault-Tolerant Systems

17. 17Quantum Error CorrectionQEC can protect quantum information by tracking errors periodicallyQuantum Error Correction (QEC) is more challenging than classical techniquesNo-CloningData QubitsParity QubitsLogical QubitSyndrome ExtractionDecode ErrorsInitializeMeasure Data QubitsCollapse on MeasurementRepeat0101Did an error occur?Where did the error occur?What was the type of error?

18. 18QEC in the Near-Term Using Surface CodesDemonstration of small surface codes represent a significant milestone for QECDemonstration of QEC codes Widely regarded as the most promising QEC candidateData QubitsX stabilizersZ stabilizersBit Flip (X) Errors -> Z stabilizers (A)Phase Flip (Z) Errors -> X stabilizers (B)ABCode Distance (d=3)

19. 19Real-Time Accurate DecodingReal-Time decoding is essential to prevent accumulation of errorsDecoding or identification of errors must be in real-time in cryogenic environmentSoftwareControl/Readout LogicSlowQubits

20. 20Goal: Real-Time Decoding in Near-Term QECLILLIPUT: A low-cost, accurate, and real-time decoder for practical adoptionP. Das, A. Locharla, C. Jones, LILLIPUT: A Lightweight Low Latency Lookup Table Decoder for Near-Term Quantum Error Correction, ASPLOS 2022Lookup-TablesControl/Readout LogicQubitsSyndromeErrorAssignmentLow-CostLow-LatencyAccurateFPGA-basedInsight: Don’t use software decoder, Lookup from Tables -> LILLIPUT

21. 21Step 1: Detection of ErrorsLILLIPUT uses FIFOs and XOR operations to detect error eventsError Detection Event: XOR between syndromes from consecutive cyclesCycle-2Cycle-3Cycle-40000Cycle-1No ErrorStabilizer MeasurementsZ Error on “A”00110110No Error0110AB1001Z Error on “B”Error Detection Events000001100000110000001010XORXORXOR0110

22. 22Step 2: Handling Different Types of ErrorsErrors on Data QubitsGate Errors in Syndrome ExtractionReadout Errors on Parity QubitsReadout Errors on Data Qubits11223344

23. 23Step 2: Handling Different Types of Errors1234Errors on Data QubitsGate Errors in Syndrome ExtractionReadout Errors on Parity QubitsReadout Errors on Data Qubits1234SpaceSpace-TimeTimeDecode Multiple Syndrome RoundsLogical Measurement -> SyndromeDecoding GraphLILLIPUT can handle errors in any operation in the quantum hardware

24. 24Step 3: Error AssignmentsLILLIPUT programs the LUTs offline and performs decoding onlineProgram LUT using software Minimum Weight Perfect Matching Decoder Control/ReadoutQubitsLUTsSoftwareMWPMLUT Data(Offline)Logical Error?(Online)000…000000…001111…111…0010000010000000000000LUT DataError Event…00000100100SoftwareMWPM

25. 25LILLIPUT Operations OverviewLILLIPUT maintains error logs and internal state to accurately track errorsSliding WindowStreaming Mode Operation1Assign Errors to Oldest Round2Account for windowboundary crossingsAvoid premature matchingAvoid inaccurate matchingTrack internal state (in LUT entry)3

26. 26LILLIPUT: Design Overview

27. 27Challenge: Memory Complexity of LILLIPUTThe size of the LUTs scale rapidlySyndrome = 4 bitsEntry = 9 bits error assignment + 4 bits state1324Syndrome = 4 bitsEntry = 9 bits error assignment + 4 bits state2813Cycle-1Cycle-2[d=3, m=2][d=3, m=1][d=4, m=2][d=5, m=2]2243724216148 MB832 B238 KB

28. 28Tackling the Memory Complexity of LILLIPUTWe propose Compressed LUTs (CLUTs) to reduce the memory complexityLUT sizes scale exponentially with the distance of the QEC CodeDistance = 3perr= 0.1%Not All Error Events are Equally likelyUncorrectableStore selective entriesin LUTsLUT entries themselvesstore sparse data-> compress

29. 29Performance of LILLIPUTLILLIPUT: A low-cost, accurate, and real-time decoder for practical adoption42 ns latency, < 7% FPGA logic,Sufficient up-to d=5CLUTs have negligible impact on accuracyLogical Error Rate < Physical Error RateUp-to 107x memory reduction with CLUTs

30. 30OutlineBackground and MotivationError Mitigation Techniques for the NISQ EraUnderstanding Quantum Error Correction in the Near-TeamEnabling Accurate, Fast, and Scalable Decoding in Fault-Tolerant Systems

31. 31Goal: Real-Time Accurate Decoding at ScaleDecoders must be accurate, fast, and scalable for practical implementationP. Das, C. Pattison, S. Manne, D. Carmean, K. Svore, M. Qureshi, N. Delfosse, AFS: Accurate, Fast, and ScalableError-Decoding for Fault-Tolerant Quantum Computers, HPCA 2022SoftwareControl/Readout LogicQubitsDo not analyze the system-level challenges in designing decodersLILLIPUTAccurate but not FastAccurate, Fast, but not Scalable

32. 32Union-Find Decoding (UFD) Algorithm -> AFS DecoderAFS decoder translates the three steps of UFD into three distinct pipeline stages Z stabilizersGrow clustersMerge clustersGraph TraversalReverse TraversalXCluster GenerationCluster TraversalPeelingGraph GeneratorDFS EngineCorr EngineErrorAssignmentsSyndromeLeverage micro-architectural optimizations to improve decoding latency

33. 33AFS Decoder: Design Overview

34. 34Challenge: Increasing Memory CostMemory requirement of AFS decoder scales very rapidly with system sizeAFS Decoder is a memory-intensive designRegistersTablesStacksTechnology for Implementation: Superconducting (tight memory budget) or CMOS (tight power budget) is an open problem

35. 35Conjoined Decoder Architecture (CDA)CDA allows restricted sharing of hardware with negligible impact on accuracyLogical QubitsDedicated DecodersLinear increase in hardware costLogical QubitsSingle DecoderLeast hardware costPoor decoding capabilityLogical QubitsCDAReduced hardware costNegligible impactTimeout?Insight: Allow sharing if PTimeout < Plogical error

36. 36Performance of AFS DecoderAFS Decoder is accurate, fast, and scalableAccurateFast9.96 MB 2.81 MB Scalable

37. 37ConclusionHardware errors limit us from running most practical quantum applicationsSoftware techniques can improve the fidelity of applications in the NISQ EraFault-tolerant quantum computers can power a wider range of applicationsArchitecture and system-level solutions can help us to build accurate, fast, and scalable decoders for fault-tolerant quantum computing

38. Thank You!

39. BACK-UP Slides (JigSaw)

40. 40Bayesian Reconstruction?Q2Q1Q0Prob.0 0 00.050 0 10.050 1 00.10 1 10.051 0 00.051 0 10.21 1 00.11 1 10.2Q1Q0Prob.0 00.10 10.11 00.11 10.7Q2Q1Q0Score0 0 00.060 0 10.020 1 00.060 1 10.471 0 00.061 0 10.091 1 00.061 1 11.86 Q1Q0Q20 1Global ModeSubset ModeUpdate Coefficients000110110.50.50.20.80.50.50.20.80.06=OutputCannot infer solutionCorrect!Q2Q1Q0Prob.0 0 00.020 0 10.010 1 00.020 1 10.181 0 00.021 0 10.031 1 00.021 1 10.70

41. 41Measurement Errors: SourcesMeasure01ResonatorAmplifierSNR+Discriminator1300 K20 mK3-4 KFrequency shift is sensitive to noiseThermal noiseADCLong latencyInaccurate classificationInaccurate classificationSoftwareHardware

42. 42How many CPM do we need?Diminishing ReturnsQAOA-12 (p4) on IBMQ-ParisN-qubit program has NC2 possible CPM of subset size 2

43. 43Device Variability

44. 44Impact of CPM

45. 45Impact of Number of Trials

46. 46Impact of Recompilation

47. 47Impact of RecompilationAverage: No Recompilation-> 1.9x, With Recompilation-> 2.9x

48. 48Scalability AnalysisComplexity is determined by number of unique outcomesJigSaw does updates only for non-zero outcomes (limited by trials)Program Size (Num. of Qubits)JigSawJigSaw-MMemory(GB)Operations (Billion)Memory (GB)Operations (Billion)10010.441.750052.1208.4*Assuming 1 Million trials, and pessimistically each trial yields a unique outcome

49. BACK-UP Slides (LILLIPUT)

50. 50Results for Latency and Hardware ComplexityDecoder ConfigurationFrequency(MHz)Latency(ns)Total LEs/ ALMsTotal RegistersMemoryUtilizationLogicMemory[d=3, m=2]25028353209832 B6%1%[d=3, m=3]240.729.141823913 KB7%21%[d=4, m=2]209.833.4557340238 KB<1%40%[d=4, m=3]244.440.821740953.8 MB<1%[d=5, m=2]232.942246486148 MB<1%Low-LatencyLow-Cost702 KB1.38 MBUp-to 107x reduction from CLUTs

51. 51How to use Compressed LUTs?0x00…0x0F…0xA0…0xAA…0x00…0x0F…0xA0…0xAF…0xF0…0xFF[d=3, m=2]Hamming Weight cut-off = 3Memory FragmentationSegment-ASegment-B16-entry Data Frame10-entry Data FrameDiscard0x00…0x0F…0xA0…0xAA…CompressEntries36b -> 16b416 B140 B

52. BACK-UP Slides (AFS)

53. 53Error Decoding as a Matching ProblemError AssignmentsMeasurement Errors

54. 54Cost Analysis of Memory ReductionDesign ComponentAFS without CDA (in MB)AFS with CDA(in MB)STM (Gr-Gen)1.970.99 (2x)Root Table (Gr-Gen)3.170.79 (4x)Size Table (Gr-Gen)3.460.87 (4x)Stacks (DFS Engine)1.350.34 (4x)Total9.962.81 (3.5x)

55. 55Bandwidth Challenges: Scalability

56. 56Syndrome Compression for Bandwidth ReductionEach error flips two parity qubitsLong error-chains have fewer parity flips3d3 fault locations6d3p possible non-zero syndrome bitsd = 11, p = 10-38 non-zero bits in 1000-bit syndromeInsight: Syndrome data is sparse, can be compressed

57. 57Bandwidth Reduction30x reduction on average Effectiveness increases with code distance