Maximizing Performance on Noisy Quantum Devices with Pedro Rivero: Qiskit Summer School 2024

1.24k views12260 WordsCopy TextShare

Qiskit

Although quantum computing promises many applications in the fault tolerant regime, today's devices ...

Video Transcript:

well hello everyone and welcome back to the K Global summer school 2024 today I'll be your host my name is Pedro Rivero and I'm a technical lead in The quantum algorithm engineering team um today we'll be talking about how to execute your Quantum computational workloads on noisy Quantum hardware and how to fight noise so let's get started all right so the first thing to comment about today's talk is that we can focus on how to do these things from two different point of points of view first of all we can think of it in terms

of like the science behind it uh we're going to be talking a lot about different techniques to mitigate and suppress errors so we can think of why do these techniques work from a scientific point of view or we can take another approach and think about it more as an engineering problem right what are the things that we need to incorporate into a problem to actually solve the issues that we that we find and this is the approach that we're going to follow so along this lecture and going to be giving you a bird I view

of the different techniques and routines that you can run to improve your results and just the necessary things that you need to know from a theoretical point of view to understand the different configurations and different steps that you need to take but we will not delve into the deep mathematics uh that sustain this these different methodologies so with that out of the way let's get moving right as a refresher we here start by introducing the different steps that you have in a classical Quantum computational workload or an algorithm first of all as you are all

aware uh we call these Cas kit patterns and step number one starts with mapping your particular problem of Interest into a Quantum computational like formalism right so that you can actually get whatever science you're trying to to solve and run it on a quantum computer step number two would be to optimize that mapping or that particular circuit or observables or set of constructs that you came up with so that they can run more efficiently on your Hardware step number three then is to run things on the hardware and then finally on step number four you

will get the outcomes from these executions and analyze them to get the results that you're looking for particularly in this talk we'll be talking about step number three so how to execute these things on Quantum hardware and the different trade-offs that they are and the different things that you need to take into account when when doing so all right so why is this a problem why is this a topic worth an entire lecture in this in this summer school well the main reason is that Quantum systems today are very noisy as we all know there's

this famous picture that you can see on the on the slide that actually says well your quantum computer is broken in every way possible simultaneously and this is actually not far away from the from the truth there are many different things that can go wrong on a Quantum system this technology is actually very very difficult to control and and maneuver and so we need to find ways to actually um get results out of them before we reach the fa tolerant regime right so what can we do about this well generally speaking we need to find

some interim solution to this problem of not having fa tolerant computers today and these interim ways of dealing with noise can be classified essentially into in two different approaches one of them would be to limit the amount of noise that we can see in the in the hardware right so in some way we try to reduce the amount of noise that is present in the in the hardware the second approach would be to okay let's let's the noise do its thing and then afterwards we can filter it out and clean our signal right these are

the two main things that we can do and we'll see that they are overarching the whole presentation moving forward so what what does this look like in reality right so this usually looks like instead of getting our Quantum circuit the workload that we want to run right and executing directly in a naive way what we're going to do is first we're going to run modified quantum computations that are equivalent in some way to the original one then we're going to collect the outputs from the from the quantum computer and load it into a classical device

and we're going to process them and compute some improved results compared to what we had originally from our naif execution so what are the different ways that we have to fight the noise in Quantum systems well generally speaking we have three different approaches to this and and they map very nicely onto those two different ways that we expressed before as as interim Solutions the first one would be to do error suppression and maybe you have heard of this in the past and are not quite sure what it means or how it relates to other techniques

like Quantum error mitigation or Quantum error correction right so let's maybe clear clear the room here and and and see kind of what this this actually means or entails so Quantum eror suppression is essentially a set of techniques right a type of techniques that is trying to reduce or avoid the impact of errors in in your computation kind of like what one of the two steps or the two ways that we had before right in the case of Eros suppression this usually happens before or during the execution itself and it requires additional classical resources for

this so maybe you need to get your your circuit optimize it before you you run it compute a certain sequence of gates in different ways and then then do your execution and that optimization process can be actually quite complex then we have Quantum error mitigation and how does this differ from Quantum error suppression well it's actually the the second way that you that we had to deal with noise in an interm way right in the case of quantum eror mitigation we're not trying to avoid or reduce the error that happens during the computation necessarily what

we're doing instead is we're letting the noise happen and then we're cleaning up our signal right we're filtering out the noise from the results when we once we get them therefore a mitigation happens usually after the execution or during the execution typically and requires usually additional Quantum Resources right so if you're running something on your on your Hardware you're probably going to need to run more things or during more time and last but not least we have Quantum error correction this is the the crown jeel of dealing with noise and errors in quantum computers and

what it what is all about is detecting and fixing the errors as they occur right so that your computation remains as close to fa tolerant as as possible right and actually Quantum correction is a key piece in achieving fa tolerance in Quantum Computing this is usually run during execution and it requires both Quantum and classical resources additional of course to uh what the Baseline would be the problem with Quantum correction again is that it the amount of extra resources that you need in order to run it the overhead that it entails today makes it very

very difficult to to use in state-of-the-art quantum computers and so this is really not a solution that we can use today but stay tuned for for coming years all right let's focus then on Quantum eror suppression and Quantum eror mitigation but before we jump into that we saw in a previous lecture from this summer school what the different types of noise are in quantum computers um so let's maybe do a brief recap of where these sources of errors or the different types of noise can can arise from and without trying to be comprehensive at all

and just giving kind of like a very general General and Broad categorization of this of these sources of Noise We can essentially see three three places where noise can arise the first one would be the so-called spam errors with that related to imperfect State preparation or imperfect measurements where you're trying to get some readout from the from the machine but it's actually not not perfect not ideal you get some some errors along the lines the Second Source would be gate errors you're trying to perform certain operation on your cubits and you cannot do it perfectly

well right and the last one is everything that is not due to actual operations that you want to perform on the cubits so what I'm describing here as environmental noise this is anything that is not directly related to actually performing something on on the cubits and it could be things like cross talk or things of the sort all right so before we move on to explaining the different error suppression and mitigation techniques let's maybe have a quick recap of how you can use our very own kiss kit runtime to run and execute things on the

hardware so what you're seeing on screen right now is a very brief code snippet showing the different commands that you have to run on kit runtime to execute using The Primitives so we see that we're basically just importing the kkit runtime service and we're importing the sampler V2 and estimator V2 Primitives very important where all of these slide lights are prepared to deal with Primitives version two so just word of warning there we load up up a service so we authenticate with the service and then we get a backend in this case we're just getting

the least busy because it's the most convenient for for this exercise here we instantiate our Primitives either a sampler or an estimator depending on of course what you want to do and then we have some baseline configuration now these Baseline configurations we're going to drag along through the whole presentation and they could be different in your in your particular case but the reason why we chose them I will I will motivate now so first of all the default number of shots both in the estimator and in the sampler we're going to set to 1,24 this

could be more or less depending on what you want but for the sake of this example 124 should be should be good enough and then for the estimator we have two special options that we will explain at the very end of the lecture which is called optimization level and resilience level and we want to set those to zero in this particular case the reason being that we want to just level set all of our configuration so that there's no error suppression or error mitigation Happening by default and we can then start adding it manually ourselves

with the different granular options that runtime provides you can find also here some references to the estimator and and samper options if you want to deep farther into what things you have available for you all right so let's begin with error suppression as we mentioned before error suppression is trying to avoid or reduce the amount of noise that happens during execution so we're going to be seeing two techniques in the realm of error suppression one of them dealing with this environmental noise that we described before called dynamical decoupling and the other one dealing with gate

errors uh in this particular case we're talking about a technique known as po twirling or randomized compiling so let's see what these techniques are about first of all dynamical decoupling um is useful in the case where you have errors of the sort of cross talk so for instance you may have your different Cubit connectivity in the in in your device and one of the cubits may be idling right it's doing nothing because other operations are happening in some other places in your in your device and while that Cubit is idling and waiting for operations to

arrive to him it remains connected physically to other cubits and this means that other operations happening on on these other cubits can actually affect the state of the Cubit that is just waiting there and this obviously induces some errors so what do we do we can actually apply gates to that cubid while it's idling in a way that they amount to to the unary to the identity um so that at the end of the day your overall evolution is preserved right you're not doing anything different to what you had were doing in the beginning but

now the cubid is not just sitting there and waiting right and possibly being affected by by other sorts of noise so this is what dynamical liap need is at its score in a in a very simple way now uh we have to make sure that when we apply these these these Gates these new operations to make the cubid not idle we're not adding more error than what we're reducing what do I mean by this sometimes if you add this new gates to make the Cubit stop idling this may induce other kinds of Errors right so

you may be removing some environmental noise or cross talk but you may be adding gate errors right and so there's a tradeoff here and you have to be very very aware when you utilizing dynamical decaling of where where are you because if the cross talk noise is very very small as is the case in for instance our latest generation devices like Heron then dynamical decoupling may actually make your results worse so you have to be very very aware of what you're running when you use dynamical decoupling all right so how does dynamical Decap look like

so in the screen you can see that we have here a Quantum circuit expressed as a sequence of pulses of microwave pulses that are going to take place on the hardware and we see that we have certain areas where our cubits are idling right particular these two that I'm marking on the screen so that means that while those cubits are over there idling error can occurs as we have already explained so what dynamical Decap looks like is something like what you're seeing on screen right this is a one very particular example of dynamical deing with

a very easy sequence where all that you're really doing is you're using that Cubit you are negating it with an xgate and then you are negating it again with another xgate so at the end of of the day the state of your Cubit is going to remain the same but we can see that now the those two cubits that I make there are not idling waiting for new operations to arrive and this is basically at the heart of dynamical decoupling one interesting thing to notice is that these other two cubits cubits 2 and three are

they seem to be idling as well however we don't care as much about that idling time at the very beginning of the circuit because we can just deal with that VI State preparation at at the right moment and so if we just leave it like this we avoid introducing extra gate errors that at the end of the day are not going to solve anything in terms of the Cross talk because the state preparation is just going to be delayed all right so how do we add how do we add dynamical deap using runtime very easy

so what we start doing is we're going to import the sampler options and the estimator options right and maybe this is something to point from the previous slide as we see here when you're instantiating your sampler or your estimator you're actually allowed to pass in a number of options so these are the options that I'm going to show you how to configure and then you can pass them on to your to your primitive and and be done with it right let's go back to the right slide all right so we're importing the samper options and

the estimator options and then we're just configuring it at the Baseline that we explained before right so we're configuring for the default shots to be 1, and 24 and in the case of the estimator also so that the optimization level and the resilience level are zero again just to avoid having any any Baseline mitigation or suppression happening to you all right and then it's very easy from that moment on it's super super easy all that you do is you say options. dynamical deap and then you say enable equals true and that's it that's really all

that you have to do it's that easy and and that will get you dynamical Decap configured and running when you once you submit your workload to run time but there are other things that you can do on top of that right you can select the sequence type that you want to insert so in the previous example we kind of saw that the sequence that we're inserting is XX right so we're inserting a couple of x's and what this does again is is just an identity but we can try other types of sequences and you can

visit the documentation and see all the sequences that you can insert um we can for instance do an X Plus x minus or an xy4 and in certain cases those other sequences are going to give you better results so experiment with it play with it and and see what makes the most sense for your particular workload other configurations that we have for dynamical decoupling is the extra slack distribution so in case you have some slack time that you have to deal with during dynamical decoupling you have to choose where you place that right and so

in here the default configuration is to just place it in the middle of the of the time that you that you're trying to the dynamical deing for and last by not least you also have the the scheduling method so how are you going to schedule your sequence of pulses and in this case the default is as late as possible and that's it and that this is really all the configuration you have for dynamical deap as you see very very easy to use moving on we have randomized compiling or twirling now this a very very general

technique it comes from randomiz benchmarking originally and we're not going to try to explain exactly all the details about this the mathematical intricacies are quite large and could take us a long time to explain so I'm just going to try to give you an intuition as to what it is very briefly in the context of error suppression right so um what are the particular features of poly twirling that are relevant for for error suppression now in general we use twirling to convert any arbitrary noise channel into other forms of noise and when we say arbitrary

noise channels we really means any arbitrary channel that is cptp and we can transform that into a different type of of channel the way this is accomplished is actually quite interesting because instead of running a single circuit that has been modified what we are going to do is we are going to run an ensemble of circuits right that they're going to be equivalent in in in some way and that what the effect that that is going to have is to make our noise have a different structure now out of all the different possible twilling techniques

that that we have the one that we're looking at here is the so-called po twirling which is a special kind of twirling that projects the channel That your twirling into the PO basis right so it converts the channel into a POI Channel how does this work very easy this is the first identity that you have here on on screen so you have a general cptp channel that we're calling Lambda over here and all that you do is you sandwich it between PES just in this fashion where the P before and after are are the same

that's why they are called P both of them so you sandwich it between po but you do that randomly right so you instead of doing it just once you do it several times and every time you do it you choose a different po at random now when you aggregate all of the results of this Ensemble of of new circuits what you end up with is effectively a new channel that we're calling Lambda P which is just a projection of that original channel into the PO basis it's it's a PO Channel actually um all right but

you may be wondering if we want to do this to do this with the with the noise in our circuit how can we actually apply it effectively to the to a Quantum circuit right at the end of the day we don't it's not like we see the noise Channel expressing our circuit and and we can just go and do it and and you're completely right so to understand how it works let me present you a second identity that you're going to need so say that you want to twirl the noise that is associated to a

certain unary right what you do is for that ideal unary you need to find an ensemble of different Expressions that are equivalent to the original unary that you want to twir so what that means is we're going to have some expression of the form a PO then the original unary and then another po p Prime right in such way that this sequence p u p Prime is the same as as the original U now this is a again this these Poes are chosen a random so it's not like you just have one identity you have

several of them an example of this for instance if the unitary is a c not we see that this original c not is actually equivalent to just doing a set on the on Cubit zero then the the C note and a set on Cubit Z again or doing an x on Cubit Z then the C note and then X on cuit Z and X on cuid 1 you can see that actually all of these things are equivalent so you're is the overall same operation that you're applying and so the idea is that you're going to

choose randomly which one of those you you execute hence the Ensemble of circuits now in this particular case of the C we're only showing four of these equivalences but there are total of 16 and the reason why we have a total of 16 is because well for two cubits we have 16 possible pis and the C not gate is a cliford gate which means that for any po you're going to be able to permute it through the the C not gate and it's just going to transform into another po which means that effectively you have

as many combinations as you possibly can which is 16 and the same thing happens for the for the ECR gate here uh nothing too fancy about it just different sets of of of combinations that you can have here all right before we move on let me reexpress this last expression of p u p Prime equals U into a form that is to be slightly more useful to understand exactly what the the math behind is without doing any numbers and this is the this is the expression now you can convince yourself that these two are actually

equivalent by just multiplying the both sides by P Prime to the to the right and then you see that you have p p Prime P prime those two P primes are going to cancel and then on the second one you have P Prime so at the very end you can see that regardless of the unary P like saying that p u p Prime is equal to U is the same as saying that puu is equal to u p Prime okay this is just algebraic construct nothing nothing too too fancy here but you'll see why why

we want to use this uh for the derivation all right so now that you have these two identities first of all what the what po tling does to to an arbitrary Channel and transforms into a into the PO basis and what happens when you sandwich a particular unitary preserv in the the the overall operation we can see what happens when we sandwich a unary but in this case when the unary is is noisy right it's not DL unitary as as before so this is what we're expressing here right same as before we have p u

Tilda meaning the original unitary the ideal one but with noise at it and P Prime right so we're breaking this apart we're just expressing that noisy unitary as the noise Channel followed by the ideal unary and that's just the that's just the way that we Define it there's nothing fancy going there now in principle what we would like to do is would like to be able to have in here where I'm marking we would like to be able to insert a p there so that then this operation this identity at the beginning works and I

can just perform the twilling of my channel but this is not possible of course because the ideal unitar is is stick to the to sticks to the to the noise Channel and you cannot separate them that's part of the problem that we have right so what we do is through sandwiching between this p and p Prime we can now apply our previous identity and we say okay in here we have upu Prime which is as we see here the same as upu Prime and it it can be substituted by puu right so we can substitute

this thing here with this thing here and then where we're left with is p the the the noise Channel and P again and that is exactly our first identity right so you can convince yourself again doing very very simple symbolic algebra here is that applying this this emble in this way is going to give us effectively a new unitary noisy unitary but in this case the noise attached to that noisy unitary is going to be po noise instead of being any random noise that could be attached to that unitary now this is all that there

is to it right it's a little bit of a little bit of a construct and if you want to see the mathematical derivation I encourage you to look at the literature but it's it can take a bit to to actually follow along um what do we use this for right so we essentially use it to as we said to suppress the effects of noise and in particular we want to suppress the noise of of of coherent noise we want to suppress the effects of coherent noise we know that coherent noise tends to accumulate quadratically when

we run it on a circit and on the other hand when we apply Po and we get po noise that noise accumulates linearly only so by virtue of converting all your noise into po noise what you're doing is you're making the noise stuck up less dramatically or more slowly right and so in that sense you're suppressing the noise but also there are secondhand effects to do in P twirling that we will discuss a little bit later on I will just give you a brief introduction right now which is that by virtue of having poin noise

instead of any general run random noise that you could possibly have you're making your noise somewhat more predictable right so for instance if your noise channel was to be a an amplitude damping channel right you know that you you would be going all the time to the to the ground state of your system right um and so that will mean that the way your your results will degrade is going to is going to be in in that particular way of the of the amplitude dumping channel right and if you had another kind of Channel then

the way your results will degrade will be in a different way and you fundamentally don't know how it's going to degrade however po noise we know we know it to be unial and this means that your the quantum state that you're representing is always going to degrade in the same way which is towards the maximally IM mixed state and this already gives us some information that we'll be able to use later to perform some error mitigation as we'll see once we get to to CN all right enough of the theory let's jump into how we

can use po to in using runtime right once again we see the exact same first three lines where we create either the sampler options of the estimator options depending on on which primitive you want to to apply PO toing with and then we have a bunch of configurations that we can turn on first of all we have twirling do enable Gates and we are setting this to true and what this is going to do is it's going to perform P twirling on every two Cubit gate that you have in your in your Circ then we

have another one which is called twirling enable measure and we set that to false um so what this option in particular is going to do is going to do po tling but it's going to do it on the measurement Gates now we're setting it to false right now because this is what we have shown so far in the previous slide from a theory point of view but we'll get to what measurement truly ISS in very shortly the other options are the number of randomizations this has to do with this emble of circuits that you're creating

as we mentioned before so how many how many squ you want to have in your emble chosen randomly of course and that's that's what number of randomization stands for and by default is set to Auto so it will just choose it for you depending on on the particular workload that we that you provide then you have the shots per randomization and the shots per randomization really tell you how many shots you want to take for every circuit in the emble right and again by default is set to Auto so what this will do basically is

it would look at the number of shots that you want to get in total it would look at the number of randomizations that you have and then we'll figure out how to how to split that up accordingly and then last but not least we have the twilling strategy now this is a little bit of a fancy fancy configuration that you can have here and I encourage you to read the twilling options that we have in the documentation if you want to fully understand what it's happening here but basically the the default what we're going to

be doing here is where we say okay just twirl the cubits that are active in the particular layer that you want to to twirl but do it in an accumulative way what this means is if I start toly my circuit and then in the first layer I see that the active cubits in that layer are Cubit for instance 0 and one then I'm going to T Cubit Z and one those are the active cubits then I move to the next layer and the next layer I so happen to have cubits two and three being active

right so I'm going to drill cubits two and three but I'm also going to Thrill cubits zero and one because it's accumulative right that's the the active accumulative strategy of Performing the twirling and that's it you configure your your twirling like that pass on the options run your workload and it will run time will take care of that for you in a seamless way okay moving along error mitigation so all that we have talked about so far was error suppression right so this way of trying to reduce or avoid the impact of noise during the

execution now we come to error mitigation and again as a reminder when we do error mitigation what we're trying to do is okay we let the noise occur we let the noise happen but then we want to clean up our signal right and get a better result and for this we're going to show two particular techniques one of them is called T-Rex and it has to do with how we mitigate readout errors which is the special type of spam errors that we introduced before and then CN or seron notes extrapolation which is going to deal

with gate errors again let's let's get to it so before we begin explaining how to mitigate errors in the the readout let me maybe give a very brief introduction as to what these errors look like well first of all like there are cost simply by the fact that when you have a cubit and you just you you measure it you may not measure exactly what what was prepared in the Cubit right like you may have some some bit flips happening during that measurement process the nice thing about this kind of error is that it can

be modeled as a classical no noise Channel happening on the on the bits in the readout at least in a simplified scenario then you have that you can measure these readout errors on a per cubit basis right so you can look at Cubit zero and say okay Cubit zero what is my my readout error here then Cubit one what is my readout error here and this is what we're showing in the picture below so in there we're seeing four matrices and the way these matrices read is so for Cubit Zer we have that the probability

of having the state zero prepared and reading state zero is 99% basically 99.2% however the probability of having state zero prepared but reading State one it's 8% so we see that this is itself just a random bit flip that can occur with low probability but that sometimes can occur in the same way we have the probability of having a one prepared but reading a zero being 4% and the probability of having a one prepared and reading a one which is 99.6% and we repeat that same process for every single Cubit right this is done via

calibration process and what you can do then is you can construct a similar Matrix just for a higher dimensional space right for the space of all four cubits at the same time and you do that simply by having the tensor product of all these single Cubit matrices that will give you a readout readout error transfer Matrix for the for the whole Space of 4 cubits now that is a little bit of a simplification and there are certain forms of read out error that may have correlations and so you may not be able to do that

necessarily via a a simple um you know simple readout errors for the single cubits but it's a good enough approximation for for us today okay now that you have this Matrix the big Matrix that is represented here the way reads is for instance if I have state 000000 0 what is the probability of reading 0000001 right or so on and so forth now if we have the Matrix we could in principle invert it right classically and then apply that to to our sequence of readouts and that way we will be mitigating right because we're applying

the inverse of the of the readout error Matrix now the problem with this is that obviously this Matrix is going to scale up exponentially with the number of cubits the more cubits you have you know the the the larger this Matrix is going to be and very quickly you're going to run into a situation where inverting the Matrix becomes unfeasible um and so what do we do about that that's that's the the the main problem as to how to perform readout error mitigation here so for this we have a technique called T-Rex which stands for

twir readout error extinctions and what this does basically is it uses this concept of measurement twirling which is essentially just doing P twirling but instead of doing it on a unitary in your in your in the middle of your circuit you do it in the measurement Gates right so before we had all those equivalences with the Cotes for instance and what we have now is that a simple measurement a simple measurement measurement gate happens to be exactly the same logically speaking as performing first an X gate which is a negation of the quantum bit the

measurement and then a negation of the classical bit right you can you can convince yourself that these two are actually the same right and so in this way we're we just found an equivalence and we can apply the same theory that we saw before in order to transform the the readout error that is attached to this measurement gate into a a different type of error that is going to be diagonal and that's the that's the key Insight if you apply T-Rex in this way the readout error transfer Matrix is going to be diagonal and so

it's going to be trivial to invert that's the that's the power of of of this technique now the cost that you have to pay for doing this technique is basically that you need to calibrate that diagonal matrix this this can be done relatively efficiently and so there there are no no problems on that front but you have to pay the extra cost of running a bunch of identity circuits and try to identify what that Matrix looks like after performing the the T measurements now one caveat of this technique is that it can only be used

for expectation value problems and that is because this readout transfer Matrix is expressed for expectation values and not for not for the readouts that you usually have in terms of B strings so T-Rex can only be applied fundamentally to problems that want to compute expectation values all right so how do we perform T-Rex using runtime uh in this case you you see that instead of importing the sampler options and the estimator options we're only importing the estimator options right the reason being that because T-Rex only applied to expectation value problems it cannot be applied to

sampler so the only the only primitive that we can apply T-Rex to is the estimator naturally so so very easy we just have the estimator options once again with our Baseline settings of optimization level zero and resilience level zero just to clear the ground of any any um default mitigation or suppression that happens and then we configure TX right how do we configure T-Rex it's very very easy you go to resilience so this is the options resilience settings and you say measurement mitigation equals to true and that is automatically going to going to activate T-Rex

for you you have once again a couple of additional options that you can configure which is the measurement noise learning so this has to do with the calibration the learning of this diagonal matrix how many randomizations do you want to run uh in order to you know do the the measurement twirling and and calibrating The Matrix and the default is set to 32 in the the same way you can also choose how many shots you want to perform per randomization for learning the the noise and that is usually set to Auto which means that it

will just take however many shots you have by you you want to to run it will take the number of randomization that you have requested and it will just compute the amount the amount of of shots per randomization that have to be run that's it that's all the configuration that you have for for T-Rex and that's all that you need to do to mitigate your readouts in using runtime so very very easy there's really no no point in not doing it because it's a powerful technique that it's super super easy to use now one thing

that maybe we should keep in mind is that we saw before that when we were talking about po twirling that we had this config this option called enable measure right tling enable measure and we set it up to false what that option does now that you know is it performs twirly on the measurement Gates right and this is of course necessary in order to perform T-Rex so whenever you say that the measure mitigation has to be turned on so it's it's said to true then the options toly enabled measure is automatically set to true as

well because it's required for T-Rex if you want to learn more about the different options that you have and different explanations you can of course always visit the documentation and and then you'll be able to find really nice explanations as to what you can do with this with this software all right right so the last error mitigation technique that I will be talking to you about today is Ser noise extrapolation now this one is I guess the the lengthiest one to explain but it's also the most powerful one so stay tuned first of all Zer

noise extrapolation it's it's a very very easy technique to understand from a conceptual point of view but it's a rather difficult technique to implement uh once you go into the details and try to see what you have to do on the hardware so it's a really good thing that IBM runtime provides this basically at no cost for the user right because previously before we had this these tools it was actually very very involved and you have to be very very much a domain expert to be able to to do these techniques so CN is divided

into two faes very simple you usually have some noise Factor right some amount of noise that you have in your circuit right and just for the sake of of representing we're marking one as the base level of noise right so when you run in your circuit without without doing anything you just say that that's your noise level one and you compute an expectation value right out of that again CN similar to T-Rex is only going to work for expectation value problems so again only for estimator okay so you compute your your expectation value and you

see here that it's roughly 39 38 something like that and the exact value that you're expecting is actually one so obviously you are not getting quite the right result that you're that you're looking for so what do you do to mitigate this this noise these errors that you have well first of all in there are two phases in CN and the first one is you just amplify the noise that you have in your in your circuit and take more measurements of these expectation values right so in this case for instance we have a noise factor

of 1.5 meaning that I make the noise 1.5 times larger as in the Baseline and a noise factor of two which means that I'm doubling the amount of noise that I have in my system and as we see fair enough as we make the noise larger and larger and larger our result degrades more and more and more and gets farther away from the exact value that we wanted to measure but now we go into the second step of or the second phase in in CN which is extrapolation what we can do now is we take

this data points that we have and we use them to to fit a curve and actually extrapolate what the what the expectation value would be in the Sero noise limit and this is what it looks like right so in this case we have fit an exponent itial curve and we have recovered a value that is very very close to the exact so the nice thing about CN is that as you can see here it has great potential for mitigating errors but the downside of it is that again its only value isly valuable for expectation value

problems and we'll see later that it's actually kind of a little bit unstable so you have to be careful when you're doing CN because you could also overshoot and and go far away from from your exact result all right let's take a look so this is the the overall picture of what CN does very simple you just make your results worse incrementally and then you try to extrapolate from that information what you would have if there was no noise but how do you amplify the noise to begin with right like how do you do how

do you do so that your your results get worse and worse and worse um this is actually a kind of like a very difficult problem to to solve and many people have been trying to to address this for for for quite a while in particular we can look at three different techniques for amplifying the noise the first one is pulse stretching this technique was introduced in a paper in 20129 by some of our colleagues here at at IBM and the and it was featured in nature so what the technique is really about is let's take

a look at the microwave pulses that are happening on our Hardware right and now let's try to come up with pulses that represent exact same action on the cubits but in a sense are more susceptible to noise in a way that is actually controlled right this is this is the key the key concept here in in noise amplification is that you have to be very very careful and and to control the amount of noise that you're the extra noise that you're putting in right so what this means in the particular case of pull stretching under

certain assumptions and simplifications is that instead of having very brief pulses you're trying to extend the pulses so that they last for longer but of course they amp it has to be smaller so that the over oper operation gets gets preserved so you can think of it as you're kind of returning the area under the curve of the pulse while making the pulse stretch out so it's more susceptible to noise and if you do that in very in a very precise and controll way you can actually increase the noise in a in in in a

suitable manner so that you can recover information later via extrapolation the Downs side of this technique even though it's actually very cool is that it's it's difficult to reproduce it requires lowlevel access to hardware and it's sometimes not trivial how to do it and and you really have to be an expert on using the the hardware on these devices to be able to to perform this kind of noise amplification so people start coming up with Alternatives and one of the very first few that appeared is this concept of gate folding now the way gate folding

works is is very simple because it's based on essentially a htic where you have one gate right that has some certain noise attached to it and and so what you're going to do is you're going to apply that gate then you're going to apply its inverse and then the gate again so the overall Evolution that you're going to have by applying gate inverse gate is the same as just applying the gate right but now you're applying more operations and therefore you expect the noise to to increase right this is very nice for instance in the

case that that that g is a c not right because the inverse of a c not is also a C note and so you move from applying one C note to applying three Cotes and that would essentially amount to a noise factor of three because you have three gates so you have three times as much noise um you could keep doing that for instance five Cotes again is the same as one but it's introducing five times as much noise so this technique again is a heuristic the the theoretical warranties that come come with it are

very limited to very particular and specific types of noise but we have seen that in practice it can actually give you very nice results and it's kind of cheap to do right you don't have to do any calibrations you don't have to learn pole sequences you don't have to do anything anything like that right it's just as easy as as doubling the gates or tripling the gates or or whatever you want depending on the amount of of noise that you want to insert all right so our very last noise amplification technique is one that was

featured in a very recent paper by IBM that showcase the utility of quantum computers in finding solutions that compete with classical capabilities or classical Methods at least when they're trying to do Brute Force calculations this technique is known as as probalistic error amplification or PA for short and it requires certain level of of learning the noise that happens in your cirquit and then you know performing some additional steps but it's not as involved as the as the pull stretching one so it's kind of like an in between between between the two uh the nice thing

about this technique compared to the to the previous one the gate folding one is that it has very very strong theoretical warranties and theoretical backing that what you're doing is actually correct and that the noise that you're inserting is the noise that you need so so this is what I have for you in terms of noise amplification we'll later revisit this notion of of PA and I will maybe talk a little bit more about in detail about about how it works so the Second Step In in CN is extrapolation right now A word of warning

here many people think that the difficult part in CN is noise amplification because of course it's not trivial how to do that we've just seen the the different methods that you have the difficulties in finding itical warrantees that the noise that you're applying is exactly the amount of noise that you want to apply and so on and so forth and and there's there are reasons why people think that noise amplification is actually very difficult because it is the problem is that that that line of thought sometimes lead to people thinking that the extrapolation part is

almost trivial right just because it's something that we're more used to on on a classical setting however it's it's actually key if you want to get good results from from CN the extrapolation part plays an equal equal important role as the noise amplification part and it it's a matter of Make It or Break It Like It's just that simple if you don't do good extrapolation you're not going to get good results and actually being able to perform good extrapolation will get you a long way so what are the complications with extrapolation let me just point

out a few of them well first of all we have C certain theoretical and experimental results that predict the decay of our of our expectation values is going to have an exponential shape right this is particularly the case when we twirl the the noise in our in our in our circuit why is that well we mentioned before that when we tailor the noise and make it into a p Channel we kind of know that as our Quantum State degrades is going to tend always into the maximum Mi State and so for pretty much any poly

observable that you can um that you may want to compute that means it's going the the expectation value of that poo Servo is going to 10 towards zero so this information we have now right we know that whatever it is is going to Decay into zero right however this this curve looks like so once again twirling is also useful as a has the secondhand benefit of assisting the extrapolation in CN now what's the problem with exponential extrapolation the the problem with exponential extrapolation is that while it can mitigate very aggressively and it can mitigate a

lot of the error that you can have in in your computation it's also very unstable and the reason why it's unstable is because you you don't know the fundamental scale of the exponential curve that you're fitting right the scale of this curve as you see here is given by this a right and that a is in and of itself the zero noise limit that you're trying to compute right so in a sense in order to perform a with extrapolation you need to know the result of the extrapolation this is of course not not not easy

to deal with right so in practice what can happen when you're doing exponential extrapolation is what I'm depicting here in the in the bottom of the slide right we here in every single one of these um of these pictures we have the exact same points noisy noisy expectation values but we see that we can actually fit several different curves that just vary widely right so in the first case for the exact same three points you have an an predicted zero noise expectation value of zero in the next one the Z noise expectation value is. 2

the next one is 0.5 the next one is one it just seems like all the curves are fitting right so in that sense exponential extrapolation becomes unstable um it it translates into the error bars of the extrapolating result being just blowing up and being just just massively large so exponential extrapolation can be tricky on that front what can you do if if exponential extrapolation fails well you can resort to more conservative approaches such as polinomial extrapolation now polinomial extrapolation is nice because it's stable right um and the reason why it's stable is because it retains

this the this rough scale of your of your noisy results of course the the downside of that of retaining the scale of your noisy results is that you're are not going to be able to mitigate as much as you could with a with an exponential extrapolation so there's there's a tradeoff here to take into account and depending on the problem and the regime that you're working with you may want to use one or the other um so all all in all exponential extrapolation and extrapolation in general can be very careful but you actually need to

be careful about how to how to deal with it let's see how we implement this in runtime then okay so this is basically all the explanation for Ser noise extrapolation um once again we begin by importing our estimator options CN is a technique that only works with expectation values so it can only work with estimator no sampler allowed here so what do you do we Once Again Begin by setting our Baseline options to optimization level zero and resilience level zero just to avoid any any additional mitigation techniques being there by default and then you configure

CN and CN again super easy uh to configure you just do options. resilience. CN mitigation equals to true and and that's it that will turn on CN for you nothing more to do now if you want it you can play around also with some configurations inside of CN for instance you can configure what noise factors you want to apply to your circuit and the default values here are 1 three and five the reason behind it being that the default amplification that runtime uses today is based on gate folding and gate folding because of the structure

that we explained before of applying either say one c not or three C Nots or five c not that's the natur those are the natural noise factors that you can work with however we have ways internally to deal with with other sort of of noise factors fractional ones so you could in principle also use 1.52 or whatever you want but the default is set to 135 and you can play around with it one reason why you may want to change this particular setting from 135 to something else is if your circuit is very very deep

so if your circuit was very very deep then you know that by the end of the of the execution the noise is going to be very large right even at the at the base level of noise and so your result is going to be very very damped right it's going to be um towards the the tail of the exponential so if you multiply by three the amount of noise that you have there you're pretty much going to lose all the signal right so a good way to deal with that in the case where you have

very deep circuits would be to use other noise factors for instance 1.2 and 1.5 those noise factors would work better in those in those situations um the last option that you can configure with CN would be the extrapolator right and now the latest version of of The Primitives in runtime have this really cool option that what you can actually say is you can provide several extrapolations instead of just one so in this case by default we have exponential followed by linear and what this means is the extrapolator is going to try to use an exponential

extrapolation if it fails According to some heuristic that has to do with these error bars and instability that we already explained then it's going to fall back into just a linear extrapolation and it's going to give you a result that it's a little bit more stable uh although probably mitigating slightly worse so these are the options for for CN with runtime again very easy to use really low barrier of Entry none of the amplification you have to take care of none of the extrapolation you have to take care of that's all handle for you through

these very nice and simple options and if you want to learn more about them again as with all the other mitigation techniques we encourage you to visit the documentation and and read more about it all right so since this utility paper that IBM published some time ago has shown to be so relevant in pushing the field into kind of like a new era that we call the utility era let's talk a little bit about about probabilistic error amplification for CN right how can we actually use this or what it means when we say that we're

amplifying the noise probabilistically um and how can we use it with runtime first of all the the first thing to understand is that PA is just the amplification it's just a particular way of doing noise amplification within CN that's the first thing the second one is that it is based on a very similar concept to twirling actually it performs twirling internally um that has to do with this execution of ensembles of circuits instead of just one one circuit itself so what it basically does is it has a the original circuit and then it creates a

bunch of equivalent circuits in some way but it inserts some random additional Gates that are going to have the effect that cumulatively once you look at the at the results coming from the emble of circuits completely you're going to see that the noise has been up ified so how to do this well first of all you have to analyze all of the different layers that you have in your circuit so this is very circuit dependent you analyze all the all the layers in your circuit and then for every single one of those two layers there

are two tasks to perform the first one is to learn the noise that is associated to that layer and that's done in three different steps the first one is to twirl the layer so that the noise that you have to learn is poly noise again more predictable and easier to deal with the step number two would be to build a circuit where you repeat identity pairs of of layers over and over and the step three would be to derive from the execution derive some Fidelity for each noise Channel meaning each each one of these layers

that's the first task that you need to do so you basically need to run some calibration to learn the noise that you have in the different layers in your circuit the second step or the second task that you have to do is to inject the noise so now that you have learned the noise you have calibrated it kind of like we were calibrating the the noise in T-Rex before now you have to inject it somehow and the way we inject this is Again by using this emble of circuits where you're inserting random Gates depending on

the on the particular instance of the of the random circuit that you're generating um that's it that's that's the way that PA Works in in very layman terms and once again the the nice thing about this technique is that it has very general applicability and very very strong theoretical backing right really really strong theoretical support as to this is actually this should work and very general assumptions okay how do we try paa on runtime so as as of today as of the recording of this video PA is available via the experimental options in the estimator

so the way to configure that as you see here from Line 1 to8 is exact the same piece of code that we had before for configuring CN and we're essentially adding lines 10 to 15 so the first line line number 10 just says options experimental and what you're doing there is you're setting the CN amplifier equal to PA right the um um original or default amplifier would be foldings instead of PA but through this experimental option you can ask hey please perform noise amplification using the paa method and this is going to trigger your noise

learning automatically and it's going to trigger the randomized insertion of noise in the in the correct way for you without you having to do anything about it so speaking precisely about learning the noise the other options that you can have regarding PA when you want to do PA is how do you want to learn the noise right and for this you have four particular options that you can take a look here first of all you have well the the the option is called layer noise learning and the first option that you can have is the

maximum number of layers that you want to learn so say that you have a circit and your circuit has for instance 10 layers so how many what's the maximum number of layers that you want to calibrate right because the more layers you calibrate the more expensive it's going to be the more time you're going to have to spend on the on the on the quantum computer so the default is four this is usually okay because typical circuits usually have a very repetitive structure so usually you don't have more than four distinct layers but in certain

scenarios you just have to to bump it up or if you you just want to learn all the layers regardless how many you you could actually have you can set this option to none and then you will learn every single layer that it finds the second option that we have is the number of randomizations again very similar to twirling as a matter of fact performance twirling so out of this Ensemble of circuits that you want to create for learning the noise how many of them you you want to actually realize and the default is set

to 32 in a very similar Spirit you have the number of shots per randomization which is set to 128 and finally the last option that you can configure is the layer per depths that you want to use for learning the the noise so basically um during the run of the experiments for calibration what are the the the depths that you want to to to explore once more for more information on what the different options are and how you can use them and the different settings that you can configure there please visit the documentation page so

these are all the mitigation techniques that that we've seen so far we've seen how to configure each one of them individually so let's make a quick recap as to what we have here first of all we have that different types of noise or different sources of noise require different suppression and mitigation techniques to address them right for instance we have environmental noise in the form of cross talk for as an example and you can use dynamical decaling to deal with that you can also have gate errors which you can address using CN and CN with

poly toling as well as a combined as a combined action and last but not least you have the readout errors that you can mitigate by implementing T-Rex however in your circuit you may have several of those of these types of noise you don't necessarily need to have just one of them and in a similar way what you can actually do is you can combine all of these techniques to mitigate all of the errors at the same time and this is how that would happen on runtime what you're seeing right now I know it's a little

bit of a longish block of code is 30 lines but it's really just pasting all the different blocks of code that you've seen before except for the PA part but just the basic configuration of all the all the previous mitigation techniques so as you see here we're of course using the estimator so that we can include the the techniques that are only for expectation values but the same would apply for for the sampler removing the T-Rex and CN configuration right here so we just Begin by importing the runtime service we import the estimator and the

estimator options we in line three we create our Baseline options again setting the optimization level to zero and the resilience level to zero to avoid any default mitigation or suppression that is that may be happening there and then from line five to 9 we are configuring dynamical Decap then from line 11 to 16 we are configuring poly twirling from line 18 to 21 we're configuring T-Rex and then from line 23 to 26 we are configuring CN after that all that you can do is you instantiate your service so you get access to your to your

Quantum account you get a backend in this case the least PC as we're doing in the very very first example in this lecture and then you just instantiate your your estimator passing in the back end that you want to use and providing the options that you just configured to do the mitigation as as you want it to do so this is nice it's 30 lines it's not necessarily a long a long code you could just copy and paste this however it's it's a little bit cumbersome right like sure this this uh walkthrough was very useful

for you to understand exactly all the things that you can configure for you to understand the different suppression techniques what the nuances were to them and same for the mitigation techniques but isn't there a better way to actually configuring this right like why do I have to go and set up all of these options manually and the answer is that yes we have a better way and you don't have to configure these things manually in this in this cumbersome way and to address that we introduced what is known as the resilience level right you may

remember that here at the very beginning we were setting our resilience level to zero but you can actually set it up to other levels and this will automatically turn on certain mitigation techniques for you so that you don't have to go one by by one turning everything on so what are these levels so resilience level Z zero it means that no mitigation is going to be performed so there are absolutely no techniques in resilience level zero that's what we wanted to do before but the default actually the default resilience level is resilience level one and

what this resilience level one does is it does really mitigation with minimal cost so it just there the mitigation techniques that don't add a lot of overhead to your computation and this comes in the form of of M getting readout errors right using T-Rex so it turns on TW readout error Extinction or T-Rex and measurement twilling as well on top of that you can also have resilience level two and what resilience level two does it increases slightly the amount of cost that you may have for doing your computation right for doing your mitigation but of

course it's it's going to produce you better results um so the results that you're going to get are usually going to have a reduced bias Al the prob probably still some of the Biance is going to remain in your in your results and the techniques that it includes is all of the techniques that we had in level one so T-Rex and measurement twirling and on top of that is going to add CN with gate foldings and gate twirling so that's it if you want to run things using this these defaults you don't want to configure

anything manually you can just turn on whatever resilience level works for you and and do the execution very simple just one line of code for instance here you could just say resilience level equal to two and that's it and you could skip all of this completely and with that we get to the end of the of the lecture today I hope this was useful just as a reminder we have walked through the different reasons why we need to mitigate the noise in current state state-of-the-art Quantum Quantum devices we've worked through the different ways that we

have to address in them and then we have done a bit of a deep dive into different error suppression techniques like Dynamic aling and po tling as well as the different error mitigation techniques being T-Rex and CN are the ones that we chose to focus on I hope that this was useful and I'll see you next time