Transforming Quantum Circuits using Qiskit's Transpiler with Matthew Treinish: Qiskit Summer School

1.55k views7601 WordsCopy TextShare

Qiskit

Qiskit provides users with a powerful and performant optimizing compiler for quantum circuits with i...

Video Transcript:

hi I'm Matthew Tres I'm a senior software engineer at IBM Quantum where I'm the technical lead for the development of kis kit SDK today I'm going to be talking about Quantum circuit compilation with kkit before we dive into it I wanted to make a quick note on terminology in kkit the quantum circuit compiler is known as the transpiler this is a historical name which doesn't accurately represent the current cap abilities of quantum circuit compilation in kkit typically when you look up the term transpiler it refers to a source to Source compilation like translating between programming

languages during the lecture I'll be using the terms compiler and transpiler interchangeably if you hear me do that I'm just referring to the same thing they're not two different terms in this context so why do we need Quantum circuit compilation if you've programmed for a Quantum M before you realize we're doing it at a very low level we're programming individual operations on individual cubits and this is much lower level than programming languages you normally use for classical computation um the thing is even doing it at this low level there's still a large number of abstractions

between what you're programming with your Quantum circuits and what actually ends up being run on hardware and what the capabilities of Hardware are there are numerous constraints with current Quantum systems that need to be accounted for and when you're programming a Quantum circuit you don't want to worry about all of those at all the time all of the time the quantum circuit uh compiler's job is to take that abstract higher level circuit that you construct and optimize it and translate it to something that can run on the actual system so to demonstrate this I've put

together this quick Burnstein valerani circuit here on the bottom and this is just a simple six circuit um and it's standard textbook circuit and just to show how important a Quantum circuit compiler is I'm going to compile it with three different settings a really bad compiler setting a good comp compilation and a better one that I had to go some effort to get it to do that and you can see here the three different outcomes of the circuit we have the really bad compilation is this giant circuit on the top and you can't even make

out what all the individual gates are it's so big the good comp compation is pretty compact it's a little bit bigger than what we had from from what we wrote and then the best compilation here is smaller it looks even smaller than the original circuit and these are all equivalent ways of representing that first circuit and then when we run this on hardware and look at the outcomes you can see that for the bad compilation you can't even tell what the expected result is here it's mixed between all of these different outcomes for good compilation

we have the expected result the majority of the time the vast majority of the time but we still have these outliers here um and then for the better compilation we have the most results are clearly the expected answer and you can see here that the difference that a Quantum compiler can make when running your circuit on actual Hardware so to start let's look at what the constraints on actual Quantum Hardware are and the first one is pretty obvious it's the operations that each quantum computer can support so every quantum computer out there has different operations

that are native to it that it can that it can actually execute typically Hardware will Implement a universal gate set that can be used to construct any unitary operation um so for example here the IBM herin systems which are the current state-of-the-art from IBM support CZ as a 2 Cubit gate RZ square root of x and x as single Cubit Gates and you can use these gates to reconstruct any unitary operation on a on a quantum computer the other thing that is worth noting here is that all cubits do not necessarily need to support the

same set of operations for IBM's current systems they all do support the same set of operations but for some of the retired platforms like the IBM Falcon processors some of those actually supported different two Cubit Gates between different cubits um so it's worth noting that while you may see many computers that have many quantum computers that have um Global sets of operations that are supported across the whole device that's not a universal rule the next constraint um that the quantum circuit compiler has to worry about is cubic connectivity so on many qu types of quantum

computers uh there's limited connectivity between the cubits for IBM's Hardware in particular which are Super conducting uh Quantum processors there a the connectivity is limited based on where there's actual physical connectivity on the processor so if you look at this old die photo from one of IBM's old five Cubit systems you can see that the cubits are represented with these squares in the die photo that are labeled q1 Q2 Etc and you can see that there's actual physical connections between the cubits and that is the limitation and connectivity um on the actual processor so when

you have um multibit operations you can only run them where there's this physical connectivity other types of Cubit technology may not have the same constraints or they have different reasons for having connectivity constraints but we'll mostly be looking at IBM Hardware in this case so it's um it's actually physical constraint and while 5 cubits in this case is pretty easy to reason about as an individual as the quantum processors of gotten more sophisticated into the utility era we have a 53 Cubit system which was a few years ago and is now retired with IBM Rochester

it's harder to reason about this and then when you look at the current IBM Torino which is a a heron processor and has 133 cubits it is much harder to reason about the connectivity limitations in a Quantum processor when you have these many cubits and then so then that raises the question what happens when you have insufficient connectivity on the processor to execute the circuit as written so in this example I've built this abstract circuit where I have two cubits Cubit zero and Cubit 1 that require connectivity to the other four cubits in the circuit

and if you look at the the connectivity graph on these previous slides none of them have two cubits that are connected to four others so what happens in this case is that the compiler will use a swap gate to move the state between cubits um or some other technique to move State between cubits um so that you can execute the circuit this comes at the cost of extra operations so here you can see that circuit is transformed with these swap gates to move the Cubit State between different cubits so that you can actually execute the

circuit on Hardware but this is potentially expensive to do because every swap gate ends up being an additional to 2 Cubit operation or it's a total of three um so here you can see when you have a swap gate and you translate it to the IBM uh basis set you can see that we have three CZ operations with these square root of x's that's the um so you want to try to minimize swap operations because otherwise you'll be adding many new two Cubit operations to your circuit and then the last constraint which is the uh

largest consideration when compiling for um a Quant computer is noise and errors so every single operation that you perform in your Quantum circuit is a potential source of noise um Gates have individual error rates they're not all perfect um you also have to worry about decoherence which is um how long you can maintain the quantum State and then there's also readout error error when you're measuring um and you can see here in this visualization that there's a wide distribution in variability in the error rates reported by all of these systems and this is looking at

three different generations of IBM Quantum processors and um also the coherence times for each of these systems um and so what this means is that for kiss kits transpiler its job is to try to minimize the number of operations that it's trying to do and um to decrease the error rate for executing that circuit and increase the Fidelity and this is the most important job of Kiss kit uh compilation so when we look through how The transp Works um keep this in mind that what we're trying to do is represent the circuit on um create

a representation of the circuit that can be executed on the hardware with the least amount of operations so in kis kit all of these constraints that we just looked at are represented by a single object called the target the target combines all of these constraints and some others that we didn't talk about because they're not necessarily relevant for this discussion um in the single object that's queriable um by the transpiler it contains a list of every instruction which is the operation or gate and the cubits it operates on and then also the properties associated with

that instruction such as error rate or duration um additionally it'll have properties like um cubits coherence time and other information that transpiler pass might need to know about and here I just printed an example Target from a old 5 Cubit device that I just had stored around because it was nice and compact to visualize and you can see here in the Target it shows how many cubits there are and then we have a CX gate and it's supported between Cubit 0 and one one and zero and the associated duration and error rates with each of

those instructions um and so all of the kis kit transil passes will look at the Target when they need to know information about what they're targeting hence the name um so now let's take a look at how the transpiler works on the inside and the first thing to consider in this case is how the transpiler uh represents a Quantum circuit for its operations um and this is using the the dag circuit or the directed basically graph representation and you can see here on this visualization on the right that we have a Quantum circuit that I

just quickly threw together which um the details of it aren't necessarily relevant for this um and we convert this to the uh the dag representation and you can see here we have this graph representation um and what this lets us do is track the data flow from AC for each cubit in the circuit so you can trace through the circuit from these green nodes which are the state of a cubit at the beginning of the circuit uh and trace it through the operations which are these blue nodes until we get to the output nodes which

are represented with uh the red circles and what this lets us do is reason about the dependency in ordering between the operations and this is critically important as we start transforming the circuit um because we need to know about the relationship between the different different operations we're performing to make sure that all of the transforms we're doing are valid um we'll look at this in a bit more depth and uh when we start talking about some of the specific optimizations that we do in Kiss kits transpiler but some common operations that we perform are traversing

the dag in topological order which means that we um Traverse the dag such that the in an order such that all the dependencies are met so for example you could go like uh CR CR CR exitation preservation barrier and then just measure measure measure measure uh for more complicated circuits this kind of ordering can be a little bit tricky to figure out um this circuit's pretty simple so it's not that bad another example operation we do is we do inplace substitution quite frequently so we'll look at this in uh some depth in a second but

this exitation preserving gate we can substitute this with a larger sub circuit when we need to replace this with its equivalent definition um and doing this in a dag makes it very easy because we can just replace this node with a a subag and maintain the relationship between all of the gates very easily because we can just move these edges where they need to go and then the transpiler is built up of pass managers which operate on these dags um the pass manager is a way way of uh scheduling the execution of passes uh so

a pass is a you can think of it like a function that does one small well-defined task that either analyzes or transforms the quantum circuit via that dag representation and these execute in um in a sequence but there is potential for control flow so you can do things like if L statements for Loops um do while Loops Etc um and so the transpiler is built up of these passes which all do single jobs and uh the pass manager's job is to organize and schedule them and also share State between each of the passes which is

stored in this property set so if any of the passes do any analysis and they'd like to store data for future passes that execute they put that in this property set um and then beyond an individual pass manager Things Are broken up into stages which are themselves actually pass managers so the kis kit compilation pipeline is broken up into these logical stages which perform higher level functions um out of the box kis kit defines six stages um in it layout routing translation optimization and scheduling um but the these are just the default in what kis

kit uses in its preset pass managers if you're manually creating your own pass manager you can make however many stages you want with whatever names and implied functionality you want but in kkit the stages execute sequentially and then inside each stage um is a pass manager that we just looked at and that in each of those stages can have more involved execution and scheduling of passes uh so that you can have for example in the optimization stage a do while loop that optimizes repeatedly until you reach some exit condition and then each of these stages

is uh pluggable and extendable so if you wanted to you could take a pass manager built by KISS kits preset functions and then um just substitute one of the stages with your own implementation or modify one of those stages in place by adding a pass to the end or the beginning um but then also uh there's the capability of having external python packages that advertise they have um uh staged plugins for the compilation pipeline so in this example on the right I'm using I installed this package that's on pii kiss kit Cubit reuse and it

installs a init stage plugin um called Cubit reuse you can see it right here and when I transpile this circuit and I run it with the Cubit reuse argument to say use this PL external plugin um it transforms the circuit and reuses the cubits from this circuit and this is actually how I made the better example in that earlier slide was using this plugin um and so this is very powerful um if there are some techniques that are bespoke for your application or your hardware and aren't necessarily general purpose you can package up that technique

in a plug-in and make it easily reusable for yourself for other users um and this kind of flexibility we'll be looking at when we start talking uh when we start doing the lab for this lecture so now let's take a look at what kis kit does um by default in its preset pass managers so the first stage that we talked about is the init stage and the init stage um is broken up into two phases the first is that it decomposes any larger operations uh of three or more cubits into one and two Cubit Gates

and this is a requirement for the next stage um which is layout layout operates by treating the connectivity of the of the quantum processor as a graph and to work with that you can only deal with one or two Cubit operations where um otherwise you wouldn't be able to model that as a graph so the init stage job is to set the stage for the layout stage which is the next one and then the init stage also performs logical optimizations so these are any optimizations you can make to the Circuit before you're aware of anything

about the hardware um so in this example which I took um just a generated this image from K kits preset pass managers and you can see these first three passes unitary synthesis highle synthesis and the basis translator these are doing that decomposition of larger Gates that I mentioned so unary synthesis will look for any unitary gate which is a gate represented solely by its unitary Matrix and it will synthesize that into a discr Circuit of one and 2 Cubit Gates and in this case we're only running this if the unary is for three or more

cubits then highle synthesis runs and similarly this is looking for higher level abstract mathematical objects that we've put on our Quantum circuit and we'll synthesize any of those that are three or more cubits and then the basis translator also will translate any standard Gates that we know about um that are three or more cubits and decompose it into one and two Cubit operations and then that's the first uh phase of the stage then these next few passes Allied permutations remove diagonal gate before measure inverse cancellation and commutative cancellation these are all the logical optimization passes

and these all perform one small optimization that we know we can do before we're aware of any of the target constraints that we have to deal with so the first one Allied permutations will look for any permutation Gates or swap gates in the circuit and and uh eliminate them by just moving the virtual cubits around because we have that freedom before we've picked cubits on the target uh remove diagonal Gates before measure is exactly as the name implies it will remove a diagonal gate before any measurement operation in the circuit because it does not affect

the outcome of the measurement inverse cancellation as its name implies will cancel any inverses or any inverse pair of operations and then commutative cancellation will'll go into more depth in the optimization stage because we also run that in the optimization stage so to see what this transform looks like we can take our original circuit ex with this um which is very high level and Abstract with we have these three controlled r y Gates and an exitation preserving block which um is much more involved and then we run this in it stage on it and you

can see our circuit is transformed in that exitation preserving block which operates on four cubits has now been decomposed into these one and two Cubit operations and this has set the stage for us to go into the next stage which is the layout stage and the layout stage its job is to map the circuit cubits from our abstract representation to cubits on our specified Target um and this is a really important job like it's um when I first started working on kkit this was actually uh deceptive how how simple it seems but how important it

actually is because the reason is uh twofold the first is no two cubits are equal if you remember from the earlier slide where I showed that graph of variability you can see that the error rates between all of the cubits on a Target are very different so you want to pick the best performing cubits but the second thing that makes this extra important is that if you pick a poor layout you can end up inducing extra swaps that are needed when you do the routing stage later when there's limited connectivity so you want to try

to pick the best layout that will pick use the best cubit but also a uh pick cubits in such a way that you have you need a minimal number of swaps and because this is so important we end up running multiple techniques to try to pick the best layout so if we look at this pass manager um you can see we run this vf2 layout pass and then the saber layout pass and then at the end we apply the layout which just takes whatever layout we found and transforms the circuit to take that into account

so let's look at what this transform does so if you look at the layout stage what we had coming into it from the init stage uh we have the circuit and there are no cubits there are no physical cubits assigned from the target um but when we run the layout stage the biggest change here is that we've uh reordered the cubits and we've assigned them each a cubit index from the actual uh Target that's specified so in this case I believe I picked the Torino IBM Torino as the target so um it's a 133 Cubit

device and it's picking for this circuit cubits 62 63 64 and 73 and they're they're mapped um appropriately so now let's look at how the uh these passes work um so the first one is the vf2 layout pass um and this pass is not guaranteed to work it's trying to find what we call a perfect layout or a layout such that you would not need any additional swaps to execute it so depending on your circuit this may or may not be possible um in this case the circuit we're looking at with that exitation preserving uh

Block it's not possible to do that so the way this pass works is it looks at your circuit and tries to find all of the it iterates over the circuit and finds all of the two Cubit interactions and it builds a graph representation of that which is represented by this uh this graph right here which says interaction graph and this represents the cubits that have two Cubit operations between them then it looks at the connectivity graph or a graph of the representation of the uh two Cubit operations supported in the Target and it tries to

find an isomorphic subgraph um of that connectivity graph for this interaction graph and the fancy words to say can it match this graph onto this larger graph that's basically all this pass is trying to do turns out this is a very computationally hard problem to solve um and to make it efficient we use this Library called prw X um which has a a particularly fast implementation uh to do this search um which it has some extra heris stics that are well suited for what we're doing in this case um and as I alluded to earlier

in this specific case you can look I mean with human intuition you can look and clearly see that there's no way to map this graph onto this one um but for larger circuits and larger connectivity graphs um that that um that uh it's not as obvious by inspection so when this pass fails then we need a fallback we need to pick a layout that is um try to find a better layout that we know is not perfect because this pass will quickly uh figure out whether or not we can pick a perfect layout so then

the next pass we run is called saber and what saber saber is a turistic layout pass um that tries to pick a layout that will minimize the number of swap Gates needed so it's not looking uh to find a perfect layout or find the cubits that have the highest uh or have the the best Fidel or the best the lowest error rates its job is to try to find the layout that will result in the fewest swap Gates when we route the circuit um and the way this algorithm works is it actually runs a routing

algorithm so saber starts with a random guess and basically assigns randomly all of your circuit cubits to cubits on the Target and then it will run a routing algorithm but instead of inserting a swap gate it will just swap that mapping and it goes back back and forth over the circuit making repeated uh swaps of these cubits in the mapping and doing this um typically minimizes the number of swaps needed because it moves the cubits that have a lot of interaction closer to each other so to show this in practice I've generated this visualization which

is not for the problem we've been looking at but you can see the algorithm working so where these red dots are going this is where it's evaluating where it needs to make a swap and then when it decides that a swap is required it uh colors that green to indicate that it's been swapped and you can see the Ines change every time it makes a swap and when we run this algorithm you can see that uh our mapping or the indices on on the nodes um have changed significantly from when we started um but the

thing you'll notice is that I said we start with the random guess and that random guess can have a big influence on the outcome of this algorithm so to account for that one of the um Innovations we have in kis kit uh building off the original algorithm is that we we wrote this uh algorithm in Rust and we can take advantage of parallelism in multi-threading and we're able to execute this algorithm uh in parallel so what we do is we run this with uh 20 different random starting points and that's what this looks like here

so this visualization is running the same algorithm with 20 different random starting points and while we're zoomed out to see all of these 20 different starting points you can't um you can't actually see it it's looking at it's ending up with different mappings for each one of these and it will only pick the one that results in the fewest swap Gates so out of these 20 it'll look at which one would result in the fewest swap Gates and then use that um and then that's what I was talking about about this being so important that

um we go to a lot of computational effort to pick a good layout so then once we've picked a layout and applied it then we know which cubits on the target all of our operations are going to execute on and now we run the routing stage and the routing stage is um set up to do what we've been talking about inserting those SW swap gates to move the Cubit State around to make up for a lack of connectivity um and the default path we use for this is called saber swap which is using that same

routing algorithm we looked at in layout but actually inserting physical swaps but if you're running with all of the default settings uh like we're looking at here saber swap actually doesn't execute um as a standalone pass because it's more efficient to do it as we're doing the layout search to just quickly insert swaps at the end because we've already computed them so this is an efficiency thing but if you're doing anything custom where if you're like changing your layout stage or using a plugin then the routing stage will run saber swap to make sure that

at the out output of the routing stage your circuit is routed so that it can actually execute with the connectivity constraints of a given Target um and in this uh pass manager visualization you can see that represented here by conditional controller that say that that this whole block of passes is in an IFL statement saying if the circuit is matching the target's connectivity constraints do not run this pass but we do run in the routing stage by default regardless of whether um we need to run saber swap or not is a pass called vf2 post

layout and vf2 post layout is actually going to try to find a better layout over what saber swap picked not with regards to minimizing swap Gates but with the error rates per cubit that we were talking about we'll look at that in a little bit more detail so the vf2 post layout as the name implies uses the same algorithm as vf2 layout um but the difference is because we've gone through routing we know that we have an isomorphic subgraph now um because that's what routing routing job is if you think about it we're inserting those

swap gates to make sure that we can run our circuit on the target which is rep with the which the connectivity graph repres present so we know we already have one isomorphic subgraph and so we can use the same uh isomorphic subgraph algorithm that we use from rust workor X to try to find other ones and then when we look at those other ones we'll try to pick um a layout that will have a lower predicted error rate or a higher predicted Fidelity um so you can see here after we've run routing our circuit here

um on this conect acity graph has a clear mapping like you can see that these four cubits you could put them on these four or these four or different combinations and to show what this looks like I have a visualization that shows the algorithm in practice so you can see here we've picked we when we start this we have picked in green what our current layout is and then the red represents different trials or different available isomorphic subgraphs that it's found on the connectivity graph and it tries all of them and looks at the pred

error rate from those running on those cubits and then it moves the layout to these cubits represented in green um because they have a lower error rate and the circuit will perform better when executed on those cubits which leads us into the next stage which is the translation stage um this stage takes the circuit with the layout set and all of those inserted swap Gates if they're needed um um and its job is to translate the circuit into the um in the instructions that are listed in the Target so at the output of the translation

stage all of the operations will be ones that are supported by the Target and the same three passes that we ran in the init stage unitary synthesis high level synthesis and the basis translator are run again but this time they're run without that constraint of three or more cubits because now we know which cubits we're running on and we um want to always run this pth to make sure that all of those cubits were were running with supported operations on all of them and the way this uh so we'll look at one pass in particular

the basis translator which does the bulk of the work the unitary synthesis and highle synthesis pass are more specialized for specific Gates that represent higher level objects or are represented by a unitary Matrix so the basis translator is what does the bulk of the translation and the way this works is that we have in kis kit a library that's extendable if if you have custom Gates um but that maintains a mapping of all of the gates to equivalent circuits that represent the same unitary operation as that gate and this is represented as a giant graph

so I have that visualization down here of what that graph looks like but it's there are so many Gates and circuits in it you can't see it holistically so I've zoomed in to just a small section around translating from a hadamard gate um and then the basos translator looks at this graph and runs dyra algorithm to find a path from our source gate so if we have a hadamard gate in our circuit and that's not a native operation that's supported by the Target we'll find a path from hadamard to Gates that are within the target

um and you can see here that if you Traverse this graph and follow the edges then there's an equivalent circuit that translates it between each operation and then once we've found a path um the basis translator recursively substitutes the gates as it's making these substitutions along the path uh along the path uh that we found in the equivalence library and then we've we can do that recursive substitution build an equivalent circuit from our source gate to something that's in the Target and then when we then we iterate over the dag and replace all of the

instances of hadamard with that that gate with that circuit we've built that's equivalent to it and what this looks like in practice so we have this circuit which is the output of the routing stage we um and if we run the basis translator on it and we get this giant circuit here at the end um and that's making these substitutions for each of these Gates that are not on the Target and I don't believe a few of these gates are um in the T supported on the target but the majority of them are not and

you can see this this giant circuit here would not result in very efficient output which is why we go into the next stage which is the optimization stage so at this point at the end of the translation stage we have a circuit that we could execute on Hardware all of the constraints are being met from the Target and we could execute that circuit but we would not get good results and that's why we run the optimization stage because in all of these previous stages we've grown the circuit to make up for all of the constraints

um and the optimization stage's job is to take that big circuit and try to optimize it as much as we can to reduce the number of operations that are necessary to perform uh the quantum circuit and if you look at the stages construction it's a little bit different than the other ones um it all executes in a big do while loop and we run the stage repeatedly until we reach a minimum Point that's found in depth and size so that the the the circuit doesn't I can't we can't shrink it anymore we can't optimize it

anymore reduce the number of operations anymore um and so let's look at the optimizations we're actually performing in the circuit as part of the stage the first one is a single Cubit unitary people optimization so this path looks for sequ for all of the sequences of single Cubit Gates on a single Cubit so I pulled this sequence of 1 2 3 4 5 six uh sorry seven uh single Cubit gates from the basis translator output just to show it in isolation and we we run the in that output circuit we're running these seven cubits or

these seven single Cubit operations in sequence and if you look at um the State transform by visualizing It On a Block sphere you can track what the sequence of circuits are doing um so if we start here with a a zero State we can track each gate as we execute it what the state is after the gate um what single Cubit unitary people optimization is trying to do is reproduce the same end state with fewer Gates so if we were to run this pass on this exact sequence we would be able to simplify that those

seven Gates down to three and the way this works is it basically is simulating um what these Gates do so it's basically doing a single Cubit simulation to build uh the unitary Matrix representation of the operation at the end of the sequence and then we're synthesizing that single Cubit unary to the Target basis set and doing it hopefully in fewer operations um and so that's how we're able to reduce it from Seven cubits to three cubits or seven gates to three gates in this case um so this pass iterates over the whole dag finds all

of these sequences of single Cubit Gates um call them runs in the transpiler and we find all of these runs and then we run the simulation to get the unary and we resynthesize that unary and we do this for every single run we identify then so while we do this for single Cubit Gates we can also do this for two Cubit blocks so this is one of the places where the dag representation and its representation of data flow is very powerful so this uh example circuit is an old one that I've used before and it's

not relevant to the Circuit we've been looking at but it demonstrates what the dag lets us do here so in this case in this circuit we have this 3 Cubit circuit and it can be represented by two Cubit block two two Cubit blocks um and that's represented here by the green and red shading and if you look at it on the quantum circuit view it's kind of hard to identify where the block is um because there's overlapping in the visualization and you you wouldn't be able to see that this T and hadamard gate are actually

part of this bigger block here in the left in the upper left corner but when you look at it on the dag you can clearly see that there's the uh the relationship between the two cubits by just looking at where the data flow is between the operations and we can clearly identify that this block operates on the same two cubits in isolation and these these Gates here operate on the same two cubits in isolation which would be hard to view from the quantum circuit View and why the dag circuit gives us this flexibility and Analysis

so once we've looked at so the next pass that we look at in optimization is two Cubit unitary people optimization and so we do that to Cubit block collection so we analyze this big circuit that's we got out um of basis translation and we try to find all of those blocks that I operate on two cubits in isolation and then we consolidate that down to a unitary Matrix representation the same way we did with single Cubit operations but we do this for a two Cubit unary and then we replace repl the circuit we replace that

block of gates in the circuit with that unitary Matrix representation you can see that here and then we resynthesize those unitary matrices in the Target basis using fewer Gates uh and so you can see here we go from this big circuit to the unitaries to this smaller circuit on the output it's still much bigger than our original but this is an optimized representation of the same Quantum circuit that can actually execute on the target with much fewer Gates than we would have had just by doing basis translation and then the last optimization pass we run

by default in kkit is uh commutative cancellation so there are some Quantum operations that commute which means that if you have two gates that are consecutive if you reorder those Gates they have the same they they perform the same exact operation the order of execution of those two gates doesn't matter so what commutative cancellation does is that it um it looks for these commutation relationships so two gates that we can reorder without uh changing the effect of the operation and then when it finds those it tries that and sees if there any cancellation opportunities so

in this example I built this just to demonstrate what this pass is doing is um I built this circuit um where the zgate and the CX gate will commute so you could reorder it either swap the CX and the Z here here or the CX and the Z here um and when you swap those operations the the commutate you commute them and you reorder them you get these two cxs next to each other and when you do that two cxs next to each other are an identity and you can just remove both of them because

they have no effect and so what was a three gate operation with two two Cubit Gates and a single one cubit gate is now just a single one cubit gate because uh those CX Gates could be canceled out when you reordered them so what the commutative cancellation pass does is iterate over the circuit and checking all of the pairs of operations that are next to each other to see if we can uh commute them and reorder them and if we can then it does that and sees if there are any opportunities to cancel Gates um

so this is just another optimization that we do and we run all of these passes in a dual while loop and then we exit when we've reached a minimum point in depth and size of the circuit um and this commutative cancellation pass as I mentioned before we run in the init stage so then the last stage I'm going to talk about which is the last stage in the preset pass managers and kkit today um is the scheduling stage and the scheduling stage takes a mapped and optimized circuit so we've basically finalized all of our circuit

transforms that we want to do and we're ready to execute it on hardware and it takes into account the execution time of all of these operations and it's trying to um uh explicitly add explicitly account for all of the timing of all of the operations um and this is done in two phases the first is to analyze the circuit and iterate over it and find the ex the duration of every gate from the Target and account for where we have idle periods where there are cubits that have no operations running on them um and then

once we found that then we run a pass to insert operations to account for that time the simplest example is a delay gate but there are more Advanced Techniques like dynamical decoupling uh where you insert gates to try to suppress errors um to show what this looks like in practice I ran scheduling from the output of our optimization Loop um and the only difference here is that I've inserted these delay Gates so if you set a scheduling method option you'll you'll account for the timing and these delays uh indicate when there's idle time on these

cub um scheduling by default is optional um because so you don't get this out of the box but if you're doing something where this is important where the timing matters or you want to try more advanced error suppression techniques like dynamical decoupling you can use the scheduling stage to do this so with that that is the end of what kis Kit's Quantum circuit compilation functionality is built in um the key thing that I'd like you to take away from this lecture is that while kkit has a lot of built-ins to get the best performance on

Hardware often you want to do custom Transformations or transpiler passes that suit your application or the specific Hardware that don't necessarily that we can't do in a general way so you want to experiment with the preset pass managers and start experimenting with what works best for your specific application because as I showed in the very beginning the difference between bad compilation and good compilation can be the difference between getting noise out of the system and getting a signal that you can do something useful with so with that thank you and that's the end of the

lecture