- Welcome to Understanding Quantum Information and Computation. My name is John Watrous, and I'm the technical director of IBM Quantum Education. The purpose of this series is to provide you with a solid understanding of quantum information and computation, and that means getting into the technical details of quantum information and seeing how quantum computing really works, what it can do and what it can't do. If you haven't already watched the overview video for this series, I encourage you to check that out. In that video, you'll find information about what sort of background preparation is recommended for
this series, as well as an outline of the topics to be covered. Also, be sure to check out the Qiskit textbook, which covers the same material as this video series in a written form, and it also includes interactive components and interfaces to hardware and software designed to help you to better understand quantum information and computation. You'll find links to the overview video and the Qiskit textbook in the description. This is lesson one of unit one of the series. Unit one covers the basics of quantum information, and in the first lesson of this unit, we'll focus
on how quantum information works for single systems, meaning that we have just one physical system or a device of some sort that stores quantum information and we're considering how that one system or device works in isolation. In the lesson following this one, we'll discuss how quantum information works when we have multiple systems and that will allow us to describe interesting and important concepts such as quantum algorithms that make use of multiple qubits and quantum protocols involving multiple individuals that process quantum information and transmit it back and forth to one another. But before getting into those
discussions, it makes sense to start with the comparatively simple setting of a single system in isolation, partly because it's good to start simple, but also because as we will see, the description of how quantum information works for single systems in isolation leads very naturally and logically to the description of how quantum information works for multiple systems. Here's a brief overview of the lesson. We're gonna start by discussing classical information. Now, the purpose of this lesson is not to explain how classical information works, it's to explain how quantum information works, but to explain how quantum information
works, it's very helpful to begin with classical information and to use it as a familiar point of reference when we discuss quantum information. At a mathematical level, quantum and classical information actually have pretty similar descriptions with some key differences, of course. In fact, quantum information really isn't separate from classical information at all. It's an extension of classical information. The fact of the matter is, at least in my view, that you really can't understand quantum information if you don't understand classical information, so starting with classical information makes sense. I do expect that this content will be
familiar to some viewers, but even if you are familiar with it, I don't recommend that you skip it. In addition to highlighting the aspects of classical information that are most relevant to understanding quantum information, we're gonna introduce the Dirac notation in this part, which is a standard way that we're gonna use to describe vectors and matrices throughout the series. As it turns out, the Dirac notation isn't specific to quantum information. It can equally well be used in the context of classical information as well as many other settings where vectors and matrices arise. We'll then move
on to quantum information, and specifically, we'll discuss the representation of quantum states as vectors, a simple type of measurement called a standard basis measurement, and unitary operations, which describe discreet time changes in quantum systems. Again, all in the setting of single systems. Before we get started with lesson itself, it will be helpful to clarify that there are actually two descriptions of quantum information, the simplified description, which is what we'll be focusing on in this lesson and throughout this first unit and the general description, which is covered in a later unit, specifically in the third unit
of the series. The simplified description, true to its name, is simpler and it's typically learned first. In the simplified description, as I've already suggested, quantum states are represented by vectors and operations are represented by unitary matrices. This description of quantum information is sufficient for an understanding of most quantum algorithms, for instance, so it isn't oversimplified, it's the real thing. However, as we will discover as we go along, it does have some limitations. For instance, if we wanna model the effects of noise on quantum systems, this description turns out to be lacking. The general description, on
the other hand, is more general and ultimately, it's more powerful in a mathematical sense. In this description, quantum states are represented by a special class of matrices called density matrices, and more general class of measurements and operations can be described including noise, for instance. It's entirely consistent with the simplified description and in fact, you can essentially view the simplified description as a special case of the general description, and it also includes classical information as a special case. It's also really quite beautiful in how it all works, so I strongly encourage you to learn about this
general description when the time is right, but the simplified description is the more natural place to start, and so that is what we will do. We're going to begin by talking about classical information with an eye on the aspects of classical information that are most relevant to quantum information and the way that it's described mathematically. Imagine that we have a physical system or a device of some sort that stores information. We're gonna give this system the name X, but there's nothing important about this name. We could use any other name if we preferred, but just
to pick a name, we're gonna call the system X. We're gonna make an assumption about X, which is that there is a finite set of classical states that X can possibly be in at any given moment. Here, when I refer to a classical state, I mean a configuration of the system that can be recognized and described unambiguously without any uncertainty or error. Really, this is something that you can think about intuitively. At a mathematical level, we simply define the set of classical states, we choose it however we want to or in whatever way best describes
the device that we're working with. Let us give the name sigma to this set of classical states, and again, let me emphasize that we're assuming that sigma is a finite set. The set of classical states of this system X is not infinitely large like the set of all natural numbers. It's finite, and that's just an assumption that we're making. For example, it could be that X is a bit, in which case, the classical state set is the set containing two elements: zero and one. Sometimes we refer to this set as the binary alphabet. Another example
is that X is an ordinary six-sided die, in which case, the classical states are the numbers one through six. Of course, we're talking about an abstraction here. An actual six-sided die may have many different attributes such as its precise position, its orientation, its temperature, and so on. But with respect to the information that's most typically relevant when we're talking about a six-sided die, the classical states are the numbers one through six indicated by the number of dots on whichever side of the die is facing up. If X is a switch on a standard sort of
electric fan that you might have in your home, then perhaps the classical states are the settings high, medium, low and off. These are just examples. We could imagine systems having different classical state sets than these, but hopefully, they convey the idea that this notion of a classical state is something that should be familiar and can be understood in simple and intuitive terms. Mathematically speaking, the set of classical states is just a finite set, and of course, this set must also be non empty as well. There has to be at least one classical state that the
system can be in and there has to be at least two classical states if the system is going to be useful at all for storing information. Now, in some situations, we might know that X is in some particular state, but in many situations, that arise in the setting of information processing, there may be uncertainty about the classical state of a system at a particular moment so that each possible classical state has some probability associated with it. For example, if X is a bit, then perhaps it's the case that X is in the classical state zero
with probability three quarters and it's in the state one with probability one quarter. We're going to refer to this as a probabilistic state, and we could express this probabilistic state, as you see right here, simply by indicating what the probabilities are for the different classical states. A more succinct way to express this probabilistic state is by a vector, specifically the representation that you see right here is a column vector, and this vector indicates what the probabilities are for the two possible classical states. To be more specific, the first entry indicates what the probability is for
X to be in the state zero, and the second entry tells us what the probability is for the system to be in the state one. This is simply the natural way to order the binary alphabet, or at least, it's the way that we are trained to do this. With zero first and one second. Going forward, we're gonna use the term probability vector to indicate a vector like this. Specifically, it's a vector whose entries are all non-negative real numbers and where the sum of the entries is equal to one, so it makes sense to think about
these entries as being probabilities. Here in this example, there are just two entries because there's just two classical states of the system we're talking about, but in general, there could be any number of entries with the understanding being that there's one entry for each classical state of whatever system we're talking about. We're going to continue discussing classical information, probability vectors and so on in just a moment, but first, it's gonna be really helpful to introduce some notation for describing vectors known as the Dirac notation. In fact, this is just the first part of the Dirac
notation. There are a couple of other aspects of the Dirac notation that we'll see a little bit later in the lesson. This notation is gonna be quite central to the way that we describe quantum information, so it's essential to understand how it works, but fortunately, it's very simple and it really doesn't have anything specifically to do with quantum information, so we can introduce it now in the familiar context of classical information. As before, we assume that we have a set sigma of classical states of some system, and let's assume that we have an ordering of
these classical states, just like how we refer to an ordering of the binary alphabet in the previous example. Another way of saying this is that the elements of sigma are placed in correspondence with the integer from one to however many elements are contained in sigma, which is what this notation right here with two bars around sigma means. That's the number of elements in the set sigma. So there's a first state, a second state, and so on. The specific ordering or correspondence that we choose isn't actually gonna matter all that much. What's really important is that
we have an ordering and we stick to it. For many classical state sets that we'll encounter, there will be a standard ordering just like we have with the binary alphabet, and if there isn't a standard ordering, we can just choose one. But whatever it is, the understanding is that we stick to it. Now, here's the notation that's being introduced. For any choice of a classical state a from our set sigma, we write this right here, which is read as ket a to mean the column vector that has a one in entry corresponding to a and
a zero for all other entries. For example, if we return to the binary alphabet, which, as you might have guessed, is a very important example that we'll return to again and again. Then ket zero and ket one are the vectors that are defined right here. Ket zero is the vector that has a one in the entry corresponding to the classical state zero, which is the first entry, and a zero for all other entries, and in this case, there's only one other entry. Similarly, ket one is this vector right here where we have a one in
the entry corresponding to one, which is the second entry, and a zero for all other entries, which, again, is just one entry. Here's another example. In this example, we have a classical state set sigma corresponding to the four suits in a standard deck of playing cards: clubs, diamonds, hearts, and spades. In this case, there really isn't a universally agreed upon ordering of these suits. For many card games, there is no ordering at all. They're just four different suits and that's all that matters, and there are different card games that do order these suits in different
ways, but it doesn't really matter. We can just choose an ordering like the one that you see right here. This happens to be alphabetical ordering according to the names of these four suits written in English, Assuming that we've picked this particular ordering, then we define these four vectors: ket clubs, ket diamonds, ket hearts, and ket spades as you see right here, and these vectors correspond to the general description that you see up here. Given that we've chosen this particular ordering for these four suits. We can do the same thing for any classical state set that
we choose. Keeping in mind that classical state set always means finite and non empty set. Vectors of this form are called standard basis vectors, and it's a very basic fact about vectors that every vector can be expressed in a unique way as a linear combination of standard basis vectors. For instance, going back to the example from just a moment ago, the probability vector we saw can be expressed as you see right here. It's basically just an alternative way of expressing vectors that's gonna be very handy for several reasons going forward. And with that notation now
in our hands, let's return to the discussion of classical information and probabilistic states, in particular. And let's consider what happens when we measure a system while it's in some probabilistic state. We might not ordinarily refer to a measurement or the act of measuring something in a classical setting, but the term makes sense. In essence, when we talk about measuring the system X in this context, we just mean that we look at it and we recognize whatever state it's in. Of course, when we look at the system, we don't see a probabilistic state, we see a
classical state, and the particular classical state that we see is one that we can imagine is chosen randomly according to the probabilities. Now, you might object on philosophical grounds to the idea that the classical state we see is somehow chosen at random at the moment that we measure or we look at the system, the system simply was in some particular classical state all along and we just happen to have learned what the state was, but for the sake of thinking about the mathematical framework we're working with, it's okay to think about it this way and
doing so will allow us to draw a parallel with quantum information a little bit later. Let's suppose that the classical state that we see is the classical state a, which could be any state in our set sigma. This changes the probabilistic state of X in general, at least from our point of view. The probabilities no longer have any significance because we now know what the classical state of X is. It's a, and there's no longer any uncertainty about this. So we would now say this that the probability that the system is in the state a
is one. We know this because we just looked at it and we saw that the classical state is a, so the new probabilistic state of X is one in which the probability of being in the classical state a is equal to one. And that probabilistic state is represented by the probability vector ket a. We have a one in the entry corresponding to the classical state a and a zero in all other entries. For example, if we again consider the situation where X is a bit and we have the same probabilistic state as before, then if
we measure X, we can imagine that a transition takes place. With probability three quarters, we see the classical state zero, and by measuring, we transition to ket zero being our probabilistic state, and with probability one quarter, we transition to ket one. You can think about this as a transition of knowledge if you prefer as opposed to an actual physical transition, but what's important for the sake of this lesson is to recognize how the mathematics works. It's kind of trivial actually, but by recognizing this triviality, the analogous description for the quantum setting might seem a little
bit less mysterious. One final note is that to another person who didn't happen to see what the classical state of X was when we measured it, the probabilistic state naturally wouldn't change as a result of us having measured it. That's okay. Different individuals can have different knowledge of a particular system and correspondingly choose different probabilistic states to represent their knowledge of that system. In the last part of this discussion of classical information, we'll talk about classical operations that can be performed on a system, starting with deterministic operations. When we perform an operation on a system,
its state generally changes as a result, and in this context, the word deterministic means that the result of performing the operation depends entirely on whatever classical state the system was in prior to the operation being performed. In other words, there's no element of chance when a deterministic operation is performed, so there's no randomness or uncertainty involved. In mathematical terms, deterministic operations are described by functions. Every function f of the form that you see right here, which means that both the input and the output of the function correspond to elements of our classical state set sigma
describes a deterministic operation. The classical state a is transformed into the state f of a for each choice of a state a. For every function f of this form, there will always be a unique matrix M satisfying this condition that you see right here, and that is that by multiplying M to the vector ket a gives us the vector ket f of a, and that's true for every choice of a classical state a. It's not too hard to write down an explicit description of M given whatever function f of this form we've chosen. It will
always have exactly one, one in each column with every other entry equal to zero. And here's the formula. The b, a entry of M is equal to one if b is equal to f of a and otherwise, it's equal to zero, and by this, we mean this is the entry of M whose row corresponds to b and whose column corresponds to a. We'll see a few examples in just a moment, and we'll see why this formula works. Now, it might not be clear why we would want to take a function and express it as a
matrix like this, but it does make sense to do this. One of the reasons, and perhaps, it's the main reason, is that if we have a system in a probabilistic state and we perform a deterministic operation on that system, then the resulting probabilistic state can be obtained by matrix-vector multiplication, i.e., if our system is in a probabilistic state that's represented by some probability vector v and we perform the deterministic operation described by M on that system, then the resulting probabilistic state will be given by M times v. Here, by the way, this arrow with a
little bar on it like this is just a common notation in mathematics for indicating how one thing gets mapped to or transformed into another thing. You could just use an ordinary arrow, but this is conventional and it's recognizable and maybe it's a little bit more specific than just an arrow, which could mean many different things. Now let's take a look at an example or a collection of examples, really. Let's consider what happens when we take our classical state set sigma to be once again the binary alphabet. There aren't too many different functions of the form
we've been discussing when this is our classical state set and to be more precise, there are just four functions of this form. Here, those four functions are described by their so-called tables of values. The first table describes the first function, which we've named f1. For each possible input on the left, we list the corresponding output on the right, so f1 of zero is equal to zero, and f1 of one is, again, equal to zero, and that is to say that f1 is the constant zero function. It always takes the value zero. The second function f2
is the identity function. The output is always equal to the input as you can see from the second table. The third function f3 is described by the third table. f3 of zero is equal to one, and f3 of one is equal to zero. This function is better known as the not function or logical negation. We sometimes also call it a bit flip because it flips zero to one and one to zero. The last function f4 is, again, a constant function just like the first one, but this one is the constant one function because it always
takes the value one. And those are the only four functions from the binary alphabet to itself. Here are the four matrices that correspond to these four functions that are described by the formula from before, which you can see right here, and if you're so inclined, you can pause the video and check this. Just remember that with matrices, the order of the indices is always rows first, column second. So the b, a entry corresponds to row b and column a or the row that corresponds to the classical state b and the column that corresponds to the
classical state a. So for example, if we look at the first function, we have the f1 of zero is equal to zero and correspondingly, the zero, zero entry of M1 is equal to one. Well, the one, zero entry which is right here is equal to zero, and you can do the same thing for the case where the input is equal to one and similarly, for the other three functions. You can also check that this formula from before is true and you can just go one by one through all the different possibilities to verify this. And
in fact, if you think about the way that matrix multiplication works, really you can just kind of eyeball this. When you multiply a matrix to a standard basis factor like this, you can imagine that you're taking the input and dropping it into the top of one of the matrices in whichever column corresponds to the classical state inside of the ket, and the output that you get is simply that column as a probability vector. So for example, M3 multiplied to ket zero gives you zero, one as a probability vector, which is ket one. M3 multiplied to
ket one gives you one, zero, which is ket zero and so on. We'll discuss operations on probabilistic states more in just a moment, but first, let's introduce the second part of the Dirac notation. This is gonna be helpful for talking about operations and thinking about them as major Cs, and it's also gonna be a standard tool that we'll continue to use going forward. The setup is just like it was before. We have whatever classical state set sigma that we're working with and we assume that we have some way of ordering those classical states. We then
write this thing that you see right here where this time the angled bar is on the left rather than the right to mean the row vector having a one in the entry that corresponds to a and a zero for all of the other entries corresponding to all the other classical states. So the difference between this thing and ket a is that this is a row vector and ket a is a column vector, but otherwise, the vectors are defined in a similar way. We read this as bra a, I'm not sure if that's supposed to be
funny, but the idea is that the names bra and ket are two halves of the word bracket, and we'll have more to say about that in just a moment. But first, let's take a look at a very quick example. For the binary alphabet, we have that bra zero is this row vector right here analogous to ket a, except it's a row vector instead of a column vector and bra one is this vector right here. Now in general, if we multiply a row vector to a column vector, just thinking about these as matrices that happen to
have just one row or just one column, then we get a one by one matrix, which we can think about as being a scaler. Here, by the way, you should think about these stars as representing arbitrary numbers. In particular, if we multiply the row vector bra a to the column vector ket b, then there are two things that can happen. One is that a and b are equal, so they're the same classical state, in which case, the ones will line up and the product will be equal to one or it could be the case the
a and b are different, and so the ones don't line up and the product is equal to zero. A simple way of expressing these observations is like this right here. And just to keep things tidy, we write this product like this with just one bar in the middle and it means exactly the same thing as this right here. And now you can see how the bra and the ket go together to form a bracket, which is also called an inner product, and that's something that's quite important that we'll come back to in lesson number three.
Multiplying a column vector to a row vector, on the other hand, gives us a matrix as you can see right here. So for example, going back yet again to the binary alphabet, if we multiply ket zero to bra zero, what we get is this matrix right here, and we can go through the other possibilities to see what happens. Ket zero times bra one gives us this matrix right here where the one has now moved over to this position. Ket one, bra zero looks like this and here's ket one, bra one. What we see happening is
that there is just a single one in a matrix that's otherwise all zeros, and the position of that one is determined by which classical states we choose in the bra and the ket. More precisely, and this is true in general, for any choice of a classical state set, the matrix that we get by multiplying ket a to bra b is the matrix with a one in the a, b entry and a zero for all other entries. And now that we have both the first and the second parts of the Dirac notation, we have a very
handy tool for connecting operations with matrices. So let's go back briefly to deterministic operations and recall that a bit earlier in the lesson, it was stated that for every function f of this form right here, we have a deterministic operation that transforms a into f of a for each possible classical state a. And we said that for any function like this, there's always a unique matrix M that acts like this on standard basis states. Using the Dirac notation, we can now very easily express this matrix like you see right here, and we can see how
sometimes it's possible to work with matrices entirely within the framework of the Dirac notation without ever explicitly converting to vectors and matrices, meaning rectangles filled with numbers. In this case, we can check that the required formula is true by multiplying M to ket a. We substitute our expression for M like this and we can remove the parenthesis and snap together the bra b with a ket a like this. And that's because matrix multiplication is linear, in this case, in the first argument and it's associated so we can remove the parentheses. And now, recalling that we
have a fixed a that we're talking about as we sum over all of the possible choices of b, this bracket right here is always gonna be equal to zero when b is not equal to a, so there won't be any contribution to the sum, and when b is equal to a, the bracket will be equal to one and so we'll be left with this one term right here. That's the formula we wanted and so we have a very nice and simple way of expressing a function as a matrix. In addition to deterministic operations, we also
have operations that themselves introduce randomness or uncertainty about the classical state of a system. We use the term probabilistic operation to mean operations like this, and to be precise, when we say probabilistic operation, we mean operations that might introduce randomness, so deterministic operations are actually a special case of probabilistic operations that just don't happen to introduce any randomness. Here's a simple example where some randomness is introduced. This is an operation on a single bit. The way that it works is that if the classical state of the bit is a zero, then nothing happens. The bit
stays as a zero, but if the classical state of the bit is a one, then the operation flips the bit to zero with probability equal to 1/2. There's no special name for this operation, at least not that I'm aware of. It's just an example, but you could imagine performing this operation on a bit. Just like deterministic operations, probabilistic operations can be represented by matrices and their actions on probabilistic states are given by matrix-vector multiplication just like before. This time, it's not necessarily the case that the matrices have exactly one, one in each column and zeros
otherwise. Probabilistic operations are more generally represented by stochastic matrices. These are matrices whose entries are all non-negative real numbers and where the entries in each column sum to one. That's equivalent to saying that stochastic matrices are matrices where every column forms a probability vector. That makes sense when you think about the action of a matrix on a standard basis vector. You consider whatever classical state you wish and you imagine dropping that state into the top of the matrix in whatever column corresponds to that classical state, and that column tells you what the probabilistic state is
that comes out as a probability vector. The probabilistic operation in our example, for instance, is described by this matrix right here. If we perform this operation on the classical state zero, then the probabilistic state we get is given by the first column, which is, again, the classical state zero or equivalently, the probabilistic state where zero occurs with probability one. If instead, we perform this operation on the classical state one, then we effectively just randomize the bit and correspondingly, we get this probability vector here where the probability for each of the possible states is equal to
1/2. In general, if we perform this operation on any probabilistic state, then the effect is essentially just a weighted average between the two columns because that's how a matrix-vector multiplication works. It's linear, so we just average the two possible outcomes accordingly. When we think about probabilistic operations in this way, it's natural to think about them as potentially introducing or injecting randomness. We can just read the columns one by one and we can see the randomness that they introduce for each possible classical state. You can also think about probabilistic operations in a slightly different way, which
is that they're random choices of deterministic operations. For instance, in our example, we can think about this operation as being equivalent to flipping a fair coin and either performing the constant zero function or the identity function each with probability 1/2, and you can see that reflected right here by this equation. That's always possible for any probabilistic operation, and it's pretty intuitive. If you made random choices as part of some process, you could always imagine making those random choices in advance and then hard coding those choices into some collection of deterministic operations. It's sometimes quite helpful
to think about probabilistic operations in this way, and we'll see an example of this down the road in a couple of lessons. The last thing to say about classical information for this lesson concerns composing probabilistic operations, which just means performing one after the other. Let's suppose that we have some system named X and M1 through Mn are stochastic matrices representing probabilistic operations on X. Imagine first that we have a probability vector v, which represents a probabilistic state of X at some moment, and then we first apply the first probabilistic operation to X, which is represented
by the matrix M1, and then we apply the second probabilistic operation, which is represented by M2. If we do that, then the probability vector we get after applying just the first operation is M1 times v, and then when we apply the second operation, the resulting probability vector is M2 multiplied to whatever vector we obtained after applying M1. So it looks like this. Now, matrix multiplication is an associative operation, which means that it doesn't matter where we put these parentheses. So we could just as easily put the parentheses like this. That's very simple, but it's also
interesting because it reveals that we can think about the operation where we first apply M1 and then we apply M2 as a single operation. In other words, the probabilistic operation we get by composing the first and second probabilistic operation is represented by the matrix product, M2 times M1, and that's the same operation regardless of what probabilistic state we started with. We don't need to know anything about the probability vector v to think about or to describe this composed operation. Notice that the order here is M2 on the left and M1 on the right, which is
because the matrix on the right, M1, is the one that gets multiplied to the vector v, whereas M2 gets multiplied to whatever vector we get after multiplying by M1. So we always have to keep in mind that the order in some sense gets reversed. The operation performed first is the one on the right and the operation that gets performed last or in this case, second, is the one on the left. If we don't stop with the second operation and we apply all of them in order starting with the first and ending with the last one,
which is Mn, then the composition of all of these operations together is given by this product that you see right here, and you see that the ordering once again is reversed for exactly the same reason as before. So in short, compositions of probabilistic operations are represented by products of stochastic matrices that represent them. Keeping in mind that this is the ordering of the matrices in the product. You will always get a stochastic matrix by taking a product of stochastic matrices like this, and sometimes we express this fact by saying that the stochastic matrices are closed
under multiplication, and the way that you can think about this is that you can't get out of the set of stochastic matrices by multiplying them because the set is closed. Getting back to the ordering of the matrices in this product, it's important to note that the ordering really does matter. Matrix multiplication is not commutative. If you change the ordering of the matrices in the product, you might change the result, and we don't need to look any further than the deterministic operations to see this. Here's a simple example of two stochastic matrices, both of which represent
deterministic operations where the ordering matters. M1 represents the constant zero function. Assuming, we're imagining that these two operations are on bits, just like in the example from earlier. M2 represents the not operation or a bit flip. If you perform M1 first and then you perform M2, the result is this matrix right here, which represents the constant one function. If you set a bit to zero and then you flip it, it's the same thing as just setting that bit to one. On the other hand, if you perform M2 first and then you perform M1, this is
the result right here, and this makes sense because if you flip a bit and then you set it to zero, it's the same thing as just setting it to zero. It didn't matter that you flipped it. Obviously, these two matrices are not the same, and so we have a very simple example revealing that the order does matter. Of course, this is not surprising at all. We all know that the order in which operations are performed matters or at least, it can matter. For example, if you light a match and then you blow on it, the
result is very different from blowing on the match first and then lighting it. At this point, we've talked quite a lot about classical information and now it's finally time to discuss quantum information. And what we'll find is that mathematically speaking, quantum information works in a very similar way to classical information with some key differences. The first key difference is right at the start, how we define a state, in this case, a quantum state of a system, and there is a sense in which this one choice how we define a quantum state largely determines how quantum
information works. We'll also define how measurements and operations work, and in the next lesson, we'll discuss how quantum information works not just for single systems but for multiple systems, but these things actually follow pretty naturally once the notion of a quantum state of a single system in isolation has been established. Here's the definition. A quantum state of a system is represented by a column vector whose indices are placed in correspondence with the classical states of that system. So we're assuming here that the system we're talking about has some set of classical states, just like when
we talked about classical information and probabilistic states. The entries in a quantum state vector are complex numbers as opposed to the probabilistic case where the entries are non-negative real numbers. And this time, the sum of the absolute values squared of the entries must equal one as opposed to the sum being equal to one as we have in the probabilistic case. The complex numbers that appear in a quantum state vector are sometimes called amplitudes and they play a role that's similar to probabilities, but they aren't probabilities and they don't have the same interpretation. Now, I can't
give you a good reason for why quantum states should be defined like this other than to say that physicists have discovered that this definition is physically relevant and it allows us to describe and model quantum mechanical systems. In other words, we choose this definition because it works, not because there are any particular reasons why quantum states should be defined like this. With respect to the mathematics of this notion of a quantum state, the first step towards understanding it and working with it is to recall this definition for the Euclidean norm of a vector, which you
can think of geometrically as the length of a vector, just like you would think about the length of a two-dimensional real vector if you drew it on a piece of paper as an arrow starting from the origin in the usual way. Specifically, if we have a column vector v like this, which we assume has complex number entries, alpha one through alpha n, then the Euclidean norm of v is defined like you see right here. This is the notation we use for the Euclidean norm putting these double bars around whatever vector we're talking about. In this
case, it's v, and the way the Euclidean norm is defined is that it's the square root of the sum of the absolute values squared of the entries of the vector. So another way to define what a quantum state vector is is that a quantum state vector is a column vector having complex number entries whose Euclidean norm is equal to one, which is to say that it's a unit vector with respect to the Euclidean norm. That's because the sum of the absolute value squared is equal to one if and only if the Euclidean norm is equal
to one. The square root doesn't change anything when the sum of the absolute values squared is equal to one. So that's what a quantum state vector is. It's a column vector with indices that correspond to the classical states of a system having complex number entries and Euclidean norm equal to one. What that means or what it implies or how we should interpret this notion is a different matter and in some sense, the study of quantum information and computation is an exploration of what this definition implies, at least, in terms of information processing. This is simply
the starting point. As an aside, let me mention that there are other so-called norms besides the Euclidean norm and this notation right here with the double bars around a vector doesn't always mean the Euclidean norm. In different contexts, this notation could refer to a different norm. For example, the sum of the absolute values with no squares or square root is a different norm, often called the one norm. There are quite a lot of different norms, in fact. Sometimes you'll see this notation being used with a subscript after the second set of bars that provides more
information about what norm we're talking about. And when we do that, it's pretty typical that the number two as a subscript means the Euclidean norm. But in this series, whenever we have a column vector like this, this double bar notation all by itself means the Euclidean norm. Here are a few examples of quantum state vectors and these particular quantum state vectors are qubit state vectors, which means that the classical states of our system are zero and one. The word qubit is short for quantum bit, which is really just a bit that can be in a
quantum state. The first two examples are really simple. The standard basis vectors ket zero and ket one are quantum state vectors. They're both column vectors, and the indices correspond to the classical states is zero and one, and the entries of these vectors are complex numbers. For these particular vectors, the entries are zero and one, which are real numbers, and in fact, they're integers, but they're also complex numbers that happen to have imaginary part equal to zero. Each vector has one entry equal to one and one entry equal to zero. So when we sum the absolute
values of the entries, we get one in both cases. So the conditions required for these vectors to be quantum state vectors are satisfied. The next two examples are very commonly encountered qubit states called the plus state and the minus state. They're denoted as you see here on the screen, either with a ket plus or a ket minus like this, and they're defined as you can see right here. Again, the entries of these vectors are complex numbers. This time, they're either positive one over square root of two or negative one over square root of two, and
when we sum the absolute values squared, we get 1/2 plus 1/2, which is one. By the way, putting a plus sign or a minus sign inside of a ket like this might seem a little strange because these aren't classical states of our system, but this notation is used nevertheless, and we'll come back to this momentarily. But for now, in short, we can use the Dirac notation to put whatever name we want for a vector inside of a ket. The vector doesn't have to be a standard basis vector when we use the Dirac notation. Here's a
final qubit state example for the moment, anyway. It's a quantum state vector of a qubit that doesn't have a special name, and it's not particularly significant. The entries of this vector are one plus two times i over three and negative 2/3. These are complex numbers, and if you compute the absolute values squared of these two entries, you'll get 5/9 for the first one and 4/9 for the second one. So the sum of the absolute value squared is equal to one. And here's one last example. This time for a system whose classical states are the four
card suits. I don't know why we would have a system whose classical states are the four card suits being in a quantum state, but it is possible in principle and it's just meant as an example. The entries corresponding to the different suits are 1/2 for clubs, negative i over two for diamonds, zero for hearts because it doesn't appear in the sum and one over square root of two for spades. And here's what it looks like as a column vector, assuming that we've ordered the classical states as we did before, which is the order that you
see right here. Taking the absolute value squared of these entries gives one quarter for the first one, also one quarter for the next one. The absolute value squared of zero is equal to zero, so that entry doesn't contribute to the sum. And finally, for the last one, we get 1/2. So the sum is equal to one as is required for this vector to be a quantum state vector. Now just a moment ago, we used the notation ket plus and ket minus to referred to two qubit state vectors that aren't standard basis vectors. And I'd like
to return to this point briefly and explain how the Dirac notation is used for arbitrary vectors and not just for standard basis vectors. As I've already mentioned, we can use whatever name we want inside of a bra or ket to refer to a vector. Kets are column vectors and bras are row vectors just like before. As an example, the Greek letter psi is very commonly used inside of a ket to refer to some arbitrary vector, and here, we're using it to give a name to the state vector from before that doesn't have a special name.
When we do this, the let our psi doesn't necessarily have any meaning all by itself. It's just the name of a vector, and it's inside of a ket to help us to clarify that it's a column vector. This can be a little bit confusing sometimes, and there is a potential for ambiguity if the letter psi could be associated with a classical state of some system for instance, but it generally doesn't cause any problems, and you don't have to use the Dirac notation like this if you don't want to. Using a lower case letter such as
u, v, or w without a ket is also a fine way to denote a column vector or more specifically, a quantum state vector. It's your choice what notation you use to express yourself. You just need to make sure that number one, the meaning of what you are trying to express is clear to others. And number two, that you understand what others are trying to express even when their preferences for notation aren't the same as yours. There is one very important rule for using the Dirac notation for arbitrary vectors and that is that if you have
a column vector that you've decided to write as ket psi, for instance, or it could be a different name inside of a ket, then the row vector bra psi is understood to be the conjugate transpose of the vector ket psi. And here, this is written as an equation where this dagger right here refers to the conjugate transpose. What that is specifically is the row vector you get by, number one, transposing the vector, meaning that you flip the vector on its side and you change it from a column into a row without actually changing the entries.
And number two, taking the complex conjugate of each of the entries. You can actually perform those two operations, the transpose and the entry-wise complex conjugate in either order or those two operations commute. So for the vector in this example, given that we've decided to call this vector ket psi, it's understood that bra psi means this vector right here where we've transposed the vector by turning ket zero and ket one into bra zero and bra one, and we've taken the complex conjugate of each entry, which is why one plus two times i over three turns into
one minus two times i over three right here. Here's what these vectors look like when they're written explicitly as a column vector and a row vector. Notice, by the way, that this rule is consistent with the notation we use when we have classical states inside of the bra and the ket. In that case, the entries are all zero except for one, one, and so taking the complex conjugate doesn't do anything in that case. We've now seen how quantum states are represented as vectors, but it's not clear at this point what the meaning or the implications
are in terms of what you can do with the system in a quantum state or how you can interact with it. The first thing we need to do to bring these issues into focus is to discuss measurements. Intuitively speaking, measurements provide a mechanism for extracting classical information from quantum systems. When we look at a system when it's in a quantum state, we don't see a quantum state just like we don't see probabilistic states, we see classical states, and this notion of a measurement is what provides the interface. If we, as humans, want to know something
about a quantum system, we have to measure it, and in doing so, we'll extract classical information about whatever quantum state that system was in. There are different notions of measurements. The difference is one of generality really, and we're going to start with the simplest and most basic one, which is typically called a standard basis measurement. We will generalize this in subsequent lessons. We'll talk about so-called projective measurements in lesson three, and then later on in the third unit of the series, we'll talk about general measurements, which are also called positive operator valued measures or POVMs
for short. But for now, we'll restrict our attention to standard basis measurements. When a measurement is performed on a system, there will be some set of possible outcomes of that measurement, which are classical outcomes, and in the case of a standard basis measurement, those possible outcomes are precisely the classical states of whatever system is being measured. Each of those classical states will be the outcome of the measurement with some probability, and that probability is the absolute value squared of the entry corresponding to that classical state of whatever quantum state vector the system was in immediately
prior to being measured. We know that the absolute value squared of any complex number is a non-negative real number, and we know from the definition of quantum state vectors that the sum of the absolute value squared of the entries of a quantum state vector is equal to one. So this makes sense. Each classical state will appear as the outcome of the measurement with some probability and the probabilities sum to one. Here are a few examples. First, suppose we have a qubit in the plus state. If we measure this qubit with respect to a standard basis
measurement, we get either a zero or a one as the outcome. The probability of getting a zero is the absolute value squared of one over root two, which is 1/2, and that's the same probability of getting a one. So if you measure a plus state, you'll get a uniform random bit. If the system is in a minus state rather than a plus state, then measuring works in exactly the same way. The probabilities are, again, 1/2 for each possible outcome. The only difference here is that the plus sign turned into a minus sign, but when we
take the absolute value, nothing changes. If we measure this qubit state, the probability to get the outcome zero is the absolute value squared of one plus two times i over three, which is 5/9, and the probability to get the outcome one is the absolute value squared of negative 2/3, which is 4/9. And finally, measuring the standard basis state, ket zero gives the outcome zero with certainty, and likewise, measuring the state ket one yields the outcome one with certainty. That's because the absolute value squared of one is one. Similar to what we had in the probabilistic
setting. We can associate these quantum states, ket zero in ket one, with the system being in those classical states, and this example is consistent with that interpretation. Now, just like we had in the probabilistic setting, if we measure a system when it's in a quantum state, then the state will change, in general, as a result of having performed that measurement. In particular, and again, this is exactly like we had in the probabilistic setting. If we measure a system and the outcome of the measurement is the classical state a, then the new quantum state of the
system becomes ket a. So for this state right here, for instance, we see the same kind of behavior that we had in the probabilistic setting. If we measure then with probability 5/9, we obtain the outcome zero, in which case, the state transitions to ket zero, and with probability 4/9, the outcome is one, and the state transitions to ket one. Sometimes this phenomenon is referred to as a collapse of the quantum state, and it's the kind of thing that keeps people that study the foundations of quantum mechanics awake at night. But from a purely mathematical perspective,
it's really quite simple and very much analogous to what we have in the probabilistic case. The source of tension here, in some sense, concerns the fundamental nature of quantum states and what they represent and how they differ from probabilistic states. But at a mathematical level, this is perhaps what you might expect. Notice, by the way, that if you measured a second time, then you would get exactly the same result as you did for the first measurement. So there's a limit on how much classical information can be extracted from a quantum state, and we'll come back
to this point soon in another lesson. We've talked about quantum states as well as measurements, specifically standard basis measurements, which provide a way to extract classical information from quantum states. The remaining topic for this lesson is operations, specifically unitary operations, which describe how quantum states of systems can be changed. Naturally, given that quantum state vectors are different from probability vectors, you would expect that set of allowable operations on quantum states is different than what we have for classical information, and indeed, that's the case. Whereas operations on probabilistic states are represented by stochastic matrices, operations on
quantum state vectors are represented by unitary matrices. Here's the definition of unitary matrices. A square matrix U having complex number entries is unitary if it satisfies the equalities that you see right here. The dagger represents the conjugate transpose, which we already saw for vectors when we talked about the Dirac notation a few moments ago. Here we're performing the conjugate transpose on a matrix rather than a vector, but it's essentially the same thing. We transpose the matrix, which just means that we swap rows and columns, so the JK entry becomes the KJ entry, and also we
take the complex conjugate of each entry. Also, this notation right here means the identity matrix. These two equalities right here are actually equivalent. If you have one, then you automatically have the other. They're both equivalent to saying that you is invertible, and the inverse is equal to the conjugate transpose. That's only true for square matrices, by the way. If you have a matrix that isn't square, then one of those two equalities might be true, but then the other one won't be true. But here, we're talking about square matrices. So this is the standard definition for
unitary matrices, and it's nice because it's easy to check if a matrix is unitary. You just multiply the matrix to its conjugate transpose on either side, and you see if you get the identity matrix. There's an equivalent way to characterize unitarian matrices, and that is that a square matrix is unitary if and only if it never changes the Euclidean norm of any vector when you multiply. An N by N matrix U is unitary if and only if the Euclidean norm of U times v is equal to the Euclidean norm v for every n-dimensional column vector
v with complex number entries. And therefore, if v is a quantum state vector, then U times v is also a quantum state vector because quantum state vectors are simply the vectors having Euclidean norm equal to one. In fact, we can say a little bit more, which is that the unitary matrices are precisely the matrices that always transform quantum state vectors into quantum state vectors. That's a similar situation to what we have for stochastic matrices and probability vectors. The stochastic matrices are precisely the matrices that always transform probability vectors into probability vectors. So once we've decided
that operations should be represented by matrices, which is the same as saying that transformations act linearly on vectors representing states, these choices for the sets of allowable operations follow naturally, at least, in a mathematical sense. Now let's take a look at some examples of qubit unitary operations. We'll see many more examples of unitary operations in subsequent lessons, including unitary operations and systems having more than two classical states. But for now, we'll focus just on qubit unitary operations. We're going to be seeing these particular examples arising again and again, so it's definitely worth getting to know
these operations. The first collection of examples are the so-called Pauli operations. These are the operations that correspond to the Pauli matrices, which are shown here. They're all unitary, and you can check that for each one according to the definition from before. These particular matrices happen to be equal to their own conjugate transposes, which is to say that they're all Hermitian matrices. So checking that they're all unitary, in this case, is a matter of squaring each one and seeing that the result is the identity matrix. The names you see for these matrices are pretty standard, and
you'll also see sigma x, sigma y, and sigma z called simply X, Y, and Z. Do be aware, though, that the capital letters X, Y, and Z are also commonly used for other purposes, even in the context of quantum information, but they are very common names for these matrices. The sigma x or X operation is also called a bit flip or a not operation, which we've already seen in the classical context. And the sigma z or Z operation is also called a phase flip. And here, you can see the action of these operations on the
standard basis factors, which kind of explains where these names come from. Certainly, it makes sense to refer to sigma x as a bit flip based on this action right here. It simply flips the bit from zero to one or one to zero. It also makes sense to call sigma z a phase flip based on this action right here. But if that's not crystal clear at the moment, that's okay. We'll encounter this operation over and over and the significance of putting a minus sign in front of the ket one basis vector and not ket zero and
why that's called a phase flip will be more clear later on. The next example is the Hadamard operation, which is represented by this matrix right here, which is pretty much always named H. Checking that H is unitary is once again a straightforward calculation, which if I step out of the way, you can see right here. Similar to the Pauli matrices, H is its own conjugate transpose. So we have this equality right here, and when we perform the multiplication, we get the identity matrix as is required. The third example is really an entire class of unitary
operations known as phase operations. These are operations represented by any matrix that takes this form that you see right here where theta is any real number. So this entry right here will always be some complex number on the unit circle. Matrices like this are always unitary, which I will leave to you to verify. This time, the conjugate transpose won't be equal to the original matrix unless this number here happens to be a one or a minus one, in which case, we end up with either the identity matrix or sigma z, which we've already encountered. Specifically,
transposition won't do anything to this matrix, but taking the complex conjugate of each entry changes this entry right here to e to the minus i theta. These two particular operations right here are particularly important examples of phase operations. For the first one, we take theta to be pi over two, and we get this matrix right here. This is commonly called an S operation or an S gate when we're talking about circuits, which we won't discuss in this lesson, but that's an important topic that's coming up soon. The second one is this operation. It's a T
operation or a T gate. And for this one, theta is equal to pi over four. And so when we compute the exponential, we get this value right here, which is one plus i over the square root of two. Here are just a few quick examples to see a couple of these operations in action. Here's the action of the Hadamard operation on ket zero and ket one. If we go through the matrix-vector multiplication, we see that what we obtain are the plus state and the minus state respectively. If we perform the Hadamard operation on the plus
state and the minus state, on the other hand, we get back to ket zero and ket one. So if we clean things up a little bit, we can see the action more clearly. The Hadamard operation transforms back and forth between the zero state and the plus state and between the one state and the minus state. Concerning these two equations right here, notice that this gives us a simple way to detect the difference between a plus state and a minus state. As we saw earlier in the lesson, if we measure a plus state and a minus
state with respect to the standard basis measurement, we get a uniform random bit in both cases, which doesn't help us to detect the difference. But if we perform a Hadamard operation and then we measure, we'll see a zero if the original state was a plus state and a one if the original state was a minus state. So these two states are indeed different, and they can be discriminated perfectly by the method that I just described, which is to first perform a Hadamard operation and then to measure. Going back to the example of a qubit state
we saw before that isn't particularly special, we can see what the Hadamard operation does to the state by converting to the forms that you see right here and performing the multiplication. And here is the result which we can convert back to the Dirac notation if we wish, and that's one way to compute these sorts of actions. Here's another example that illustrates that you don't always have to convert explicitly to a matrix vector form. You can also perform these computations directly using the Dirac notation. This example concerns the T operation, and here it is right here
just for reference. This operation has this action right here on standard basis states. And if we want to compute the action of the T operation on a plus state, for instance, we can start by expanding the definition of the plus state and then use linearity to express the result as you see right here. We can now just plug in these expressions up here, do just a little bit of simplification for the second term, and we obtain the result. If for whatever reason we'd like to apply a Hadamard operation to the result, we can do a
similar thing. First, we substitute the expression for T acting on the plus state and we use linearity like we did before. And now, if we substitute the expressions we already have for the Hadamard operation acting on ket zero and ket one, we obtain this. And now, we can expand the plus state and the minus state to get this expression right here, which we can simplify just by adding, and here is the final result. There's nothing particularly special about this example. It's just meant to illustrate this alternative but equivalent way of calculating the actions of these
operations. You can use whichever way is more convenient in the situation at hand. One final point about unitary operations is that compositions of unitary operations are represented by matrix multiplication just like in the probabilistic setting, i.e., you can compute the action of a sequence of unitary operations by simply multiplying the matrices together. And what you'll get is a single unitary matrix representing the combined action of the sequence of operations. Just like we had for stochastic matrices, the unitary matrices are closed under multiplication, so you'll always get a unitary matrix when you compose unitary operations. You
have to make sure you multiply the matrices in the correct order. But the way it works is just like it was for stochastic operations where the matrix for the first operation you perform will always appear on the right-hand side of the matrix product, and the last one will appear on the left-hand side. So just like before, the order is reversed in the sense that you compose from right to left rather than left to right. Here's an interesting example that gives rise to an operation known as the square root of not operation. Suppose that we first
apply a Hadamard operation followed by an S operation, followed by another Hadamard operation. The combined action of these three operations is described by this product. Here's the first Hadamard operation, here's the S operation, and here's the second Hadamard operation. If we perform the matrix multiplication, which isn't shown here explicitly, but you can do this for yourself if you wish, the result is this matrix right here. It's a unitary matrix, and we can either check that directly or we can trust in the fact that the unitary matrices are closed under multiplication. It has to be unitary
because these three matrices are unitary as we've already seen. We don't have any particular quantum state vector in mind here. We can imagine performing this sequence of operations on any quantum state vector, and this matrix describes the action. The reason why this combined operation, meaning the one described by this matrix right here is called a spirit of not operation, is that by performing it twice, we get a not operation, or in other words, a sigma x or an X operation. Performing the operation twice is represented by squaring the matrix or multiplying it to itself. And
if you do this, you'll see that the result is the Pauli x matrix. And that's kind of peculiar, and it gives you just a hint that you can do some interesting things with quantum information that you can't do with classical information. There's no classical operation, meaning one represented by a stochastic matrix that gives us a not operation if we perform it twice. So it's a nice example that reveals that quantum operations behave very differently from classical operations. And that's it for lesson one. In this lesson, we discussed how quantum information works for single systems in
isolation, and in the next lesson, we'll talk about how quantum information works for multiple systems. Thanks for watching. If you have any questions, feel free to leave them in the comments section below.