1. Signatures, Hashing, Hash Chains, e-cash, and Motivation

208.97k views11946 WordsCopy TextShare

MIT OpenCourseWare

MIT MAS.S62 Cryptocurrency Engineering and Design, Spring 2018 Instructor: Neha Narula, Tadge Dryja ...

Video Transcript:

the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu okay so um so let's get started okay so um great we're here to talk about cryptocurrency engineering and design I think the first question that comes up that's very obvious is what is a cryptocurrency so this word was kind of invented ten years ago when I don't know how many of you know the

origin story of where Bitcoin came from but basically a pseudonym on the internet dropped a paper and some open source code in a forum on an email list and said hey I have this idea for this thing called Bitcoin it's kind of like electronic cash here's how I think it could work and here is some code if you want to run it and become part of this peer-to-peer network we don't know who this person is this person is basically virtually disappeared from the internet and from the world but it's created something that has captured so

many people's imaginations and has sort of you know depending on how you measure it created billions and billions of dollars of economic value and sort of inspired a lot of people to think about how to use this technology to solve a myriad of different problems not just electronic payments so you know crypto currencies and the technology behind them are inspiring people to think about how to bank the unbanked add more auditability and traceability to our world get rid of trusted intermediaries and institutions in certain situations and you know basically solve every problem if you read

if you read about what block chains can do on the Internet now that's not exactly what this class is about this class is not gonna be about applications this class is going to be about technology and infrastructure you're gonna learn how to create a cryptocurrency what goes inside a cryptocurrency what's important what are the techniques and you know what application you choose that to apply that to down the line that's kind of up to you but we're not going to be doing digital identity or healthcare records or something like that we're gonna be talking about

the technology so a big question is how are cryptocurrency is different from regular currencies and another thing that I want to make really clear is that the term in this space are still being defined so you will hear people throw around all sorts of terms cryptocurrency blockchain consensus and and and these words are kind of have floating evolving meanings right now part of that is because Bitcoin the first cryptocurrency didn't come from academia as far as we know it came from a community of enthusiasts on the internet and so it doesn't necessarily have the same

basis and rigor that we might expect from most of our academic fields of study it's totally okay we're figuring it out as we go along and academia is really embracing this topic so if any of you are graduate students who are looking for an area in which to do research I think more basically the number of papers published on cryptocurrencies and blockchain technology and respected academic venues is doubling every year so there's huge opportunity here so cryptocurrencies are not regular currencies you know they're not a dollar or pound or a euro what we normally think

of as currency there's something different Bitcoin was sort of created out of nowhere and and what does it mean to create a cryptocurrency you know who says you can create a cryptocurrency what backs a cryptocurrency why is it valuable well I think in order well first before we answer that question I just want to make it really clear what this class is not about okay we are not going to help you I see oh if you are interested in IC owing just you just go like this is not that's not what this bus is gonna

be about we are not going to offer any trading advice we have zero opinions on whether you should buy Bitcoin now or sell or whatever you know Zen cash or whatever all these things are so like none of that don't even ask us we're not interested and this class is not really going to be about permissioned blockchains either now you might not know what this term means yet and that's totally okay but I just want to make it clear that what we're talking about here are crypto currencies they're open permissionless systems in which there is

a token which has some value so that's what we're not going to do in the class so going back to and let me just pause there for a moment let me and ask you if there are any questions so far about what I've said yeah no not at all and let's start to get into that so the question was do tokens always have to have value so I think really to understand what our cryptocurrency is what are tokens what do they mean we have to talk about money okay and we have to talk about what

money is and what it means so this is gonna be very hand wavy and I'm sure not very satisfying to a real monetary economist but you know money developed there are a few different theories about how money developed there is this thing called the coincidence of wants so maybe I have a sheep and tag has some wheat I am hungry and would like to make bread Tadgh would really like to make a sweater and so we can barter we can trade you know I have won one set of goods that is useful to tag cash

is another set of goods that are useful to me we can get together and make an exchange so that's fantastic barter is incredibly important it barters existed for a long time but what if you know tag doesn't have weeped tag has vegetables and I don't want vegetables I want wheat but tad still wants the wool from the Sheep how do we execute this trade we don't have a coincidence of wants we don't actually want the exact same thing from each other so some theories are that money evolved out of this problem and money can be

represented in so many different ways it's it's it's it's several it's from money I think was first created around 5000 BC so it's really really really old the things that represented money usually had certain properties they were rare they were they were not easily reproducible people at times used things like shells or bead beads for money you know the first coins this is like a really interesting coin that was developed precious metals were often used for money and then eventually we sort of evolved into what we think of as money now which is paper bills

currency another theory of how money came about is this idea of receipts debt and credit so you know maybe I have a sheep and I share all of my sheep and collect a lot of wool what I can do is I can store that role somewhere and I can get a receipt from someone from having stored that wool and that receipt is a value it entitles the person who holds the receipt to the good that is being stored and it and and so another theory of money is that money evolved out of these receipts trading

these receipts back and forth instead of taking all that wool with you you leave it in one place in a depository and the receipt acts as a bearer instrument whoever owns it has access to the wool in the depository and so you can kind of see two different ideas about what money is develop from this one is well it's it's a bead or a coin or you know something that I hold something physical that we've decided to assign value to in and of itself and another idea is I'm going to use a trusted institution I'm

going to deposit something with that institution and they are going to ensure the validity of that deposit and manage who has access to that deposit so this doesn't really get it the question that was originally asked which is why do tokens have value but one thing I want to point out is you know why well a question I want to ask you guys actually is why do these things have value does anyone have any ideas yes because everyone agrees that they do any other thoughts on why those things have value they're backed by institutions say

a little bit more about that what does that mean okay the government's coming too promising to respect the value of that does anyone want to add to that or have another reason yes and see your name payment for taxes okay so that kind of connects to the government thing okay great anybody else yes they have value because they're rare okay interesting all right so those are all really interesting ideas I think that those are all sort of properties of what makes things value a valuable there are definitely things that are rare that are not valuable

right I can think of some things you know that that might be extraordinarily rare there's only one or two of them universe and you would have no interest in owning them whatsoever you wouldn't assign value to them certainly it's really important that you can pay taxes with this stuff because taxes is you know pretty much a requirement of living in any country there are things that have value that you don't necessarily use for taxes so that's that's a little confusing and then there's this idea that it's backed that it's backed by something right and you

know the dollar used to be backed by something and actually if you look at a dollar I think it still says this right it's backed by The Full Faith and Credit of the United States government they don't say that anymore they used to say that but that's what like a lot of people say about about money it's backed by the by the Full Faith and Credit of the United States government you know what does that really mean I think what it all goes back to is these things are valuable because we think they're valuable we've

all decided they're valuable and you know that if you have a dollar bill and you want to buy something from someone they're gonna take it that's gonna that you can make that exchange and the reason that they're gonna take it is because they know that someone else is going to take it right these things hold value because we think that they hold value it's it's a collective story that we all tell so I think once you look at money that way then when you start to look at tokens which are essentially digital representations of these

things things that are rare and a little bit special then when you ask well why does this token have value because we think it has value so what makes a token inherently valuable the fact that we think it's valuable and a lot of different things can go into that maybe we think it's valuable because it you know it's very rare or maybe we think it's valuable because someone's promised that you can use it to pay for storage like with Dropbox or maybe we think it's valuable for a completely different reason because we like the name

or we like the people who are running the network but ultimately tokens are valuable these digital representations are valuable because we think they're valuable yes well so my argument is that the fact that they're limited is something that goes into our perception that makes it valuable great ok so now that we've learned a little bit about money talks a little bit about money I want to go into how payments work because ultimately we're gonna get to cryptocurrencies and cryptocurrencies are electronic cash so here's the way that digital payments kind of work right now you have

an institution called a bank you have Alice and you have Bob and Alice and Bob have accounts at this Bank right and so the way so the bank is keeping track of sort of who owns what and these are these are records these might be digital records they might be paper Ledger's or paper records whatever the bank is using to keep track of sort of who has what in their account and so the way that I've set up this example right now Alice and Bob both have bank accounts Alice has $10 with the bank and

Bob does not have any money with the bank so let's say that Alice wants to pay Bob let's say that Alice and Bob have gotten together maybe they're in the same coffee shop and Alice wants to buy a sandwich from Bob and Bob says okay you need to pay me a dollar if you give me a dollar then I'll give you the sandwich so how can a let's do this how can she transfer a dollar to Bob well if she had a paper dollar she could just do that right but you know let's say that

she doesn't have a paper dollar so Alice can ask the bank to make this transfer for her or five dollars so Alice sends a message to the bank and authenticates with the bank and you know to show the bank that she is in fact Alice but I'm not going to go into the details on how that works and then the bank confirms that makes the transfer in its ledger says Alice now has five dollars and Bob now has five dollars Alice tells Bob hey I did this I talked to the bank go check you can

you can verify it for yourself Bob checks with the bank and sees yes in fact the bank is saying that he has five dollars now whereas before he had zero and then Bob gives Alice a sandwich because he believes that he now has five dollars and the bank sort of preserved the property that money was not created out of nowhere that that the balance was ultimately maintained so the bank is very important in this scenario the bank is critical this is how digital payments work credit cards venmo banks it's kind of all sort of based

on the same idea that there's some trusted institution that is handling that payment for us and that is keeping track of everything now what are the pros and cons of this scenario anyone want to throw a couple out yeah around right so we're putting a lot of trust in this bank and maybe you know should we trust the bank banks fail sometimes banks are hacked banks have humans who are running them who occasionally might want to change those balances in their favor this has all happened anything else yeah yeah there's a lot you know I

almost have to talk to the banks and that's kind of annoying so there's that anything else yeah so okay so this is like getting a little bit more advanced here you know what if everyone takes their balances out at the same time well you know we need to make sure that the bank actually has that money so to speak we're not gonna be talking about that problem right now but very good problem so to kind of talk through some of the pros and the cons of this situation one of them one of the big pros

I think is that even if Allison and Bob are not in the same physical location Alice can still pay Bob if they can talk to the bank so that's pretty cool and that's something that you can't do with dollar bills or with coins or with bars of gold right so having the stress of institution that you can communicate with electronically means that Alice and Bob could be halfway around the world from each other and they can still pay each other so that's that's pretty awesome and that is definitely a property that we want to have

in terms of cons I think we covered quite a few of them which is you know we're really putting this Bank kind of in the middle of everything here and there are a few different ways that that can cause us trouble so the bank needs to be online during every transaction if the bank is offline then how does Bob know whether he got paid or not the bank could fail at some point in time which is kind of related to that the bank could simply decide that they don't want to do this anymore and can

block transactions and then privacy the bank has kind of insight into everyone and their payments and this is like incredibly sensitive information payments are quite quite important and we're gonna be talking about privacy a lot in this class during the second half of this so just like an example a couple of like visual examples of that the bank could just totally go away and then what happens to that ledger who knows right I mean literally it could just disappear it's just you know maybe it's paper and it gets burnt or maybe it's bits on a

computer and it wasn't replicated um the bank could decide that they don't like Alice for some reason and that they don't feel like processing Alice's transactions this happens all the time in the real world so there have been designs for electronic cash that that work a little bit differently and and we're gonna kind of step up to the most sort of like the design that came right before Bitcoin and we're gonna do that iteratively so let's talk about ecash and how he cash works so the way that ecash works is Alice tells the bank instead

of saying hey Bank do this transfer for me Alice says hey I would like a digital representation of a coin can you give me something that is digital so I don't have to be in the same physical place as you and that you know that I can use in such a way that I can prove to someone else that I have this thing and then I haven't double spent it because that's the problem with digital representations of coins like a fundamental problem is that bits can be copied so whatever system you use to design your

electronic cash you need to make sure that people can't just copy coins and give this what what is sort of the same coin to multiple people in the previous example the bank was making sure this happened the bank was maintaining balances and you know Deb debiting Alice's account and crediting Bob's account but you know if we want to think about something that doesn't involve the bank and we're starting to get there then we need to think about you know how to ensure that a coin can't be what is known as double spent so Alice asked

the bank for coin and you know maybe she does that maybe she has an account with a bank like before or maybe she gives the bank teller like actual physical money in order to get one of these coins so the bank generates a unique number s n stands for serial number and decides that this is the physical this is the digital representation of the coin then gives that coin to Alice in a way that it's clear that the bank did this usually this is done using a technique called digital signatures we're not we're gonna get

to that in as class progresses but not right now once Alice has this coin then she can give it to Bob and Bob can take a look at this coin and hopefully there's enough going on with this coin that Bob can be convinced that this is a real coin Alice didn't make it up out of nowhere she actually had the funds so to speak to give to Bob and that it hasn't been double spent and with traditional ecash the way that this was and and once Bob is convinced of that he can give alice to

sandwich now in traditional ecash the way that this is done is Bob actually goes back to the bank and says here's this coin Alice just gave me this coin you know is this an OK coin but the fact of the matter is that the bank in this case has a serial number and knows that it gave that unique serial number to Alice and then Bob is showing up with a coin that is that serial number and what the bank is doing here in this example is the bank is the way that the bank checks to

make sure that this coin is correct if it looks at the serial number and it makes sure that it hasn't been spent before right so the bank can link the coin between Alice and Bob which is unfortunate the bank can also the bank also kind of still sort of has to be on line not to do the actual payment between Alice and Bob but in order for Bob to have confidence that this coin is real and and you know later on Bob can say you know I would like a dollar for this coin that I've

just given you or something like that or Bob can have an account with the bank and can maintain a balance there so just to kind of like go through some of the pros and cons here oK we've kind of done something where the bank's not in the middle except the bank is still really in the middle we're getting like a step closer but we're not we're not there we don't need you know the the Alice can technically give Bob this electronic thing that represents value but Bob still needs to talk to the bank to make

sure it's real and it hasn't been well spent and we still have this problem where the bank is the one who's minting these things the bank can decide not to give Alice a coin if it feels like it and we still have this privacy problem because the the secret number the serial number that we invent for the coin can be linked across these payments so there's this notion of something called Chow me and E cash so David Chapman was a cryptographer and he developed the system which has slightly nicer properties than previous forms of ecash

so the idea here that's really key is instead of the bank choosing the secret number Alice chooses as a secret number and we have ways of generating random numbers that we can be fairly sure are unique so we can let everybody generate their own random numbers so in show me an e cash Alice chooses the secret number that represents a coin and then Alice blinds her message so Alice adds some randomness to the secret number such that the bank doesn't know what that number actually is and we'll get into more detail about exactly what that

means that's actually it's it's all in the paper that was assigned reading for this class so make sure that you take a look at it so when the bank verifies that the secret number is a real secret number and it's really a coin and Alice gave the bank of dollar or something like that the bank does so on the blinded secret number and Alice actually has the ability to remove that randomness or that blinding later and end up with a valid signature on a secret number so Alice does the same thing that she did before

she gives Bob a representation of that electronic coin and when Bob redeems it note that the bank never sees what the number is so when Bob redeems it the bank has no way of linking the payment between Alice and Bob so just to get into how this works visually Alice will talk to the bank and Alex will use a blinding factor on the secret number and so when Alice talks to the bank the bank doesn't actually see what the secret number is they can't decode it again Alice gives a dollar or something like - to

get this coin from the bank and the bank signs this Alice can remove the blinding factor later and this is what the coin is the coin is a valid Bank signature on the secret number and also the number itself which Alice can then send to Bob Bob can check and make sure that this is a valid signature from the bank and if that's correct then Bob can give Alice a sandwich in order to redeem this Bob gives this coin to the bank the bank says okay I've never seen the secret number before so I'm gonna

assume that it's you know and you have my signature on it so I'm gonna assume that I went through this process with somebody and you know signed something and now I'm gonna record that secret number once that happens Bob can be sure that this coin hasn't been spent before the bank keeps a running list of all the secret numbers its seen and it makes sure that if it ever sees one again it can it can say no this is this is not correct this I should never see a secret number more than once now okay

but we you know what about Alice could give one version of that to Bob Alice could also give a version of that to Charlie and how are Charlie and Bob supposed to know what you know whose coin is correct because remember we wanted to try to get the bank out of the way when doing this and so in Chow me an e cash the way that this works is the bank actually keeps a bit more information and the information that the bank is keeping won't let the bank link these transactions together unless Alice happens to

give this to two people and so if Alice gives the same coin to two different people the bank will be able to detect it and the bank will be able to know it was Alice and so this is kind of a motivator for Alice not to do that so the idea being here is that the way that we get around the fact that we don't know if a coin is has been double spent or not is we we add punishment if the coin is spent so bob doesn't know for sure that this coin he receives

hasn't been double spent but he does know that if it was someone's gonna know it was Alice and they're gonna punish her so this is a pretty clever scheme and this actually gets us around a lot of problems we have digital payments we can make the actual transfer without the bank in the middle we have some privacy now because the bank can't link transactions together and we have this way of doing double spend detection we have a way of motivating people not to double spend their coins which means that you probably don't have to check

at the time you receive a coin whether or not it's been double spent of course this still suffers from a really big problem which is that a bank can still decide that they just don't want to do this with you they can just decide that they don't want to play this game with you they don't want to issue coins they don't want to maybe they don't like you specifically maybe they don't want to take your coins and exchange them so this scheme tchau me and ecash solves quite a bit of problems when it comes to

how do we have electronic money with you know some nice features but it doesn't quite get to all of them and so the real question in this class is how do we do electronic money really in a peer-to-peer way where there's no institution in the way there's no sort of entity that can say no so he cashes the math is really interesting it never really that kept her allowing on these banks that's I've never quite got off the ground so I'm I'm going to talk about somewhat more abstract and low-level primitives I'm not going to

quite get into cash or you know tokens or transfers on or anything this this lecture so but I'm going to talk about the really basic primitives that you need that we already sort of mentioned hash functions and signatures signatures obviously are we talked about a little bit what you need to be able to sign messages in order to send these tokens around but first I'll talk about hash functions which are like basically the most fundamental basic thing we use in these systems and I think if you've used computers or if you've programmed a little you

probably have some familiarity with hash functions they're simple right but they're actually extremely powerful the hash function is basically you have some data a bunch of bytes a bunch of ones and zeros you run it through a hash function and you get an output that's also a bunch of ones and zeros generally the input data can be of any size you can hash something that's a mech you know put in a megabyte put in a gigabyte or put in a single byte and generally the output is of a fixed size so in the case of

Bitcoin we use sha-256 the output size is 32 bytes long or 256 bits long and you know this is used for lots of things in computers I guess the reason they call it a hash is because it's like when you take the potatoes and chop them up into little squares and grill them on the you know if for breakfast it's sort of that idea that we're taking this data and you know the data going in gets chopped up and smushed around and then comes out into an output the properties so this is not a sufficient

different definition but I will say that you can sort of do everything with hash functions there's some fun things that you can't do but you could make a cryptocurrency only using a single hash function and I think people have like sort of for experimental reasons if there's you you limit the fun stuff you can do but you can do signatures you can do encryption you can do all sorts of things like that okay so this is not a sufficient definition right that okay there's any size input fixer size output and the output is random looking

that's sort of wishy-washy but what is random looking mean it's not actually random right if you put in the same input everyone will get the same output so if you say okay well what's the hash of one you'll get some output and if you someone else says okay what's the hash of one you'll get the same thing however the output does look you know while it is deterministic it's sort of high entropy in that the output should have about as many 1 bits as 0 bits if you take the hash of 1 it's just gonna

look like a big random number and the hash of 2 will look like a completely unrelated random number the outputs look like noise so if you've ever seen hash functions you can run it on your computer you say echo you know hello pipe shock 256 sum and you'll just get some kind of crazy random thing there doesn't seem to be any order to the outputs a little bit more well-defined we usually talk about the avalanche effect in the changing a single bit in the input should change about half the bits of the output right so

even though you have extremely similar inputs they should be completely dissimilar outputs well completely dissimilar as in about half the output changed if every bit changes then it just is the inverse of what you had and so it's easily correlated but the avalanche effect is is sort of how hash functions are constructed we're generally there iterative rounds and so you say okay I'm gonna sort of swap these things and multiply these things and shift these bits around and such that if any change in the beginning will sort of propagate an avalanche to so that all

the output bits have been affected by it okay and a little bit more well-defined generally the hash functions are defined by what they should not do so that three main things they should have preimage resistance second preimage resistance which I'll sort of skip over and collision resistance and we can define what these things are okay so a pre-image is the thing that came before the output so it's sort of a mathy term but the is okay if you know why you can't find any X such that the hash of X is equal to Y so

if I give you a hash output and that's all I give you you should not be able to find an input that leads to that output so if I just say hey here's a hash output it's you know 3 5 o 2 1 FF whatever you know some long string you won't be able to figure out what I used to put in to get that of course you can't find it eventually right therefore for any given Y there's probably some X in fact there's probably a lot of X's that will lead to that Y since

Y is a fixed length and there's you know 2 to the 256 possible wise but there's an infinite number of X's right because X is not bounded in length you can have a megabyte or a gigabyte or a terabyte size X so since there's sort of infinite numbers of X's and a fixed though a very large number of Y's you know as long as it is a random mapping there will be lots of different X's that can lead to this Y and so you should be able to find it it's just impractical right it's like

yeah maybe you'll be able to find it but it's gonna take you 2 to the 256 tries to find any specific Y value and that's about 10 to the 78 which is a number that's big enough that you can sorta round it up to infinity well I mean not quite but big enough that you're not going to be able to compute that right the Sun will burn out and the universal diet stuff like that so that's pretty much resistance you can't go backwards given the hash you can't find what what led to that ok any

questions about preimage resistance seems reasonable it's a little interesting in that given y is a little tricky and then it's like okay well someone came up with you know someone might know X in order for them to have computed Y or maybe it's just completely random and no one actually knows what the X is right so so there's a there's a sort of loss of information in the idea of a pre-image stack ok second preimage resistance this one's a little trickier and and can get messy so I'll define it but we won't go into it

too much the idea is given x and y such that the hash of X is equal to Y you can't find X Prime where X prime is not equal to X and the hash of X prime is equal to Y so we're sort of giving you a pre-image we're saying hey here's this number X and here's this result Y I bet you can't find another X that leads to it this one is actually poorly defined in the literature and so it's like it's literally like well who made X and who gets to choose and is

it any X Prime and things like that so and it's not actually that useful so we can just sort of gloss over that one just just sort of mentioning it and then the the other one that's very proper that's very important is collision resistance where the idea is that nobody can find any X Z pair such that X is not equal to Z but the hash of X is equal to the hash of Z and this one's sort of a lot cleaner in that there's no lack of information there's no like secrets or anything it's

just like look no one can find this and so it's really easy to disprove you can just say hey look here's an X and here's a Z try hashing them oh shoot the hashes are equal and it doesn't really matter how you got these or who's doing it so that's a really nice easy clear property and again you can find this eventually right so if your output size is 256 bits long you'll be able to find two inputs that map to the same output in fact you do not need to try 256 times I not

gonna go into the details but you actually only have to try a hundred and twenty-eight times to I sorry two to the 128 so you need to take the square root of the number of attempts in order to find this collision because sort of the intuitive reason is well you just start trying things and keeping track of all their hashes and there's what's called the birthday attack which as you keep trying them there's more possibilities so you know the next thing you try you can collide with any of these things you've tried before and so

you actually only have to do the square root and it's called the birthday attack because um there's sort of the birthday paradox which is not really a paradox put the ideas so in this room there's people that have the same birthday it's almost certain which seems kind of weird because the intuitive thing is like well there's 365 days a year maybe once you get like a hundred 50 6070 people in a room you're gonna have two people with the same birthday but actually it's like 22 or something anyway that that it becomes likely that people

will have the same birthday so it's kind of counterintuitive and and it applies in this case as well so to find a collision you need the square root of the output space but you know a hash function should not have collisions if you can find a collision if any collision exists for this hash function you can consider the hash function broken it's a little bit different than premature assistance because it's hard to sort of definitively prove that you've broken four images that's something of an interactive process where you say hey here's a why and then

someone comes back with an X oh okay you proved to me that you can find free images but that's hard to tell - the rest of the world because it was sort of interactive whereas collisions are very clear and non interactive you can just say hey here's an X and here's a Z anyone can verify these you don't really matter how you got it okay so sort of some practical how do these functions work practically speaking the collision resistance is a harder property so there are many functions where the collision resistance has been broken where

the preimage resistance has not been broken so examples of this are sha-1 and md5 md5 is a fairly old one written by ron rivest over it well I guess it wasn't at the Stata Center because it was in the 80s but this was you know message digests 5 I guess there were several before that and that is quite broken you shouldn't use it it's collision resistance is you know trivially broken you can find collisions and like under a second on a modern computer sha-1 was more you know happened later in the late 90s I think

and the NSA made it and there there have been collisions fact I think there's really only one collision that's been found basically by a team at Google and some Italian University last year and they they spent like a lot of computer time to find this collision but they did find it and then once you find one it's sort of like oh yeah we really shouldn't use this anymore but in both of these cases sha-1 and md5 there's no feasible preimage attack so given a hash output for either these you can't find what the input was

or you can't find a different input so generally it's it's a lot easier to make a function strong against preimages collisions is sort of harder to do we deal with also practically speaking how do these hash functions work it's a little bit of black magic there's no proofs that a hash function can even exist right so if you could prove if you can prove that there is a one-way function you would you you get the Fields Medal right it's like a million-dollar prize so if you can prove that there is such a thing as a

hash function like you will be a super famous mathematician we have no idea that this is even mathematically possible or you know maybe the universe doesn't work this way it seems to though it seems like there are these things that work like hash functions that work like one-way functions but we have no proof of that so so even the most fundamental part of that that everything hinges on we don't even know if it exists so and then this is sort of closely related if you're in the computer science e stuff to like P and NP

that like anyway so we don't we don't know that these actually work and and also in practice there's the hash functions are sort of not nice math cool things like elliptic curves and like RSA prime numbers and stuff like that they're really like if you look at the code it's sort of like well I'm gonna take these bytes and I'm going to swap them and then I'm gonna add these two numbers and then I'm gonna rotate the bits over here and then I'm gonna XOR these things and then I'm gonna do that 50 times and

why 50 well it seems like 50 is a good number it's not too slow like so no really like it's sort of black magic shot 256 uses 64 rounds nice even number different functions like Blake to be uses 20 rounds but then also a version that uses 12 rounds which is faster and people think well it's still seems quite secure but if you want to be really secured use the 20 round variant if you want to be probably secure enough use the 12 round very so so this is not like there's no proof there's there's

heuristics and things like that and best practices but it is a little like this kind of cryptography is a little bit black magic and it's not based on any cool like mathematical number theory stuff either the way that elliptic curve cryptography or RSA kind of stuff is so like if you break RSA you can say hey I can now factor these you know composite numbers very quickly that's in of itself a cool mathematical discovery the breaking of sha-1 there's not really any cool math insight it was just like yeah we found this fairly specific weird

path that we were able to break sha-1 after a couple years of computer so it's cool and some people are super into it but it's something of a niche to like actually build hash functions I would recommend not building your own national yes and my question is is breaking a hash function literally just guess and check where is there so so know if you so like break if you say hey I found a collision by doing one two to the 128 attempts one nobody's done two to the 128 attempts that's still seen as like beyond

technology today but if that's how you break the function that's not really considered a break because that's sort of the definition it's like yeah well we know this is 256 bits long so to find a pre-image if you do 252 to the 256 attempts you'll find it so that's not considered a break a break is considered hey I I found a pre-image in two to the 240 attempts or I have a proof that you can you will be able to find a pre-image in two to the 240 attempts and here's how to do it and

that's considered a break even if you know it's still impractical to the 240 is still impossible in today's technology but that would if you had a paper and people looked at it like oh yeah that would work you wouldn't be able to do it but that's still considered broken and so those are different class okay so so something like md5 md5 oops output size was 16 bytes or 128 bits so collisions even if even if it were you know strong it would still be sort of too short today that collisions would be able to be

found in two to the two two to the 64 iterations which is doable on today's computers like if you rent a bunch of stuff on a TBS you can do to the 64 in a couple days so so but like that's sort of the different definitions of breaking the function sort of fun Ethan Hellman who's at BU and we work with he and we all sort of broke the iota wrote their own hash function which is like some crypto currency and we we found collisions in it and it's kind of fun but yeah it was

like weird it wasn't like number theory was just like oh I wrote this Python script and then we have this go script and we you know tried this thing we got a collision so usages what do you use these sasser's for there's lots of cool things you can use it for it you can use them sort of as names or references where instead of naming a file you can just take the hash of a file and that is a good representing you know a good compact representation of you know so you can point to what

you're talking about so the hash of a file is a unique representation and if you change any bit in that file the hash will change and so you know that okay here's this you know way to point to the file you can also use it as sort of a reference or pointer in different algorithms so you can say like you know anything you're using pointers for linked lists or like maps and stuff like that you can say well I'm going to use a hash and as a pointer and then be able to like sort through

it that way so these are so anytime you sort of think of pointers and thin like graph theory and stuff like that in computer science think well could I use a hash function here instead of just a like regular memory pointer and in many cases you can in some cases you can't so you can't have cycles right since the idea is you can't find free images you won't be able to find a whereas you could make you know a cycle of pointers in the computer where like a points to be be points to see see

points back to a you shouldn't be able to produce that with hash functions because having that cycle means okay well somehow you found this preimage right but but in many cases you can do this and then another way to look at is the hash is a commitment you can sort of say well I'm not going to tell you what X is but I'll tell you what Y is and I can reveal X later and then since everyone remembers why they can be sure that yeah he's revealing the right thing right he's there are no collisions

in this function so we can be sir if were presented with X that this was the X that was committed to yesterday so I'll give a little example of that I've sort of commit and revealed so you can commit to some kind of secret or something you want to reveal later and reveal the privilege so here's my commitment I'm this is an actual hash sha-256 I just made it on my computer and there is a string you know there's an ASCII string that maps into this and it is a prediction about the weather but that's

all I'll say and like given that information and given this hash you probably can't find my prediction you can try to try all these different ascii strings about the weather today but i'll reveal it okay so i think it won't snow Wednesday but it actually anyway and then I put this number it and so if you you know put this in your computer and Linux I think in Mac it's a slightly different come in it's like shocked to something or something but in Linux this will work and you can say I I think it won't

snow Wednesday and then I put these some random numbers here because if I had committed to just the phrase I think it won't snow Wednesday you might have been able to guess that right you can say well he said it was about the weather I'm gonna you know take all sorts of built millions of different strings related to days and weather and like common English words and I'm gonna try hashing them and see if I find a collision and you might be able to write but I added this four bytes of randomness at the end

to make that difficult it doesn't really contribute to my commitment I sort of you know you know the list doesn't really mean anything but it makes it harder to guess what my input was you know cuz I've already revealed that it is it's not a fully random input right so you might be able to guess things like that so I could you know I could say hey I'm gonna make a prediction about the weather commit to it and then reveal my prediction tomorrow and we'll see if I was right this can be useful in the

case where like you know not the weather but in other things if if knowing my prediction could influence the actual events this would be a nice way to you know commit to what my prediction is without everyone knowing what the prediction is and then revealing it the next day yes hashing this again well in so in Bitcoin they hash everything twice it's it's like generally you don't need to there's no explanation for why they do that in Bitcoin you could but there are things you can construct where you can say like append some extra data

and then hash this again so you could say here's my prediction for next week and like this is the hash and then lash it again so you can make sort of chains of commitments and then reveal iterations of it actually I only had some slides where you can sort of you know hash something again and again and start revealing it incrementally that might be useful I actually have stuff like that in software I've written where you want to reveal secrets but you let's say I want to reveal secrets but I don't want everyone to have

to store all of them so I can sort of make a chain of hashes commit to the last one and then as I reveal six successive free images you don't have to store all of them you can just store the like the latest free image and you can reconstruct all the hashes from that so to evaluate like if you want to try this it it's imperceptible to perform one shot to p6 hash is I don't know billionth of a second or something you can do you can generally do like a hundred two megabytes to a

gigabyte of hash output on a regular CPU core she's asking does it make it harder to find a pre-image if you hash twice oh the answer is sort of note it it might so like I don't know chained md5 can you still find collisions I'm not sure but but generally you're generally thinking is if the hash function is broken and you can either find collisions or preimages yeah maybe it gets a little harder by chaining like by iterating it but you should just stop using it and use something that's secure but yeah it seems that

like finding preimages would be harder since it's essentially adding more rounds right by hashing it twice and then there are some attacks so it's like fairly out there but um it's called length extension attacks due to how function hash functions are constructed where if you do say okay I'm gonna take the hash and then take the hash of that you do prevent certain types of attacks that are like fairly niche but like a length extension attack you know you know merkle-damgard construction will be prevented by this so generally no generally you don't need to do

this but there are sort of different constructions where you're gonna hash a bunch of times I don't have the slides here but like a Merkel tree is a binary tree of hashes where you're sort of taking the hashes of these things and then hashing it again and again and that's a really useful data structure and a blockchain is essentially a chain of hashes and that's what we'll talk about next week but yeah okay so I'm gonna go a little faster so that's that's an interesting use case where you can commit and reveal and yeah adding

randomness so you can't guess the preimage this is called a hash based methods message authentication code where there's part of it a secret and part of it not and this is kind of a proto signature this is getting towards a signature right where I've committed to something and then I reveal it and everyone knows yeah that's that must be what he committed to the the day before it not quite a signature but it's getting to that direction and so next I'm going to talk about signatures what is the signature it's useful and it's a message

signed by someone and so I'll define what a signature is through the functions that it uses there's three functions will allow you to create a signature scheme generate keys sign and verify and so these and these different things generate keys you make a secret key and a public key and so the idea is like there's some public key which is your identity and there's some secret key which you only control and you use that to sort of prove your identity and prove these that these messages are signed by you so yeah you generate a key

pair the holder of the secret key can sign a message and then anyone possessing the public key can verify a message signature pair so I'll go into detail on these three functions and this is this applies generally so I'm going to talk about a hash based signature in detail but there are many different signature schemes DSA el-gamal RSA signatures elliptic curve signatures there's tons of different cool math systems that that allow that these kinds of functions and I'll talk about in some ways this is one of the simplest ones okay so yeah there's these three

functions the first one is generate keys and it returns a private key public key pair and it generally doesn't take any arguments but it takes in like randomness you need to flip coins you need to find random 1 and 0 bits so you know and there has to be long enough that no one else can guess what your private key is ok so you have a private key public key public key is you know public key tell everyone private key is or secret key actually I think I in the code I always say secret key

it's usually better to say secret key because at least it starts with a letter that's not P anyway ok and then the signing function where you take your secret key in your message and it signs a message and returns a signature okay and this thing you know all these things are just strings of ones and zeros right it's just a bunch of bytes public key a private key a signature a message these are all just bytes right and then the verify function which is sort of the most complex a verify function takes a public key

that you've seen a message and a signature and it returns a boolean whether this was valid or not so it returns a single bit if it's zero it says yeah these things don't match up either maybe maybe the message just changed or maybe the signature has changed or maybe it's from a different public year something but if all three of these are correct you know and the signing function was the private key the secret key associated with this public key was signed this message and produced this signature than it will return true and so you

can get into the sort of math properties of like okay how what does it mean to forge a signature and like can they be Forja Balcon pute a tional e like eventually a lot of these things since it's bits you could eventually guess the forgery but maybe that takes two to the 256 attempts or something okay so any questions about the basic structure of like what constituents a signature scheme or mostly makes sense and you can see how this is useful you can say okay you know you can publish a public key and say hey

I'm Tadgh this is my public key and in fact on my business card I have a RSA public key and so if people get my business card and then I sign a message and email to them they could be sure that well this is probably the same guy nobody ever cares but but you know and you know it's useful for the stuff we were talking about before with like chomi and cash where alice needs to authenticate to the bank and one way to do it is to sign a message and say hey I'm Alice give

me a coin and then Alice can sign a message to Bob and so on so this is really useful as a basic building block for all these kinds of messages okay so I'll talk in the last 14 minutes about signatures from hashes this is doable using just hash functions you can construct a signature scheme and in fact that's the first problem-set and you implement a signature system using only hashes and the hash function is already defined for you it's in the standard library it's just shot 256 the same thing Bitcoin uses and this is called

Liam port signatures Leslie Lamport wrote about this late 70s I forget exactly when the paper came out but this was one of the earliest cryptographic signature schemes and it's kind of cool and it works another fun thing is it's quantum resistant so if you know about like quantum computers quantum computers kind of ruin all the fun in terms of cryptography all the cool things we can do with cryptography not all but most of them get ruined by quantum computers but hash functions are quite resistant to quantum computers because they're not based on any fun that

they're based on this black magic of just you know X whoring and shifting numbers around that's a huge oversimplification but but yes those hash functions are generally seen to be quantum resistant so if you have a signature scheme that only uses hash functions well it still works even if someone invents the quantum computer and can break all these other things like RSA in elliptic curves so there's actually sort of renewed interest in these kinds of systems recently ok so how do you make a signature seem with just hash functions so how do you generate a

key in this case so a public key and a private key you want to generate right so first we generate our private key now these squares are 32 bytes each just and you generate 256 of them on this row 256 of them on that row so you're generating 256 times 2 or 512 32 byte blocks right and these blocks are each 256 bits or 32 bytes so in total that's what 8k 8 kilobytes I think pretty big but anyway you're saying okay here's my private key it's all completely random I just you know take debt

slash dev slash you random or whatever just flip coins 8,000 times or however many this is total and generate all these different blocks and store them on my hard drive and keep it secret then I want to generate the public key so for each of these 32 byte blocks I take the hash of it which will also be 32 bytes and I you know take the hat so there's now 512 hashes 256 on this row to 256 on this row the Green will be my public key okay and the grey one is my secret key

so they all look the same they all look like just a bunch of random ones and zeros the gray ones actually are a bunch of random ones and zeros the green ones are actually hashes though of all the great ones and I publish the green ones you know I just to serialize it I just put in a row I just say okay here's this first 32 minutes second third fourth and then go to this row or whatever scheme you want okay so how is this useful now everyone knows a bunch of hashes and I know

a bunch of the preimages right and I don't so now it's sort of this commit reveal thing where if I reveal to you this you can verify that oh yeah that mapped to this one right later on okay any questions so far about this process seems sort of useless but fairly straightforward okay then I want to sign so first to sign a message I'm gonna take the hash of the message to sign and this is often done it's done in Bitcoin it's on and most signature schemes where I want a fixed-length number to sign so

I don't it's it's annoying to have to say well what if I want to sign a megabyte long file or what if I want to sign up ten byte long string you want to standardize it so look I'm whatever I'm signing it's always 256 bits long right so if I want to sign the message hi first I take the hash of the message hi which in shot 56 is this is the hash of hi and so I look at this as 256 bits and I say okay I'm gonna pick the private key blocks to reveal

based on the bits here so the first bit here is a 1 right because it's an 8 and so I'll reveal and and I indicated before that there's sort of this zero row and this one row and now what that means is okay well the first bit of my message to sign is a1 so I'm going to reveal this gray square and the next bit the next four bits actually since it's me are going to be zero so I'll reveal this and then I'll reveal this this and this and I just made it up but

like yeah so for example if I'm signing and it starts with 0 1 1 0 1 1 1 0 I reveal this free image this pretty much the screen is this pretty much these 3 this one and so I reveal pre images based on the bit representation of the message I'm trying to sign and then give everyone these so so my signature will just be this sequence I can I can go in a row order here yeah it's probably a lot easier so I go in sequence I say okay here's the first 32 bits of

bytes of my signature here's the next here's the next here's the next and so my signature ends up being 256 blocks long each of which are 256 bits so it's like 8k wait the keys are 16 K and this is a fairly big but totally doable on a computer today right hey kilobytes is no big deal okay now to verify take the signature hash each block of the signature and see that it maps into that part of the public key right so the people who are verifying the signature they have your public key they have

all the green squares and now they have been given a signature which is these gray squares and they say okay well let me hash this one oh it maps to that so it maps to a zero oh this maps to a 1 this maps to a 1 its maps to a 0 and they can go through and say yeah this is a message this is a signature on that message right in the case of limp or signatures you can actually determine what the message is just from the signature and the public key right since you're

given these you know if you're given this and you're not told whether it's a 1 or a 0 well just compare you know hash it and compared to these two green ones you'll be able to see and that's a useful signature okay because no one can forge that because no one knows these pretty images except for the person who holds the secret key so given your public heat I can't forge a signature from you once the signature is issued I also can't forge a signature right the only bit sequence I know is the one that

you revealed and so I know part of your private key I know half of it right but the aisle I only that half only lets me sign the message you just signed so I can't really do anything extra with this so this is a usable signature scheme any-any I think I just showed it but any downsides that you can think of with this is that what your signatures are sort of public so yes you're saying that um you can sign a message once and give it to a bunch of people and that's sort of a

feature not a bug I guess there are different signature schemes where you want like I only want this signature to be valid to this person there's different ways to do that with like diffie-hellman key exchange and stuff but generally the the signature scheme we've we've talked about here with these three functions the public he's really public and anyone can verify and that's that's something we want if you don't want that there's other ways to do it but yeah the big one is wait you can only sign once as suit once you generate a key pair

your private key your your public key and you tell everyone these green squares if you try to sign again if you'll reveal more pieces of your private key right so if I sign two different messages sometimes it it's the same bit sometimes it's different bits and now I start revealing more pieces of my private key and now people can start to forge signatures because I can say okay well the first bit I can I can sign anything right on the first bit I'm still constrained here and here and here but in several locations I can

I can you know I can sign whichever bit I want and so the the basic thing is if there's one signature I can't Forge anything if you give me two signatures since it's generally you know random on average half of the bits of the signature will be constrained so in this case if it's 256 bits long and you signed twice I probably still can't Forge anything because 128 bits I have the freedom to pick either and on the other 128 bits I'm stuck with either a won't you know either the one or the zero and

I don't get to choose so mote that means most messages I want to sign I won't be able to write because it's gonna like if I try 102 to the 128 attempts I'll be able to find a forged signature but that's a lot and so maybe you can sign twice but again it's probabilistic you might get unlucky and reveal quite a bit more than one you know 128 bits where you get both but but on average and then once you have three signatures okay now I've probably revealed three-fourths of the locations you're gonna have both

the one and zero row and you can start and that this starts to be practical because this in this case it would be you needed two to the 64 attempts to for this to forge a signature and that's that's doable on today's computers you