Distributed Systems 2.1: The two generals problem

56.81k views2015 WordsCopy TextShare

Martin Kleppmann

Accompanying lecture notes: https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/dist-sys-notes.pdf Ful...

Video Transcript:

hello everybody welcome back in the last lecture we introduced distributed systems by looking at some concrete examples such as the web and rpc which are examples of client server systems in this lecture we're going to move things to a little bit more abstract and a little bit more general model and we're going to talk about system models for distributed systems which are uh descriptions of the assumptions that we make when we're designing an algorithm to run in a distributed systems so a system model is very important because as we discussed things can go wrong in

the distributed systems nodes can crash networks can fail and so on and we have to be precise about what failures we are assuming are possible and what failures we are assuming are not possible so we're going to start this by looking at two classic thought experiments from distributed systems the two generals problem and the byzantine generals problem and we're going to start now with the two generals problem so uh this is it's kind of like a military analogy so i'm sorry about that i'm not really a big fan of that but it's widely known as

the two generals problems so i'm just going to stick with the convention here and so the setting of this thought experiment is that we have two armies each army is controlled by one general and these armies are wanting to attack and capture a city now the city is well defended and so if only one of the armies attacks at one time that army will get defeated and so it's very important that if the two generals are going to attack they attack at the same time because then if both the armies attack at the same time

they know that they are going to win so you can see from this truth table here what happens is okay it's it's all right if neither of the two armies attacks but if only one of the two armies attacks then it it's all going to go terribly wrong so what we really want is that one army attacks if and only if the other army attacks so both are attack together now what makes this difficult is that the two generals can't just talk to each other and agree on their plan of when to attack but they

can only communicate via messengers and so these messengers there are people who run with the messengers through the forest and as they run through the forest they might get captured by by forces of the city and so whenever an ah whenever one of the general sends a messenger to the other general that message may or may not get through and there's no way for the sender of the message to know whether the message got through except by receiving a response and so the problem here is now this so imagine for example general one has decided

to attack on the 10th of november and so he sends a message to general two uh saying we're going to attack on the 10th november are you okay with that and general 2 receives the message and says yes i'm on board we're going to attack together on the 10th of november sends that response back but unfortunately the response message is captured so the initial message the request get through but the response is lost and so now general one does not receive a response now this uh this is one scenario of what could happen here's another

scenario of what could happen it could also happen that general one sends the message attack on the 10th of november to general two but that initial message is lost and never gets through to general two and now general two doesn't receive any message so he's not going to respond either and the end result is also the general one does not receive a response so in both of these two cases the only thing that general one observes is no response but general one does not know whether there's no response because the initial message didn't get through

or whether the response was lost but there's a big difference between the two of these because from general two's point of view in the first case general two has agreed to attack in the second case general two doesn't even know about the attack so they look the same from general two's point of view but they look very different from general sorry they look the same from general one's point of view but they look very different from general two's point of view so let's try to design an algorithm which will nevertheless get the two generals into

agreement and so let's think about it first of all from the point of view of general one general one basically has two choices either general one is going to always attack no matter whether any response is received or general one is going to wait and only attack if it receives a response from general two so let's start with the first case so uh general one always attacks even if no response is received so in this case uh general one wants to make sure that general two is also going to attack because also because otherwise general

one is going to be in a problem problematic situation so we could say that general one is going to send lots and lots and lots of messengers over to general two all saying attack at this time attack attack and if one of those gets through then things are probably okay because general two knows that general one is always going to attack so general two knows that it's safe for uh general two to also go into battle uh even without responding to general one because after all general one has promised that um that that general one

is always going to attack however it could happen that all of the messengers are lost and so in this case general two does not know about the attack so general one ends up going into battle alone and loses so that means this first option of general one always attacking is not really great so let's consider the alternative which is that general one does not promise to always attack but general one will only attack if it receives a positive response from general two and in this case general one is safe because general one knows that it

will only go into battle if general two uh is going into battle but now if you think about it from general two's point of view now general two uh knows that the um that general one will only go into attack if the response from general two to general one gets through because after all general one is waiting for that response from number two and so now the general two is in exactly the same situation as general one was in the first option and that is either general two must commit to always attack in which case

he risks being alone in the battle or general two will wait for a response from general one but now you know general one has to reply and so you end up with these potentially infinite chains of yes yes i'm going to attack okay i'm going to attack if you attack yes okay i will attack but only if you also attack yes yes already said i'm going to attack and so on and so they have to send each other back and forth these messages you get actually an infinite chain before there's any certainty that they're actually

both going to attack together and so what is this is called in distributed systems the problem of having no common knowledge so there's no knowledge in the system that one node knows and the other node knows that the first node knows it and the sec the first node knows that the second node knows that the first node knows it and so on so you can construct these arbitrary chains and the end result is just that no matter how many finite sequences of messages we send back and forth we never actually have absolute certainty that general

one is going to attack if and only if general two is going to attack so you can build up gradually increasing uh like probabilistic certainty maybe depending on your assumptions of whether messengers get captured or not but it's actually impossible to reach complete certainty here now let's take this abstract thought experiment and apply it to a concrete example so we had in the last lecture this example of an online shop making an rpc request to a payment service in order to charge a credit card and really what we want here is that the online shop

dispatches the goods if and only if the customer pays for the goods because you can imagine you know if the online shop dispatches the goods but the payment service does not charge the credit card then the shop is unhappy because the shop has just given away some goods for free if the uh if the online shop does not dispatch the goods but the payment service does charge the credit card then the customer is unhappy because the customer got charged without receiving any goods so really what we want is something that looks extremely similar to the

two generals problem here that the online shop dispatches to goods if and only if the payment service charges the card and as you can imagine the rpc between the online shop and the payment service looks very much like the messengers running through the forest in the two generals problem which is to say that the messages might get lost either one or the other might get lost and so it is actually not possible for the online shop and the payment service to achieve the certainty that one action will happen if and only if the other other

action happens now in practice actually of course online shops do work but the reason they work is because there are a bunch of second level safeguards then which uh ensure a reasonable outcome so for example if it turns out that the paint the card got charged but the online shop doesn't actually have the goods in stock anymore then the online shop will just send an apology email saying oh sorry actually we're out of stock we've refunded your card and so that way it's fine and so this it's possible to get out of this situation because

the the charge is actually a revocable action it's possible to refund the charge and therefore it's back it's possible to get back into into a safe state where neither the goods dispatch nor the payment has effectively happened or another option is you know that the the payment service may or may not have charged a card and so the online shop then when the network is is repaired and the messenger's messages can get through again then the online shop checks with the payment service saying now did you actually charge that card or not because i never

heard back from you whether you charge it or not and so what will probably happen is that the payment service will always go ahead and charge the card um because even if uh it's not certain that the online shop is going to dispatch the goods because uh in this case it's fine because the payment could get refunded if necessary so that is the way in which this online shopping problem is not actually exactly the same as the two generals problem but nevertheless the two generals problem does illustrate this issue of uh uncertainty that we have

in a distributed system when we're not sure if the messages got through or not