[Music] in this section we'll study TCP congestion control and as we'll see TCP uses many of the insights and also the basic mechanisms that we studied in the previous section in order to implement congestion control we'll start off by looking at what we might call classic TCP where the TCP senders gradually ramp up their rate until loss occurs and then they back off we'll also look at a delay based approach to congestion control where RT T's are measured and also look at a network assisted approach towards congestion control called explicit congestion notification where the IP
routers play an active role in congestion control and then finally we'll wrap up with a look at TCP fairness so let's get started well let's begin by recalling our discussion in the previous section of congestion collapse this was this very non-intuitive behavior that as the TCP senders ramp up their aggregate rate the aggregate end and throughput actually goes down now back in the 1980s before TCP had a congestion control mechanism congestion collapse was actually beginning to be observed within the network and so in 1988 a networking researcher van jacobson published a seminal paper on on
congestion avoidance and control that paper laid out the foundations of the approach towards congestion control that we're going to study now it's just absolutely amazing that we've had the same mechanism in place now for 30 years even as the amount of traffic carried by the Internet has gone up by more than factor of a billion amazing TCP takes an end end approach towards congestion control with congestion being detected by loss events rather than say being signaled by a congested network router the idea behind TCPS congestion control algorithm is really simple when a network path is
not congested that is the TCP connection experiences no loss the TCP sender can increase its sending rate so I mean it's got data to send it higher rate but when segment loss occurs it must decrease its ending rate this increased decrease approach gives rise to a kind of sawtooth behavior as shown here which plots the TCP sending rate here on the y-axis as a function of time the sending rate increases until a loss event that decreases then increases again over time that decreases at the next loss event and so on the particular increased decreased algorithm
that TCP uses is known as additive increase multiplicative decrease or a I MD when increasing the sender increases its rate by one segment per RTT and when there's loss it cuts its ending rate in half that's pretty simple well how can we think about this sawtooth behavior here's here's one way that I really like to think about it a TCP sender is basically trying to find the fastest rate at which it can transmit without incurring loss so it increases increasing increases until loss occurs and then decreases by a factor of two and then it begins
increasing again if you watch kids you know this is exactly how kids operate when they want something they figure out some minimal amount they can get and then they say how about some more how about some more how about some more until finally a parent says hey that's it if you don't like what you have then you can't have any and then the kids back off but then after some time they start asking for more again trying to see what the new limit is there's another great example about how many introduc all's sort of do
really have their human analogies there's a bit of detail here having to do with cutting the TCP window in half that we should deal with now this window having happens only when loss is detected by a triple duplicate a Canty CP that is when three cumulative acknowledgments in a row arrive for the same segment of data as we discussed earlier in section 3.5 that means there's a missing segment in three subsequent segments after that missing segments have been received and three duplicate cumulative acknowledgments generated by the receiver if there's a time-out tcp is going to
reset its window size all the way down to one segment now you might ask why additive increase and not say multiplicative increase well the concern with a faster increase really has to do with the issue of stability in fact it's actually been formally shown and much after the 1988 introduction of TCP congestion control that TCPS congestion control algorithm serves as a distributed asynchronous optimization algorithm with different senders individually setting their individual sending rates based on the lost feedback signals they individually receive and this distributed asynchronous algorithm actually optimizes flow rates and has some nice stability
properties as well so there's much to be recommended about aimd and it's really an amazing tribute to the tremendously deep insights behind the design of the aimd congestion control algorithm the implementation of TCPS congestion control algorithm is really pretty simple recall our earlier discussion of the sender's window over the sequence number space back when we were talking about go back in and selectively repeat for TCP the sender's window looks something like this there are segments that have been acknowledged as shown in green here there are segments that have been sent but not yet acknowledged shown
in yellow here then there are sequence numbers for segments that could be sent that is the congestion control algorithm would allow such segments to be sent if they were data and then there are sequence numbers for segments that can't be sent sending them if they were data to send would mean that TCP is sending too fast these blue and yellow sequence numbers are in what's called TCP congestion window so TCP sets its transmission rate by regulating the size of the congestion window and the way in which the congestion window is regulated is precisely the aimd
algorithm we just discussed now there's a direct relationship between the congestion window size and the TCP senders sending rate that's shown here if the round-trip time is RT T and there's cwnd bytes of data in flight and then let's say that all these in-flight segments are acknowledged then TCP is delivering cwnd divided by RT T bytes of data per second shown here what we've discussed so far is the aimd behavior of an ongoing TCP connection we should say a few words about how a TCP connections throughput ramps up TCP is algorithm to ramp up from
an initial sending rate of 1 MSS per RT T is known as slow start this initial phase of ramping up is actually a multiplicative increase doubling TCP sending rate every RTT as shown here so TCP s initial sending rate that starts with 1 MSS per RTT leads to an exponentially fast increase in the sending rate so don't let the word slow fool you an exponential increase is really that slow well we've learned about aimd and we've learned about slow start let's now see how we can join those two pieces together how does TCP actually switch
from the initial slow start phase which again really isn't that slow to the more careful additive increase phase well it switches from slow start to a IMD when the value of cwnd reaches half of the value it had when it last cut its sending rate down to one and so we see at the read loss event here at time T equals eight how a variable known as ssthresh which tracks the transition point between slow start and congestion avoidance is reset to be half of the congestion window size when that loss event was detected whoa well
don't get alarmed here we're not gonna walk through this slide actually we're not gonna walk through all the events and actions shown here in this finite state machine like specification but the finite state machine does go into all of the nitty-gritty details of needing to count duplicate ACKs setting the CW end and ssthresh values handling timeouts and more so just remember that this FSM descriptions here for your reference if you've got questions TCB's aimd algorithm has been in place now for more than 30 years there have been a number of modifications that have been proposed
some implemented some even deployed over time but really the one modification that seen widespread deployment is known as TCP cubic and as we'll see TCP cubic operates just like aimd except for how it increases its congestion window size the way to understand TCP cubic is to recall our earlier notion of probing for usable bandwidth and here's the insight suppose that W max is the sending rate where a sender last experienced congestion loss and the sending rates been have the question then is how should that TCP sender again start probing that is increasing the sending rate
the original TCP congestion control algorithm of course says to increase it linearly that's what a IMD is all about cubic says hey the congestion status of the link probably hasn't changed that much since we just experienced loss so let's ramp up a little bit more quickly to something close to the rate where a loss occurred but then be very careful about increasing the rate once we get close to that loss value this little animation here shows TCP Kubik ramping up more quickly and then more cautiously approaching w max and if you compare the red curve
which is aimd with a blue dotted curve which is TCP cubic you can see that the throughput with TCP cubic is higher on average than with aimd so specifically TCP cubic operates as follows it first chooses a future point in time at which it wants to have ramped up its window size to the window size when it last experienced a loss event that's W max TCP cubic then increases the window size as a function of the cube of the difference between the current time and the time at which it'll hit that sending rate where losses
were last incurred this means that TCP cubic increases its sending rate more quickly than aimd when farther away from W max but increases it more slowly when close to W max as shown in the diagram here because the performance of cubic is generally felt to be better than that of aimd it's now the default congestion avoidance algorithm in the Linux implementation of TCP also a recent measurement study shows that among the 5,000 most popular web servers nearly 50 percent of them were running a version of TCP Kubik well we've seen how classic TCP congestion control
algorithms including cubic take a loss based end end approach towards congestion control there have been two other flavors of TCP that have actually been fairly widely deployed that take different approaches the first takes a delay based approach and the second takes a network assisted approach where the TCP sender receives explicit feedback from both the TCP sender and congested IP router let's take a look at how those work let's begin by taking a step back and looking again at the big picture we've seen how aimd and cubic increased TCP sending rate assuming there's data to send
until packet loss occurs let's look at where this loss occurs at a congested router somewhere on the source to destination path will refer to this congested link as the bottleneck link a flows packets are generally passing through other routers and links fine but it's here at the bottleneck link where they get lost and remember they're actually multiple flows experiencing loss at this bottleneck link and so we want to abstract away the other routers they're not the problem and focus on this bottleneck link now aimd would have the sender's fill up the buffers at this bottleneck
link until loss occurs but let's think about what actually happens when this bottleneck link is congested that is it's pretty much always busy sending in this case if the links already almost always busy having senders increase their sending rate actually does no good the router can only deliver our bits per second on the congested output link and if the TCP sender send faster it can still deliver no more than our bits per second so what we like to do is to keep the router busy but not for the TCP senders to send so fast as
to overflow router buffers in other words we'd like to keep the end-to-end pipe just full but no fuller in particular not so full as to cause congestion loss so that's the insight behind a delay based approach towards congestion control and here's how it works it's based on using our TT measurements and remember that's a topic that we covered earlier in section 3.5 so suppose the sender has a currently measured RT T value of RT te measured the sender also knows how many bytes of data it's sent during this last RT T and so the sender
also knows its measured throughput which is simply the number of bytes sent in that last interval divided by the measured interval RT t length the sender also knows RT t min minimum of all the RTT delays its measured and we'll take this value as the uncongested RTT value now in the uncongested state when there are cwnd bytes in flight then the throughput the sender should see is simply cwnd divided by RT t min again that's the throughput with no congestion so the idea behind a delay based approach to congestion control is conceptually simple if the
measured throughput that is cwnd divided by RT t measured is close to the uncongested throughput well then the path isn't congested and the sender can send more ie increase cwnd on the other hand if the measured throughput is below the ideal case that means there's a larger measured RT t and delays at the congested router the paths congested and so the TCP sender should back off that is decrease cwnd well there's been a couple of delay based protocols proposed and deployed that differ a bit and how they increase and decrease the window size but they
operate using the same basic principle that we've described here so we see that delay based congestion control seeks to control congestion without loading up the congested router so much so as to induce loss the idea is to keep the pipe full but just full enough to keep a congested router always busy and no fuller a fairly recent version of TCP called bbr which stands for bottleneck bandwidth an RTT is used by Google for TCP traffic on its internal network that interconnects its data centers and replaced cubic it's also being deployed on Google and YouTube web
servers explicit congestion notification ECN is a form of network assisted congestion control that involves TCP senders and receivers as well as the network routers it's an approach that avoids congestion without having to drive a router into overload ie a loss scenario as shown in the figure below in ecn a congested network router will set a bit in an IP datagram header which we'll take a look at in chapter 4 to indicate that the routers congested the TCP receiver then informs the TCP sender of this congestion indication by setting an ECE bit in the header of
a receiver to sender TCP acknowledgement segment the TCP sender in turn reacts to this congestion by having its congestion window just as it would have reacted to a loss segment using fast retransmit it's worth noting here that the definition of congestion isn't actually defined in an RFC it's configurable by the network operator there's a standardized mechanism the ecn and the ECE bits in the IP and TCP headers but not a standardized policy for how to set them well let's wrap up our study of congestion control by considering the question of fairness and what do we
mean by fairness in the first place well a reasonable deserter rod is given here if K TCP sessions share the same bottleneck link that has a transmission rate R then each of the TCP flows should get an average rate of R over K bits per second of throughput assuming that they can use that much do you think the TCPS we've studied so far will result in fair sharing of the bottleneck link well let's see there's a really nice graphical way to look at this question of fairness for the to flow case on this two-dimensional plot
we're going to look at the throughput achieved by the two TCP flows sharing a bottleneck link if TCP shares the link bandwidth equally between the two flows on this blue line here the sum of the two throughputs would equal R then the realized throughput should fall somewhere along this 45-degree line let's suppose the connection starts at this red dotted point where connection one is getting more throughput than connection - the question is will TCPS aimd algorithm drive the achieved throughputs to this fair line here and this point in particular where the bandwidth R is fairly
shared well let's see because the aggregate throughput of the two connections is less than R no loss is going to occur at this red dot point and both connections will increase their window by one MSS per RTT as a result of TCPS aimd algorithm so the throughput of the two connections proceeds along this 45-degree line since there's an equal increase of one maximum segment size per RTT for both connections eventually the throughput of the two connections will be greater than R and so packet loss will occur let's assume both flows 1 and to detect loss
and thus decrease their windows by a factor of 2 ending up here the two flows then again increase their throughputs along a 45-degree line again eventually loss will again occur the two connections again decrease their window sizes by a factor of two and so on and if we play this out the throughput of the two connections eventually fluctuates along the equal fairshare line you should convince yourself actually that the two connections will converge to this behavior regardless of where they start in this two-dimensional space that's pretty amazing let's wrap up our discussion of fairness by
taking the bigger picture view remember that TCP bundles together reliability flow control and congestion control if someone wanted to build an application on top of UDP that had reliability and flow control that could certainly do that and that application would compete for bandwidth unfairly with TCP connections when the TCP connections experienced loss they've cut their sending rates as we've seen the UDP based application on the other hand with reliability but no congestion control would not have to cut it sending rate and so would continue to send at a higher rate getting better performance and perhaps
starving out the TCP connections interestingly there is no internet police to well police the fare sharing of Ben with' at a congested link and yet we don't see any kind of widespread cheating if you will perhaps that's because application designers that need reliability just build their application on top of TCP maybe you also remember that when you studied HTTP you learned that many web servers open up multiple parallel connections between a client and the web server allowing the web application to get more throughput than if it just opened one connection is that fair well you
decide and in any case even if it is unfair there's still no internet police to call if you see it happening well in the section we've covered TCP congestion control we started by looking at classic TCP congestion control which used losses to both indicate and then avoid congestion we also looked at delay based approaches that used our TT measurements as well as a network assisted approach to congestion control called explicit congestion notification then we wrapped up by looking at the issue of fairness among TCP senders now I think it's really hard to overestimate the importance
of TCP to the success of the internet you've already seen that there are tons of internet protocols and yet when we talk about the internet protocol stack we sometimes just call it the tcp/ip protocol stack I think that reflects the very central role that TCP has played in the success of the Internet in particular the importance of TCPS congestion control algorithms [Music]