Microservices are Technical Debt

295.76k views6465 WordsCopy TextShare
NeetCodeIO
🚀 https://neetcode.io/newsletter - System design newsletter 🧑‍💼 LinkedIn: https://www.linkedin....
Video Transcript:
um I think microservices are technical debt I think everyone to a certain extent makes a distributed monolith I'm one of these people who thinks that go is a good programming language a week ago I made a video talking about a blog post that door Dash posted where they discussed transitioning from a monolith over to a microservices architecture so they had a python monolith and they split it into microservices it's better for a team to own a microservice because it can be deployed independently and one of the original AU of that post who's an engineering leader
over at door Dash his name is Matt he actually saw that video and so I got to sit down and actually discuss microservices and he had actually quite a lot of interesting hot takes and I definitely learned a lot just kind of hearing his experience over like the last couple decades with microservices and I think you'll learn a lot from it too it's just a Google meets like the footage itself is just me and him in a Google meets but I don't think you guys care about that I know you guys probably care more about
the substantive stuff in the video so I think it's definitely worth your time um you know I'm just one person I don't speak for the for the company but I can tell you how the company is thinking about these things and it has changed since we wrote that blog post I think there are some topics about microservices that are that are really interesting that if you're if you're curious about this sort of thing like this is this is like there's some tricky bits in here that that I think people should talk more about so what
I think the the first thing you should start with is really why why would you make a microservice but like so if I already asked ask you like just in you in your own regardless of what I'm doing I don't Das why do you think someone should ever make a micros service yeah I think that's a good question so it kind of takes me back to just like what exactly is scaling and so when I was kind of reading the door Dash post I initially thought well if they got this far with like a pretty
large monolith which of course like I've never gotten to the point where I'm scaling a monolith to a very high degree uh nor have I gotten to like that cusp where it's like okay well this is the line where if you jump over that that's the point where microservices are worth it and if if you go to the other end of that line maybe you just want to stay away from them and I guess maybe this is what you're getting at where people think that line is is maybe changing over time and I think for
many companies I'm sure it is worth it my short experience at Google made me realize that well if they have all this tooling already set up and they have like a large team of dedicated sres that are handling all of this it's probably worth it at that point to you know stick with microservices there's really no need to simplify like the architecture cuz we really didn't have to worry about those kind of issues like those latency things all that was abstracted away from us and from that persp I think it's it Google has very mature
tooling you know they've been they've been at this for longer than basically anybody and so you would sort of expect that their stuff works really well like that's I guess you shouldn't be too surprising but also you probably can't take what happens at Google and right infer too much about how to like make that be advice for anyone else um but so I'm gonna I I really really wanted to stick to that to the like to the why though because I think it's super interesting so like why would you take your monolithic application doesn't really
matter what language is written in where there's a single Deployable artifact like one big binary one big ball of python code like one you know whatever you know language you write it in but it's like one thing inside of it the modules talk to each other by making function calls and when you want to change it you just deploy the whole thing right the monolith right so like what would make you ever want to do a microservice or I hate the word micros service honestly just what would ever want you to introduce the idea of
making gpc's or rpc's network calls instead of function calls and I I mean I so I think that the answer usually works out like this you're cruising along in your monolith and everything's fine and then your company you know starts to get bigger you know your development team starts to get bigger your your maybe your user base gets bigger but like the the the scaling of the traffic is not I don't think an issue I think it's often cited as a reason to do services I don't I don't think it should be um if it
is that there's probably something very specific about that environment but I there are very high-scale monoliths out there I think it's that's definitely like possible but anyway um what probably starts to happen is you have a lot of people all while work on the same thing then they start stepping on each other not while writing the code but while rolling it out so you got to roll it out you cannot just flip the switch and in this day and age you never turn your software off so like traditional software engineering text you know they're all
like maintenance windows or whatever you know what I mean it's like not of this era right so our software you can never turn off and in a world where you can never turn it off and you've got like maybe Global reach and Global audience you're going to be making changes slowly so like developer writes code it rolls out slowly for a day or so maybe some hours but not seconds you know what I mean ain't no seconds right well the problem now is if you what if you if you have one developer or five developers
is probably okay it takes like four hours to do a deploy but if you have 100 developers now boy they're probably gonna like start stepping on each other where like oh there's already a to plan away so then you go okay we'll do a Release Train everybody get on the train all the want to go out they all go out together but now what happens if one of them was bad if you ship a bug you got to roll everybody's change back and I think this is the reason to do Services is that you have
a large engineering team and people are stepping on each other because you're changing it a lot I I think you could you can imagine other examples of things like why it would be worth talking over the network like maybe specialized Hardware like you're doing GPU inference you know ml inference or you've got some other larger memory data structure like a like a like a map you know you're you're Computing etas right and so you've got a bunch of bunch of Road vectors in in inmemory you know graph maybe you'll need a handful of those right
and then it would be silly to embed that in the monolith like that big in memory D stru yeah could be but like in terms of the micro Services where everyone's like I'm GNA move independently like I I think that's what forces you into that would you would you agree with that yeah I think I would and of course like you have much more experience than me in this regard but I think that's definitely the case at least in my experience so you mentioned like why would you want to make RPC calls from microservices that
kind of reminds me that it's not exclusive from the perspective of like back-end development for example um there was a application Google Cloud itself the front end for that is just one really big single page application and it started out as that and so there there's not rpcs being invoked like from those pages to each other they kind of actually realize the exact same thing this single page application when there's dozens of services on Google Cloud there's thousands of developers working on it and they ran into the same problem if you're going to be deploying
it one team is going to break the other team and you can't really do anything about that you're going to have to wait days weeks and so they did something very interesting and I'm not sure exactly how they did it but they actually made that single page application Deployable in independent units and it was all a matter of like developer productivity it had nothing to do with the scale I mean it's a single page application it's going to be served the same way regardless and it's going to hit the same backend end points yeah I
think that's exactly right that in my limited experience I think I would agree with that as well yeah yeah Okay cool so so I I think that is actually like that's like it you just pointed out as an interesting way of saying well it that's not necessarily doing services but they found a way to modularize it to break it up so people aren't stepping on each other and you know this this kind of there are different approaches depending on what it is the thing that you're deploying but but but somehow the issue of if you
got too many people contributing to the same thing and you have to deploy it slowly that's what makes you want to split it out in Services okay cool I can tell you that door Dash did not have the luxury of this kind of careful analysis about thinking about like oh how should we do our thing uh so this is before I joined but they had the pandemic and all a sudden everyone was like well I can't go out um we have to order food and so all of a sudden door Dash had way more traffic
than they had before and more and that you know people needed bug fixed and new restaurants wanted on board and just like all the stuff right so the the development velocity needed to increase dramatically and so they hired a bunch of people and I eventually was one of them and um but we were just stpp on each other like crazy you know like it was not it was not good and the microservices like kind of effort at door Das started with some people just being very practical going like well I've never done microservices before but
I am who works here so like we gotta I'll just do the best I could do right and they started you know figuring it out and you know kind of kept learning like better ways to do it better ways to do it we learned a ton and I would say like some surprising things that I think that that I learned about this is like I actually think I think I've I've been fond of saying this to to my to my colleagues who who all kind of grown when I say it now but um I think
microservices are technical debt and I say that intentionally because in in that I do think that they help you move faster at first I think if you're you're you got a bunch of code running one place and somebody says you know what why don't you just I'm gonna write some code over here you just call me with a thing for a time that does help that one person that one team move faster like they didn't have to get involved in this whole teams release process and their development whatever they're just like I wrote my code
and I got it working now just call me over RPC like it actually did make it faster for that team for that one thing but the problem is now that this thing is running it's part of the call graph you it has to be up right so I know a lot of a lot of microservices theory talks about like loose coupling and oh what about fallbacks and does everything have to be up and if everything has to be up then youve made a distributed monolith and that may be true and it's kind of like not
an useful thing to say I think because like I think everyone to a certain extent makes a distributed monolith unless you really aggressively do fault injection like really aggressively I think you you have a distributed most people have a distributed M whether they realize it or not because they probably don't actually know what will happen if their things slowly grind down to a you know a much lower performance and everything Stacks up behind them or you know just all kinds of different failure modes but okay it's nice ideal none of the companies that I've worked
for have ever made anything other than a distributed monolith either like all the things have to be up they just have to be up and so you could say well you shouldn't do microservices and maybe maybe you shouldn't but they are technical debt they helped people move faster for a time if there's one if there's one thing that I think that that I wish people would change their thinking around it's that idea because it's not like it's black or white it's not like oh microservices are bad like you've got something in exchange for for your
debt right like you borrowed against someone else's productivity in the future and to be clear the productivity that you're borrowing against is someone else later needs to come to you and say oh we have a feature that needs a change to Services a b and c and first we got to roll out a change to B we got to get that thing rolled out now we got to roll out of change to B and now we got to roll out of change to a now we got to make sure that the whole thing works and
and none of those changes got rolled back and then later if you're lucky we can go back and say hey B you can clean up that code and do a release just to clean up the code C you can do it B you can do it but what we find is there's very little incentive to clean up that old code and it it just has a way of like making certain kinds of changes not only both expensive because you had to do like six deployments there to make one change you had to do six deployments
it's crazy but also it tends sociologically like it my my colleague Chris Mel John uh has this phrase uh that he where he he calls microservices a socio technical problem and I think that's a really good way of putting it it's like it's not purely if it was purely technical you would just say like well do a good design do a different design but like there are people motivation someone wants to get you know this product shipped right away and they don't care what happens next year they just want to ship it right now kind
of what you're saying is that it's like a people problem it's like maybe a management problem or just like an organizational problem um I would definitely agree with that in my experience we had like a microservice that belonged to a different team originally and then so we kind of inherited it at one point and then you kind of mentioned like as the number of microservices grow it's kind of technical debt like there could be consolidation like in hindsight you look at it and it's like well these didn't need to be separate you could consolidate them
and we even got to a point once where we had one service using a data store and then we thought well rather than creating our own instance of that same data store for the second micros service we could just reuse that same one and so if they're reusing the same data store well then that's kind of feeling like a monolith but you know in that case we were choosing to do the monolith to go faster actually we're saving time by doing that and then it kind of creates the complicated issue where it's like you don't
really know what it is anymore and you're just optimizing for time yeah it's kind of like a really hairy problem and I guess I don't know necessarily the solution to it if everybody had infinite time you'd kind of clean everything up but you don't for sure for sure for sure and you must realize surely at this point that the comment threads are going to explode with people saying that microservices should never share databases like can you believe that that like absolute just like sacrilege that you've committed there by having two Services share the same database
like how do you live with yourself I mean it must must be rough um yeah I mean dude we did the same thing I mean like the the early the door Dash early door Dash micro you know microservice deployments were just like migration are hard and like you want to make us like literally copy the data out of this database into this other database like you know so um but anyway I I think the people it's I wouldn't say it's just a people problem I I would think if if it was we I I don't
think we would have so many Engineers having so many opinions and being being you know kind of wrestling with this challenge I think it is it's at the intersection of people you know people at work and you know people like getting paid to make software right um and the software itself and there is a technical component it's like the software resists being changed by too many people too quickly right like that's the technical problem it's like if you could find a way to safely let a thousand people change the same thing and just know that
they weren't breaking each other then you would probably let them do that right that's the technical problem but and it's but what we have is like kind of somewhere in between so you say like well if you can put your stuff on the other end of this RPC then maybe we can start to reason about okay did did your thing break or did my thing break Etc even though you practice a call graph that like propagates error back everyone on the call graph ends up getting paged right it's super hard to say like oh the
reason I'm sending back errors is because two Services down started sending back errors you just end up paging everybody and we had the exact same problem where we had like latency charts like within our UI and if like the Upstream service like increased latency then we kind of get paged as well and it was never our fault like it was always a false alarm and then so you kind of get into the Habit where it's like the boy who cried wolf and it's like every time you check it there's nothing actually wrong so then you
get in the habit but maybe if there's an actual issue you're not going to double check it and that's not a really good position either yeah that's right that's right yeah I uh I I don't I don't know how much time both of us have but I sent you a couple of links here one is to uh Chris Mel John's PhD thesis which he did on microservices um and based on Research that he did at door Dash um which is where he talks about the sort of social technical problem um I also I also dropped
you a link to a talk that I gave in 2016 that talks about uh where where I where I suggest what might come after microservices but there's a there are a couple screen that was when I worked at Uber there are a couple screenshots in there of a tool that they built at Uber that is super cool and I've never seen anything else like it that does try to tell you in a RPC call graph whose fault is it so like if if you're sending back errors you're burning your SLO you're doing you know whatever
like it's your service is in trouble it's a a tool that will tell you whose fault it really is usually I think it's pretty cool interesting yeah it I mean I think it is possible to build tooling to make some of these microservices problems like less bad I think people would be upset if I didn't ask this question when it came to that migration was it worth it to kind of go into the microservices direction or is it more complicated than that in that it helped you initially but now maybe you're kind of paying that
tax or have you already paid that tax and now you know it's just smooth sailing from here what's kind of the situation right now no I mean it's that's a really good question um door Dash had to do it you know they they just they couldn't they couldn't move any faster with their monolith architecture theoretically possible with enough time they could have rearchitecturing is one separate monolith and the the the driver you know assignment is different just you could you could imagine dividing it up into some chunks right it could have could have done that
they actually actually did do some of that but but even so it was actually still the same code because there's all this like shared code and and the test Suite was taking forever to run and it was you know it needed it needed a major re architecture and they didn't have time you know they needed to add features fix bugs and do stuff there there was no time to re architect the monolith and like I said you can move faster at first by breaking stuff out um where are we now like I don't know we're
kind of still in this like there's no more monolith like monolith is monolith gone and you know we have hundreds of services 500 I think you need about like 100 to like place an order and get it delivered I think so I think that's about right you know what was it worth it like of course it was worth it because the the company was gonna just the the monolith was grinding to its death you know it was like overwhelmed that you couldn't fix no one knew how to fix it in enough time so like it
was worth it because we you know door Das still exists right like here we are had you know had I joined the company sooner I might have tried to steer us into some like slightly different directions but like you know it it solved the problem like it it it made it so the the company you know could could keep growing but at the same time like we learned just a ton a ton a ton about like what are the hard Parts about microservices and yeah you know stuff that we talked about in the blog post
but it's it's really it's all that socio technical problems you know it's like oh you want like we realize this problem like if everyone would just please upgrade their grpc client um then we can fix this weird interaction that we have sometimes but there's very little incentive for people to want to like go upgrade their grpc clients or pick you know pick your other library right so like now you've got 500 individual things that are running some version of some set of dependencies defined by somebody somewhere and then like you want to say like well
you we really want you to go in and change your code to this you know the API is different we need you to get down this new dependency like it'd be really great if you did you know like it's kind of hard to justify if I were to give people advice I would say use as as few as possible Right like are we stepping on each other okay fine find a like split it out then and then like that is I think very reasonable and not just oh by my domain driven design ideas the consumer
service shouldn't know about the something something piece of data therefore it should be a separate service because what you have I I'll tell you a fun stat shoot I wish I should have looked this up before um okay I'm going to get this a little wrong but it's it's order of magnitude correct um the average fan out when you make a request to the door Dash like front end average fan out is you make about a thousand rpcs and I think that might actually be too low I think it's more than a thousand but like
the reason is because of this domain principle people are like oh well I'm not the consumer profile service so I shouldn't know about consumer profiles so then like you go through this call graph and every step of the way someone's like oh I need the consumer profile so then they go get the consumer profile and you know you can have caches and stuff and that's fine but like if you you make everything have to go out over the network everything is really gonna end up going over the network like like to an absurd degree you
know where where even simil simple things that you might think like oh that maybe could that just be a library and then someone will say yeah but where's the storage come from and then you have all these philosophical arguments about how what's the right way to do microservices and next thing you know you have 500 services and a thousand way fan out on on average yeah that's really interesting and honestly I have one question for you and maybe this will be a little bit uh controversial but I think what you're saying kind of reminds me
of how many times like programmers have this kind of like philosophy they're kind of reading off like a Bible and like everything has to be done this way like you know there are some interesting that people say like you know when it comes to let's say like unit testing or like object oriented programming which you know some people have like their opinions on but obviously like the context matters like you could say that this is a principle and like it shall never be broken but if like the context of the problem is like dictating that
okay well the solution is just more simple to do it this way or maybe we just don't need all these unit tests or like the unit tests aren't actually helpful at all you can pretend like they are you can pretend like it's going to help you sleep better at night and of course I'm not against unit tests or anything like that I have an application where I haven't written a single unit test and it's a relatively small code base and I own it I know every line of code in it and so for that reason
I know 90% of the time when something's going to break if I make a change I'll manually test a few things and to be honest I I do think it saved me time now some people might say well that's that's dumb and maybe it is I don't know maybe I'll have a major outage in the future but at the end of the day it saved me I would say at least 50% of the development time yeah okay so yeah your your your overall question is are there just these ideas that programmers have that that they
stick to they're not exactly sure why and perhaps they stick to them at their detriment um and the example you CED is is test 100% um yeah a example of that is that you know the thing I talked about about like oh you have to have that as a separate service because that's how you do microservices um testing is a fascinating topic I am I I too am not against writing tests they have saved me many times um but uh tests have a curious way of making some changes harder because they kind of like solidify
the way a thing works and you might look at doing something and you know that it would be right to kind of refactor something move it around you're like oh man I'm gonna have to rewrite all these tests and this is another soot technical problem but it like it tends to have like if in in projects that have a lot of tests it tends to encourage uh people to work around the fact that the they just want to keep the test we working and not necessarily get the best architecture um also test coverage I I
am deeply frustrated with this as like a metric of performance because all that really matters is your assertions like you can have 100% coverage and just say assert one equals one right at the end you've covered all your code good job but there's no way to measure assertion quality and I actually kind of wish there was because if there was a way I think I would be on board with you should probably have some amount of you know tests that with quality assertions but lacking any way of measuring the quality of your assertions I think
it's kind of silly to chase to chase test coverage numbers but but we do it it's my personal opinion we of course say oh no no it's very important that you have test coverage and it tends to be that people also when they're in there making their test they tend to have good assertions you know but it's like it's a little misleading like seems like but anyway yeah people have that and it's really hard to um give people a framework for understanding when they should question these kind of dogmatic ideas I I actually don't know
I all I know is I've been doing this for long enough that I just kind of have a sense of when I can get away with it and you know and then people kind of look at me and they're like Yeah but isn't 80 better than 70 % test coverage you know and we'll have those arguments but I don't know man yeah it's this a huge problem I think is that like how do you know when this you know ideal is serving you or is actually hurting you it's that problem H do you have any
other like last minute like hot takes where you think the industry is just wrong about something or just a general Trend that you've seen that's just like counterintuitive or just not helpful at all how do you feel about object-oriented programming I guess or or any other kind of I don't know I mean I I think I'm I'm one of these people who thinks that go is a good programming language because it has fewer features and makes you think about your programs differently um and it's somewhat somewhat controversial but I I actually think it is a
it's a wonderful uh language for a large team to collaborate around I think the YouTube audience is GNA agree with you on that YouTube definitely loves go yeah yeah but I mean it's that definitely gets me gets me a lot of uh a lot of sort of heat discussions in the you know sort of professional software engineering uh Community who are like yes but why not you know Java are cotlin and they're like fine fine fine you want some Modern Pro modern compiled language you should do Rust you know and all that's true um but
you know what here's my my final thing is I actually think that it's a shame that the industry has not produced uh an an alternative that we say uh oh monolith obviously start you make a Jango or whatever you know kind of thing you start your company with like that's cool and then when you break it out you do you do microservices and then like you read some like thought Works books or or whatever and you like you you get the the cool way of doing it and I I think it's it's bad I think
I think we should build Frameworks that offer a different set of trade-offs like this this tradeoff of it's either all in the same thing or it's in a million things and you have to like design it explicitly around these assumptions that it's all in a one monolith or it's always making Network calls a over place I think it's terrible like this industry like should produce new abstractions new Frameworks I don't I don't know exactly what they are we're I'm actually working on one at door Dash that's not done yet and we'll see how well it
works but you know it was a huge battle to try to get it funded because everyone's like but the rest of the industry is doing microservices and I'm like I know but they're bad I think we should better I think we should do better as an industry and you know so like Google has that thing called service Weaver I think that's an interesting step in this direction it it doesn't fully address the issues of like you even if you write one program you still might need to do like interact with this whole RPC ecosystem um
but it's cool I think it's a really cool project but in general I want the industry to do better we should not have to pick between monolith and microservices there should there should be there should be something in between I guess okay last last question because I know people are going to want to know this so do you have any advice for other programmers that are listening to this maybe they're in college maybe they're working full-time and they kind of want to follow in your footsteps and get really good at application development backend development and
you know architecture design what should they do are there blog posts are there books they should read should they come join you over at door Dash what should they do well I mean that that's certainly one way of doing it mean I don't here's the thing I I don't actually know like I got super lucky and in my career and that just I happened to be in the right place at some right times and you know met some people that gave me some opportunities um I I just I don't know man I just write a
lot of software and I just don't I I I don't uh I'm not settle or not satisfied with when someone says Ah here's this Library like just use it like I always want to know what's in there you know I I I I have a deep fear of dependent like I don't like run other people's code unless it's really important and when and whenever someone says hey we're going to use this Library I'm like all right let's have a look you know and I I want to like like I read the dependencies and see what
they do um uh I man I don't know I just it it's very hard to replicate luck but I I just think like if you if you care about your craft you care about like understanding like how stuff works and just like what good looks like and not necessarily getting sucked into these kind of dogmatic ideals like we were talking about you just like what like good is contextual based on what kind of project you're working on and I think it also evolves as the industry evolves technology involves just I don't know figure figure out
how stuff works just make make software better I honestly completely agree and I'm so grateful thank you so much for like joining us and I'm so glad that you kind of said some of the things that I feel like if I were to say them i' might get in trouble but you know now I got I got the credentials there and I think like most importantly a lot of these are like open-ended things like there's no one right answer and at the end of the day you're just trying to create applications that work and that
serve users and the tech behind the scenes isn't you know the number one thing yeah so soot technical problems that's uh that's that's how you should think about it and you should think that you should you should definitely think that microservices are technical debt I mean I just not satisfied with that let's let's do better let's find a way to do better
Copyright © 2024. Made with ♥ in London by YTScribe.com