Serverless: A Comprehensive Breakdown

56.89k views13892 WordsCopy TextShare

Theo - t3․gg

"The Truth About Serverless" is that I'm a huge fan. Sadly it's not that simple. Thank you Prisma P...

Video Transcript:

it's no secret that I'm a pretty big fan of serverless even if I just put out a video about why we're moving off of it that doesn't mean I'm moving all my things off in fact I was a new service that we built for upload thing generally speaking everything I build is still built around serverless paradigms but I haven't taken the time recently to break down why and to really showcase the truth of serverless I've also not been able to do it without a certain sponsor behind me not that they ever had meaningful influence over

the things I said about serverless but I'm hoping that we can have a bit more of a trustworthy conversation about it because there's a lot of things that people do and don't understand about building serverless applications there also some fun little secrets that I think people have missed about what makes serverless applications in the serverless mindset for building really really great for most software developers that doesn't mean we don't have a sponsor today though so let's quickly hear from them today's sponsor is Prisma you probably know them for their omm but what if I told

you they actually have one of the best database products available today you might be surprised about Prisma postgress because I hadn't heard about it until recently and is mindblowing they're not exaggerating when they say it's three clicks to deploy it's absurd it's like versel levels of DX but for database setups and you get five projects on the free tier which is insane that's a lot of database to be hosting for free and it has some features that I've honestly never seen before that are super cool like I'm just going to command f for cash what

at the database level at the orm level you can put in a cash strategy and now you won't even have to connect to the database to get that value back it'll be instant on their Edge Network so cool and it's also serverless ready for all of us hosting on things like versel and netlify they'll pull the connections so you don't have to worry about hitting the server too hard it's it's so good oh and for those of you who like real-time updates in your servers when something changes in the database they got you covered you

can stream in results so you can call Prisma do user. stream here and now you'll get a response in in event stream whenever something happens how cool is that setting this stuff up yourself is not fun I've never SE a service that integrates the OM this well with the database itself providing features that are essential to most Services thanks to PR and post Chris for sponsoring check it out today at soy. link prismad DB the truth about serverless this is going to be a complex topic before we go too far in and believe me we're

going to go deep so if you're a person who knows serverless well but doesn't necessarily understand every detail of what's good and bad about it stick around but we need to do a very brief top level overview of what serverless is obviously serverless doesn't mean no servers what it means is you're no longer provisioning servers for your application and for your code if you were building things the traditional I don't know lamp stack way which is the cool thing when I was a kid lamp stack stood for Linux Apache MySQL and PHP and when you

were building with lamp stack you would run all four of those things on the same server you'd have a server that was running Linux had aache as the like top level of orchestrating the HTTP requests you had PHP which would actually do the work of generating the pages and you'd have MySQL which was the database things would connect to to get the data write it read you know what a database does hopefully then we had the mean stack which was Express angular and node and from that point the idea of everything running on one server

started to fade due to a specific company called AWS so the way things used to work is you would have a box so we'll call this the old way and in the old way you'd have this box we'll say this is just running Linux you would have a top level in this box that was something like Apache then you would have another layer on the same server that was PHP and then one last layer on the same server which was MySQL to rethink how I drew this here let's make these all different elements inside as

I was describing with the lamp stack think of this box as one server and it would run Apache PHP and MySQL all here and if you wanted to swap out the PHP code you would do that by ftping in and replacing the files that this PHP code was calling because PHP is interpreted you can just swap the PHP files out for different PHP files on the Fly and you're fine but then something interesting happened well two things the tech we were building with changed slightly we're like we would still maybe use Apache but usually we

wouldn't even bother so the mean stack would be Express and node you would still have angular in here but the angular code wasn't running on this server the angular code was a script tag that was part of the HTML that Express would resolve with and then you would load that script tag probably from something like S3 to actually do the angular side or the react side if you did M Stack locally on your device but something quickly started to happen here which was that the tooling to swap out the code for the JavaScript stuff was

complex enough that just ftping in and swapping it was not super reliable on top of that getting to behave was annoying so a combination of this and this little company you guys might have heard of called AWS I know they there's a small startup but uh they did something interesting they broke this up so instead of it being one server that has all these things in it they have one server that's just the express and node code and a separate server that is just the code and this enabled a lot of cool things it meant

making changes to this didn't ever risk the Integrity of your database it meant you could have multiple of these in multiple different places if you needed more boxes to handle more traffic you can just run more of them and if they're all Conn to the same database it's fine and slowly we started doing this with other databases it wasn't just even early on but the idea of a database platform being servers that are separately running from your application code was at a point in time a kind of New Concept the idea of connecting to a

database on a different box somewhere else as the main way of dealing with database reads and writes on your API was weird people were not used to that at time eventually this went so far that we started having microservices where we have different Express node code bases entirely running on different boxes all using some database as the source of Truth a large inspiration for why we did this was the ability to swap out the code running on these boxes so this is running go code if this is running python code or something else and the

ability to swap it without restarting so to speak wasn't there like it is in PHP the only way to properly swap out this code would be to spin up a new server with the new code start running traffic to that kill the traffic on the old one and then kill this box and this type of setup in order to swap what was inside of the box that users were hitting got more and more complex on top of that more and more of the state that we used to keep inside of this box like we would

here with my SQL started moving out so instead of this being stateful in this server that we had running having a bunch of data that would be lost if the the server shut down we started making it so at any point you just kill the server and you wouldn't lose anything but at that point how much are we benefiting from having a long running server in the first place and what happens if we have a huge traffic Spike and we need more provisioning than this server has in it boy do we have a solution everyone's

favorite kubernetes kubernetes was created to make it way easier to manage all of these different servers how many of them there are what codee they're all running Auto scaling groups with ec2 was also a big part good call out from Ryan I didn't know you knew the inaside this well good yeah ec2 was another path there and then eventually ECS and eks all built around these kubernetes Primitives the goal was all the same though make it so you don't think as much about what specific code specific servers are running instead you configure the ability to

spin up a server and you can just spin them up and kill them whenever as long as your database is separate who cares because if you have one server or 100 servers if they're all connecting to the database fine doesn't really matter how many servers you have I've learned to just assume Ryan knows everything it's usually accurate yes but one of the the highest honors I've ever had was helping a thing click in Ryan's head it is magical to take someone who does know everything like he does it actually help them understand a thing they

happen to not have an understanding of because they haven't done it it's it's humbling yeah Ryan a big part of why he knows so much is he takes the time to learn it and he talks to the people who he knows no better and is very willing to learn and be wrong but yes it feels like he just knows everything anyways enough of me explaining why Ryan carni is one of the smartest people in Tech back to another smart thing in Tech kubernetes I hope bringing up kubernetes and even typing that word out gave y'all

a bit of trauma because one of the specific things I have tried to push with my brand and my channel is taking these things that work for very specific types of workloads in large companies and making sure the average developer knows they shouldn't touch them things like 100 % code coverage things like kubernetes might make sense at specific large companies but for the majority of medium to small-sized companies or even medium to small siiz teams at big companies they make a lot less sense do you know who else realized this AWS I think we have

now contextualized how we got here first we had everything on one box then we had the separation of our application code and our database layer so that we could swap this out without losing our data then we realized we could actually run multi of these at all connecting to the same database and it didn't really matter which made it way easier to scale it made something like node viable I should probably point out that nod's concurrency model despite being great for iio bound things was not as good for heavy work so the ability to spin

up multiple servers was very very useful for node specifically but to leave these here something started to become obvious at AWS why do we have to think about how many of these servers we're running and why are we running them all all of the time let's set up a theoretical scenario going to make a quick little diagram here let's say this is number of users and the bottom will just say it's time now let's say when our company started number of users is relatively small and we could comfortably fit it on just one server awesome

let's build some thresholds out here let's say this line is the point at which one server can no longer handle the traffic this would be two servers this will be three this would be four for a long time we can get away with this but things don't get start to get messy until you cross a threshold for the first time and now we need two servers let's say we only get those traffic spikes that need that at very specific times like I don't know Black Friday caused our traffic Spike up a whole bunch now we're

doing too much traffic to justify having this one server here so we need two we could set up an autoscaler to detect when the traffic gets to a certain threshold so that usually looks like is you pick a threshold we'll say here and when we see that traffic number of users whatever metrics we're tracking get to a specific point then we spin up another server so if we go past that point we have enough servers to handle the traffic this also means you have to write your code in a way that can handle running in

multiple places and makes no assumptions about multiple users being connected to the same box but if you make all of those assumptions and you build this in a stateless way and you build in these thresholds to detect when something should scale up and down you can handle this the issue is how long it takes for that to spin up I'm going to copy paste this because I want to have some more fun examples let's say instead of a slow ramp up because it's a Black Friday let's say instead the ramp up is because Theo mentioned

you on Twitter push you in a video let's say the ramp up looks like that instead so it happens almost immediately where in seconds we go from a small number of users to 3x or more we pass this threshold so we spin up the next server but do we have even the capacity to notice that we needed two servers not one the servers take time to spin up and you need to have enough servers to even know how much traffic is coming in so if we broke our threshold we don't know how hard we broke

that threshold necessarily so how many servers do we spin up gets complex and as a result what I would see many companies do and to this day many still do I was guilty of this myself at Amazon we would just provision up to our theoretical Peak so instead of running one server or two servers we would just always run three because the alternative is some users get an error page or get a hanging connection because there was no server to actually resolve their request because the server that we had was over burdened and underprovision this

stuff sucks I want to be real clear about that doing this is not fun if you're the type of person that finds this fun cool awesome you're going to take a lot of my money in the process but this sucks and it's funny somebody said just don't use JS in chat because a lot of other languages make this even worse rails blocks on iO so if you're doing a database check for one user the next user can't even start their request yet at least with JavaScript IO won't block the other things going on when I'm

waiting for the database to do something I can process other requests at the same time you can build this yourself in other languages like you can build this type of concurrent model for handling non-blocking IO separate from your requests in languages like go maybe you can do it in Rust but at this point I hope youall know how miserable it is to do concurrency and parallelism in Rust even the like rust Community seems to have mostly acknowledged this and written some awesome content on how hard is to parallel and multi-threading stuff correctly in these other

languages but non-blocking IO is a difficult problem and no rails has not solved it rails has a cool thing called rack where you can spin up multiple threads on one box but each of those threads can be blocked by IO that's just a fact so believe it or not JavaScript is actually a pretty good language for running in environments like this when you consider the balance of the code being much easier to work in and digest because you're not writing heavily async code the only code base I've ever seen that is truly fully threaded and

is readable was also written in Elixir because Elixir was built for these things but yeah if you think rust is beautiful you haven't written enough concurrent in asyn Rust because that code you lose a lot of the beauty really quickly anyways if we wanted to make this solution work great there's a few things we would need to do so here are like the bullet points of how to make this great and as again I'm talking about the load balancing solution so the things we'd have to do to make load balancing work well are make it

so servers can spin up fast because if I had to handle a traffic Spike I don't want to wait two minutes for a server to spin up I need that ASAP so cold start times and the time it takes to spin up a server is much more important we also have to make sure that we can detect the amount of traffic more trivially make traffic patterns easier to track and of course we need to make the code stateless so of course Amazon worked hard to do all of these things they went out of their way

to make the spinning up of a new server as fast as possible to make it easier to use other tools to actually detect what traffic levels look like so not having a server capable of handling the traffic doesn't prevent you from knowing how much traffic you're getting and encouraging us to move our state off of our serverless into things like their database products their fun new serverless database Solutions stuff like that but in order to make a load balance solution work well these things went from stuff that no one ever thought about to high priorities

I promise you up until recently nobody in the PHP world was thinking about how long it takes for your PHP code to initialize or to spin up a box from scratch and get PHP running on it these weren't things people thought about because they ran PHP on a fixed number of servers that were always there so they didn't think about how long it takes to start up because they don't start it up they started up 17 years ago and haven't touched it since so these aren't things that most of these devs thought about but in

order to handle these types of traffic patterns and these behaviors ADW has had to think about these things and I think those who are tuned in enough are going to see what happened here they accidentally invented serverless serverless happened naturally when the efforts were put in to do all of these things so that load balanced Services could spin up faster because if you need to go from one server to two servers to three servers you need to be able to handle the fact that there's different state on those different servers you need to handle the

fact that these need to spin up almost immediately you need a way to know how much traffic is there so you know how many servers to spin up so the obvious question that led here is what if it was automatic this is why serverless exists the process of automating load balancing almost naturally comes to the conclusion of rebuilding Lambda and the reason serverless works this way is because it's where you naturally end up when you try to optimize these problems so to actually explain what serverless is because again there's a server is we've abstracted a

bit further so if we go back here to the distributed mean stack example where we have servers 1 two and three and all these servers are running Express in node and we have our database I'm going to move this separately because I want to separate these into a clear set of things a quick way to diagram things differently from what I was showing before is if we set this up as stateful on one side and stateless on the other because this code needs to be able to run and die and handle all of those cases

so we have to offload the state over here the state could be in State could be in redus the state could be in a lot of different places but when these servers need State they have to get it from somewhere else instead because we need to be able to kill this server at any time or spin up another next to it and not lose track of your user information and stuff like that and when you separate things this way the next step almost becomes obvious I'm going to quickly change this to be your express code

make this a bit bigger so it's clearer because this server is running your express code and node which is what the code is being executed on so now if it turns out you need more traffic you have to spin up a whole another one of these boxes and these box are running Linux so we have to wait for Linux to boot so now we have an empty box with just Linux then once that's booted we spin up node we have to load in node all of its dependencies everything else and then we load in your

express code and now the server once it's processed that code Jed it launched it and got it running now after however many seconds if not minutes that all took oh and by the way you have to form a database connection too notice to connect to your database to be able to do things once all of that work is done now this server could start responding to requests what if we moved the express code though what if all of these servers were already running but without your code in them you could do this yourself as I

showed before you can just spin up a server for all the different things that you might be doing and all of the concurrency that you might hit all the user counts you might reach you can spin up the right number of servers for that but it' be hella expensive but if your Amazon size spinning up that many servers and just keeping them around isn't as big of a deal especially if you are dealing with the traffic of like half of the web because if one website is spiking in traffic that means other websites aren't so

if you have all of these servers that are already running and you're able to temporarily share these with people so that effectively it's a different layer of the same instruction where when you're renting a server through ec2 they're not going to a server rack plug pluging in a new box that they just bought for you and assigning it to you they already have a bunch of these machines spun up ready to go they're just giving you a virtual layer in one of them so what they've done since and what surus actually is is they've abstracted

it one step higher now they're not just running a server waiting for your code to come take over and run it indefinitely now they have all of these servers idling waiting for a payload the payload could be a Docker image but generally they don't recommend that recommend using their images so you can use their pre-provision servers because now when a user makes a request all they have to do is move your code in handle the request and then move it out that's what server list enables it's not that there aren't servers is that you don't

own them the same way in the previous model all of these boxes are things you could see in your AWS database you had a server that was my server one my server 2 my server 3 these were yours and they stayed around until you told them to go away the what serverless means isn't there are no more servers it means the part of what you own no longer includes the actual servers because the server is only running when a request is being made so it could be an event triggered through sqs or like you uploaded

a file on S3 which triggered a Lambda there's a lot of different reasons that one of these things might spin up but by default they are not there and your code is just sitting in S3 waiting for one of these servers to receive a request so that it can load in that code and start to execute now we're try to get the good questions does serverless mean the dependencies get installed for every call effectively yes oh AJ's here good stuff please correct me if I say anything stupid I'm trying to to do a very dumbed

down explanation of how these things work but if I say something that's like outright egregiously incorrect you know what you're talking about please correct me doing great stuff are awesome that's like the serverless God says I'm doing great cool we're doing good so far boys this is actually a really good question does serverless mean the dependencies get installed for every call for now I'm going to say yes I'm going to describe it and then I'll show you how it's not fully true but for starts yes if I have a service that has a low number

of requests let's say one a minute or so the request comes in AWS sees the request and says oh this request is to this URL which means it needs this code to run so to be very clear not only do you have to spin up all the dependencies from your code base so any node modules you have any complex stuff there not only is that all to get included you also have to form any additional connections so if Express is using an OM in your JavaScript code that then has to form a connection to your

database to reddis to whatever else and the forming of those connections can take time because again back in the old server full days where everything was on the same box they didn't care how long it took for PHP to connect to my SQL because it's already connected who cares suddenly we have to care a lot more because that is now a blocking thing that keeps your code from executing so if it was a 5-second connection time that you only ever had once in serverless all of a sudden it's happening hundreds if not thousands of times

a day depending on the services that you're building this also meant that things like balancing how you connected to your database mattered more cuz like a default postgress deployment on Heroku had a maximum number of connection of 10 10 connections could be made because again they were assuming you had one to 10 servers that were always connected but if I have 15 users I might have 15 connections and now the old way we connected to databases started falling apart and that's why we started seeing things like PG bouncer PG Pooler and all of these tools

to abstract and build a way to hold connections so they could be shared across different deployments but as I mentioned before we were building all these things anyways in order to make server backed load balancing work as well because it would really suck if the 11th server shows up and all of a sudden your database can't be connected to because it had a limit of 10 connections we had to solve these problems but as we went further and further with those Solutions we made this way more viable and now the term serverless database despite sounding

kind of dumb because it is kind of dumb what it actually means is the server that your database is running on is prepared to handle an absurd number of parallel connections because if you have 1500 users you might have, 1500 connections to your database even if you're not reading and writing too much data from it so this just sounds terrible right like why would we do this well there's one piece that makes it way less bad than it sounds when this code responds this box doesn't immediately die it sticks around for a little bit it

holds the connection that you made and if another request comes in fast enough it will handle that for you which means you don't have to pay the cost of moving this code over you don't have to pay the cost of jitting the code getting it running you have to pay the cost of connecting to your database and all those things it's already been paid it just sticks around for the next request and if you have enough users and enough relative traffic the number of times you're eating cold starts goes down quite a bit but if

you have a traffic Spike if you have a traffic Decline and then it goes back to normal or you just have like an event where a lot of people are saturating existing lambdas and then one person makes an additional request past your current provisioning you will eat a cold start so we needed to reduce the cost of those as much as we could and we have the cost of spinning up your database connections the cost of insing the JavaScript code the cost of a lot of these things has gone down significantly and now you can

run things serverless and get surprisingly great performance if architected correctly now that we have broken all of this down I think we're finally ready to get to the thing I actually want to talk about for this video which isn't how serverless works or even how we got here it's the title up here the truth about serverless we talk about the truth we talk about the facts that we've just established first serverless requires fast startup times two serverless scales to zero well three serverless requires stateless design of your application code and four serverless costs more money

than running a server for the same amount of time that's a cool call out from AJ pretty true too AWS want say it publicly but I wrote a blog post with code which proves that they're doing some predictive autoscaling pretty neat I call it proactive initialization that's really cool a% cold starts were pre-warmed but your mileage may be veryy interesting so even cold starts might be pre-warmed because Amazon is trying to detect those things very interesting yeah anything here that is egregious AJ I think this is a fair set of points to call out I'm

going to do a a little thing up here fun potential tangents that I'm successfully avoiding we'll see how long I'm able to successfully avoid these for one is how Cloud flare uses V8 to avoid cold starts one is versel caching bik code one is HTTP connection methods for databases there's a lot of these things I want to touch on but we have more to talk about so it is relatively agreed upon that these are things that you have to understand and deal with if you're building in a serverless mindset speaking of building in a serverless

mindset AJ has been helping us out a ton throughout this if I say anything and he disagrees and it's about AWS or serverless anything I am wrong he is right listen to him great stuff also streams on Twitch benchmarking a lot of these different things so if you're curious to see what this actually looks like to do AJ's a really fun follow and I have absolutely used his threads his content and his streams to keep up on these things I can't believe I wasn't following him that was pathetic on my part anyways I want to

Riff on these things because I have a real spicy take and the spicy take is the thing that we're I'm actually filming this for here's my spicy take the requirements to build in a serverless manner make you write better software and here's where we get to get spicy the secret piece that I'm going to try and conv you guys throughout serverless is the biggest win for functional programming quite possibly of all time functional programming is the concept that you build a pipeline of code and functions and when given an input it will always generate the

same output pure functions functional programming are not the most popular thing in the world I'm not going to sit here and pretend everyone's writing everything functionally functional programming isn't just a thing that we talk about because we're evil and we hate o or whatever it makes reasoning about your application logic comically simpler this one of those few places where I think I in the unit testing and testing World get along quite well because if your functions are written in a pure way where the same input will always generate the same output it is really easy

to test it is really easy to debug it is really easy to reason about and some of the most complex code bases that I have worked in were made significantly simpler by using patterns like this like it was easier to read our insane concurrent pipelines for dealing with video inest for the content team I was on at twitch because it was written in elixir with good Primitives then it is to read and reason about I don't know a websocket server written in Rust if your code has clearer inputs and outputs and those pieces the functions

can be composed in a way that is logical your software will be better and easier to maintain I have a few pieces of content where I talk about this there's a classic video one of my favorite tech videos ever want to find the original yo is this Brian will video I have a video on my channel of me reacting to it object oriented programming is bad is so good it's such a good video it got me into coding YouTube for real when I was younger and I I probably would not be a YouTuber if it

wasn't for this and it wasn't for Brian unbelievable piece of content I would argue this is like a must-watch for all developers not even because o is terrible just to give you the perspective of how we got there and why other methodologies of programming are good so as I was saying with serverless one of the fun things of building serverless is that your state has to be separated you have to have your state somewhere else and when you're executing a serverless function let's say I have an API that you hit to get your user profile

data so you're hitting this API you have a cookie on your headers that I'm using to know who you are and I'm fetching data from database to give you your profile the server cannot contain the that comes from the request it is passed to it and it cannot contain the database because that's running somewhere else so when you write this code and this pipe so to speak is being hit it gets two inputs the first input is the request the second input is the database connection so to speak and through those two pieces we are

now able actually technically speaking the only input on the top level function is the request so this function takes the request it sees that this request is for the get user profile endpoint so it grabs the database connection and it passes the request in the database connection to another function that function checks the request to figure out who this user is once it has authenticated you and knows you're you it then makes a call to the database to get the user's profile info and then it sends that down to the user none of that state

lives on that server it doesn't matter if we have one server doing that 100 servers doing it or 100, servers doing it as long as the database can handle the concurrent connections you're good but if that state is distributed all over the place or it's attached to random objects you need to update the server and you don't know what's going to happen to that connection all of those things it gets chaotic quick and most of the hardest problems I've had to debug in software have come from the fact that state is being put in the

wrong places I'm going to drop one more hot take this is also where the HTM X folks and I agree the fewer places your state lives the better and ideally your state doesn't live in places it doesn't have to like business logic like client side logic if you can move all of the state from your servers away from the client and all the client gets HTML it makes reasoning about the Cent and relationship with the server significantly easier if you can move all the state out of your serers side logic and endpoints and have the

logic in the complexity of the code be there and the state is something that it's pulling in externally or receiving through a request reasoning through what happens when and how becomes significantly easier it's so cool and as a functional programming nerd it has gotten me excited about building and back at code again because the actual code I'm writing is so simple and all the other things we used to have to think about like how is this deploying what's the deploy script how many servers do I have what is the size of the server how many

virtual CPUs how much RAM do I have all those things I had to start worrying about that again recently because I'm spinning up some lall stuff on the side I forgot how annoying it is to play The Guessing Game to make sure you've provisioned things correctly it is not fun and it's just another one of those things that's taking up room in your head like where does this state live or how many cores do I have on the server or how many requests can this handle all of these things are things that have been wasting

time in my brain and when you are forced to not be able to have those things I actually find that the code I'm writing and the amount of my brain that I'm using is simpler it's easier to reason about these things I have been very surprised that these requirements aren't flaws they aren't things I have to worry about constantly these are things that make my code better there are obviously times where I can't do these things like if I have to have state like for the injest server for upload thing it needs to know the

current set of chunks it has and where they are going and that is ephemeral state but it has to live on the server we are putting some of it in redus so that if a server dies immediately like we can recover but the nature of how it's holding connections is such that it should be on a server so it is but if you can build this way I would go as far as say you probably should because it makes reasoning about these things homic simpler but I have to Riff on two additional pieces here I

think the first chunk of what we talked about helps explain why the requirement for fast startup times makes sense and why it's actually good because it makes everything from load balancing to serverless better I think I did a good job here with point two which is serverless requires stateless design of your application code just described why not only is that reasonable but it's actually kind of good and can make your code significantly better and easier to reason about but now RI on these next two scale to zero and it costs tooo much let's have a

conversation about scale to zero this is a thing I've actually changed my mind on a little bit I'm going to tell this story of how I came to these conclusions by not talking about serverless we're going to talk about databases talk about two companies Planet scale and turo these two companies might seem pretty similar on the outside they're both serverless database providers that are trying to make it easier for full stack devs to deploy things and database work but there are some key differences obviously Planet scale is MySQL plus vest it is focused on massive

throughput this bit smaller on massive amounts of data but the most important thing is it requires servers to run and that means it's expensive for them to run and I still love Planet scale we're still using it for upload thing and I cannot imagine building upload thing without it turo it's quite a bit different it's SQL light technically it's lib SQL because they forked sqlite because getting changes merg into sqlite is impossible but I honestly see lib SQL becoming the standard long term it's in a really good State simple to adopt multi-tenant multitenant in this

case means that you can spin up different databases for different customers because realistically speaking are you ever trying to join data from different users tables like if I have two customers with upload thing that are uploading files to different places in different regions what is the likelihood I have to fire a query that is joining those two different customers it's effectively zero and if you can separate like different orgs different customers different applications to different full-on databases it can make managing those databases way easier and turo really focus on doing things like that where Planet

scale they're using vitess to Shard so you have one massive database that is broken up based on different keys in the database so maybe they have different like groups of data in the database sharded based on the user ID they're attached to pretty cool but very different approaches to similar problems they are both trying to be serverless friendly databases though one more detail on turo side data stored in S3 since it's based on sqlite and as you hopefully know sqlite is just storing to a file by default turo just stores those files in S3 or

some random Object Store and if your database isn't getting a lot of traffic and then a user goes to your service and now that database has to be hit similar to what we were describing before with serverless they just pull your database from S3 resolve the request see if more come through for a bit if they don't they just put it back in S3 and then it comes out again when the requests are being made that means insanely cheap to run since Planet scale has a server or multiable for each of their databases doesn't matter

how many users you have each one requires multiple servers and that's expensive and that's why Planet skill ultimately had to kill their free tier whereas turo is one of the most generous free tiers I've ever seen in my life the free tier gives you 500 databases 9 gigs of total storage 1 billion row reads and unlimited embedded replicas that's insane for a free tier but when you understand the pricing model it makes makes a lot more sense because their pricing isn't based on how many databases you have costing them significantly more money because they're all

running servers it's based on the fact that they're just storing that data in S3 this also why when I'm working on smaller projects now and the requirements for raw amounts of data are lower than something like upload thing with Millions if not tens of millions of files terso is a really good choice and that's why I use turo for pick thing yes I'm actually shipping too and production now and it's been pretty good experience overall so why am I bringing all of this up this is the planet scale pricing page one of the things that

I really like about planet scale is their concept of branches a branch is a copy of your databases schema that is none of the data it's just the same schema so you can use that in Dev in staging whatever to make changes to the databases schema like add new rows add new tables delete things create new indexes whatever once that has been reviewed and people agree on it you can then do what's called a deploy request similar to a poll request on GitHub to request that these scheme of changes get added to the production schema

but those development branches have to run actual servers because again Planet scale is running real MySQL v test servers which means each of these branches costs them pretty close the same amount of money that the actual database cost them which is why they have to bill hourly based on those branches previously you'd pay for a fixed number of branches but that got miserable quick like our bills for just branches were pretty rough especially when we tried to automate the creation of staging environment stuff it got bad so they changed the billing to be based on

hours of usage so a month says 30 days 24 hours 720 hours in the month as they say that's two number the hours of the month so you can built in into their main tier have two branches open the whole month but if you need a third branch you have a third developer working on things you have to pay money for that if you don't remember to sleep these or kill an old one when you're done with it you have to pay for that and it's not that just nickel and diming you to squeeze every

penny out it actually costs the money I'm not sitting here trying to demonized Planet scale it's the thing that we're using for a reason it's a phenomenal service and it's worth every cent but these types of features cost them a lot of money to run and the reason they built this feature is to streamline the process of deploying your changes so that it feels more similar to what we're used to with tools like GitHub and versel and all these other awesome things in the ecosystem turo gives zero I don't think they mention they mention branching

in here yeah branching isn't like how many it's just yes there is no concept of branch hours or number of branches or any of these things because again it doesn't cost the money it's funny to look at to see yes where the on planet scale it's like complex math you have to do it's just yes yeah so this is where my brain started to flip cuz previously especially if chatting with Planet scale and Sam there I still love Sam we would talk some on scale to zero because the people who push really hard for scale

to zero as like a reason to adopt a service they're doing that because they have no users and they want their service with zero users to cost Z and they want their service that theoretically might have thousands of users someday to cost a reasonable amount of money one traffic goes down they don't want to pay anything and when you're like me I don't even want to see how many gabos I have right now 192 so I have 192 projects and I actually go through and delete them pretty regularly so that's a lot and if I

have all of those different projects very few of them are getting actual traffic and if I'm paying for a $5 VPS for each of those that's not a cheap monthly bill and now imagine each of those has multiple poll requests that have preview environments if each of those has let's say average five open PRS all of which have a preview environment and they all have $5 a month server for each of those which is generous cuz theoretically the production version probably needs more 192 Time 5 po request time $5 that's $4,800 a month for just

my GitHub repos and then I was starting to play with turo more and thinking more about their cost model and realized the magic of scale to zero has nothing to do with having zero users on your production environments scale to zero doesn't matter for production and that is what has flipped my brain realizing that this isn't a feature that allows for you people with zero users to have a good time like yes it does that cool the benefit of scale to zero is you don't have to worry about things in your development environment your staging

environment and all the other things costing you money when they have no users if you're running a $3,000 a month gigantic server to handle insane amounts of traffic and you're working on things in Dev you either run it on a smaller server so you're no longer replicating the actual production environment or you're doubling your costs for every single instance of a developer environment and I've seen companies doing crazy stuff to make that viable now to showcase the company I said I wouldn't talk about too much this is my personal versell account notice that it is

Hobby tier I am not paying on my personal versell account because I don't need to the things that get enough traffic are on my business account but for my personal hobby tier is fine but what I care about here has very little to do with the number of apps I have which by the way I have a lot of apps deployed I the show more button show more again yeah I have a lot of apps deployed on a free tier and it's fine but even more fun if I go to something that had branches on

it this actually be a fine one T3 Astro you go to deployments these are all the deployments for different PO requests that I did two plus years ago ready for the real magic I just clicked the link for a deploy from two years ago and it worked fine almost immediately inste this might have had no users it objectively had no users nobody is checking a preview build of my application from 2 years ago this type of workflow is only viable in serverless environments and it's the secret win that scale to zero actually provides if the

cost of your service having zero users is that close to zero you can now do things that you would never otherwise be able to do we saw the branching model on planet scale it's cool they got it working but it takes so long to spin up a branch and it cost you money unless you manually sleep it and it cost you money anyways because s we spun up with something like turo just go spin up more databases who cares spinning up a new one is you just click a button it's there immediately because it's just

making a new sqlite file you have to worry about those things and that enables workflows ideas and things that just aren't possible otherwise you cannot have a preview environment experience this good without something like serverless because because you can't leave this code running you can't leave all of these PO requests on a on my personal site on my free tier versel account you can't have these hundreds of previews up and ready to go where I can click any of these links and it will immediately resolve the page still this my old version of my website

that is so cool and up until recently versell would just indefinitely keep all of these deployments available forever now there's an option where they'll automatically kill them in 60 days but I don't care why should I all it's doing is grabbing the right code from S3 that's it's been sitting on for years now it's not like the code is Big it's like a few megabytes of but it can pull those into the Lambda spin it up immediately and it costs them effectively nothing so if I have an outage and I'm trying to figure out when

the thing broke I can just scroll through or I can hit filter production here's every production deployment I've ever had on my personal site I haven't done one for over a year I need to write more blog posts clearly and these are available so if I have a bug and I'm curious when did it break click does it work here oh cool it works here guess it was an older one this is not possible with servers unless you massively over provision build your own workflow for dynamically loading in and out code and dealing with the

latency necessary there like I've seen people spin up things like this with stuff like fly iio it's going to be 5 to 10 seconds minimum when you click the link till it finishes spinning up the server and actually shows you the content which makes emergencies suck also if I want to switch to the old version here's how I do it click oh sorry hobby custom customers can't instant roll back but if I was on a paid account one click all traffic is going to the old version but the traffic isn't being redirected what's actually happening

is when a user shows up it just resolves different code instead that's magical and these types of workflows are only possible because the production environment who cares about the scale of this one but all of these other environments every single old production environment I've ever had these are all currently benefiting from scaling to zero because if they didn't scale to zero they would have to be terminated they couldn't stay there this is totally true scale to zero is useful if your product is seasonal or even just active a bit during the day and then not

but for Production Services which are interacting with other automated Services scale to zero is kind of pointless you can save money using provisioned mode for Lambda but it's still expensive compared to even fargate yeah I agree and honestly building things in a way where serverless is all of the stuff going on in the developer world when you're writing the code you're working on things in development whatever else awesome you use sist for all that and then if you want to minmax the cost verticals for the actual deployed version make the production version fargate that is

a totally fine acceptable path I would actually find that pretty cool because again serverless and stateless code can be run just fine in a server code that's expecting a server full runtime is both harder to debug and can't just be thrown into a serverless environment and be expected to work I like coolify a lot I've been chat with the coolify dev a bit more now too but I want to be very clear coolify help streamline a lot of these things and it's one of the few ways to to replicate these workflows in a server nobody

has this without serverless you can't just have every single instance ever deployed of your application with a oneclick swap over that doesn't happen otherwise should probably clarify what coolify is for those who don't know coolify is self-hosting or superpowers it's a either you pay them to use their cloud or you self host because it's all open source and you deploy coolify actually I'll see if I can find my um my coolify dashboard cool just for proof that I'm actually using coolify I'll dark much you all don't get too mad at me you can connect the

GitHub project configure things and get it running as many Docker images in an existing server so kifi runs on a dedicated VPS and then you spin up little Docker images for different things so here I have my Layel deployment and I have my postgress so these are separate Docker images that talk to each other so I'm able to to replicate the stateless workflow here some amount but these still need to be provisioned thankfully with PHP you can kind of just hot swap the code it doesn't give a so you can get a little closer with

PHP but other languages have startup times and expectations you can't just hot swap those and is cool as cool if I is and trust me it's pretty cool it does not come close to the power of oneclick redeploys instantaneously redeploying having every single GitHub PR get a preview environment link that is there forever so if I go and I look at an old PR and I want to see how it works I can click the link and it works no one else does these things this way no other method I should say doing this with

servers isn't really possible unless you're spending a ton of money and as Yash just said in chat I know someone whose company racks up $100,000 a year in bills on their QA and Dev envirment I have personally racked up more than that at Amazon and twitch it's not even that hard to do AJ gave me a fantastic transition Point here to go from3 to4 I hope I have properly established here that the benefit of scale to zero isn't that if you have zero users things are cool it's that you can have an infinite number of

deployments and not think about it because who cares it's just data sitting in S3 and it gets loaded when you make the request when you start building in this way you get to do so you just get superpowers that you wouldn't otherwise have that are magical it's similar to point at my favorite functional based thing react the magic of oh I can just reuse that component it's like oh that fundamentally changes How I build software and that's how I felt when I was doing this I just didn't realize that scale to zero is the the

reason that these things can work and are so cool like I intuitively knew it and if you ask meble to figure it out but it was using turo that made it click for me the reason turo can be so generous with things like branching the number of databases and all of that is because the databases are stored in S3 and they just have a server running sqlite pulling from S3 reading or writing or whatever and then updating S3 accordingly it's so cool and it's basically the exact same model as how how Lambda works and how

versell works and how netlify works and arguably how cloudflare works and it is so hard for me to not go down the cloud flare V8 tangent please in the comments tell me you want to hear this so I can justify filming this video because I don't think people care but if you can convince me I would love to film this it'd be very fun for me but now we need to talk about the final Point here serverless costs more money than running a server for the same amount of time if you were to price out

how expensive it is to run a 5 VPS and look at how much compute you got it's like oh that's five bucks a month and I get all of this compute whereas doing things serverless is no longer build on how many servers you have and gigabyte hours are a measurement of how many gigs of RAM are being used in explaining gigabyte hours is really hard AJ how do you do it how do you break down gbh in a way for Layman to understand because it's stupid metric Gabriel it's like gigabytes per second but for hours

godamn it I haven't had a question hurt me this deeply in a long time this might make me do an additional tangent okay I have one more tangent that has come to mind this one I might actually do while recording this for tears of abstraction bare metal ec2 ECS Lambda that'll be an annoying one hey just writing a Blog he also probably is doing his job and having a life this is actually B basically how I was going to describe it a function allocated 1 GB that runs for 1 second is considered to have consumed

1 GB second so if you have a Lambda that has 1 gig of RAM and it runs for 1 second you just ran 1 Gigabyte second this is annoying because the amount of CPUs and other things that exist in that are additional cost vectors but the gigabyte second metric is how most of your serverless billing is being measured so let's say I've configured my service to use 4 GB of RAM on a Lambda it gets 100 requests per hour in each request takes 1 second then every hour I'd be using 400 gigabyte seconds take the

number of seconds per request times the number of requests times the number of gigabytes and when you combine all those numbers together that's your usage so let's bump this let's do a th requests per hour maintain the 1 second request time this would equal out to 4,000 gabyt per hour so every hour we're using 4,000 gigabyte seconds gbtimes s to be clear thank you for correcting that chat and that would be 2, 880,000 GBS per month so let's go look at the pricing here we're going to literally skip everything else we'll look at the 4

Gig why is it showing prices per 1 millisecond that is so annoying so here's the 4 gigs of RAM we'll times this by a th so that we have actual seconds here's the gab per second cost so if we go back here 4 GB servers GBS cost equals that tiny number so if we grab the 2, 880,000 and we multiply it by this number that would cost you $192 a month so the total cost here 192 bucks and when you look at that you think about it ow especially when you compare to an equivalent VPS

ec2 pricing if we look at an ec2 instance I wish they would stop showing the hourly rate we'll grab a t4g medium times 6 60 * 24 * that can't be right that's hourly right d for an equivalent 4 G Ram server running for that whole month it'd be 24 bucks a month so if you would just run a server that whole time instead it would have been hilariously cheaper it's not the $5 a month VPS clearly there's been some inflation but for 25 bucks a month you could just run this server and not pay

that absurd amount of money but there's a lot of catches here that I want to bring in first one is here thousand requests per hour there are very very few services that have a flat level of requests like this if there are there're like cron jobs or ingest things pulling off of a queue stuff like that in reality most Services have traffic that looks a lot like this where the amount of traffic goes up and down constantly and depending on how heavy those spikes are for you the cost of spinning up enough servers to run

during those Peaks quite possibly very high and the cost that matters isn't if we have an exact fixed rate of traffic what's cheaper servers or serverless the question is how big is the gap between your lows and your highs in your traffic because this Gap suck to deal with the space between these points is the magic of serverless we don't have to care how high or how low our traffic is at any given time our cost will Scale based on it and if we have priced out our service so that the amount of money a

given user's request costs us is a reasonable amount then we can just kind of turn our brains off because the the painful reality when you're not building this way let's say you're we this is the 4 G gab server that cost 25 bucks a month if you are here and that is your Peak you're paying the 24 a month the moment you go there you just doubled your costs and the Gap from here to here having a 2X in your cost sucks and if you're ever going to worry about these things and your auto scaling

is too slow and you end up losing traffic as a result of a user having a page timeout because the server didn't respond fast enough or spin up a new server fast enough those costs can get bad both the like loss of the customer cost but also the over-provisioning so we go back to this breakdown and we look at a th requests per hour assuming each one takes one second oh turns out I did a lot of my math really bad here don't know why nobody corrected me it's only 720,000 gigabyte seconds so if you're

leaving a rude comment right now because of how big that cost was I hope you double checked my math before doing that because otherwise you would look really foolish cool all of a sudden this starts to look more reasonable Because the actual cost here would be 48 bucks remember that VPS we just looked at I let track of it you guys got the idea though it was about 24 bucks a month for the 4 gab server handling all this traffic through lambdas would be twice the price is that horrible and terrifying me but again go

back to my chart the moment you cross this threshold either a user has to wait for the new server to spin up or you decided to over provision so you're just paying for the two servers instead anyways but now let's play with the way that these numbers are distributed because I set a th000 requests per second that is 720,000 requests over the month we could say that those are happening in even amount throughout the day but let's change it a little bit let's say for our theoretical alternative we only get 500 requests per day we'll

do the math so 29 days 500 requests that's 14,500 requests for 29 days of the year but then one day or I'll say one hour sees a huge Spike and we'll say that spike is the gap there so it was 720,000 minus 1450 so we'll say that this Spike I'll even go lower we'll say to 600,000 request so one hour on one day gets this massive Spike to 600,000 requests we're used to getting 500 a day oh was my current math right because I did the four gig um look at the math I just did

oh yeah I did foret the times four I should plan these things out more before I do them live I once again recant because it was 4 gigs that'll be the number but I am going to change something for the sake of being realistic we're GNA make it one gig of RAM now because realistically speaking a server handling all of your requests with 4 gigs of RAM is comparable to a server handling individual requests with one gig of RAM it's probably over provisioned to have one gig of ram in those so I am switching that

back we're assuming one gig of RAM here so it is a slightly less powerful server but if each user's requests are using four gigs of RAM you can't handle concurrent request anyways so cool my math was right but yeah anyways so if we have our 1 gig of RAM 500 requests per day usually but then this one hour it spikes at 600,000 good luck handling that yeah and here's where things start to get fun if you are building this with servers and you need to be prepared to handle that massive of a spike your options

are quite limited if your traffic usually lives down here and most days it hangs out in this range but then for just a brief moment it does that do you know how miserable it is to provision for this because even if your load balancer code is incredible if the spike is aggressive enough you can't load balance it out so your options are now provision based on this potential Spike so you have your servers provisioned way more heavily but if you just went viral and your servers were not set up to handle that a significant percentage

of your initial users are going to hit a dead page because it cannot resolve and in order to make it so that that can resolve in order to handle that Spike you have to account for that by over specking your server by like 600x and that's the issue is how your traffic is distributed fundamentally changes the way in which these costs operate for you of course a load balancer can handle that how do you think AWS does it do you think a load balancer makes new servers when Amazon runs out of servers Amazon solves this

problem by being the most over-provisioned company of all time Amazon servers are already there they already bought them they're sitting there so yes if you want to eat a bunch of cost like Amazon does you can massively over-provision too but one of the cool things about Amazon is that they have so many people buying so many servers that it effectively balances out because if my service is having a massive Spike there's a bunch of other services that aren't and as such they can justify buying a much larger amount of servers because they can handle everyone's

spikes for them it's priced into their model but if you're not Amazon you should not be running a massive over-provision set of servers for those single spikes it just doesn't make sense woring about a theoretical random viral event seems insane look at levels traffic after massive advertising spikes uh everything I've ever worked on has random 5x plus spikes literally everything I've ever worked on in my life be it upload thing having someone randomly decide they want to move all their files over and moving a million files in an hour be it twitch chat where Drake

shows up in ninja stream and we go from I don't know 500,000 messages a second to like 15 million a second that's just real production workflows I would go as far as to say if your stance is well not everyone goes viral you don't work on real production workflows this is expected that was viewers but one viewer can send a shitload of messages yes levels has a person whose full-time job is managing the load balancing for his servers and as people are pointing out he also over Provisions massively if your traffic can have multiplicative changes

where it can be five times higher for an amount of time and the speed at which it goes from the low traffic to the high traffic is fast and these are the things that I was trying to figure out how to word it is starting to come to me there are certain characteristics that make serverless the cheaper and better option the big two are gap between high and lows for traffic velocity of change between high and low traffic because if we have this traffic Spike here and instead of it going straight up it does that

where it gradually increases oh Lo balanc are can handle that fine says generally High variance I like that phrasing if the variance of your traffic going up and down is high and the speed that it goes up and down is high too serverless makes a lot of sense and there's a reason Amazon uses Lambda in serverless as heavily as they do it's the right primitive when your traffic can spike in these random ways it is so powerful and a thing I think people fail to understand when they say how do you think Amazon does it

is that Amazon's services are largely built based on what they needed internally as I walked through at the beginning of this very long video they found themselves on serverless in a pretty logical path we went from here the old way to separating out our database so that we could scale our application code independently to realizing we could move the application code out of the application boxes but to go back here it's hard to make theoreticals for like how much resources would 600,000 requests in an hour take so let's use some real numbers base camp did

5250 requests per second 500 cores at Max load that's three boxes each running a Z5 192 core AMD chip for 5,250 requests per second okay so let's figure out what that would cost I'll even hetner it not many real production environments are using hetner because you never know when they're going to randomly shut your down is there like a header pricing page cool we do AMD you can only do the 48 core models for this 48 cores 96 threads he said he need how many cores 500 cores at Max load so I'll even I I'll

round down for him we'll say you need 10 of these time 12 it's 28 grand a year not looking good so far but € 2360 let's translate that to USD 262 bucks for one so 2,620 for enough for his traffic 5250 request per second cool so if we assume this is all evenly distributed dhh numbers v50 requests per second he said 500 cores cool and that's also for handling the peak traffic so if it's an eight hour work day Peak traffic is probably somewhere in the middle I think it's fair to say that most seconds

in the day it's probably closer to a tenth this probably even lower so let's work with that assumption I will I'll average it out and say it's as low as a tenth as high as D50 and we'll average this to 2625 which is is generous to say that is the number of requests per second on average I'm sure it's way lower and on the weekends it's almost nothing so let's multiply that by 60 for minutes hours and days and we'll say that's the number of requests per month cool now let's blindly assume all of these

numbers on the server list side obviously unfair too there's a lot of other things we might have or not have here I will say that one gig of RAM is almost certainly not necessary if all this is doing is grabbing values out of database and then rendering HTML I going to grab the request count here also 1 second per request is insane so we'll say it's 0.5 seconds per request 283 million requests a month and I want to be clear once again this is napkin math we end up with 70 million gab seconds let's do

the math quick now we have all the numbers I'll round up to 71 million look at that with relatively generous assumptions on both sides that I would consider quite Fair it would be half as expensive to run the same service on serverless because remember those servers still cost money on the weekends when no one is using them he could set up load balancing but he can't do that because he's buying his own Hardware it is less than half the price and I am sure you can poke a lot of holes to this and I'm sure

people are going to I I already know it's going to be screenshotted and put on Twitter people are going to say I'm really stupid to be clear if you are getting sustained traffic at this level or if in the like average day you're only going as low as like half that level of traffic absolutely go spin up a server for it but if your Peak traffic is that much higher than the weekend traffic using a system like this is hilariously cheap and if you don't want to have to spend all of the time figuring out

the provisioning cost performance balancing out all of the stuff which like by the way if this way of building was easier for them than this way they should go with that because $1,000 a month Gap is $112,000 a year if I could save my Engineers a meaningful amount of effort and make them happier by paying 12 Grand a year that's a no-brainer a companywide expense for that both of these numbers are really small relative to the engineers that you have at your company that's that's just reality and like I would hope most people are starting

to see the benefits of building this way it makes your applications more resilient it makes it easier to do deployments and roll backs in preview environments and all of these types of things it be really hard to convince me the developer experience of doing things the serverless way is worse than doing it the dhh way and in the vast majority of real application workflows that are facing external sources be it users be it external apis whatever else those have enough traffic fluctuation that you have to either set up really complex load balancing or massively over-provision

this is what the over-provisioning looks like and again relative the number of devs it's still quite cheap and when I was at twitch and we were doing complex video injest stuff this is the rout we went don't do video injest on several us it makes literally no sense to do that but we were spinning up servers that cost a lot more than $2,600 a month and weren't even thinking twice about it because it was still way cheaper than the engineers working on the thing in the time it was saving them I had some bad math

here that I want to correct for this video don't know how we will deal with that in post so we agreed the 23, 500,000 requests per month is reasonable this is the mistake I made yeah I did 60 * 60 * 30 it should have been 60 * 24 * 30 that divid 60 * 24 the actual number of requests is half not even so let's redo the math and like I was already being generous to the dhh side I made a huge error on my part 2835 472 bucks but serverless is so expensive H

God no I'm just annoyed at this point so to close this very long one out I'm going to reference a video of mine and it's pretty easy to find if you go to my channel you hit videos you popular it's this guy the real cost of AWS and how to avoid it it's about to hit half a million plays which will make it my most popular video ever the real cost of AWS isn't all of the things we've been talking about today it's the amount of time you're spending building the and if you can use

a service like Lambda or versell or netlify or Cloud flare or upload thing or Planet scale or turo or any of these things and it saves a meaningful amount of time for your developers it is probably cheaper than whatever it costs so as fun as it is for all of us to sit here and nerd out about these gaps these don't compare to payroll and if this solution here requires you to have two more Engineers not worth it if this solution requires you to have one more engineer also not worth it so be reasonable but

how you make these decisions stop in as much about these numbers because the gap's not that big in serverless not only is it not too expensive it's often cheaper than the alternative for any realistic traffic and scale to zero results in a much better development environment and experience let you do things you can't do with servers and yeah that's it should probably do a real outro for a video this long but I don't care let me know why I'm wrong in the comments until next time peace NS