AI models are in a race to the bottom. They're working as hard as they can to make them both as chea...
Video Transcript:
token batches this is what's happened in the two years since gpt3 came out costs have plummeted from $60 per million tokens to cents per million tokens things are getting wild and I want to talk about how we got here what this means for the industry and longterm how is AI going to change I guess everything going forward since these videos aren't AI generated I do have to pay my team so quick word from today's sponsor then we'll get right back to it really excited about today's sponsor because it's a company of loved for a very long time and they're giving me an excuse to talk about one of my favorite things Elixir if you're trying to ship quality fast and at scale docyard has you covered these guys know how to scale better than almost anyone I've talked to they've been contributing heavily to open source ecosystem stuff especially the Elixir World which as you might know I love there's a reason that Tech scales so well and why everyone from Discord to Whatsapp leans heavily on those tools they know scale incredibly well and I'm so excited to see them diving more into this AI world too these guys aren your usual AI devs I know that term has some specific connotations to it I know them as open source Champions doing incredible work in the ecosystems I've spent a lot of time in they're good friends and I can't imagine being bad hands with them not that you have to trust me by the way there's a lot of companies here you've probably heard about before like uh maybe NASDAQ or Doby if you want to work with the same Engineers that those companies reached out to for help they're pretty hard to beat they know everything you need to know for good full stack development be it all the crazy server side stuff I mean they're The Elixir guys of course they know that they also know know everything about modern tech Stacks I know they know react surprisingly well but they also are deep on Swift crazy enough they built their own alternative to react native in order to render Elixir apps natively on iOS whether you're trying to spin up from scratch or you're trying to make sure your existing Tech scales these guys have you covered you want to learn more you should definitely book a call it's free and they'll give you a ton of useful info make sure you tell them that Theo sent you thank you to docyard for sponsoring today's video check them out today it's wv. l/ doyard so what happened here especially like the three to 3 . 5 move where it went from $60 per million tokens to $20 for better inference to way way cheaper soon after that a lot of things happened the biggest one without question though is competition there are two core axes that are being fought on for these wars there is the quality of outputs there is price there's a bunch of other random things that they like to talk about that I care a lot about too like speed Pi capabilities I'll say UI ux and product features all these are cool all of these things like demo really well on Twitter and are super fun none of that is the model War though because these are entirely different things most of what goes on here I can do I can make real change here even though I'm not making models I'm not training on a bunch of gpus but I'm still able to impact real things here I can make cool stuff in this space but there's a limited number of people that have very expensive specific resources that are able to compete on this section here and this has had interesting impacts on the industry in particular it seemed for a while that the only way to make money in AI was to live here because if you weren't competing in this area but you built a really good UI or a really good product that used models from someone else open AI could just show up and do that and that was the Assumption turns out that assumption's kind of bad because now we're doing the opposite where people like me are coming in making a clone of open ai's product that's better still providing open AI models and the result is really good someone else pointed out another really good thing that they're kind of competing on context window this is one of those ones where like we can't do much from the outside same with speed so if we break these down I'd say the first category and only model makers can really change that for context window mostly model makers but we can hack it and then the bottom section here anyone can kind of do this and if we think of the model Wars in these three categories it'll make it a lot easier to process what's going on right now because this war has been going brutally for a while and the results are showing in pretty much every chart let's chart these two we'll start with the quality so we have time on one side quality on the other this has been an interesting race I'll use blue to represent open AI know it's their favorite color for a certain amount of time I'm just going to draw an arbitrary cut off here we'll say pre and post gpt3 kind of funny that we have before gpt3 and after gpt3 can label them b gpt3 and a I'm not going to make the joke anyways up until then there were cool things going on like autocorrect things like automatic translations and like we were making real improvements slowly but surely like the the quality of what a computer could generate was going up but gpt3 represented a Monumental leap in the ability for a computer to generate text given arbitrary input this was the from autocomplete to actually useful move with gpt3 and when this happened it wasn't just the model they dropped they also put out a bunch of research papers detailing how llms worked and why this model was so valuable and as a result we quickly saw others starting to build their own stuff and try to catch up we got a pretty Monumental Le from the competition relatively early we saw crazy things like meil showing up the Llama work starting bunch of random other open models that weren't particularly great but a lot of work started happening really quickly with the launch of gpt3 but as soon as the quality started getting even close 3.
5 dropped and with 3. 5 we saw yet another big quality jump this was also like the hype was going people were diving in on it and open AI was working hard to make things more efficient behind the scenes too so they could lower the price also they saw that the gap between them and the competition was closing especially near the end of gpt3 it seemed like those other models were starting to really catch up and this space was starting to close but every time they drop something new the space widens again we all realize that open AI is pretty far ahead and then things start catching up but two things have happened the amount of time to catch up has been going down and also the size of the Winds open AI seeing has gone down too 3. 5 to 4 was nowhere near as big as I and many others were expecting and 4 to 40 I would argue was even smaller it was cool for the price side which we'll get to in a second but it was not a particularly Monumental win in quality and as such this line has been catching up more and more and the amount of work it has to do to catch up is going down each time it's getting closer sooner then we got 01 and 03 which helped bump the quality again but I really want to think of quality in this way where where the quality bar is kind of set by open Ai and then the rest of the industry is competing to get as close as possible to where open AI is at any given time we have certainly surpassed 40 but I haven't really seen much of anything that's truly surpassed 01 or certainly not even 03 mini on high like the quality you can get out of those models is nuts clad is better at some things but it's not a a meaningfully better model overall especially when you consider the price for it but this is just the quality side though so let's take this and do price instead because price is a very very different chart 3 to 3.
5 was a huge drop in price I'll even extend this a bit to be a bit more realistic about how absurd it was because 3 to 3. 5 was a crazy drop in price the important thing to know is that as Alternatives started coming out none of them started that high and the Alternatives have been fighting this crazy race to the bottom with ups and downs of their own to be fair but the Alternatives were always price quite a bit lower and if I was to be more realistic probably looks like that and with each of the new ships that open AI does they do their best actually I don't think four was a big lowering in price initially can check that other chart that we had a 35 turbo was a huge drop actually so turbo wasn't a big quality wi at all but it was a huge drop in price think for was basically the same price at that point it's almost like keeping three in here makes the chart unreadable open has a weird ability to make charts unre I spend a lot of time on artificial analysis recently which is a place to take a look at different models and how they compare and what their performance looks like here's a bunch of the like more popular wellestablished trusted models right now the thing I was saying about open AI is that this chart intelligence versus price 01 is so hilariously expensive that it throws this chart off that said 0 One's price is half of what GPT 3's price was at launch and if I turn off 01 here here the chart suddenly becomes useful again I just thought it was really funny that 01 is so skewed in price that it ruins charts by having it on there it gives you a whole different story when you add and remove that this also shows why I'm so frustrated with Claude because the price relative to what it can do is not there anymore yeah so if we put oath one in here like it's going to be here is not necessarily useful for the chart I'm trying to draw here of the race to the bottom but also if we compare 01 to I don't know 03 hilariously cheaper similar quality to go back here 03 mini actually fits in that quadrant of like well priced 01 is literally what 10x more expensive let me go to my chart here yeah 03 mini $110 cents for input 440 for output 01 is literally more than 10 times as expensive insanity but there's a reason that the first thing open AI put out after 01 was 03 and specifically it was 03 mini when 01 came out they put all of the models with it out at once when 03 came out it didn't come out 03 is still not out only 03 mini is and I think that happened because of another scary thing over here 01 was the first major quality leap we'd seen in a while and they did that because it was the first like major reasoning model and this seemed like it would be hard for the industry to catch up with and it was until something interesting happened R1 got so close to 01 that it definitely set some fear in over in the open AI world but what was much crazier is despite 01 going up with price we're going to zoom in to see where we're putting this line deep seek stayed in that cheap range so 01 to get that quality had to 10x prices deep seek got to stay in the cheap range and also be an open model which also as many of us saw and probably even felt caused the stock market some pain as well seems like the way they made R1 so cheap was by taking it out of the stock for all the companies were invested in real talk though there was a strategic decision made here by open AI you might notice if you look at the 03 and the R1 numbers closely they literally strategically picked exactly double r1's costs with O3 mini if you tell me that o03 mini wasn't clearly a move by open AI to make sure deep seek wasn't going to destroy them it's very very obvious that is what was going on here cuz this number doesn't make any sense relative to the other numbers they charge it's not a fraction of one of them it's not close to any of them $44. 40 is a really weird amount to charge unless your competition is $220 so you're doing exactly double promising more quality way easier to host cuz R1 still sucks to actually run it is clear now that despite how in the quality World open AI leads the pack and everyone else fights to catch up price is the opposite where everyone else is leading and open AI is forced to catch up when crazy things happen like what we saw with R1 R1 was so brutal to what open AI was trying to do that they were forced to release 03 specifically that mini model unprecedented for them to do a mini model first and to lower the price as much as they did and to like foot gun themselves like that cuz 03 mini is pretty close to 01 in quality the only reason they do that is because they were scared of how this red line looked and if you were to draw this as like a gap where we took the price and put it at the bottom here I'll make it green for price and put it here relative to the time of all these things releasing the gap between industry pricing and open AI pricing for the best models and quality there this Gap was insane and if we compare it to where deep seek was yeah this is what happened this space here the quality to price that ratio has gone insane over the last few months in particular it was going down but the race to the bottom there has been nuts to watch and to see something like Gemini come out at the lowest prices we've seen from any major model with quality comparable to that of 40 is just insane the reason I took the a i pill and started building T3 chat was that I was so impressed with deep seek V3 I saw for the first time like oh that's literally 20 times cheaper than Claude at a similar quality level I want to build things with this wow the Deep seek site kind of sucks I want to build my own better one and that spiraled to a relatively successful AI app this race has been nuts to watch and it made me go from sitting on the sidelines to participating and that's what's been fun this has had an interesting side effect though which is that the moat these companies had is starting to erode I have another video I'm planning on doing soon I might even record it today funny enough all about how the rappers are the winners here because none of the battle we're talking about here none of this helps open AI this hurts them their margins are rapidly going down and their lead in quality across the industry is rapidly closing their advantage in quality is dying fast and their advantage in price is non-existent they're fighting their hardest catch up in price while at the same time releasing a $200 a month subscription and losing money on that by the way it's not like they're arbitrarily pricing these things really high is they've been so focused on raising quality that price wasn't as much a concern they dropped it where they could because the industry Trend was scary but 01 shows that they weren't that concerned they felt like their quality lead was great enough that they could price it however they wanted and honestly they kind of could but that's dead now they can't keep fighting the battle at that position the quality gap has now closed enough that they can't do a 10x multiplier on price certainly not the 100x plus they're doing it compared to some of these other options that position isn't tenable anymore and if that was just their margin that's one thing if they can just erase the margin insane thing we are currently losing money on open AI Pro subscriptions that's the $200 a month tier that introduces 01 Pro that model is so expensive for them to run that just the GPU and inference cost alone is greater than the money they're making from the subscriptions which is insane their models are actually that expensive to run so their margin here isn't crazy their costs are and now that we have these open models that are close in quality now we have companies like Google competing with quality and demolishing with price things are changing fast and if it was really hard to switch between models they'd have a moe like let's compare this to something like AWS the web services that most of us use for the web if AWS came out with a crazy price that was super high and suddenly the industry spun up a bunch of Alternatives that were way more competitive in price it would still be hard to move off of AWS because you've built your iner on it you can't just click a button or change one line of code and move to a different provider you can literally do that with these AI apps here is the actual code for T3 chat for rendering with the right model and picking the model for when you're actually generating the text and if I look for where this is being consumed in stream text this is where I would have to change the code I can change which model I'm passing here I can change a lot of other things I could even just put open AI for mini and if I had that imported this would just work cool now that's running 40 mini let's change it again let's change the default model I don't know what do we want to use today use Googles let's just the Google Google sure if we want flash to cool now the model's flash to it's actually that easy when I changed the title generation over in T3 chat from 40 mini to use Gemini it was a oneline of code change there is no moat in the model provider world and that's awesome this is going to be a crazy crazy competitive industry it'd be like it's less like car pricing more like food pricing if a new grocery store opened up right next to your grocery store and everything was oneth as expensive in pretty much the same quality you'd stop going to the old grocery store since these things are so Expendable since these things are so consumable so easy to switch between open AI has to fight hard in order to stay on top and it shows the industry has driven the price into the ground and this is an article from November of last year it's gotten crazier since to be fair you won't even be able to see it cuz the $60 per million tokens number at the start here is so insane that that the scale makes it hard to appreciate that we're seeing 50% wins consistently the price of this has gone down by more than 50% year-over-year for the last three years now it's kind of nuts sorry that's not even fair it's closer to 10x decrease in price every year right now insane and that lines up too if we go back here and we look at like 40 mini groundbreaking in price but next to 40 like 40 still was cheap at the time 250 in 10 bucks out I prefer flash to 40 it is a 20th the price that's insane basically what I'm saying is anthropic is if they don't get a new model out way cheaper soon they're done they're riding off of fumes right now open AI has been cornered to the point where they can't just keep dropping new expensive the expensive stuff is no longer going to be models it's going to be products so instead of just dropping 03 at an even higher price like they could of they put it3 mini for cheap and they make a new product that's an entirely different world like deep research open a I realized this space sucks to compete in if they are looking at the three Wars to fight in the space the war they started with sucks and they don't like it anymore in this war yeah they made it a little bit faster they don't really seem to care about competing here either this is where they want to win open AI seems a lot more focused on product right now than models the model Wars are making them look bad enough that they're talking about this more again but they're releasing pieces are here things like deep research things like the scheduling product things like the operator that can control your browser they want to compete there now because this has been so heavily commoditized that they can't make as much money here anymore and they know that the writing's on the wall this war sucked and this war is also more fun to compete in it's less expensive more flexible you have Engineers working on things that are much more fun you don't need a bunch of scientists in inventing new mathematical processes to try and make gpus 20% more efficient 5% of the time so I'm curious to see what this ends up meaning for the industry my hypothesis is that companies like Claude and open AI are going to stop competing with each other and are going to start competing with people like me people like perplexity people who are building the product that takes advantage of these models because the model Wars aren't a place where you can make as much money anymore and with this race to the bottom even if you make a groundbreaking model that's 10 times cheaper and 99% is good someone else will make something 11 times cheaper and 99.