DeepSeek R1 - Everything you need to know

2.95k views10439 WordsCopy TextShare
Greg Isenberg
Ray Fernando, a former Apple engineer, gives an in-depth tutorial on DeepSeek AI and local model imp...
Video Transcript:
Ray Fernando on the Pod he's a 12year ex Apple engineer he streams AI coding he's building an AI startup in real time I needed to have you on because what are we going to talk about today Ray today we're going to talk about prompting and we're going to be specifically prompting with the new reasoning models with deep seek R1 and there are a lot of caveats as far as these models cuz they're now able to think and reason and what that can do is can even lead to superhuman capabilities uh and so what does that
mean is that uh these models have now become so Advanced um and this specific one from Deep seek uh is out of China and uh what that allows you to do is basically um like they've made it open source so it's available for uh us to study uh but it's apparently also on par uh with chat gpts like 01 model 0 one's reasoning model mod um and why it's taken the World by storm uh is because the fact that um it's also free to use on their website so [Music] jeep.com and um when we say
free uh there's also a little bit of a caveat uh if you don't really know so I just want to also kind and cover a little bit of the architecture um today and and explained you know what you're getting into if you use uh something like deep seek uh and then maybe uh how you can also run this in something that can run like in a container like in North America or in some other area because your data is really important especially if you're doing anything for business uh and then also the the third like
kind of secret bonus there would be uh how to actually run this locally on your machine so you can get the capability of these models uh and you know run that locally for your own private businesses whether you're a lawyer you're a doctor or whatever um there's a lot of different implications that you probably want to look into too so I think that uh this episode's going to be super helpful if you're even just beginning and and you don't really know some of the advanced stuff or no code uh that's okay uh it just takes
learning English or using English to describe these things uh to get the output and the intelligence of these models to do some really cool stuff so I'm pretty excited all right let's get into it cool excellent to start out uh to use these models you have a couple of options and one is going directly to deep seek.com and this is actually currently hosted in China so a little bit of uh background here is that your computer is here like it for example I'm in North America and if I go to deep seek.com or download the
app from the App Store uh the app will actually be talking to uh a region over in China and uh for what it may be uh whenever you send your data over to uh another country they have their own rules and laws and regulation so I would be very careful as far as anything you put into this system uh as far as if you have any s sensitive data because it would not belong to a region that you know you may live in or have control in uh there are other Alternatives which we're going to
cover uh which would be using a web UI and going to these different API providers like fireworks or Gro uh and then we're also going to do something like uh covering running something locally on your machine so it doesn't go out to any of these providers and you can even run this uh if you're flying on a plane which is really exciting so uh as an example for deep seek uh we're just going to do this because uh this is something that's currently public information and I don't really mind uh having this stuff sent out
so as far as prompting one thing that I I frequently do is I have a live stream and I basically transcribe my um videos and stuff so I basically you know I made a little app that will transcribe videos and I just take my live stream here uh and just run it through the transcriber there and what it will do is just generate transcripts from the video and it usually does it pretty fast it processes uh on my device and then it sends it up to uh grock for the endpoint so when it's done looks
basically something like this and you're able to uh copy this transcript uh and put it into something like deep seek if you wanted to and so in order for you to use the model what you can do on deep seek.com is to just go ahead and click where it says uh deep seek so when it turns blue that means the Deep think is enabled and that means uh it's also there and you can if you want to enable web search you can do that so we can probably do that for the next prompt here so
I paste in my transcript here I hit shift and I hit enter a couple times and so what we can do is give it additional instructions for uh it to do what we want to do and one of the things that I like to do is um I actually have built a little prompt that I'll actually share with y'all uh so that you can actually do some analysis and generate a blog post off a transcript and so that is actually located here in my little uh notion thing and so one of the things I have
is uh we're going to coup cover how to do some of these prompts and stuff and I'll actually show you how you can generate some of these Advanced chaining prompts because this is will really take advantage of these models to think through all of that text and do some work on our behalf U so this is really really cool it's basically like hiring an admin to go through all of your stuff uh and and make things for you so we're can go ahead and hit submit and when we do that I will say you know
I wouldn't put a tax return on deep seek.com it's not the type of thing I would put on so be you know you do want to be a bit wary of what you're putting on when you're on deep seek.com now I was playing with perplexity earlier and perplexity actually has some of these models built in um but it's hosted in the United States of America so that's that's a bit different that's correct and there um you you may want to ask your app providers what they do one of my favorite apps for coding is actually
cursor and I asked them hey where do you have have your deep seek model hosted and they told me they use the fireworks API and that's you know actually not in China so that's great so it's like okay cool um that's awesome and they're using the full model quick break in the Pod to tell you a little bit about startup Empire so startup Empire is my private membership where it's a bunch of people like me like you who want to build out their startup ideas now they're looking for content to help accelerate that they're looking
for potential co-founders they're looking for uh tutorials from people like me to come in and tell them how do you do email marketing how do you build an audience how do you go viral on Twitter all these different things that's exactly what startup Empire is and it's for people who want to start a startup but are looking for ideas or it's for people who have a startup but just they're not seeing the traction uh that they need so you can check out the link to Startup empire.co in the description uh so these models have these
parameters that you may hear of uh and like you know like the the really large parameter model like 600 uh billion plus parameters uh just means that it has more intelligence to leverage and uh it tends to take longer in its thinking but the results are really really really great um and some of the models that will probably run locally on the machine a little bit later they're going to be like distilled so you just basically take the essence of it uh and then those models basically are going to run they run a lot faster
and they're just as efficient uh but they may not think as long or they may not give the results and it's really up to you to try out uh which I highly encourage so one of the problems is that sometimes the server is really busy and um that can happen because right now it's so popular and I probably after the publishing of this video it' probably be even more popular uh so you can hit this little pencil and you can hit send again to try to resend it and so that's kind of where um I
thought well if I'm sending this over and there's a bunch of reliability issues why don't I try to do something like you know that I can host my own or just hit the API themselves if cursor is doing this why can't I do it as well so I can actually show you a technique for how to do this so you can hit the API and so you don't send your data to China uh and that actually involves using this thing called open web UI so I can show you that so while this is thinking uh
if it even returns results or does anything here uh we're going to go ahead and pop on over to the other side so in the other side we're going to go to here and so I have an instance of what's called open web UI and it looks very similar to like a chat GPT and uh to get this set up I'll probably go through a little bit more details but I'll just go ahead and show you what this looks like so in here I have the model selected I can go to deep seek and so
what's great is that you can um connect to an API provider uh and I'm using fireworks AI so fireworks AI here is uh currently hosting deep seek model and they allow you to use the model just by um you know using it getting an API key uh and then putting in the the exact model string and so forth here and so from here um if I go to the open web UI uh I'm able to select them and say okay this is my deeps account I'm going to go ahead and just paste the exact prompt
that we had pasted in with my my transcript and everything here and I should be able to get everything out so let me just double check here um that I got everything so it's still timing out here yeah server is busy try again later so yeah that's not fun so what I'm going to go and do scroll to the top and hit this little copy button and then go over here just make sure I put everything in there I have the whole transcript yeah so the whole transcripts here and so what's going to happen here
is it when I hit send it's going to send this off to fireworks Ai and uh what's great about this um thing that's actually running uh in in this open web UI is that it's using the API and it's not actually sending the data to China so just for um as as it's doing its thinking here and showing us what's going on I'm going to go back to Overlay this model of our data in our container and kind of show you what this is looking like in the background so here in t draw we actually
have our Mac and PC and so I used web UI here and I'm actually using the fireworks API so I'm going to the cloud and this cloud is located in North America so the data basically resides here in North America and and it's going to be delivered back to my device so that's what we're doing when we're on the Deep seek website we were going out to the China region so that's just a a heads up of kind of how that's working in the background and so um next I'll show you the difference of what's
going to happen in like the speed difference with what grock hosting provider provides uh and then uh a little bit later we'll get into a little bit of details for how to get this set up uh so as you see it's kind of outputting this stuff here because these models are still so new uh these web apps are still adjusting to uh take a look at the reasoning stuff and so what I'm going to go ahead and do is hit this little pencil up here to the very top I'm going to make this a little
bit bigger so you can see and then from here when you hit this little pencil it creates a new chat uh at the very top you can select the model dropdown I'm going to type in deep seek and the one that I have set up from Gro is called the distilled llama 70b um and so this model um is actually like a smaller distilled version that they're hosting but it's incredibly fast using the gro API and so if we hit this here it seems like nearly Instant by the time um all the stuff kind of
starts finishing so we'll see this uh model actually going out it's doing its thinking and it's actually providing the response just like that super fast so um if we take a look this this has thought for a few seconds and actually shows us the reasoning that was going on so this is actually going through my transcript trying to really understand what was going on with my transcript this was an interview with ldj who spoke a lot about deep seek and technical details of things that you know I really couldn't remember and so then it basically
makes this like you know very simple blog post um so we'll see if if my other um one that was running the the larger models finished and you can see the difference between the two models a distillation model is just going to give us like a small little blog post versus the the full model that's also running on fireworks API is actually giving us quite a bit of detail uh it's going to take more time but take a look at what it's doing right now so it's above here was all the stinking stuff but this
is actually now doing an analysis on my transcript uh and generating a really nice blog post uh and so it's telling us about the calculations that J talked about in the Stream uh geopolitical implications of what's going on with the new AI arms race uh also future predictions and we talked about these details in the live stream and it literally picked them up and is now creating a graph from this this is how crazy these models are if you can really think about it uh and here are some key takeaways as well so that's um
an amazing thing and I'll I'll be able to share these prompts with you and um you know so that you can actually run these analysis on your guys' own trem scripts as well um yeah so here here's the SEO enhancements and final thoughts yeah so when you're in when you're in business or you're building a startup having an unfair Advantage is so important right like being super efficient and being you know keeping your cost low creating your product to be the best possible thing now we're in this new deep seek world where the model you
know I call it a deep- seek world but it's really it's a llama world it's a deep seek world it's a world where if you figure out the model that works for you and the tasks that you want to accomplish you might be able to out compete whatever whoever you're you know you're competing against now I've done a similar prompt on chat GPT and with some of my YouTube transcripts and it's not unusable but it's more of a thought starter it's like oh okay like I can take most of this and I can rejig this
and add this and add that and probably get to a blog post that is good but it does require a lot of human energy to go and do this when I see what's coming out of this what's really really mindboggling is the fact that it almost looks like just quickly scanning this this looks pretty pretty human level incredible like a senior writer would do something like this yeah or research in engineer that you hire to really thoughtfully take a lot of notes spend a lot of time analyzing and like put together how you would want
to report and it's even more incredible because these instructions can be configured so if you want like a graph or you want to type a thing we can take this prompt and put it into you know deep seek itself to say can you give me this type of output instead and it'll it'll do that for us you know it's like how do we improve the prompt or what do you want to see from your your outputs all the time from your live streams uh and and like I think the thing that I've seen this is
kind of the biggest breakthrough that's happening is that uh with I'm seeing this also with 01 Pro by the way uh 01 Pro and uh deep seek reasoning models these reasoning models spend extra time and actually pay attention to your instructions and so every little detail that they're seeing they're like oh yep I haven't done that yet okay let me go ahead and make sure I still do that and that's something that I super deeply appreciate uh and for me it's worth the extra 200 bucks I pay a month to open AI but this is
really quickly turning my head and like oh my goodness could you can you understand like what just happened here it's like I'm a little still taken I'm still taken away at this output like you're saying it's very detailed and um it's it's to me I I feel like this is totally a GameChanger and um I think what one thing that people aren't really talking about right now is actually this additional rush to understand who can host this in order host these huge like 600 plus billion parameter models you need all those gpus you need services
like fireworks uh grock is trying to spin that up you know grock was able to get a distilled model uh there's just so much demand there's going to be even more demand for these chips and um yeah this is just the beginning and I'm trying to figure out you know uh who who which provider can host this for me reliably so that I can do this for myself but also share back uh and and put this into apps for other people as well because this is going to be uh this is great and I don't
want the data to go to China I just want the data to stay in North America or if I get a European container I can do the European container and meet all their legal requirements that need to happen for that as well so um that's super exciting yeah what's the uh cost the pricing for for fireworks yeah the pricing for fireworks we can look this up real quick I think it's about $8 per million tokens uh where normally I think chat GPT was like 15 input and $60 for output for 01 Pro I can just
double check that real quick so pricing let's see uh yeah from from what I remember it's it's cheap it's like significantly cheaper Thana going to 01 Pro yeah so uh 01 Pro uh o01 uh API cost yeah and pricing and this is going to add up right like you might you might be like oh yeah who cares a million tokens but once once you add this to your workflow and you're pumping out content or you're doing research you know on an ongoing basis or you've built a business around how to do this these tokens will
add up that's exactly it yeah they'll add up um also at the same time open AI is currently promised that the 03 model will come out and the Mini model will come out which would be on par with this model so that prices will also probably significantly drop as well because they just get more efficient with time and so that'll be really interesting to see and um you know I'm rooting for it because uh for me as a consumer I want the power of all the intelligence and um to to do these types of things
I think it's going to be pretty important and I think um to that note I think it'd be interesting to show how um if anyone hasn't found out about this it's this thing on openi thing called the platform. open.com so you just sign up for accounts or developer account it's a little playground and what you can do is actually hit this little generate star button and so we can describe a prompt that we want for any type of model and what I will do is reconfigure the prompt for uh the language model so that they
can actually be more efficient at doing something so if there's a task that you do quite a bit you're like uh I just want you know please like make uh you know keywords for my Amazon listings and then hit like generate what it'll do is basically um whoops sorry if we hit generate here and hit this uh at the very top right uh and hit update what it will do is actually uh reconfigure this prompt just from a oneline type of thing to include more details so we can take existing prompts and try to improve
them through this mechanism so as you can see this is like how people get really nice long chains of thought or Reason or outputs so one way to think about these things is to first kind of put down like what instructions you want what type of output do you really expect um maybe what you don't want as a as a good starting point and then that'll help you generate prompts that can be a little bit useful um a lot of the prompts that you're seeing that I spit out are basically things that um have come
out over time because of you know of my use cases it's like okay I want this instead of that um and so as an example uh one of the things that we I was thinking about was like how do you verify like claims for a specific type of thing so for if you have an article and how do you understand if something is actually true or not um so I have one here this says information verification and so um one of the things that's um these these models that I'm Cur I was currently showing you
is that uh the fireworks and the gro uh are specific API endpoints and right now there isn't like a specific web search thing that's currently tuned into them so if you want to do web search you have to go through the Deep seek.com route or the app uh and keep in mind you're also sending data uh into this this container there um so like sometimes you could just use it for public articles for things that you really don't care about so if you're on deep seek U let's just see if they actually have stuff available
here for us so you just go to the search thing turn that on I paste my prompt in and then um what I'm going to go ahead and do is like maybe grab an article like the uh techno Optimus Manifesto from uh mark hron uh it's a very popular article and sometimes you know it's really long to read there's like a lot of information here and you're like oh my God there's like so much in there right it's like uh how can I even get started with this thing and how do I even verify the
claims of this stuff uh and I think this is probably sometimes like a good thing to start here so um what I do is um I hit shift and enter and I put the like article at the very top and once I do that I just go ahead and paste that in there and then just go ahead and hit send so what that's going to do for us is going to use the web search and try to like look through uh just like how we saw earlier that was happening with the API uh every type
of claim that's in there try to see if they can search the internet for it um and try to see if it can do anything so this is going to try to do its thinking thing um this is obviously very popular and you know deep seek the website is is getting flooded uh with people because of basically it being free and um so that yeah just keep in mind like U Greg says like I yeah I would not be putting taxes in there I probably wouldn't be putting medical records in there uh things that you
know you don't want to see generated if somebody asks a question that's related to you uh because that can be a little crazy uh you'll be like wow all of a sudden my data is showing up somewhere uh that was not expecting so um yeah this is this is currently airing out uh it's no surprise and this is kind of why um we're kind of thinking about doing some other Alternatives here so um yeah I think that's that would set there so I think um another thing maybe that can be useful here is probably getting
this stuff set up so if you wanted to run this locally maybe we can kind of go over that workflow what do you think I I would love yeah I mean selfishly I would love to know that okay okay cool awesome let's do that yeah I think that that'll be great so uh in order like for this section in order for us to run this model locally the best interface that I found Bar None is something called open web UI so so open web UI uh it's really quick to get started all you need is
to download Docker so just go to doer. and you can just download the desktop app so download the one that you need for your machine uh if you're running an apple M machine which I am doing I just download the Apple silicon version uh and and accordingly so once you get that installed uh it's going to present to you a user interface like a dashboard here um and so that's kind of something that you'll have um you know kind of going mine's is already showing the app is running here um but that's how you already
know it's installed uh it will require the terminal but it really won't hurt you too bad if you have it running so the First Command that you will basically run is this one that's uh listed on their quick start so the quick start will be listed here and we'll have this available as a guide for those to download um and all you have to do is follow these two steps really um so the first step is pull the container so this is you just copy this and then you put into the terminal and then it's
going to do this little pulling thing and it'll probably download like you know several gigabytes of files onto your your machine uh and then the next step is literally what they call running the container so uh with Docker the whole app and everything is all contained in one so that way you don't have to spend a bunch of time doing extra terminal things this is probably the only two terminal things that you will run if you're running a PC you probably want to run uh especially for NVIDIA you want to run this command gpus Dall
uh so all you have to do is just copy this one if you're running Nvidia uh and that'll take advantage of your GPU uh and you know it'll run more efficiently when you're running it locally um so the one I I like to do uh is just the single user mode which doesn't require a sign in um that way if you're the only one that's using it at your house or on your network that's probably the best way to do it uh so you just copy this command here and then you put it into uh
the terminal and it'll say Hey you know great it's up and running uh and then all you have to do now is just go to the website Local Host 3000 and then um once you're running on Local Host 3000 you're going to be presented with you know some user interface thing like this um and I have a model that's currently uh loaded here that's kind of why it's showing us here but you you're not going to have any models loaded so in order for you the next step here that we have to do is actually
U have a couple of options one of the things that I do is like you can just download a model locally and I use a thing called AMA and so ama.com is uh something you'll want to download there and so that way you can run uh any local model and it's simple as just you know finding the model and so forth so once you hit download it's going to download for your machine and you install it you'll see like the way that it's downloaded you'll you actually see this little um llama guy that's at the
very top uh and the little figure there and that's how you know it's currently running uh and so once that's currently running there um you'll see the models that are currently listed here uh in the model section at the very top and so deeps R1 is going to be the one that we want to use here so we'll go back to our web UI instance and then we're going to go ahead and hit where it says user at the bottom from there we're going to go to the admin panel and from the admin panel um
there's like a section A settings area and so this settings area um has an area of our connections with a little Cloud icon and this is kind of where we're going to connect our other providers here so let me make this a little bit bigger so everyone can see so as you can see the olama API is already configured for us which is nice um and this is already going to have the the docker container there um which is great and so when you hit the little pencil here and you hit plus uh add uh
change the model you can type in the model like deep seek and if you don't see it available which um it may not be there you'll see this option at the bottom that says pull from Deep uh pull deep seek from ama.com and so that'll actually search olama here uh to get it for you so like for example if we wanted to download the fe4 model I'll just type that one in just as an example so you can see V4 so I have no I don't have that model currently downloaded I just can hit here
and it's just going to go ahead and find it and it downloads it so uh in in no time basically this model would just be downloaded on my machine and then I could just type in fe4 and I'll be able to use that uh in the future going forward so what we can do um is go ahead and type in um you know we're starting a new chat and we're going to basically select the model uh deep seek D R1 and you'll see um it'll be colon latest is kind of what it's listed that and
that's actually how you know that that's the one that's currently running locally and so once you select that one you can just say something like explain options training and then go ahead and hit enter and so what this does is you know it's basically um you know thinking and you can see the thinking tokens of what's going on when it's thinking and so all of this is actually running on my computer which is amazing um one of the ways that I can tell is um there's this command line uh called ASI top and it actually
shows us uh all of the resources that it's eating up uh thankfully I have 128 gigabytes on my machine because I could I do live streams I do this all the stuff at the same time and you can kind of see how much RAM it takes up right now uh with me hosting the stream plus u you know running this model locally um so yeah this is uh the actually what it does here one of the things that we could even do is try to test that prompt that we were using earlier so that we
can run this command locally so earlier what we did was we were running like a whole analog is on something um and it would just fail out so this thoughtful analysis that I was showing you we can try to see if we can run this on a local model and just see the difference as well so uh this is basically the transcript that I had earlier plus the analysis stuff and if I go to open web UI and then just go ahead and going to go ahead and um go back here and create a new
chat and hit paste and then hit run so this is going to see uh it's thinking here and it's using up all the resources on my local machine to run this model and um it's quite a lot of tokens uh and it's it's still fairly impressive what a smaller model can do that's running on my machine and um you'll have different versions that you can use I think um and so this one's using the 7 billion parameter model if you get something that's a little bit higher this probably going to get you a little bit
more detailed response um and I would definitely play around with these things another important setting I think that you can tweak um and we can probably run this as a next uh chat uh is while this is going here there's control section so this control section at the very top will uh show us let's see if I dismiss this so the controls one of the controls that you probably want to change around to get different results is the temperature so setting the temperature from like you know 08 the default to like a lower temperature will
actually make it like hallucinate less is kind of what people say uh and so it'll tend to follow instructions better and and not kind of kind of Veer off into different tangents uh and then another one if you go all the way to one it'll just be extremely creative so you can think about those as far as maybe if you're doing some creative writing some non-logical reasoning um that can be really helpful if you want to kind of think out of the box and have it kind of go into La La Land it's really uh
up to you and your content but I would definitely do two different responses with different temperatures uh and test those things and see uh if you see any difference in your output um for me sometimes I find the temperature zero to be very helpful for very logical reasoning uh outcomes especially around code um but it's it's really up to you it it kind of varies and I I just kind of want to give you a heads up on that to that's I appreciate that uh to me I would rename that temperature as you know wine
versus coffee mode love it wi wine might get you a little more creative um but if you know if if you want more rational execution style maybe you want coffee mode I think um that is something that I you know we have a LCA it's our design firm for the aih we have a I feel like that's what's missing from a lot of these AI products is just like a little humanity and and and lightness um right so I expect over the next couple years we'll start seeing uh you know you know what would be
funny to basically have like a spinner where you you can actually flick it yourself and you kind of see it land on something and then just like hit go totally that would be cool because sometimes you really don't care right you're just like I just want to spin the bottle and see what happens like totally it's just kind of YOLO mode kind of yeah yeah yeah because I think like like you say there's huge opportunities in the AI space to be playful uh and and I think that's what's interesting is you have these intelligence of
the the models uh and then now you have to have people who build interfaces to interface with them and there are a lot of companies who are trying to do that and you know you can get very far with just some prompting as we're seeing here uh and then we're trying this exercise here is to try different models so if you think about it ama sort of the Gateway um you know to all these different types of models that you can try out and see if it even works for your use case um and and
this web UI is actually a really nice user interface to keep track of that uh is saved locally on your machine you can go back to them at any time um you can there's additional options at the bottom here which is really nice so um you can actually have this read allowed to you so uh if you're a person suffering maybe from dyslexia or you actually prefer audio you can have that for you uh this will give you some information there you can continue the response sometimes if you have too much information it still needs
to continue going so you hit the continue and it'll just continue on uh or regenerate the responses um so that's that's kind of some of the the basics there so yeah um so this is the output of this model and I'm fairly impressed for being a 7 billion parameter model at running locally on my machine uh that it took that entire transcript and did this analysis type of thing that's I'd say is pretty close to the bigger model and um in in terms of details is not as detailed as the other one if we kind
of take a look so like the the previous this is one that it came out with before you know with this nice big blog post type of thing so um it's it's pretty good and it's it's running you know locally I can run this on the plane um you as as far as that so yeah so to get um get started basically again it's just open web UI there is a getting started it's literally a couple steps to run make sure you have Docker installed there uh and then AMA is going to show you all
the different models so if you go to the models you'll see kind of stuff that's popular and trending right now uh and that'll kind of get you some of that as well as far as getting started uh there is um you know we also talked about fireworks AI so that's fireworks it's a good resource for you to um you know go take a look and put that model in so like if you want to put that model into your olama you would kind of do the same thing here so go to user and then you
go to admin panel and then you would go to settings up here and then from the settings um you're going to go ahead and hit connections and so what you'll do is go ahead and hit the little plus connection and so you have to put in the base URL and you'll also have to put in the API key so the uh base URL here for fireworks is um this specifically here it says api. fireworks. a inference V1 in the example documents you'll see SL chat completions and things um you don't need those because that's part
of the open AI framework is that uh you just put everything up to V1 uh and then you'll generate an API key from that model uh over in fireworks so AI so if you go to the model here in fireworks and you go to the um your name and then um you go to API Keys once you go to API Keys here you just hit create API key and that'll pop up and that's the key that you want to put in there um similar to grock Cloud you just go ahead and hit create API key
um so once you go to console. gro.com uh there's an API key section here and then you'll want to hit create API key and that'll pop up a dialogue with those API keys and so that endpoint will look something like this over here so that'll be um if we hit configure api. gro.com [Music] opv1 and then you put your key in there and you don't have to do anything with these IDs these will be pulled directly from that endpoint so whatever models you have available will be there and so now when you hit the plus
sign you'll see like this nice list of models from fireworks so there'll be the fireworks on so account fireworks you can play with any one of those uh and then the other ones that are just with the normal name are from grock um so they have those those is available for there so you can you can play with a lot of these models which is nice uh and compare them and then the ones at the bottom are the ones from olama uh and so they'll show like you know the colon latest is kind of how
you can tell and if you hover over them you'll see like um some additional information over the parameter count what quantization level it is so Q4 means it's um quantized to four bits and um that also has a play in its intelligence obviously the higher level of quantization you know um means more memory so like 32 bit 16 all the way down um so the like the the lower the number the like not less intelligence but you may not get the output that you want is expected so that's kind of part of that process it's
a a lot of different things here but I think uh the most important thing is just um yeah how how to host this locally how to start playing around with it um and that's kind of like a really good primer to get started for doing these models and stuff yeah I love it um I don't know if you've played around with it but is there any way to do this on mobile like could you play with local models on the mobile device yeah there is an app called Apollo have you um heard of that Apollo
yeah I just haven't used it AP see I don't think app store app store I'm GNA see if I can go here Apollo let's see okay private local AI yes so I have this app on my phone and they allow you to download the models directly just like you would with AMA as well but it's it's has its own interface which is really nice and so um I wonder if I could share my screen I think I can so on your phone yeah yeah they have a phone mirroring yeah exactly so Apollo okay oh I
have to lock my phone okay cool so I lock it and then it should be able to connect okay cool nice awesome so yeah let me kind of minimize this here and um yeah okay um let me just go to a different screen here probably one that's less cluttered and do phone whoops put that over here cool maybe this will work I think yeah yeah sweet yeah I actually have the yes uh another place to get your models apparently is also through open router um and so yeah so this is kind of the Apollo app
you're like okay cool I can can I start chatting with this you know as soon as I play with the thing uh a couple configurations you have to do is you hit this little um hamburger menu at the very top left corner and then you hit settings so on the on the phone app you hit settings and it's going to say AI providers and when you click there uh you have three different options open router which is another uh API provider and you can also get access to pretty much every model there which is also
very handy uh I think they give you some free credits but then you would uh put you know your credits there uh and then you have the local model and then you have custom backends so uh with the local model they actually can tell how much memory you have on your device and they'll actually have a little download button for those models the ones that are not available with the download button basically means you can't run that on your device because you don't have enough memory um to run them so uh these downloads are pretty
big like 4 gigabytes uh and some of them you know are several gigabytes so just depending on the space on your phone um so you can actually run the distilled llama 8b8 uh 8bit mlx version um and I have the distilled Quin version at 7B um so it just depends on your oh that one's actually not compatible which one do I have downloaded so I think on mine let's see the one I have available is the Deep seek R1 from Apollo I think I have it from open router that's running so let's take a look
here AI providers open router yeah so the the one that I have set up right now is from open router so open router will show you all the models I have you can select deep seek R1 from there which is awesome uh so you can have a conversation so this just requires me you know being connected to the internet we start a new chat you're like uh tell me more about options TR trading and so here you're still talking to the model uh but you're actually just using open router and so that's a little bit
different than you know sending your stuff directly to deep seek uh and they should be able to do that it's possible that this model is busy or it's currently down that can happen so yeah that happens um yeah while that's going I think we could even start another new chat let's see is this model you can select a different model so let's see it's crazy how many models there are now there's so many yeah it's like how do you know which one does it I feel like you just go off Vibes like what's what's my
friend telling me yeah like what's the real Vibes right now so the Vibes right now obviously R1 is like the real hotness people are are like totally into that right now um and it makes sense because you know reasoning uh at a much lower cost so um let's see um uh there's probably something go wrong with my API key or something so uh AI providers I can select the local model to run um you know I want to see if there's something small here that we can download so we could do yeah this dis still
quen just for Just For Speed purposes we'll just download the gigabyte one so this is going to download uh wow that's really fast um the quen model 1.5b and so that'll Run Deep seek locally and so basically it's just downloading it directly from I think hugging face and then the model is being loaded on my phone and um this this is actually optimized to run on Apple Hardware or apple silicon so that's um you know one way that you can kind of take a look at it uh to run this thing and so what's nice
yeah if if this phone runs out of internet or I need to ask some questions or do some stuff uh I now have I will have this R1 reasoning model that's a much smaller version to run on device and I think that's a another good point about AI That's running and like you don't always need the most powerful thing running for every single type of thing I think it's really important to understand different use cases you know because maybe you don't need that depth of reasoning you just need something that's really quick uh or you
just need something that's really good at like Gathering lots of information and just telling you some topics or something like that uh and that could just be done really quickly so it's kind of like picking the right tool for the job uh and experimenting so um the we're at a good age today where we can actually get these models and experiment with them so now I should be able to select this guy and run it so let's go and hit done and start a new chat and then over here um we're going to go ahead
and select the model so here we're going to type in uh oh yeah it already has it the top so you see this little icon that signifies that it's running locally and then we're going to hit cancel uh hit done okay great so now we're running with that local model and uh I think we're just using a default system prompt about it being Apollo and you're like yo tell me about options trading and it should basically start to cook so it's using my phone's power there and it's now thinking and so if we click this
little drop down we'll actually see the reasoning tokens and so that's yeah I have reasoning on my phone no Internet completely running locally yeah 2025 is insane yeah and imagine be able to run this on your watch like that'll just be because this is already showing its capability like we're doing this input if you're if you can make an app that can run on a watch all locally you know just think about like the transcription stuff right uh you have a very very lightweight model you send the audio you know from the watch you know
especially of a loved one maybe they've fallen or something it can just turn on the the speaker and try to understand the situation uh and then if it listens to paramedics or something about asking questions and they don't really know maybe the watch can show hey this is here I'm going to show you the emergency card this person has for their medications or uh this is something that's happened in the last you know five minutes before this events or something um this is kind of the way that uh people think about designing uh apps with
these models is trying to think about these use cases because now you have really powerful devices just all like on the size of your wrist that can run these models and the power of uh Apple's mlx infrastructure and also their uh AI technology is the fact that you know these These are really optimized to run these models um you know very small as we're seeing right now um so that thought for 49 seconds uh and it gets us this output uh yeah please excuse my small screen but uh we'll probably have a zoom in on
this uh so for the edit so yeah yeah yeah that's that's pretty sweet so many startup ideas by the way like from from that alone and I also you know and and so many you know I love that I love that you shared that example of you know someone maybe following hurting themselves I think even coordinating with your your airpods um yes there's just a ton of opportunity there as well um translation you know not just pure translation but it's like someone is saying saying XYZ but what are they really saying you know what I
mean oh maybe like imagine like negotiating in the future except you have like you know the the you know pretend you're a lead negot Ator as like almost you know a local AI llm that's helping you figure this out so there's just you know we can have a million ideas but it's just really exciting to see what that you know where this could go yeah I think that also goes to the point a little bit about what some of these models do so one thing I just learned very recently about uh GPT 4 and chat
gpt's Omni models is the fact that this model's breakthrough a little bit different than R1 uh is the fact that it can actually understand audio and tone and all these extra implications that we don't know about um especially for negotiation like imagine if you can understand someone's breathing rate just from listening to the audio uh that's the capability of something like you know the 40 models with audio you just give it the audio and it's going to know tone it's going to know Cadence it's going to pick up things that we just normally don't think
about but people who are maybe skilled negotiators can understand what those implications mean and then can say hey give me some outlier things every time I give this person a response you know it can answer in like milliseconds what the differences are and I've heard these terms of like micro expressions and if you have an omni model uh you can actually mimic these you know micro expressions and say okay this person is off when we ask them these types of things or changes their position and those are the things that you really can't get today
with some of the you know current reasoning models but except for like the Omni models which is the you know 40 models uh and so it's going to be really exciting when they actually dropped 03 I think a lot of people are going to be taken by storm of like what's actually really going to come out from them it's going to be a really really big leap anything else you want to cover today uh I think this is a really good primer for folks to get started on the power of prompting uh and especially with
these reasoning models just to get started so we covered uh being able to get started with prompting understanding where your data is going you know if you're using deep seek.com or using the apps that will go directly to China and and their restrictions and things that they have for uh your data privacy so just beware um I wouldn't personally be putting any personal information in there that you don't want exposed um and then there are other providers that you can use right now and other people are that are still spinning up at this very moment
uh so for now fireworks open router uh Gro as far as inference uh and then we also covered here running the models locally so that you can actually you know run them on your phone uh using the Apollo app I was using it's it's a paid app but I find a lot of value from and I'm not sponsored or anything like that I just love the work that these people are doing uh and it's really good stuff you can connect to these endpoints with them uh and then the other part is running this through uh
AMA locally on your Mac and using web UI uh with Docker and stuff so I think there's a lot to be taken um care of here as far as trying to use these models and and come up with different app ideas and uh if you have an idea just start to use the the playground to try to generate some prompts for it and see if you can get the output that you want uh and that could be the beginnings of your next multi-million dollar idea that you don't even know is there right so it's it
could be hidden in PL sight and I think that's the power if you want to reach out to me you can just go to Ray fernando. and book some time uh we can have a conversation we can get you set up uh because some of the stuff is very cumbersome and it's just easier for me to walk you through this uh and so I'm available there as well uh you can find my YouTube channel Ray Fernando 1337 on YouTube uh feel free to check that out I do a lot of live streams where I check
out new tech technology and try to play around with these AI models and try to discover what's going on and also try to bring on Experts to explain things a little bit more for us so um Greg such a pleasure to have me on the show this is really amazing you're a legend man thank you for coming on sh sharing sharing your insights here this has been super helpful um I thought it was helpful so if if people agree uh go go comment on YouTube I read and respond to almost every comment um like And
subscribe for more of this in your feed and let us know if we should bring and when I say we it's me you know let me know if I you know should invite Ray back on again to show us more stuff I would certainly love to do that more in 2025 um crazy times Ray uh this whole deep seek you know Title Wave is just yeah I'm glad that it's something is here to um like make more people aware that there's a lot of intelligence and how fast it's moving um and I I want to
also add that like please don't be fearful or don't feel like you're left behind if you're just finding out about this you're not that far behind we're all actually still trying to understand what this intelligence can give us and so the prompts and the things that you develop is a good place to start uh and you know it doesn't have to feel complicated uh and and you know whatever you can get your hands on make sure you do that um and you know be aware of where your dat is going but at the same time
play discover share share back with the community and definitely share any cool stuff that you've done in the comments for sure all right my man I'll see you later thank you so much take it easy thanks [Music]
Copyright © 2025. Made with ♥ in London by YTScribe.com