Deepseek R1 - The Era of Reasoning models

4.08k views3286 WordsCopy TextShare

AI Jason

Deepseek R1 - in depth review of reasoning models and how to tame it Join my AI builder Club to lea...

Video Transcript:

deep 6 Dro a bumb today with R1 model and open source reasoning model that has unpaired performance with A1 but 96% cheaper compared with open a01 model and the team Spilled Out All the training secrets of how this Frontier reasing Model are actually train and what is driving significant improvements and this reasoning model also has a distilled version that you can run on your home device directly as long as you have CPU with at least 48 GB of RAM so one thing became more and more clear is that those reasoning model is going to drive

key advancement in larg D model progress 2025 that's why today I want to dive a bit deeper into those reasoning models how does it work what's the best practice to prompt those reasoning model and how can you utilize those models today so one of the key findings from open AIS 01 model is that the longer the model syncs the better result it get and this is such breakthrough findings because we finally have a way to scale model's capability further without just relying on pre-training data you might notice the model capability Improvement recently from gbd4 to

gbd 40 is not as dramatic as from GB3 to gbd4 and that's because so far the key scaling method has been the pre-training but for pre-training data we are running out of data that we can use to train the motel further and this also what ear mentioned in the keynote the preing as we know it will end and if we can figure out a way to through more computation during the inference stage and get model to just think longer to increase its IQ we're no longer be limited by the amount of data we have to

PR train the model and get model to start solving problems that it never saw before so how do those reasoning models actually work as we know there's technique called chain of Sal which basically just prompt model to Sy step by step and by doing that the model's result just became much better fundamentally all those reasoning models are just having same behavior where it will generate a huge amount of change of s reasoning before it give you the answer but the difference is that the reasoning model string of salt is much longer and much higher quality

it comes with a bunch of of new Behavior like it will try to stop and re-evaluate their previous approach to ref fact if it is a right approach or not and also it will break down problems into smaller steps and also try multiple different strategies and what's really fascinating is that all those special behavior that we mention and observe here is not programmed by the AI engineer or AI researcher they just use reinforced learning to incentivize model to generate higher quality and longer reasoning token and those Behavior just emerged out of sing air the model

self- evolve figure out better ways to do problem solving as it mentioned in the paper one of the most remarkable aspect of the self evolution is emergence of sophisticated Behavior such as reflection where the model will Revit and re-evaluate the previous steps and exploration of alternative approach to problem solving arise just spontaneously those Behavior are not explicit programmed but instead emerg another really interesting thing mention deeps R1 paper is a distillation basically with those reasoning model since it can generate originally high quality reasoning data then we can use those reasoning data to train smaller model

for very specific domain tasks so knowledge distillation has been a pretty popular methods where you will use a very large big model to generate training data to train smaller model on very specific tasks so that you can actually get a really good performance for certain domain task even Edge devices like a mobile phone and they've successfully did it by using risent data generated by the Deep seek R1 to train smaller model like Quinn and this is also another reason why open source reasoning model like R1 is so exciting because previously open ai1 model has been

hiding all those reasoning token from the developers and one of the key reason I think they did that is otherwise it can take 's reasoning tokens to either F or do knowledge distillation to train their own reasoning models so it has been preventing developers from getting those reasoning tokens but now with deeps all those reasoning tokens is actually free and accessible for developers so I think the knowledge desolation is actually going to be a pretty uh big thing in 2025 I'm pretty Keen to try and showcase an example workflow about how can we do that

with those reasoning models if you're interested please leave a comment below but even though those reasoning models are extremely powerful nonm developers are uning models in their large model applications because it does comes with a trade-off in terms of cost and latency that's why today I want to take you through a few best practice I saw in terms of how to prompt those reasoning models effectively to get the best performance out as well as some agentic use case where you can deploy the raising model today I ra through a few different papers that dive deep

into into this topic as well as material from open Ai and research from promp Hub I'm going to summarize key learnings for you in 10 minutes the first prompting principle for those reasoning model is that you try to make your prompt simple and direct a lot of prompt techniques that we used to use for model like gbd 40 or cross 3.5 actually doesn't really matter here in fact in many cases it contribute negatively for the redu this one really interesting paper called do Advanced large Range model eliminate the need for promp engineer where they try

to use similar prom techniques that we used to use for model like gbt 4 unreasoning model like o1 and compare the results and founding is really interesting so you can see that those technique like f sh prompt chain of s critique normally drive up the performance for GPD 4 all model but when it go to reasoning model the best performing result is when you use zero shot which means you just give the instruction any other techniques you start adding actually drive the performance down this is probably the first kind of counterintuitive thing when you prom

those reasoning model try to make your instruction really direct instead of super detailed and explicit and one example provided by open AI is that the prompt that we normally use look something like this it will give the task itself but often we give very specific instruction about let's think step by step don't skip any other the steps do this first and do that next and in the end do this and we might also put prefix like this to guide the response generation but this type of prompt is not great for reasoning model instead that from

what they found for reasing model just keep it direct like this tell it what the task is and let it figure out the rest so this is the first principle the second principle is what I call one to two sh prompting this kind of relate to the first one when we use few sh prompt techniques with full model we normally try to give at least three to five different examples but from different testing findings when you give that many examples normally the result is not that great this is one chart from a paper called Mac

QA where they found that 01 model when you do fut short prompting with five different examples the output will be actually lower than the Minal prompt per from the open AI own guideline they actually recommend you to use some kind of examples in the prop which they refers to as show the motel how to do it instead of tell it what to do and I think the key thing here is the number of examples when you give o1 model just one or two examples it actually doing great but when you give more examples the performance

start de great and this kind of align with open AI own documentation when providing additional context or documents which should only include the most relevant information to prevent the model from over complicating its response so this is second prompting techniques for those reasoning model that I found a bit counterintuitive and thir pring techniques is that you can actually prompt those reasoning model for even more extended reasonings to get better performance so as we mentioned before one key findings for those reasoning model is that the longer those reasoning model syn the better performance and this one

research where they test two different promps one type of promp it will tell the reasoning model to give very concise response where the prompt were including things like I mean emergency speed is the most important thing answer as quickly as you can versus the other one is that take your time and think carefully spend as much time as needed to study the problem and the result is is that by prompting those reasoning models it can lead to 16 to 30% more reasoning tokens as well as increasing accuracy in the readout so this is also something

you can try to squeeze more performance out of this reasoning model so those those are some special things you need to be aware of when using those reasoning model and as you can see all those gain IQ from reasoning model has this trade off of time and cost and this leads to a key question which is when should you use those reasoning models but before I dive into when to use those reasoning model I know many of you are trying to learn how to make practical use of AI and start your own Venture and this

requires lots of in-depth practice and learnings of how to build production level large Lang model application and that's why I build this Comm called AI Builder Club where I share all the learnings and mistakes I personally have in terms of building large L model application and AI coding in production alongside with practical examples take you through step by step as well as tips and tricks from industry experts interview and most importantly we have this growing community of top AI Builders who might already experience the problem that you are facing today so that people can just

come to share the ideas and get advice from each other I have put the link in the description below so you can go and join Community today now without further Ado let's talk about when to use reasoning models from my point of view those reasoning model actually don't replace your day-to-day models but now you actually have new option to get more intelligence as tradeoff against higher latency and cost one mental model I normally think is that when you build lary mode apps for most time you should still use mode like 40 and CLA 3.5 but

you can decompose your task into small steps and identify what are steps that can benefit from additional reasoning and does latency matter less one of the example here is Agent planning and reasoning especially for those type of agent scenario where the process to complete task can have more than five steps you can actually use reading model to generate plan about how to execute tasks and then pass on the plan to smaller model that can actually execute task and do function calling like full Mei and I will take you through example quickly but this is one

of the most common use case I see where reasoning models can be used today and Second Use case is the image reasoning and understanding so those reasoning model has show better capability in terms of understanding complex image like flowchart or diagrams and because of that you can use those reasoning model for those complex image tasks maybe for medical purpose and a lot of people will also use those reasoning model to do image pre-processing to generate relevant metadata for each image so it can be retrieve more accurate later so this is Second Use case I also

saw that is pretty interesting but the most straightforward EX example I want to showcase you today is Agent planning so imagine you're trying to build a agent for logistic 10 to figure out the best route to fulfill customer order and this agent can have access to 20 or even 30 different tools for this type of complex task even with gp40 model I can imagine that agent can struggle to complete task but by utilizing strong reasoning model like deeps R1 or open at o1 you can get deeps model to generate the plant first and then even

use smaller cheaper and faster model like full Mini to do the task execution and here example again provided by the open AIT team to Showcase how can you integrate this planning step into your agentic system let's say we're building this logistic agent that can design optimal route to fulfill the orders and here we're going to define the context of the request I'm not going to go into details but it covers things like what's inventory for each item what's order information what are available suppliers uh stuff like that so we can use those as more contacts

when the agent call certain functions and here is the problem for the planning step so your the Supply Chain management system the first input you receive will be complex task that need to be carefully reasoned through to Sol your task is to review challenge create detailed plan to process customer orders manage inventory and handle Logistics you will have access to a large L model agent that is responsible for executing the plan that you created and return the results the lar mod agent has access to following functions so here we're list out all the function and

tools that 40 agent actually has access to and when creating plan for lar mod execute break break your instruction into logical step-by-step order using a specific format the main action taken should be numbered and sub actions are letter like 1 A1 B is and specify conditions using clear if then L statements and for Action that require using one of the above functions defined right a step to call a function using back tick for the function name ensure proper input arguments are given to the model for instructions and the last step in the instruction should always

be calling the instruction complete function this is necessary so we know the lar model has complete all the instruction you have given and plan generate must be extremely detailed and we should use markdown format for generating a plan with each step with substep so this is a prompt that we're going to use for all one model to generate plan and later after plan is generated we're going to pass on to the 40 agent that can actually do the tool C to execute the task and here we're saying you are helpful assistant responsible for executing the

policy are handling incoming orders your task to follow the policy exactly as it is written and perform the necessary action you must explain your decision making process across various steps first it read and understand the policy and then identify the exact step in the policy and decision making briefly explain your action and why you are performing them and action execution and here we're going to insert the policy that is generated by the o1 model and then we're going to give a list of actual functions that Full O agent can call which is a lot and

below is where we Define actual tools and here we can see we just use context information to mock the results here we Define some helper function one is that we're going to print the message call1 first P the plan generated and then call the 40 model and in the end return final message then this is just a helper function to adding the message into the message history that we Define above and also print out the result in the terminal and we'll also Define this function that're going to call the o1 model where we'll give the

o1 model prompt they give as well as scenario and ask it to generate plan then we'll also Define this two Co agent that is pretty standard it will be given this system and prompt if it is returned that it try to call the tool it will try to run the tool until the function is instruction complete here you can see that openx notebook didn't really rely on tool course uh finish reason to identify whether the task has been completed but instead they make it very explicit in the 01 model prompt the last step has to

be calling this instruction complete function so that uh we know L model has complete all of the instruction you have given so I guess this probably part of the practice they found that works better if you ask model to explicitly calling out that all the instruction has been complete and followed and then we're going to ask the agent to execute so here you can see that first they 01 generate a plan with very clear step by step actions first they need to F new orders and then it need to process each order and for each

order item it should check the inventory availability where it should call this function and if the available inventory quantity is greater or equal to the order quantity then process to allocate stock so you can see here it give very clear if then and L statement so here the plan is actually very detailed but following clear structure so that full agent can execute the actions according and in the end it will give this instruction instruction complete then this plan will be sent to the full model where it will start executing the actions based on this plan

and in the end where it will CA all those different functions the end complet this action so this example of how can you use 01 model to drive very complex decision making actions if you just give this task directly to full media agent it definitely won't be able to complete this actions so those are some best practice and use case of how to fully utilize those thinking models in your life model applications I continue posting interest in learnings and pures I'm doing in a community and building it has loads of content about AI coding and

how to build agents and lar model applications as well as insights from industry experts but most importantly you have this community of top AI Builders who might already experience problem that you are facing today so you can just come here and ask any question you have where myself and other community members can just come and give advice I have put the link in the description below if you enjoy this video please like And subscribe thank you and I see you next time