OpenAI Just Changed Everything (Responses API Walkthrough)

244 views5932 WordsCopy TextShare

Dave Ebbelaar

Want to get started as a freelancer? Let me help: https://www.datalumina.com/data-freelancer?utm_sou...

Video Transcript:

open AI just released their responses API and every time open AI does something like this the whole developer World panics right because startups are getting killed products become irrelevant and we as developers have to study again to see what this is all about in order to stay relevant and now the goal of this video is to make your life hopefully a little bit better because I went through the entire thing and I'm going to outline it step by step showing you exactly what you need to know about these new updates so if you're new to the channel welcome my name is Dave eilar I'm the founder of data Lumina and we've been building custom data an Solutions for the past 6 years already and we also run a community with over 100 freelance data nii professionals and I make these videos to help you level up as an engineer So eventually you might want to join us now let's dive in so throughout this video I will be walking you through this giup repository it's in my AI cookbook and there's a dedicated folder for responses I will walk you through a bunch of files that you can also run yourself to experiment with this now in this video we are going to cover the following updates that are introduced with the responses API so start with simple in introduction really what you need to know then we'll talk about text prompting conversation States we'll see how you can now use function calling and structured output with the responses API and then we'll get into some new features for example the web search and file search and there are also some updated ways or improved ways that you can use reasoning models in this responses API and now every time there's an update within the world of AI like this all YouTube channels blog post news articles jump on it usually with a lot of hype covering the low-level features that are already in the blog post and my goal with this channel is always to go deeper than that and really explain what you need to know as a developer and how to translate these updates to what it means to build real AI applications so hopefully I can do that and go a little bit deeper than most of the other stuff that you'll find online but first I want to cover the most important things you need to know as a developer right now so these are just seven quick points that I want to go through don't click off this video just yet if you only get one thing out of this video it should be this and then we'll dive into some examples so first let's talk about the backward compatibility and where the new responses API stands next to the chat completions API the API that we've been using basically since the beginning of working with open AI models and the responses API is a super set of the chat completions API and what this means in simple terms is that everything that you can do with chat completions can also be done with the responses API plus additional features we will cover those in a bit here's probably the most important thing to know as a developer that is the migration timeline in the introduction video open AI announced that they will eventually Sunset the chat completions API end of 2026 so there's still plenty of time but they you clearly see that the responses API is the New Direction the new way forward for open aai and it's going to be it's going to be become the new standard so as a developer you should already consider right now for the projects that you're working on what this means what a migration timeline could look like and for new projects I would already recommend start starting to build it around the responses API so keep that in mind now what are those new features and this is these are the features that we will cover in this video so quickly high level there is a simplified interface for different types of interactions so you can clearly see that they the chat completions when they introduced that API it was really the first time doing this and everything was based around chat and text right now we have multimodal capabilities and they have really like simplified the interface and it makes a lot more sense there's now also native support for web search capabilities amazing feature so we can browse the web there is a new developer role that you can use so next to system user assistant you also now have the developer role we'll get into what that means and then there's improved support for reasoning models so there are parameters that we can tweak to control our reasoning models we have built in file and Factor search functionality and there's a simplified conversation State Management that we can now use now now with this new responses API they also introduced some tools that we can use so these tie into the new features but there is web search File search computer use and function calling I will not dive into computer use in this video so if you're looking particularly just for that this is not the video everything else we will cover okay so then when to migrate I would recommend right now for new applications new projects start with a responses API watch this video understand what's it all what it's all about and then for new projects use this this for existing applications that really run and are in production begin planning migration but there is no immediate urgency right now you still have probably two years to go test the new API in parallel with existing implementation to understand really what the differences are and how it might affect your outputs okay now here number six this is a really important one so the API structure changed there's this new API that we can use but really the same principles when it comes to AI engineering and when you're building applications remain the same we're still using the same models so for example we're still using GPT 40 and that model did not change that model does not have new features it's just that that are new features in the API so features that previously required for example multiple llm calls or a separate surface or a separate function that you would use in your code can now be done in single API calls to open Ai and there are both pros and cons to this which I will outline within this video so while it overall makes the whole process of AI engineering simpler it also introduces more abstractions because open AI is trying to take control of everything and really tries to like Fender lo you in with being your One-Stop shop for all things AI engineering so really the fundamental patterns of retrieval tools and memory management still apply and they remain the same so this new API with it there are no new features for example that you couldn't do one week ago one week ago it would just mean that you had to build some some extra logic some extra functions in order to make that happen and right now we can do it with a simple API call that's really what you should understand because otherwise this update can feel very overwhelming it can feel like all there's all these new features and capabilities and how does it apply to my existing projects you're probably fine it's just an easier way to build these applications if you were to start from scratch again all right and then finally let's talk about agents because open AI also announced their new agent SDK which is going to replace swarm so you can find more about this on the specific documentation website and we can even scroll down and come to the new open source python Library so this is really a new library a pip install open AI agents that you can use to build agents this is similar to what penic AI does or what langra does or crew AI or plenty of all the other agent Frameworks that are out there also this agent SDK will be beyond the scope of this video so if you're specifically looking for that this is also not the video because we will first cover the responses and its fundamentals and then in a later video I will most likely get into what this new agent SDK is all about and um if it's any good and if I would recommend using it all right now let's dive into some practical code examples so I am here in the responses folder and I am starting in the introduction file if you want to follow along in all of the files there are also links to the official documentation that you can reference so for now let's start with a really quick example to compare the chat completions API with the new responses API and for this I'm going to assume that you are already familiar with the chat completions API so that you have worked with that and understand it so we can do a simple API called using GPT 40 uh write a bad story and we get the response now let's look at what that looks like with the responses API so now instead of using client. chat. completions create we can use client responses create so you can see that they have simplified the response creation they removed chat and another thing that we can immediately notice it that is that we now have this input variable so in the chat completions API the only way to pass data to the model is via the messages object which had to be structured like this with the role and content in a dictionary and then you could build a a message history or conversation history now with the responses API they have simplified this and this is really a key insight as like really with open AI figuring out there's more to working with these AI models than just chat right A lot of the time it's just one sentence or one one way you want to send information to open a ey rather than having a complete conversation so now it supports both so what we can now do let's run this and we will get the exact same response we just have to input the model and now we do as the input write a one sentence badtime story and we get the response output.

text so they have they have also simplified the uh extraction of the the key content that you want to get where this I found this always a little bit weird where you had to do the choices and then the first the message and then the content it just seems a little bit for both right a little bit extra where right now you just get the response and you have the output text and here you can see the output from the model now like I've said the responses API is a super set of the chat completions API so everything that you could do in there is also possible over here so here you can see an example of pluging in an image so for the content we put in an image so this is a uh image over here and then we can also let it describe what that is about and it also supports streaming by setting stream to True putting the input here let me just run this for you to go over this basic example so you can see that we stream stream the output over here so that's the very first introduction to the difference between using the chat completion and now the responses API all right and now let's cover example number two in the text prompting file so next that inputs can now be a single string or a list of messages we also have a new role mainly the developer role and there are two ways that we can use this new developer role the first one is by setting the instructions so this is a new parameter that we can now use so here for for example we could say Talk Like a Pirate this would be an instruction coming from the developer to the model but this would be similar to setting the role as developer so this is the previous method of providing messages to the chat completions API but the input the input parameter still accepts those types of message sequence but you can also do a single string and then use instructions so let's the run these examples side by side to get a feel for that so we can run this first example Talk Like a Pirate GPT 40 and then there's a question about semic columns in JavaScript so there we get the response but this would be exactly the same as setting the role to developer and then the contents Talk Like a Pirate but you can already see this is a lot more concise a lot easier to read um and to manage than having the roles and instructions over here so we can print the response over here and you can see also a pirate explaining about semic columns in JavaScript but now here's something important because of this new developer role you might be wondering okay but now what is the hierarchy what is the role of the system prompt of the developer of the user of the assistant and openai has an article on this and specifically within this section called the chain of command where they explain really the hierarchy of how different information is used to get to a final response so there's a hierarchy to this where where for example a developer message overwrites a user message and a platform message really what is what opening ey is using behind the scenes overrides a developer message so let's see a practical example of what that could look like but this article is also really worth uh going through and reading in full in order to get a better understanding of really how the API in the models work behind the scenes but if I have the following example over here so we do a uh we use the responses API we do a create gbt 40 and there is a system prompt a developer prompt and the user prompt and the system prompt says Talk Like a Pirate and the developer says Don't Talk Like a Pirate let's see what we get and if we run this and then print response you'll find that this Talks Like a Pirate because the system prompt is overriding the developer prompt to tell the the model essentially to talk like a pirate but now we can take this uh the way around where the system prom says Don't Talk Like a Pirate and the developer says Talk Like a Pirate and you'll find that it is exactly the other way around meaning that this doesn't talk like a pirate it just say um gives a regular answer semic columns are technically optional in JavaScript so there is this new role category of the developer that we can use to play with but the system prompt the system is also still there so what this means for you as a developer is that now next to having that system prom where you can control your application you can also give more granular instructions on the developer role but knowing that the system prompt will override that but this could come in handy in certain situations where you don't want to really overblow your system prompt but just have a lean and minimal system prompt but then for some occasions have a developer prompt gives Specific Instructions to make a certain situation or API call or prompt more specific with the context at hand and then real quick if you're a developer you get some technical skills and you consider starting as a freelancer to take on some side projects learn more make a little bit more money but you don't really know where to start or how to lend that first client you might want to check out the first link in the description it's a video of me going over how my company can help you with this we have a community with over 100 data and AI professionals and we are all here to make more money work on font projects and create Freedom so if that sounds like you feel free to check it out and then I might see you in a group all right now let's get into example number three and that is the new conversation state so let's start with a simple example again so before the response API this is how you would manage your conversation and message history you would have a sequence of user assistant user assistant back and forth and then every time you would make a new API call it was the goal was to get all of the relevant context together and then create the message sequence and then send it to the API and this still works as you can see it just follows up knock knock who's there orange and then the AI replies with orange who um correctly showcasing that it has the context on the joke that it is in but now here's the new way to go about this and that is using the conversation state so what we can now do is if we use the response API and we ask gp40 mini to tell me a joke here we get a joke but now what we can do if we do another API call and we set the previous response ID to the response ID of the previous interaction with with the model and we can see we we have this object so let's see we have this response so this is the entire response that we got back from open AI on this call and now we can specifically use the ID which is a unique number over here that is stored within open AI platform that we can now reference and if we then ask the model to explain why this is funny so here you can see we have a we have a single API call we don't provide it with all of the previous context and we ask to EXP explain why this is funny so this should now explain our previous joke so why did the Scarecrow win an award so then it says this joke plays on the PA of involving the world outstanding with the the double meaning right so here you can see that it correctly referenced the context of the original joke even though we did not uh specify that to the model but it was able to retrieve that using the response ID and now what's important to note here is that this uh storage is true by default fault so we can plug in the parameter store is false and this will not save it to the open AI platform but if we leave this out this parameter we can use that ID it will be stored and we can use that in later conversations in order to pull that up so that's going to overall make your life as a developer easier but it can also be tricky for debugging things because this is all stored within the open AI platform and having something explicitly like this or building its really specifically like this can have an advantage from a developer perspective if you're going through a code base because someone new to this project might be like okay but where is it getting the context from uh and also where is it stored how long is it stored those are all things that you should really be familiar with before really using this response idea to uh manage conversation history all right and then number four function calling so with this one I'll quickly go through it because it's essentially exactly the same we specify a uh a function in the form of a tool I'm not going to dive into the specific syntax and how to do that because I've highlighted that in other videos but it's exactly the same as it was with chat completions so you can specify the model the input and then you just specify it as tools so here is a tool that sends an email and we ask can you send an email to Elon and Kya and we can look at the response and we should get two objects back so this is the first one and the second one so this is an email to Elon and this is an email to kacha yeah so for function calling nothing really changed you can just specify the tools and add it to the API call okay and then number five structured output because any serious AI application needs structured output to some degree so let's cover the two examples that you can use and I'll I'll have a sneak preview here for you that I wasn't able to find I really had to do a lot of digging for that because whenever I use structured output I typically either like to use the instructor library or use use open I structured output but then use the ptic version of getting structured output over specifying the Json schema and if you go into the documentation right now you can't find the ptic documentation on how to do that with the responses API but it still works I found a message from an open AI Def and I'll share it here as well but there are two ways that you can use the responses API to get structured output so that's by specifying adjacent schema so again model we specify the input and then we put text in here and we form and we format that so in this case we say it's a calendar event and there are properties it's a name date and a participants and we use this to uh look at a message and extract the name the date and also the uh the topic so what the event is about from this specific text so this is again similar to what it was with the chat completions API so we can run that we can load it and we can have a look at the event and now we can see that we get adjacent object with name Science Fair date Friday and the participant is a list uh with Bob with Alice and Bob so that's how it's specified over here but like I've said I always like to do this uh directly with penic models so instead of having this very forbose Jason uh schema definition over here let me just Define a penic model and use that so that still works so here's the link to that I found this link where someone was saying I wonder why they face that penic structure output for the responses API let me show that over here and then uh what is it over here this guy so shout out to you Steve from open AI team he linked to this giup example and it still works so what it then looks like is you can also just specify your penic model and then we have model input instructions and then text format we do the calendar event so let's put that in here then we get the response model and now we can get the actual ptic model by using this syntax over here so we do have to do a little bit of digging onto the object that we get back in order to get the penic model back but here you can see now we have this exact penic model the calendar event like we specified all right number six web search really cool and very simple to use we can now use a tool called web search and we do that by specifying the tools and then setting type to web search preview and we can just hand that over to the API so the response is API GPT 40 we ask what are the best restaurants around the dam which is a place here in Amsterdam so I can run this print the response and this will take a little bit longer because it's going to actually perform uh Google searches and browse the web and then here you can see the Dam located in the heart of Amsterdam and then we have a couple of options that we can choose from now if you go into the specific documentation you'll find that there are some more options that you can play with one thing that could come in handy is specifying the user location because doing searches especially Google searches it really depends on your location to see what kind of results you get so that's how you can do this so I can run that it would be similar uh we would get a similar answer and then what's also cool the annotations are also in there so let me pull this up over here so you can see all of the uh sources that were used and in there is also a URL so if I just look at the first object over here you can literally see this Source was used to uh base its answer on and that's really cool that we now have this feature right out of the box within the open AI API okay now number seven file search and semantic search straight from the open AI platform so I'm not sure what I think about this so with this approach you can essentially perform Rag and do it all using open AI but there are so many great tools uh open source Vector databases already that can do this as well so doing it directly in open in the open AI platform I could definitely see how it makes it a lot of a lot easier for a lot of developers but really if you're serious about building AI applications then I would really be hesitant with build with using this feature out of the box because it's pretty straightforward to build yourself and you'll have a lot more control because as you'll find some of the things you just have no control over you're not really sure where your where your data is how it's chunked Etc but it's cool that it's there nonetheless so let's have a look at what at how this works so for this we use a little bit more than just the responses API because we are first going to um upload a file create a vector store chunk it and then perform the similarity search on it using the responses API and these are all examples really from the open AI docs so let's I'll just run you through an example over here uh I think with this the best is just to experiment this on your own and run the code but first we can create a file so we have a a URL over here so this could be any file or URL and what you can do is you can come to the openai platform and here you can see all your files so you can see I just uploaded this deep research blog. PDF so this literally is a this is a public URL public PDF and you can plug in any URL or file path and it's going to upload that to open AI so here you can see in the platform we now have the PDF in here then we need to create a vector store so again there's documentation in here on what that what that looks like and here you can see we don't have one yet but we can use the API to say let's create a knowledge base so now if I refresh this we have a knowledge base great now we can add the file to the vector store so we use the file ID that we uh that we stored over here and we can run that let's see all right so we put that in we can check the status it's probably already complete all right and then let's have a look at this knowledge base so now you can see that within this knowledge base we have to file deep research block. pdf you can see uh the usage the size Etc and Please be aware this cost money so whenever you have a knowledge base that is active uh you will be built per day so if you run some experiments and you don't plan on using it afterwards make sure to delete them and now what we can do is we can use the responses API and we can ask what is deep research by open Ai and then we have the tools and we say type file search and we have to plug in the factor store ID so this is the ID of the the knowledge base or the factor store that we just created so let's run that and see what that looks like so what this is now going to do we're going to use response API we ask we ask what is deep research by open AI we give the tool file search and because it has that context it's going to decide let let's use that knowledge base to perform a s similarity search on and then you can see over here if we have just just a text it can tell you uh what that is and it will use information from that blog post now one of the things that I always find a little bit tricky with this so okay we have this in here but we have no we we don't know how this was chunked we don't know what embedding model we're using so while it's very simple and straightforward to get started um you also lack a lot of control as a developer and I for example have a lot of videos on this channel on how to build rack systems it's all pretty straightforward to set that up it's really hard to scale and do it well and avoid hallucinations really at scale as your knowledge B base grows and with this you're going to run into similar issues especially since you don't have that much fine grain control so what you can do to limit results is for example you could set the maximum number of results or chunks that are are retrieved from the from the knowledge base you can set that so for example we can set that to two and then we can also include the search results so this is some syntax from the uh that you should just get from the docs or you can just copy it from here but if we set it like this and we print the response model so it's this is the same question what's deep research by open AI it will run it will print and here you can see over here that we have the results and that you can see two chunks over here so we set the max number results to two and here you can see so it has the score so it's file name de deep briers block.