How ANYONE Can Use OpenAI’s New Realtime API (even if you can’t code)
1.1k views5684 WordsCopy TextShare
Mark Kashef
In this video, we're diving into one of the newest and most exciting developments in AI - OpenAI's R...
Video Transcript:
one of the newest shiny objects that open just released to the world is the realtime API which is basically a fancy way to be able to use their advanced voice systems in any application a lot of people including myself really want to try it out and see how it Stacks up against the other competition in the voice agent space so I spent the majority of this weekend trying to put together the code that it would take to actually bring this to fruition and more importantly create a way that it would be accessible to everyone to Tinker an it rate using the real time API in this video I'm going to give you step-by-step instructions that don't require you to understand a single line of code I'm going to let you plug and play into my code to just enter a few Keys set up a few accounts and be up and running in a matter of minutes more specifically I'm going to cover quick explanation of what the real-time API actually is the simple process I've put together to get your own AI phone agent up and running how to set up everything using just a few clicks and some basic account creation and what you can actually do with this to set up if you don't know who I am my name is Mark and I've been running my own AI automation agency called prompt advisors for the past 2 years I have over a decade of experience in Ai and I've had my fair share of experiences with voice agents I'm now going to jump straight into some slides to make this concept and process as easy as possible for you regardless of your actual technical background so let's Dive Right In All right so if you're getting started with the real time API there are a few key things that you should be aware of so why should we care in general about this new thing called the real-time API first of all it's one of the few times that you can actually talk to openi Advanced voice systems outside of the chat gbt mobile app and the goal of this is ultimately to create conversational experiences that are as real time and as humanlike as possible now the next thing you need to know is that because of the way they've built it using what's called a websocket infrastructure if you don't know what that means don't worry about it it just means that these conversations are more programmed to be live meaning you send a response and because that response doesn't have to be translated into text and back into speech you can take advantage of that space in the middle to lower the latency between you saying something to the AI and you getting a response in real time which is why it's called the real time API and the last component is there are some speech to speech models out there there's a couple by Google but this is one of the few that are probably commercially available and accessible to the general public so to make this possible two companies work together on this specific project or AP which is open aai and a company called twilio twilio provides the calling infrastructure whereas open AI provides the brain the llm and the infrastructure to actually maintain a back and forth speech to speech conversation now if you've never heard of what twilio is in general it's pretty much a company that specializes in any form of inbound or outbound uh calling or even emailing using their product called srid that's very developer friendly and you can buy a number have some form of service or automations be kicked off from an inbound or an outbound call from that number and it really facilitates that calling part of the infrastructure all right so how this works in a nutshell is imagine you are making an inbound call to this number so you call the number that you've purchased on twilio and then on twilio you have a way to kick off this automation through a trigger and that trigger then talks to a microservice a microservice could be literally any cloud provider that has the code they I'm going to give you you to be able to call the realtime API so then it triggers that conversation to be executed by the realtime API and then you send speech to Open Eyes large language model that's facilitating this conversation and you create this feedback loop between the speech being sent the response being sent until that conversation comes to an end and then it sends a notification back to in this case repet saying hey it looks like the conversation's dead and then twilio knows the conversation is dead as well so this whole process stops but you can see here without something like twilio you have no means to start that execution of that conversation using something like a landline phone or a mobile number now I'm giving a bit of a spoiler but if I'm answering the question is this better than existing voice agent platforms then after going back and forth on the code for hours and trying it for different use cases I can definitively say that I don't think so and it's not ready for Showtime just yet especially for AI voice agent use cases now I'm not going to go into too much depth as to why I don't think so just yet once we actually build this out I'll give my thoughts near the end of the video to go more into the nitty-gritty but at a very high level the cost is very prohibitive it's anywhere from 20 cents a minute to 30 cents a minute depending on how many messages are exchanged how long those messages are and I don't like the variability of that cost in addition to that there's a lot of Randomness in the calls sometimes depending on how you configure the parameters the AI ends up talking to itself or it ignores you when you try to interrupt it irrespective of how you play with those parameters so for me that's not a very useful experience and with other providers in the voice space such as Bland vapy sylow Etc this is not something you have to deal with anymore so if you are a business owner and you were waiting for this moment to be able to use the real-time API for an inbound receptionist use case then honestly at this point in time there are better options but just like everything it's worth actually knowing how to trial this and build out so you can see for yourself and double check that for your use case this could or could not be a very viable and scalable way to actually operate now if you have zero technical expertise whatsoever and you've seen other tutorials out there you'll see they're very code intensive because honestly the way this API was designed it's not for the faint of heart when it comes to putting it together so if you're telling me hey help I don't know how to code I'm happy to tell you that my goal today is I just need you to click some buttons pay for twilio if you actually already have a twilio account and you need to buy a new number otherwise you could probably get away with using it for free and just get and fund an open AI key that's the only guaranteed cost you have because it's going to actually be having that conversation back and forth using the funds on your open I key so that's going to be pivotal to making this work from a paid perspective now in terms of the workflow I'm going to simplify this as well conceptually before we dive in so you're fully aware of what we're doing without having to actually read the code or copy paste the code and asking chat gbt to break it down for you so pretty much what we're going to do is you're going to Fork my repc code and what that means in plain English is you're going to take a link that will be available for you in the gumro description down below and you're going to click on that link you're going to click on fork and assuming you have a repet account it will copy paste my code to your account if you don't have an account just sign up for one you'll be able to run this at least within the environment for free but if you want to be able to be available and online on an available basis you will need to subscrib to their $25 a month plan which is called the core plan the next step is we're going to have to create a twillo account which is something we already talked about on that Twi account if you've never made one you'll have to confirm your identity and that's a part you'll have to do on your own time it should only take you assuming you have any ID at your disposal like a driver's license or passport 5 to 10 minutes to actually do that and again that's only if you already have a Twi account and you need to upgrade to be able to buy more than one number now if you have a paid account you'll need to buy a new number otherwise if you have a free account it comes with a free number with a small number of credits that you can use and then assuming that you want it to be ongoing and on at all times we'll have to deploy this replic code so you can actually call the number that you bought on tro at any time and have it pick up and talk to your AI agent otherwise if you purely just want to experiment with this you can even get away with not deploying this code meaning again you don't have to pay that fee to deploy this on a server if we do deploy this code you'll get some URL that ends inapp this is what we'll have to put into to the twilio web hook which is what I referred to before on that previous slide is going to be here so in twilio the name of the app that we're going to deploy in repet this is going to be the part that sends the notification to that service and repet to tell it hey start having a conversation and start the back and forth conversation and once this system is ready to go the workflow is going to be very simple you call the number that you bought on twilio twilio is going to root that call to repet which is what you're going to use my code for rep's going to process that call an open AI is going to go back and forth until the conversation is done generating responses in real time and to get started all we need again is a twillo account an opening account that is funded with credits doesn't have to be too much could be $5 to $10 you'll learn the hard way that after you have 5 to 10 minutes worth of conversations you're going to use that credit up pretty quickly and last thing if you want to be able to deploy this I'm going to be using repet because that's my favorite service to use um if you want to be able to deploy it it'll be 25 bucks otherwise you can test it out and by clicking run in memory and doing that for free all right so I've set the stage and now we'll dive into how to do this process step by step in just a few minutes now the end result of what we're going for is that you have a system that's live that you can ideally go into this Command Center that I've set up a very basic front end that's super non-developer friendly and you can say you act like you are a zookeeper instructor doctor no idea where that came from it's random but it'll work you choose a voice I'll choose shimmery here you choose a silence time out and this is pretty much how long the system Waits until you stop talking before it starts talking and then you have a threshold which is how sensitive is it to uh background noise or noise in general um there's no real end scale it goes from like zero to you can make it like 1 1. 5 Etc um I noticed that 04 was okay but in General I found that it wasn't very good at being interrupted sometimes open a will steamroll you but again that's for later we can just set it up for now and ideally you can click on update configuration it'll say configurated successfully and it'll be updated in real time so you don't have to actually go into any code change any prompts or any parameters yourself and redeploy or deploy especially if those words scare you you can just go to the number you've called and ideally this is actually applied in real time hello welcome to our virtual Zoo experience I'm your zookeeper instructor today how can I assist you on your journey through the animal kingdom yeah can you tell me how I should talk to the penguins please when talking to Penguins it's important to be calm and gentle penguins are very social and curious birds all right so you get the idea it applied instantly so you can really have back and forth conversations very quickly and iterate and see what's possible from your mini Command Center Here Without You building your myself I spent the hours to build it so let's see how we can bring this to reality all right so all I'm going to do is I'm going to click here I'm going to click on copy in this case and we'll just copy from here all right and set it up we'll go into my incognito window here I'm going to log in using an account that ideally doesn't actually have a paid account so I'm just going to sign in here all right and when you go in if you've never had a repet account before you'll see a greeting like this and you will see that you are on the starter plan so you just have to fill out this little thing um I'll just say Mark cash and then I'll say for work I am intermediate and then you click your rule whatever and then it tells you hey if you want to use repet core build annually it's this much per month um obviously if you don't want to do annually you can do per month uh it will be 25 bucks uh if you go with the monthly plan and that lets you have unlimited repats be able to do do a lot of deployments and you get credits for those deployments so I will just click on here I'll go on go with starter um for our paid plan I'll show you my actual account and what we want to do here is we want to actually Fork the rep that I put together so this link you'll find in that gumroad in description below once again if you click on it you will see something like this that says Fork you want to click on fork and then call it whatever so you can call it the same thing you can Fork it and then you'll get a little tour of the app so I'm just going to skip that myself for now and you'll see all the code that I've put together here and there should be two main files config. js index.
js and then I have the HTML that actually builds that Command Center so the idea is when you click on run here it's going to say that the API Keys missing because it's not actually going to transition mine to yours so you have to go and actually get your to configur it so if we go back to my other screen and we go to chat if you go to platform. open. com you'll see a screen something like this you want to go to dashboard and then you want to click on API keys and then you want to create a new secret key and then we'll just call this my voice agent trial and we'll click on SEC create key and then we'll take take this key and we'll go back into repet and then what we're going to do is we're going to go down and scroll into Secrets here and you'll see a little notification button with a little red dot that means it's missing you want to click on that secret go to the value paste that key click add secret and then we're going to go and try to run once again you should get a preview of the actual UI that I just showed you and then you shouldn't have any errors here you should have something that says AI server is listening on Port 5050 which means you can theoretically send a call here once we actually have somewhere to send this service to so before we go on this part we have to now create an account on twilio so go to here and then I will start for free all right so in this case I'm going to do a Google sign in so I'm going to do sign with Google I'm going to sign in with my account continue and then I will just put in my phone number here and once you're verified you want to take a screenshot of this recovery code so I like to do this save it to my desktop and then click continue and you want to just fill out the actual description so I'll just put business here um direct developer minimal code something else voice and then we'll click let's go all right and then you'll see this main Landing p page what you want to do is you want to go to phone numbers click on buy a number and then you'll see here at the very top you have a trial with $15.
50 it allows you to buy one number once you surpass that it asks you to upgrade your account to fund it with an x amount to be able to buy other numbers so in this case we are just going to look I'm in Canada so I'm just going to go to Canada here well it would be super cool if I could spell there we go so I'm going to go into Loc it and I'm going to sa the Toronto region and then we'll search and then it will find numbers near that municipality so I'm just going to click this number I'm going to click on buy you have to scroll down click on I agree buy the number and there you go it's purchased again for free so this is the number we'll be actually routing our calls to from the repa service all right so now we have the repa set up we have the twilio key we have the API key and one thing I forgot that I just want to make sure that you understand if you already don't have an open the ey key if we go back to Keys here and we go to usage then you'll see how much I've spent this month you will have to make sure that your account is funded so I won't go to my billing necessarily but you'll have to go to settings you have to go to billing and then you should have some balance so in my case I always have an upload of 100 bucks for a top up in your case you can make it $5 to $10 but you need to absolutely make sure this works otherwise you'll get an error in repet or whatever service you deploy this on saying insufficient funds or you can't access the API so this is very pivotal that you make this happen put $10 I would say with an automatic top up once you get to $5 and then you should be good to go okay so now that we have everything we're going to take a shot at taking this number and testing this out at least for free before I show you what we need to do to actually deploy so if you go into repet you'll see here if you click on new Dev tab you get that Command Center that we saw on my other screen which should work as long as you have this running so the whole point of paying to deploy it is that technically it always stays on otherwise if you do stop and you refresh you'll see this this won't work because the server is not actually live so like I promised we want to just click things and paste things so what we want to do is we want to go into repet and then take the new tab we want to click on copy link address and in twio in here just make sure this is set on web hook tww IML bin function Etc and then make sure this when it says a call comes in is set onto web hook primary Handler fails make it again web hook and then make sure this is HTTP post this is disabled and these are both post and then what you want to put in this URL you want to delete it and you want to paste in what we just copy pasted from repet now you'll notice this very annoying and weird double slash to the end we want to take one of them out and replace it with incoming Dash call why do we have to add that it's somewhere in the code it's called a root that basically creates the bridge between twilio and the real time API if you're a developer take a look at the code and you'll find it otherwise now we just go down we click on Save configuration and now that's all saved so technically when you call this number it should send a hook to repet and assuming that repet is on it should be able to have that conversation now in here you'll see if you click on this file called config.