How to Build an AI Voice Agent using OpenAI Real-Time API

2.44k views5615 WordsCopy TextShare
Bart Slodyczka
In this video, I will show you how to build and deploy an AI Voice Agent using OpenAI's new Realtime...
Video Transcript:
hi you have called Barta automotive center how can we help hi I want to book my car in for a service of course I'd be happy to help with that may I have your name please yes my name is Bartholomew thank you Bartholomew what type of service or work does your I just need a regular service and probably a break check as well got it a regular service and a break check when would be a convenient time for you to bring your car in next Tuesday at 1: p. m. we can definitely schedule you for Tuesday at 1: p.
m. your appointment is all set we look forward to seeing you then Bartholomew if you have any other questions or need to make changes feel free to let us know all right see you okay guys and that was an AI call with the open AI realtime API let's get into the video hello Legends in this video we'll be building out an inbound AI voice agent using the new open AI realtime API so open AI recently introduced the realtime API and this is super valuable for speech to speech experiences so as you saw on a phone call before I was able to dial a number and I was able to speak with an AI assistant to book in a time for me to get a car service done now even though a lot of people are saying that this pricing is actually pretty expensive for the voice too API I think that this pricing is going to come down probably relatively soon and when it does the demo that you just saw it was basically using twilio to generate a phone number and then to make that connection with the real-time API and that's a direct competitor to tools like vapy or Bland it might not be as good as in the back end to kind of create those experiences but as you can see using open AI now you can create a chat assistant you can plug it into your emails to automatically reply to emails you can use the API to do completions and process files for you to review data for you to analyze your Excel sheets and now you also have a voice API so if you wanted to you can use open AI for uh your ecosystem of AI Integrations and all of your AI needs so while it might be a little bit expensive and maybe not as usable or easy to use as the uh different AI tools it's kind of looking like the trend is going towards uh open AI creating a full service experience where people that know how to code and also don't know how to code can really integrate AI into their workflow so this is pretty exciting and yeah it's a little bit expensive now but I think it's going to come down uh in the near future now the final thing I want to point out is on that demo call pretty much as soon as I finished speaking I already had a response coming back to me from the API and that's thanks to the realtime API letting you create a persistent web socket connection so for realtime conversations for realtime communication like live chat or now this um voice API the websocket is the main mechanism that lets you communicate instantaneously so let's say a regular API call where you're speaking with the open AI assistant it's like you have to send the postcard to communicate I got to send the postcard to to you you've got to wait for it to be delivered to you then you get it then you write your next response and then you send the postcard to me and I've got to wait that for that delivery time the websy removes that delivery time by pretty much uh creating a portal between you and me where as soon as I send a message you immediately get it there's no lack there's no downtime so that's how this communication is Real Time by using this web slock and another thing to point out is that you can still do function calling with this API now coming back to that demo that I just showed you guys I created a GitHub repo where I'm going to show you exactly how to download that into repet how to deploy it um what kind of settings you need to change it probably take you like 5 minutes total and then I'm going to show you how to create your twilio account and actually add a number and like everything that you need to do to plug those two things in together so that within the next 5 10 minutes you can actually go on your phone and then call your own AI assistant so watch this entire video that's all going to be explained but just quickly I'm going to show you what the functionality is of the assistant that I created so as you saw before it actually takes inbound phone calls so we use twilio we generate a phone number we plug that phone number number into the API and then you can call it on a phone and then speak with it then I've prompted it to capture some customer information so customer name uh the availability and then whatever service work the customer needed and as you can imagine using that prompt you can adjust this to whatever your business needs are as well and then one thing that I haven't shown you yet is that I actually plugged it into make. com so that I can get that data from a phone call and then do whatever else I want with it like in make.
com you already know you can plug into like any of your other tools do absolutely anything you want create different flows automations uh so I'm basically sending those details to make. com and for my example I'm just updating a Google sheet and that's what I've got here I've just got three columns I've got name I've got the availability and then I've got some notes as well and yeah this was that most recent one so I wanted to book an appointment for 1: p. m.
on Tuesday to have a regular service and then check my brakes and as you can imagine you're going to have a lot of different use cases as long as you can get that data into make you can do pretty much anything you want to do and then finally we have session management so this can handle simultaneous phone calls all at the same time I think that are some limitations in the API at this stage where if I'm not mistaken the Max phone call length you're allowed to have is 10 minutes or 15 minutes which I don't think you kind of cross over that but then also if you're on like a tier one tier 2 account there's like a limitation on how many um phone calls you can have at one time um and then the API has rate limits anyway but anyway this is probably going to get outgrown really quickly so if you're watching this video you might have a couple of these speed bumps to start with but then moving forward um it's going to get you know going to get a lot more breathing room but really I don't think you're going to run into any problems um but yeah this session management lets you handle multiple phone calls at one time so if I have a phone call on twilio and then three other people having the same phone call the information on that repet server is not going to get mumbled up jumbled up so my name isn't going to be used for that third person and that third person's car service isn't going to get uh muddied up with whatever I need to get done it's just going to be all separate calls and separate session so uh yeah this is actually it doesn't seem like much but this is actually really cool to have especially if you want to launch this uh starting today for your business now we're going to be using repet to basically take that GitHub repo we're going to run it on repet and then we're going to be yeah basically all the calls going to be routed to repet and finally I have the GitHub repo here which we're going to be going through in a second I'll also explain all the code and once again I'm going to show you how to plug it into twilio so it's going to be super easy actually the twilio side of things is going to be very very easy even if you have a just a brand new account you can get this working straight away okay so just quickly how did I build this out so I first found twilio's uh YouTube video like the day of or the day after the real time API was introduced and twilio created a repo to write this in uh JavaScript so they deploy this using node and uh which is my favorite language you can also I think they've got um a repo for python as well but you can watch this video where they explain the code in detail just breaking it down step by step like what each part of the code means I'll also explain in my code which was built off this code uh what each of the parts means but if you do want more details come to here for sure the only thing is that the guy that's on in this video made the code on his personal computer he didn't deploy it on a server like repet now they have an extra step in here where they have to expose their code to the Internet so that the twio number can Target that specific code uh and that's why they're using this thing called enro but we're not using this because on repet we're going to expose it directly to the Internet so we're actually saving a couple steps here as well so this is twilio's GitHub repo just want to give credit where credit's due because this is again what what I started with and what I actually used to uh build out my code so yeah definitely jump into here check this out this is the main file the index. js and you can come into here and look at the code and and you basically see that my code is very very similar to this just has a few different things and then my repo is over here uh I basically used a fictional automotive store so B Automotive and I made a couple different features into here to basically capture the user's information and send it across to make. com uh there are some key differences between this code and a twilio code but it just like upgrades things that I wanted to increase the functionality okay so how do we actually launch this in repet okay so over in repet I'm already logged in but if you don't have account already just sign up you can sign up for free uh otherwise it's 20 bucks a month that's the plan that I'm on for uh repet core and I've got heaps of things running in repet heaps of things that I deployed I build all of my clients projects in here and whenever I build out code that has to be available online then I only use repet this is like the very beginner friendly way to uh serve code online okay so I'm going to go to create a reple and then I'm going to go to import from GitHub and then I'm going to go to from URL so you guys will be doing this as well you go to from URL and then we're going to click on this green button and we're just going to copy this button here so let's click this copy the clipboard back and repet let's paste it in and let's go import from GitHub all right so when you import from GitHub pretty much all the code that I have on GitHub is going to be transferred across into this repple so let's go confirm and close and as you can see on the left hand side all the files that I had in the GitHub are now transferred across into this repet so all I had to do was pretty much just press that confirm button and now I'm like 95% complete there's just a few more things I need to do to get this up and running and before I do that I'm just going to pull this bottom screen and bring it to the right hand side so now my interface is a little bit cleaner and I'm going to go back to GitHub so scrolling down over here I've got a command to run and that's npm install and that's going to install all the dependencies that I need for my app in order for it to run so I'm going to click here and copy this command and back in rep I'm going to go across to Shell and I want to paste this command in and hit enter awesome now I have all the dependencies installed so what is the dependency now if I go to this package.
json file I've already listed out all the dependencies that we need for this app over here so that's why when we ran this command it looked into this file and it basically installed all this stuff so that we can actually run this app now the last thing that we need to do is create a new file which is going to be EnV and in the EnV file we're going to put our openai API key so I'm going to go back to index. js and our variable for the open AI API key is this so back in a EnV file I'm going to paste it into here and put an equal sign and now in my open AI account I'm going to go to dashboard and then I'm going to go to keys and I'm going to create a new secret key I'm going to call this test-b create secret key copy this hit done and I'm going to paste it into here all right awesome and I'm going to hit run perfect this is the message that we're expecting to be printed when this is available online and this is the URL that we're going to be using right now in this test development and I'm going to show you how to actually deploy this more permanently later on but for testing we'll be using this URL so it's already exposed to the internet and uh yeah pretty much everything here is done for us I will say that right now in this index file we are using my preset stuff so B Automotive and some other things that I've configured here but we're just going to set up twio now test out the phone call and I'm going to come back into here and explain the code for you so this is twilio's website and again we're going to be using twilio to uh generate that phone number for us so that we can then plug in our AI agent so go to twilio. com and then create an account by clicking start for free but I'm just going to click login CU I already have an account now if you create an account for free with twilio you already get a phone number and I'm pretty sure it's a us-based phone number and even though I'm in Australia I actually had a Us phone number as well so yeah I think the default is going to be a us phone number and you get like 15 bucks of credit so you can actually run this demo by using a free account you don't have to pay for anything just yet uh but you cannot add a new number until you upgrade your account and I think you have to deposit like 20 bucks so I've actually turned off my Us number and I've upgraded my account cuz I wanted to get a couple numbers for my Au so then click on your region where your phone number is then click on phone numbers then click on manage and then click on active numbers mine's already popped down so I'm going to click on active numbers numers so your default phone number if you just created an account should have the voice capability so you'll be able to see this little phone icon uh if you don't see the phone icon for whatever reason just you might have to like delete that number or repurchase a new number but we we need this in order to actually make this work and yeah I'm pretty sure by default you should already have this in your free number so then all we need to do is Click into one of these numbers so I'm going to click on this top one and then you see a page like this and then click into configuration so again this is my Au number I think the Us number uh had different options here uh the names were slightly different but they do the exact same thing um just make sure you choose the one that has web hook and I've got web hook over here so what we're pretty much doing is saying for this phone number when someone calls that phone number so when a call comes in we want to make a web hook request across to a certain URL with a certain HTTP type so this is like making an API call and we're making this API call to our repet to our repet so that we can communicate with our realtime API so all we need to do is go back to repet and then over here we copy this URL then back in twilio let's paste it into this first option here again this is for when a call comes in now we're not done just yet we still need to put something at the very end of here so back in repet now go to your index.
js file if you're not already there and then scroll down a little bit until you see this so incoming Das call so just to highlight it like this so you can see it and we need to append this incoming - call on the end of this URL that we put into twilio so I'm just going to copy incoming D call back in twilio I'm going to paste it in here so why do we have to append this little bit at the end so this URL is actually just targeting this app in general and over here this/ incoming D call is basically saying hey twilio when you're sending your phone call to us yes we're sending it to this app but then we want to plunk it into here so from this code onwards we want to insert your twilio call that you're forwarding and then do everything else with it and again I'm going to explain the code in a second but that's why we're pending this to the end of the URL in twilio so now that we're done we have our URL with the incoming call appended to the end make sure that we're using the post request and now let's hit save configuration awesome now that everything is saved let's go back into repet and we've already got this deployed so again I actually just hit run here but if you haven't hit run I'm just going to stop this to show you just hit run here and now we've got this running and it's available to access by the internet so I want to make a phone call and we're going to be looking at these logs over here so let me just make this bigger and then we're going to be getting logs of whatever is happening in the code so the first line should say um incoming phone call and then we'll probably get some uh different logs into here like the transcript of the conversation as it's happening and a bunch of different stuff so let's actually give this a try Okay so let's actually give this number a call so I'm just going to plug this in directly into my phone and let's give it a call hi you have called Barta automotive center how can we help hi I want to schedule a service for my car of course I'd be happy to help with that may I have your name please yes my name is Bartholomew thank you Bartholomew what type of service or work does your car need I want a full service and I also want a car wash and you also want a r sorry could you please complete your request it seems like it got cut off I would like my car to be washed as well absolutely we can add a car wash along with the full service when would you be available to bring in your car how about next Thursday at 11:00 a. m. next Thursday at 11:00 a.
m. works perfectly we'll see you then Bartholomew if you have any other requests or need to resched feel free to let us know all right see you so as you can see we got a bunch of logs here for example this is an agent response this is exactly what the AI agent said to us this little bit here which is session underscore this is the unique session ID variable which uses have simultaneous phone calls without them colliding so if I keep scrolling up a little bit more then I've got the user message which is what I said Thursday at 11 so how about Thursday at 11 and we're essentially creating this transcript of the conversation so that at the very end I've got the entire convo in text and I can plug it into a completion API call and you know par get different data from and then send across to make. com but as you can see uh this phone call was successful the only thing that we didn't do was we had an invalid URL for our web hook so if I scroll up here I actually didn't put my make.
com URL into here so Crossing make. com I've just got a web hook and then a Google Sheets integration so I'm going to click on this and I'm going to copy this address to clipboard and then back in repa I'm going to paste it into here between the quotation marks and now since I've made this change I actually have to uh resave this so I'm just going to hit stop and if I hit run now the server is live again with the updated URL so I'm going to test this out in a second but let me just go over the code and just explain what's happening so let's expand this out a little bit more so again Watch the twio video CU it will give you a good idea of all the nuances of this code but at high level what we're doing is we're setting the system message for the AI receptionist so as you can see I basically said hey you're an automotive AI assistant just have a nice conversation with the user get some of their information like their name their availability and what kind of service work they want to have ask one question at a time and then over here I'm setting the actual voice that we're going to be using for this so we've got the alloy selected and if you go to your open AI account and click playground then click real time you can see there's a couple voice options here so if you drop down yeah there's a few there's alloy Eco and Shimmer and you would just plug them directly into here so you just choose your favorite one and it's got to be in lowercase and then for my use use case I just insert my web hook URL that's going to go across to make. com and over here is where we're logging all the events of the phone call so across in your open AI account if you go to API reference and you go to client events and then to server events these are all the events that happen on the actual phone call that we're having with our assistant so each of these little events will basically contain some different information and this is what I use to obtain not only the user transcript but also the agent transcript and again from my use case I wanted the entire transcript and text so that I can actually use that and send across to make and get different details and do different things so then over here we actually have our first message that we're sending across to the user which is hey you know the phone call is successful you know how can we help you so this is like the initial message that we send to the user when they make a phone call and then as you saw I said something like hey I want to make an appointment and again that's that first message coming into that incoming D call uh endpoint that we appended into twillo and then scrolling down a little bit more from here gets a little bit more complex but we're basically setting up the main configuration for the uh AI agent this stuff over here the session variables you'll remember from this over here so this session.
update event uh this is basically what we're doing here ours is a little bit shorter because we don't have we don't have tools in here and uh actually one thing that I introduced into this code that wasn't ino's code was this uh input audio transcription so we're using the whisper -1 model to to transcribe the conversation as it's happening live so yeah the too GitHub code didn't have this it actually took me a little while to figure out why I wasn't able to get the transcript of the audio uh yeah you need to enable this to be true so you can use the whisper model to transcribe your audio into text and that's what we're doing here we're using the whisper model but yeah we are setting our API key we're setting the real time equals V1 parameter here as well this is our actual real time API websocket address and over here this is where I do some transcript handling so like we saw in the console before we had user message and agent message that's thanks this additional code where we're taking the actual audio we're converting it into the uh text transcript and then we're going to be using that at the very end once the phone call is ended to send that across to make. com so scrolling down a little bit more so over here I'm actually making a chat GPT completions API call I'm using the structured outputs schema and I'm sending the system message to be hey extract the name availability and special notes from the transcript that we're going to provide to you for the user message we're actually sending the transcript itself so that's all the messages that I'm saying all the AI agent messages all that is inside this transcript variable and then we create this schema that basically creates a structured response back to us that contains the customer name the customer availability and any special notes and these are all just strings so they're just like straight text then scrolling down a little bit more over here we're creating that web hook across to make. com so we're taking the web hook URL that we set at the very top of this code we're making a post request and we're sending the stringified payload so that payload is essentially these three variables and then those variables are going to be landing right into this web hook here and then after they land I have a Google Sheets module which basically takes the variables directly out of here and places them into the name availability and notes sections of my Google Sheets which is over here name availability and notes so I'm going to hit run once and then I'm going to quickly make another phone call and I'm going to show you exactly how the variables show up into here and here all right so I just finished my call and then over here we have our inbound payload of customer name availability and special notes this time I want a full engine rebuild and then a Turbo upgrade package and then over here we basically inserted those variables into our sheet and now looking at our sheet I'm booked in for next Tuesday at 1: p.
m.
Copyright © 2025. Made with ♥ in London by YTScribe.com