LiveKit XTTS V2: Best Open Source Local AI Voice Assistant (100% Free)

4.67k views3061 WordsCopy TextShare

Devs Kingdom

This video demos how to build a Local AI Voice Agent using LiveKit, Coqui XTTS V2 and Groq / Ollama ...

Video Transcript:

hello guys this is Jacob today I'm going to show you how to build a a local Voice Assistant using live kit and dxt sv2 so that being said let's get started so if you can see the screen so there's a uh basically La kit agent playground here so when I actually click connect hey how can I help you today hey uh this is Jacob how are you hi Jacob I'm doing well thanks how about you uh doing fine uh could you uh recommend a few uh breakfast options here are a few breakfast options you could have scrambled eggs and toast oatmeal with fruit or yogurt with granola and berries cool uh thank you so much you're welcome Jacob have a great day you too bye bye Jacob so as you can see this works perfectly as your local voice assistant so in this tutorial I'm going to show you how to build this a step by step uh on KGO so and that's 100% using open source stack so you don't have to pay anything there's no stram attach so that being said let's get started uh so let's go to uh k. com let's create a notebook called labkit um and then all you have to do is first uh set up the uh labkit front end so just go to labkit um G directory and then I just do a git clone with live kit agent playground uh you can just go to the GitHub directory and then take a look at um the description and read me and uh to get a feeling what it is so basically it's a nodejs app so you can uh just do a clone and then set up the U environment also like the your lab Kap here K lab kit secret and also the public lab kit URL so you can get it from your uh lab kit console which you can uh get it from here it's a cloud. la.

just register account you should be able to get it uh and then um do uh C to this agent playground and do npm install so uh and after that you basically want to expose this to public right so you do need to install and grog so I um mention and uh demo this uh set steps um for all the other videos so I'm going to skip this on this video but make sure that after you set it up uh it's going to point to 3,000 so uh this is basically uh the public URL that we generate from the enr and point into 3,000 which is the one that you just see 89 AC so this 89 AC so this is one that's actually uh we just did the demo and that's live so that being said let's continue then you have to do basically is to uh do a soft link to the node so so that um basically the K environment knows where to find the node so you have to uh can do this step uh after that you want to actually spin up the uh node or mpm um within a uh container environment which is a uh basically a process so you don't want to actually expose it uh inside the notebook because it's going to block the execution so you have to install supervisor so um after install supervisor just make sure that the supervisor have a configuration that points to the uh agent playground so you can uh just copy paste this command and make sure also set up the log directory for debugging then uh that's it so that's for the playground setup so um and then next thing we have to set up the agents so this is basically where you um going to tie the uh playground to so uh the second basically is the uh agent setup so you can just go to the agent lab kit agent which uh they have a get rle here you can also go through the rle to get a feeling of what the agent does and then they have some examples here so you can just click the examples and see you know basically how to set up a agent um so let's go back to Koo so then after that we just go to um the uh agent uh examples and in this demo we're going to uh demo The Voice pipeline agent which is the one that you just see the voice pipeline agent so we're going to build a live a uh voice agent so just go to this example and then install the requirements after that the same as to set up your uh API key and API secret so do a la kit URL also the URL that uh you can have in the um left kit console so just grab all that and paste it in the uh with pipeline agent example. environment so the previous one is environment local this one is environment so make sure that you set up all these environment variables and then uh the next thing we have to set up 11 laps so this uh the tricky part so um so by default uh you can see the uh agent They Don't Really support um uh xtts right version two They Don't Really support xtts or the open source stack so mostly uh if you see here they basically support all the paid Stacks paid plans like the open AI uh also the 11 Labs so but uh in this tutorial we're going to show you how to do it using xtts which is open source um so after that uh you have to uh make some changes if you have installed 11 laps we just take 11 laps as an example plugin you can just modify some of the code of 11 Labs then basically it's going to call H xtts so uh let's continue so the uh 11 Labs there's a class called ttspy so in this file make sure um you change a few things so first you have to change the API base URL uh V1 so this is where uh the 11 Labs um basically the plugin is going to call so instead of calling this API 11 Labs change it to 8020 so this is where the um htts APS server is going to be uh running so make sure uh you use the API to point to a local host which is inside the notebook so then you point to 8020 so that's where um the request is going to make and then you don't have to set up the API key uh because we really don't need it but um you can put it any string so that's fine we just don't want to change too much of the existing code um so you can leave everything else and then go to uh these capabilities so make sure that streaming equals to false because the htps with V2 the API server They Don't Really support streaming so U make sure that turn into forse so it's going to basically wi up the um the default live kit uh streaming input so you don't have to basically do the uh streaming input by yourself the lab K where provides that just make sure you're turning the stream to FSE because the API server doesn't support it and after that uh you have to basically make sure that uh it doesn't have to be but uh you can just comment out the stream because you don't need that it the synthesized stream uh that's basically your stream input so instead of a tax input it basically a stream input so we don't support that so uh we just comment that out and so we just use a synthesiz and basically we just want the trunk string okay we don't want anything else so inside this uh main main task this is where the um magic is going to happen so make sure that um uh you instead of doing this uh default session post so basically it's going to call the 11 Labs right so and also passing the header so you don't want that right instead we want to use our own https API server so then um all you have to do make sure that uh you use the space URL which is Local Host 820 right so the space URL and the Endo is TTS stream so make sure the TX is using the safe stream so you do need to do some encoding for the TX which is a prompt that's um going to be added so then um make sure the speaker wave is through mail. wave so you can pick the um a few different waves or your own waves but the male wave is one uh that um the httpx TTS uh API server provides by default so there are three some post you can just use any of them or the one that um you prefer um and language to make sure that you choose to be English so that's all you have to set up for the uh h x TDS local uh API server and after that make sure you use a session get not session post but use a session get and basically to get from this URL which you uh Define in here the headress is not needed but we just add it here um so then in the response when they actually to get the response back for the uh a TTS uh API server make sure that uh you don't use the uh encoding because the encoding is for 11 Labs so you don't need the encoding to MP3 just comment that out um and then just use the one that's by default so if it's not MP3 then basically rest for to this L block so just make sure uh you only use that not the MP3 one and uh so this is all by default uh working so you don't have to change anything just comment the first p is out so the X uh TTS V2 server API server is going to basically return the B data uh so it's um G to just work by default if you just uh use this um Al block so it's going to be returning as a bites um and after that basically it's all uh the default you don't have to change any of that so that's basically the um setup for 11 Labs TTS so you have to make some changes to make that work and uh the uh voice pipeline example you also have to make sure that um there's a minimum assistant file so this is basically the uh script we're going to run uh make sure that the pipeline is updated also so basically there's a pipeline it's a called a voice pipeline agent so this is going to have a vad which voice detector you have the St make sure that you don't use the open AI this is by default right you can see from the default pipeline they're using uh deep gram using open a open a TTS you don't want any of that um so you just want everything open sourced so we use the gro um just for um demo purpose you can use AMA as well but uh just use Gro and we pick the U uh Speech to Text as the um uh whisper large we3 turbo so then you can just use the base URL as the um roke API and then uh also make sure that you put put the API key in here also the uh use also use Gro um we just use the Llama 3.

1 70b so you can use anyone you prefer go through the gro uh models that are supported and then just pick the one that you like and or use AMA uh if you want to uh use Lama on local I have a lot of tutorials use Lama on local feel free to check that out you can also set up in here um so the El Labs U you can paste in whatever API key it doesn't matter it's just string that's not going to use but it's going to wiar up the xtts uh API server end point so that's what we're going to demo later demo previously we're going to set up that later um so then all these three just um use everything that's uh open stack so you don't have to use any of the one that's by default because it's going to make things very complicated which is also like uh very high cost so we don't want that and after that it just make sure that um you'll be able to again save it to the minimal assistant so that's all you need to do and uh for the live agent so because you already set up live agent make sure that you uh uh configure that through the supervisor so because another process the first one is the front end which is the playground the second one is the agent uh which is the TTS agent uh you're going to uh The Voice agent you're going to uh uh set up right and then make sure that you uh add a configuration in supervisor and do a python uh which is call this a minimum assistant and then basically just run through this common line and then make sure you also add log to check out if there's anything uh not working or uh whatever that happened within this process and after all this set up make sure that you do a supervisor stop start and then it will basically spin up the two process okay the playground and the agent so the last thing we have to set up is the um uh xtts V2 API server so this is the most important part of the um so this open stack setup for live Kit voice assistant um so uh we used basically the uh D were one to three so this is actually the Passover um that was built by uh one of the awesome developer um so thanks uh for this guy and then basically we can um so basically this is not perfect though uh but you have to make some couple of changes we're going to talk about in this video um just make sure you use this umle and then just do a kit clone after that you have to set up torch Tor Vision Tor audio make sure you do this uh uh CA 12.