"How to Use ElevenLabs - Best Text to Speech AI Voices (FULL GUIDE)" - Transcript | YTScribe | YTScribe

in this video you're going to learn how to use 11 laps how to about text to speech speech to speech design voices clone your voice and I'm going to show you everything I do to get the best results from this amazing speech synthesis [Music] tool so if you don't know 11 Labs is a speech synthesis AI tool that allows you to generate speech from text and manipulate audio of voice recordings to give you a realistic AI voice and I think that 11lbs is genuinely one of the most realistic AI voice generators out there in 2024

and honestly it's actually super cheap too you can try it for free but you quickly get Limited in terms of usage and if you create a free account your limits are a little bit bigger but honestly it's super cheap so I just recommend that you start on the starter plan which includes 10 custom voices 30,000 characters which equates to about 30 minutes of VoiceOver according to their estimator and on top of that the Start Plan also includes the commercial license which means that you can use it in paid projects and plus it's like a dollar

for the first month and then $5 afterwards which again is super cheap the price of a coffee and honestly I think more companies should make their tools this available and later on down the line if you do start hitting limits which I don't think you will do to start off with you just go to the creators plan and now most people I speak to about 11 Labs don't actually fully understand what it is most people just think it's a simple text to speech tool text to speech generator but it's much more than that the 11

Labs AI actually understands context which means that if you write something in the style of a book the AI is going to try and interpret how to perform a setup passage from the context of the writing itself you can guide it through the writing as you're writing it which is super cool and on top of that it's also got a bunch of settings that can be used to achieve a wide range of emotions and it's more like a voice actor than just a regular text to speech generator and I think you're going to understand why

a little later on in this video so let me show you on the computer hey YouTube there's a link to sign up to 11 labs in the description it is an affiliate link but it won't cost you anything extra and it'll help Alec make more highquality videos just like this one so once you're in your your account the default tool is the speech synthesis tool and this is where you can generate voiceovers from text so basically it's the text to speech tool at the top you'll notice that you've actually got task where you've got two

options text to speech and speech to speech and we're going to cover speech to speech in just a little bit now down here we've got three settings or three drop- down menus in the settings section and these three are probably the three most important that you want to take your time and customize the first drop- down menu is where you can choose from a bunch of different pre-made male and female voices so if I open up as you can see I've got loads of different options on the left here we can click to preview never

mistake motion for action and then you'll notice it's got a name and then a few tags now these tags actually mean something the first tag the purple tag is the accents as you can see here American we've got Irish British English Italian and then the second tag is actually the tone or the style of the voice so whether it's Whispering whether it's calm whether it's well-rounded and then the third tag is actually the recommended use case so the tag in red as you can see meditation ASMR narration news presenter and if you spend a lot

of time on social media I bet you've actually heard this voice right here allow the world to live as it chooses and allow yourself to live as you choose that one is probably like one of the most famous 11 Labs voices but then you've also got voices like Arnold which I think sound like what worries you Masters you and then if we scroll down a little bit we've actually got where is he you'll recognize this one maybe life isn't about finding yourself life is about creating yourself maybe it's just me and obviously 11 Labs didn't

do this on purpose I think but interesting next you've got voice settings and this actually looks a little bit complicated to begin with but don't worry I'm going to explain and I'm also going to give some recommendations as to what you should do so you've actually got three sliders the first one is stability the more you slide this to the right the more stable it's going to become meaning that there's going to be more consistency in terms of the voice on Regen Generations but it can make it sound a little monotone the more you slide

it to the left the more unstable more variable it's going to be and increasing the variability can make the speech more expressive with the outputs varying a lot between regenerations but this can actually also lead to instabilities so as you can see we've got this red zone so we should always try and keep it at 30% or above if you are generating long chunks of text I do recommend staying on the more stable side that way way your renders are more consistent but if you're doing short on liners or short video content maybe get a

little bit experimental see what kind of crazy results you can get with the unstable stuff hey guys if you haven't signed up to 11 labs yet I would greatly appreciate you using the link in the description hey guys if you haven't signed up to 11 labs yet I would greatly appreciate you using the link in the description next up we've got Clarity and similarity enhancement this dictates how closely the AI should ader to the original voice when attempting to replicate it if the original audio is bad quality and the similarity slider is set to high

the AI May reproduce some of the unwanted background noise when trying to mimic The Voice from the recording and now for text to speech this is actually okay as we're choosing from pre-made voices so they can almost always be set to high I just tend to leave this on default but basically if your captured audio is good set it to high and if your captured audio isn't as good maybe try playing with it down a little lower and again I highly recommend playing around with all of these settings to see what kind of fun voices

you can get and then next we actually have style exaggeration style is only available with the multilingual language V2 model and I'm going to get to the language models in a bit but this setting just attempts to amplify the style of the original speaker it is a newer feature and more of an experimental one the more you increase it the more instability you'll get um but again it can be a fun one to play around with but you do get some pretty wacky results the higher the slider goes up as you can see the Red

Zone here is like 50 to 100% so I always just leave this zero and even 11 Labs themselves recommend keeping this setting at zero at all times so again it's only kind of like an experimental thing if you want to go crazy have fun with it and then down here we've got this speaker boost toggle which is on by default and again this is just another setting that was introduced in the newer language models pretty simple it just boosts the similarity to the original speaker most of the time I've actually found the difference here to

be very very subtle so not really worth it I just leave it as is which is default toggled on and then down here the third dropdown we've got the different language models and I just want to show that if we actually change this to multilingual V1 you'll see that we lose those last two features but if we go down here they actually say that they recommend switching to the 11 multilingual V2 model to get the best possible quality for this voice so I'm just going to go back to it but to explain what these are

we basically have four distinct models and they've each got unique features es so we've got English V1 the oldest and fastest tailored for english-based tasks but it's got limited accuracy we've got multilingual V1 right here so this one's a little bit more experimental supports multiple languages including uh various English dialects German polish and as you can see a few more down here then we've got 11 multilingual V2 which is a much more advanced version supporting 28 languages Japanese Chinese Korean and many European languages and this one basically more stable it's got more diversity within the

languages and also better accent accuracy and then finally down here we've got 11 turbo V2 which is optimized for real-time low latency applications in English dialects so I actually just always leave it on multilingual V2 each model does kind of have its own strengths but if you don't really know what you're doing like myself I would just always leave it on this one it's the best one the more one of the more recent ones the one you've got the most creative freedom and flexibility with and then once you set all your settings down here you've

got the text box here you can go and put anything you want in you can type out your text you can paste your text anything you want the AI to speak there's no rules but there are some pointers as to how to get a better output the first one is pauses you can use the syntax break time x seconds and basically that just adds a break and then the X represents the number of seconds you want the voice to pause so break time equals 2 seconds is for a 2 second pause and then you can

either do break time equals 1.5 seconds for 1.5 second pause and the good thing about this is that this creates a natural pause within the speech it doesn't just add a silence Gap it makes it natural wait give me a second to think about it okay I'll hit the like button and I've also tried and read online that a simple Dash or three dots will do the trick but they are a little less reliable and sometimes don't sound as natural as using the syntax wait give me a second to think about it okay I'll hit

the like button button next we've got pronunciation so for the English V1 model um which I'm not using right now you can customize the pronunciation using the International Phonetic Alphabet which is IPA but I don't know enough about that so I'm not going to talk about it I'm just going to share a link to some documentation from 11 Labs down below and then we've got emotion so with 11 Labs you can get the AI to read your text with a certain emotional tone with dialogue tags or context 11 Labs actually even says that the best

way to write and prompt its AI is the same way as it's written in a books and this is when it comes to Emotion by the way so for an example for a happy tone you might write she joyfully exclaimed what a beautiful day or are you sure about that he said in a confused tone or even stop it he shouted angrily for an angrier tone so just let you read things in a book try doing that why haven't you subscribed yet why haven't you subscribed yet he shouted angrily one issue there though is that

it will also generate that context but you'll just have to go and edit that out in a video editor like Premier Pro and then pacing it's the same thing as the emotion you could imply a slower Pace with a descriptive language like I really like long walks at night he said slowly I really like long walks at night I really like long walks at night he said slowly and by the way I would love some tips from you guys in the comment section down below so if I missed anything or if there's something that you

do that you think is super helpful let me know but now that we've done text to speech Let's Take a look at speech to speech and speech to speech is essentially exactly the same as text to speech all the settings are the same so you've got the three drop down menus except here you can only use the English V2 language model there's no other option and then if we look down here it's not a text input It's actually an audio input so we need to input an audio file or we can directly record audio and

then what it would does is it will regurgitate the exact same audio with a different voice voice so if I said hi my name is James run it through with that different voice it sounds like hi my name is James so voice changer might be a better name for the tool instead of speech to Speech but you are converting one voice tone to another which is super cool super fast it means you don't have to spend time writing stuff out I could just grab the mic say exactly what I wanted to say and then have

it said in a different voice but the cool thing about that is that it'll also respect the Cadence and the delivery for my recording but just do it with a different voice good good morning ladies and gentlemen today we're going to be talking about one of my favorite things pancakes with lemon and sugar good morning ladies and gentlemen today we're going to be talking about one of my favorite things pie taste with lemon and sugar so that is actually super cool it's way much easier than having to play around with the poor syntax pacing thing

but let's say I'm happy with the way I imput but I'm not happy with any of the pre-made voices well all I have to do is go and click on add voice right here and then I'm taken to the voice lab and this is where you can design an entirely new synthetic voice from scratch and now if you are on a free account you actually won't have access to voice cloning but to do so all you need to do is subscribe to the starter pack again it's $1 I think it's super worth it but on

a free account you can also access uh voices from The Voice library and these are voices that people have actually decided to share with the community the voices you make will only be available to you unless you decide to do otherwise and so here it's actually super cool all I have to do is click on add voice and then as you can see see I can go and design a voice so here I can choose between the gender male and female and then I can choose the age young middle-age old and then I've got the

accents when you're designing voice you've only got these accents for the English language below that you can adjust your strength so let's say I wanted a female old I'm going to go British and then I'm going to do super strong accent generate first we thought the PC was a calculator then we found out how to turn numbers into letters and we thought it was a typewriter so as you can see that generated quite fast and it was actually very good and I do want to mention anytime you generate obviously you are going to use some

of your characters here I just lost 128 CU I use the default text you do have to do 100 minimum and then if I'm happy with that I can actually just click use voice I can give it a name so let's just say Old English lady I'm just going to leave these tags but you can change them and then you can describe it again put whatever description you want in there if you're making it super unique and if you're making a lot just so you can remember then you click create voice and then that voice

is added to your library you can go and find these later on as you can see I made one earlier which is me with a cold cuz I'm super sick right now but I'm still making this tutorial and then if I actually want to clone a voice I just click on plus and I'm going to go to instant voice cloning here we're going to give it a name so I'm going to click Alec 2 and then we just want to drag and drop a file so I can just upload any audio file open it up

whoops it's got to be less than 10 megabytes I forgot that but I've got one right here I can just drop and for your audio recording there's actually a few things to know to get the best clone of your voice the quality of the recording is super important and that's actually more important than the length they recommend more than a minute but anything above 5 minutes doesn't make much of a difference you want to avoid as much noise distraction as possible so any background noise any distractions no you want to try and record on a

mic um with Obviously good audio and that will help you get the best outcom as possible 11 Labs I think they said themselves that a 1 to 2 minute recording of audio without any Reverb artifacts or background noise appears to be The Sweet Spot the AI then tries to mimic everything it hears from the audio that means the speed the inflections the accent the tone the breathing pattern the noise and mouth clicks so you can almost do a little bit of voice acting when you're capturing it to get an even more unique voice but because

it does pick up on everything in your recording if there is anything else in your audio like background noise the AI is also going to try to replicate those so honestly can't stress it enough the higher quality audio recording the better the voice clone is going to be but then once you've done that again you could add more samples if you want to I don't think that adding more samples is necessarily better I think one recording with the same voice the same delivery the same accent gives you the best voice clone more recordings I think

kind of just distracts it a little bit that could just be me and I haven't cloned a voice with 25 samples maybe something you guys can try let me know in the comments and then below that you can add labels description and then just confirm that it is a voice that you're allowed to clone obviously not someone that you are cloning their voice um illegally and then you just do add voice and and then the AI takes a few seconds a few minutes generates your voice and boom I can now click on use I'm taken

to the speech synthesis tool with my voice pre-selected and then I can go and change these adjust these and type out whatever I want and finally the last thing with 11 Labs can do is dubbing and here essentially you can go and translate a video from one language to another and this isn't translating in the form of subtitles is translating in the form of taking your audio and actually saying that in another language in your voice and one last thing before I wrap up this video if you actually found it helpful and you want to

support me free of charge down in the link if you haven't yet signed up to 11 Labs it is an affiliate link I'll get small commission and no extra cost to you but if you don't want to do that or you are already signed up I would just massively appreciate a subscribe and a like button thanks for watching peace out