Gemini 2.0 is Out NOW! Full Breakdown How to Use for Free

48.11k views3993 WordsCopy TextShare

The AI Advantage

Google dropped a bombshell this week when they announced the rollout of Gemini 2.0 and detailed thei...

Video Transcript:

so Google just launched Gemini 2. 0 project Astra project Mariner Jewels various agentic products that work inside of games and they're integrating all of this across their ecosystem of apps including Google search so there's a lot here and they've clearly been preparing for this Big Gemini 2. 0 push in a minute here now if it might seem slightly confusing or overwhelming to you well you're not alone they sort of bundled all of these things into one big announcement and in today's video we're going to spend some time picking these things up part and getting our hands on what's actually available today which is the brand new Gemini 2.

0 flash instead of Gemini Advanced or Google AI studio and yeah if you see me looking at the screen here and there that's because all of this is clearly too much to memorize and Google releases and naming have been a bit all over the place neveress there's some quality products here and a lot to talk about here especially in terms of how this relates to what's available out there in the market already let's get into it all right so let's talk about Gemini 2. 0 there's a lot to unpack here and I would really put this into two camps there's the things that are available today the Gemini 2. 0 flash model and also the ability to live stream your camera and your screen to Gemini 2.

0 flash it's multimodal and it can interact with you on a live basis this is the first time we see this fully available and in action so that's going to be super interesting we're going to start out the video by reviewing that and then we'll talk about all the other things that are built on top of this that might not be available yet but surprisingly Google has actually been inviting various people to test these features so people got their hands on it and they released a bunch of Hands-On demo videos here so there's a lot to talk about with Gemini 2. 0 but as mentioned let's start by reviewing what is possible today and it's really this new Gemini 2. 0 flash experimental model that you can access either through Gemini Advanced or Google's AI studio right here so here's the thing full transparency and honesty is important to me in this Channel and I want to just point out that I just resubscribed to Gemini Advanced to create this video because I thought hey flash experimental is available in there plus they announced a new research mod that is supposed to have advanced reasoning and that it was supposed to be available in here if you check out this deep research article you can clearly see a screenshot of how this is available inside of Gemini advaned and it said it's rolling out but I don't have it yet so that was a bit disappointing and then while testing this I also found out that everything is actually available for Google AI Studio where you have some free credits and I wouldn't even need to subscribe to Gemini Advanced and as mentioned I wasn't even subscribed to this recently because about a month ago I just looked at all my subscriptions and I made a decision to unsubscribe from everything that I haven't used in a few weeks or months and Gemini fell in that category so far they just didn't really differentiate themselves with their AI products but I think that changes now I'm not saying Gemini Advance changes that but honestly what they shipped here today in Google AI studio is so unique and so cool and so responsive that I seriously think this is worth your attention and you should see what they did here but let's start with a very basic style comparison prompt as I always like to do them because whenever a new model comes out and tries to be a contender for the spot of the best llm this is one of the things I personally really care about what does it sound like what does it write like so just told it to write an essay about penguins and you can check out the writing style over here classic chat GPT over here and then honestly this Gemini flash experimental doesn't sound very different but what I found interesting was the fact that when I followed up with a little stylistic prompt where I told it a little bit about my writing style and told it to rewrite this essay about penguins it actually came back with something that sounded well way more person person Al and reflective then both chat GPT over here and Claude over here now I'm not going to sit here and review every single word in here but just have a look at the first sentence Claud says you know what I can help but smile whenever I think about penguins it's just a thing about these waddling little birds that captures my heart every time I see them okay chap chipd starts up by saying I've always had a soft spot for Penguins maybe it's the way they waddle like they're wearing tiny tuxedos or how they seem to embody resilience in the face of an unimaginable cold okay but when we look at Gemini over here it just to me feels a little more reflective and personal I don't know maybe it's just me but look at this okay so Penguins I've always been kind of fascinated by them they're just weird you know in the best way like they're the birds that can fly in the sky anyway so I just thought that was interesting and quite different from chat CH and Claw and Gemini always had a bit of its own flavor For Better or For Worse so that's interesting but not really unique and not really a reason why you would dish out $20 to subscribe to this or book Mark a new thing to use regularly so what is unique here is the fact that this is multimodal in a way that we haven't seen out of any of the competition yet concretely if I go in here you might already know that you can upload videos here and not just upload videos I could use the camera right here allow this time and record a little video hey what's up I'm just curious if you can see what the camera sees how many fingers am I putting up and what does this mean I'm just going to prompt it and say what do you see and right now it's not even running right it's just uploading this little video video I'm going to run it and look at the speed 1 second 2 seconds not even 2 seconds and boom we have the result it describes it and it says he's holding up two fingers in a p sign gesture now I don't know about you but that was impressive no two seconds to analyze all of that not bad and it goes further because you could go up here and say stream real time this is brand new and it only works with Gemini 2.

0 flash experimental and you could actually stream things in here let's just do the following let me share my screen here with Gemini give it access to my tab here Google introduces Gemini 2. 0 sh okay it seems to be sharing okay tell me what the difference between Gemini 1. 5 Pro 002 and Gemini 2.

0 flash experimental are on the various benchmarks that you can see here sure I can help you with that according to the Benchmark shown Gemini 2. 0 flash experimental score is better than Gemini 1. 5 flash pro2 on all of them okay for example true all right oh sorry keep going okay I understand as I was saying Gemini 2 2.

0 flash experimental scores higher on all benchmarks specifically it scored 76. 4% in mlu Pro compared to 75. 8% by Gemini 1.

5 Flash proo2 92. 9% in natural 2 code versus 85. 4% look at that a true AI assistant that gets it right sees your screen takes audio input virtually no latency here right this is what advanced voice mode promised in the beginning and now we have it maybe not in your phone yet but certainly in Google AI Studio this is very usable right I didn't even load any credits you can just give this a shot right away which is wonderful and then you could do the same thing just with your camera as you can see right here here we are um okay what is this gesture what am I doing the person in the video asks what gesture they are making they're gesturing with their right hand while speaking I don't know about all that like try again what is this gesture what are you seeing in the video the person in the video raises both of their fists in the air they then ask what you see in the video yeah can you tell me what it is the person in the video asks you to tell them what the gesture they made is well I'm not asking you okay so it's a bit confused in terms of like who's speaking here but it works and it recognizes it no the person in the video states that they are not asking you what the gesture oh my God the person in the video but you see the point and this works there you go this is pretty cool and as I said this is the Baseline and what's available today okay you can access this through the studio inside of Gemini Advance you have this new model without some of these multimodal capabilities with the reasoning model apparently rolling out today so it should be here soon so this model is really about diving deep into a topic honestly this is a perplexity in chat gbt search competitor where it just looks at various sites pulls all of that together and presents it back to you with links and citations is it better than perplexity my intuition would say no it's probably very similar but only time will show and I will Circle back to that question once I have access to this in another video okay so those are the things that are available today or supposed to be available today what else did they come out with because this is where it gets interesting you can see right here project Astra project Mariner we have agents for developers and some Dev updates and then some interesting little use cases like agents and games let's really start out by project Astra and by pointing out that all of these subsequent things are powered by Gemini 2.

0 they're powered by this flash model that is super fast multimodal takes audio and video input and has this fluent and natural sounding audio output and what this makes possible is experiences like project Astra now if you want all of the examples of this and you want to review them then I would recommend you check out Google Deep Mind YouTube channel here not the main YouTube channel that's a bit more commercial but here in the Deep Mind Channel you can really see some of these examples I asked what it would recall in the room and project Asher would call out the most esoteric like non-descript item it appears to be a Roby senior robot it was a toy made in the 1980s By Radio Shack and could move around and respond to voice commands so this is how it essentially works right you use your phone with the camera and this AI assistant sees what the camera sees and takes your audio input pleora of examples of this in use and I think this one just intuitively makes sense right an AI assistant that has eyes and can speak to me all right the tag shows to machine wash at 30° C avoid bleach and Tumble drying iron on low and dry cleanable then what setting should I use on this machine based on the laundry instructions of 30° C select the 30 setting under easy Care on the dial amazing as I mentioned there's many more examples here of him trolling for London and even using this with his glasses as we saw in the demo that came out with google. io a few months ago so this is nothing new but what is new is the fact that they actually invited some people and showed them an early demo of this and friend of the channel Matt wolf was actually one of these people who flew out to London and had a look at this and me personally this is what I really care about seeing this in somebody's hands that gives it a shot and he was walking around London and you can check out this Twitter post here or X poost I suppose where he's just walking around with a body Cam and giving an early version of project as a shot and the fact that they actually put this into people's hands is what really surprised me here positively that is what is this bike right here it's an every bike specifically a forest branded model noted for its electric assistance and 10 minutes free daily feature and the reason I like this is because he walks around and there's cases where it does get it right it says hey I'm not sure what this tree is it doesn't even have leaves what kind of tree is this right here that appears to be a deciduous tree likely a type of birch but it's difficult to identify the specific species without leaves so well that's really interesting there's no Early Access and no wait list for this yet so I guess we'll just have to be patient until this finally comes out and my suspicion would be that cat GPT is not just going to sit on their hands they have a few more days of announcements an upgraded advanced voice mode where that can use your camera and desktop would really make sense and all of a sudden they would pull ahead of Google which they do like to do which might be interesting because that would probably mean that it pushes Google to release project Astra even earlier than they plan to but let's see so this is one of the things that came out and an interesting fact that they revealed is that it has up to 10 minutes of memory you can run the assistant for 10 minutes and it still remembers next up this project Mariner and I want to kind of review this one based on all the experiences I have with browser automation tools that are agentic this idea is really powerful okay this is something that immediately captured my imagination I think a lot of people can get on board with the idea of an AI agent actually doing work for you in your browser pressing buttons filling out forms transferring data researching whatever it might be all of these tasks could be theoretically automated and a lot of them are tedious and people would like that here's the problem I've tried most of these tools that are available on the market now including cla's computer use that came out about a month ago maybe two months now and they all have one thing in common they make incredible demo videos it's really easy to record a 30- minute session take the best parts of that and cut a little best off together that looks incredible wow look at it I just prompted it and it went out and did all of these things by itself transferred the data filled out the form great problem is that in practice if you run the same thing multiple times you're going to get different results every single time it's super inconsistent it takes different routes every time and even this example that has been pointed out here I think it's fascinating and it's certainly the future but I personally am only going to believe that this works as demonstrated when I see this in a live setting I want to see a 60-minute stream where they play with this and where it fails sometimes because that's just what this thing does but yeah this demo looks incredible look she tells her to find the email addresses for these companies and it goes out and use Google search through that so ask the agent to take this list of companies then find their websites and look up a contact email I can use to reach them this is a simplified example of a tedious multi-step task that someone could encounter at work now the agent has read the Google sheet and knows the company names then of course it goes ahead it searches the web and enters all the different email addresses but in practice if you do this with some of the other models that I've tried again I haven't tried this project Mariner but in the others it just naturally gets lost in a website sometimes clicks the homepage then goes to contact and then there's maybe no email address right there so it clicks somewhere else and on some random sub page as soon as it finds an email address it enters it and it's not the one you would want to contact for example or it just straight up finds the wrong company I mean how is it supposed to know that there's only one Benchmark climbing on the web that's kind of a tough problem that's why Google search shows you many results and this agent is supposed to know better than you yeah that's a big ask so I'm just saying I'm a bit skeptical on this one but I'm super glad they're pushing this forward cuz I certainly want a project Marina that that works and some of these previews look very promising and in combination with this new flash model that just works as I showed you today this thing is available this product category holds a lot of promise and I'm glad that they're pushing in this direction okay next up we have some developer updates with Jules the agent for developers now this is something that's not available yet and to me it seems like there's a lot of competition in this category by now and I'm not exactly sure what would set Jewels apart from some of the other coding assistants whether in the form of a IDE Plug-In or a full-fledged ID like herser now what I do know is that they released several developer tools with this announcement and I think some of them are fantastic so one of them is new output modalities so Gemini 2. 0 flash can output audio and images natively which is amazing a true multimodal model and also all of these live streaming features are included in the API so it can stream video from your camera it can stream your desktop all of these could be built in through their API today it can pick up on interactions and voice activity and I have to say one more thing when interacting with this thing check this out I don't know how much you use advanced voice mode but open eyes version of this voice assistant has a problem of interruption sometimes you kind of want to interrupt it and it just keeps talking or takes two seconds longer check this out Gemini over here just seems to work hey what's up how are you doing today I'm doing well thanks for asking how about you yeah I'm doing very well too tell me a little story okay here is a short story for you once upon a time in a small village nestled beside a great Forest okay hold up make it make it a bit more dramatic how about that all right let me try a more dramatic story in a kingdom shrouded in Perpetual Twilight there was a forbidden prophecy for telling the rise of a shadow sorcerer all right now do it in a dark mysterious and how do you say somebody like somebody's voice after 50 years of smoking cigarettes and drinking hard liquor on a daily basis the journey is usually the part that you remember anyways do that voice okay I will do that in the Abyssal depths of a forgotten catacomb a voice rasped like gravel grinding against Stone as a okay there you go so it cannot do the voice like advanced voice mode but my point here was that the interruption does work better and all of this is available through the API too including the live streaming so you can expect this to pop up in applications where it makes sense all of a sudden you will be able to live stream your screen to an agent sure security concerns no doubt but fascinating that this is becoming available today cuz yesterday this wasn't a thing you could easily do and then this all out they included a few more examples one that particularly sparked my interest because this is just my personal interest always has been is agents in gaming now check it out they had a few demos like this you can check them out on the Google Deep Mind YouTube channel given the base layout I recommend attacking from the bottom or Southside this direction allows you to Target the town hall directly with your Giants while the Wizards can handle the surrounding defenses AI gaming coach how cool is this now as an avid gamer I do have to point out that all of these games and their little demos are turn-based so these are some of the simplest to comprehend games it's not something where reaction time really matters like Counter Strike or something with a million variables like like League of Legends these are some of the simplest games out there but I love seeing this heck I mean soon we'll even have ai agents that you're going to be able to play co-op games with sort of enabling digital friends at a whole new level of interactivity I don't know just a thought I had when I looked at this and super cool that they're including this in the main blog post I like that and that's really everything they released and announced today so what all of this has in common is that it's powered by this Gemini 2.

0 era and especially this flash model that is multimodal it can see it can hear and it can also speak it might not be able to do the voices of advanced voice mode but the latency is on par with some of the best voice assistant out there and the interruption feature in my opinion is the best I've ever seen you can access it today for Google's AI studio in this stream real time interface and I would strongly recommend you do that it's a whole new experience that we haven't had access to up until now and then if you want to get inspired even further on their Channel they have all of these example videos as I pointed out and if you want early access to any of this over at labs.