Google DELIVERED - Everything you missed from I/O 2025

75.48k views3135 WordsCopy TextShare

Matthew Berman

Can anyone keep up with Google? Join My Newsletter for Regular AI Updates 👇🏼 https://forwardfutur...

Video Transcript:

I just got back from Google IO and Google announced so many incredible new products and I'm going to break it all down for you. But before we start, I got to interview Sundar, the CEO of Google, and I talked to him about World Models, the intelligence explosion, the future of search, and more. So, if you're not already subscribed, make sure to subscribe so you get notified when that video drops.

All right, the first thing I want to talk about is how quickly the narrative around Google's AI initiatives has changed. Don't forget, it was just about a year ago that many people were doubting Google's AI strategy. And since then, at the last Google IO, it just seemed like it was received so poorly.

And then one year later, here we are. And look at this, shipping at relentless rate. We had Alphafold 3, Imagine 3, Gemma 2.

This is all back in 2024. Look at all of these announcements. Project Mariner, Gemini 2.

0 Flash Thinking, we got Gemini 2. 5, 2. 5 Pro, Gemma 3 Robotics, Alpha Evolve.

And the entire theme of this event is taking the research that they have been doing over the last decade plus and putting it into products. They are finally productizing all of that research work that they've been doing for a long time. Here's the improvement in ELO score for each of their major model releases.

But that's not the most interesting thing. Let me show you what's telling not just about Google, but about artificial intelligence in general. Look at this.

Over 2024, here are the monthly tokens processed. 9. 7 trillion.

This is Google. And that sounds like a lot, but watch this. Now, we are processing 480 trillion monthly tokens.

That's about a 50x increase in just a year. There were audible gasps when he dropped that metric. To go from when one year 10 trillion to nearly 500 trillion tokens processed is insane.

That not only speaks to the per user adoption of artificial intelligence, that also speaks to the depth of usage that people are using AI. And not only that, of course, we now have thinking models which use a lot more tokens. And all of these factors together make that 50x number in one year just look absolutely astounding.

And remember, if you're watching this video, we are all still very early. We are still at the very beginning of that inflection point. So, it's an exciting time to be in this world.

All right. The next thing I want to talk about is Project Starline, which is now renamed to Google Beam. And if you don't remember, let me just play this clip from the event.

Project Starline, our breakthrough 3D video technology at IO a few years back. The goal was to create a feeling of being in the same room as someone even if you were far apart. We've continued to make technical advances and today we are ready to announce our next chapter, introducing Google Beam, a new AI first video communications platform.

So, I got to try this out and it was pretty amazing. If you've ever used the Nintendo 3DS, it kind of feels like that on your eyes in the sense that you are looking at a screen that is completely three-dimensional. It looks three-dimensional.

And what they're doing is they're using multiple cameras, taking video of you, and then recreating it using artificial intelligence to show somebody else in 3D space. And it really is absolutely insane to see in person. At first, my eyes took a second to adjust and I thought I was going to start getting a headache, but then when I just allowed myself to relax and converse with the person on the other side, it was great.

And at one point, he took out an apple and just like held it in front of me and it felt like I could just reach out and grab that apple right off of the screen. It was really cool. So, this is really for enterprises.

This is for meetings to feel like you're in the room with somebody. Probably not going to see this on any consumer devices anytime soon. All right.

Next is Project Astra, which parts of it are being brought into the Gemini app on your phone. And it basically just allows you to use your camera and interact with the real world. So, you can point your camera at something.

It'll remember things. It'll tell you what things are. You can ask it, you know, what type of tree is that?

What type of animal is that? Where did I leave my glasses? There are a lot of cool use cases that we've seen previously and I'm starting to use visual artificial intelligence a lot more day-to-day and they played a really funny video of Astra in action and let me just show that to you.

That's a pretty nice convertible. I think you might have mistaken the garbage truck for a convertible. Is there anything else I can help you with?

What's this skinny building doing in my neighborhood? It's a street light, not a building. Why are these palm trees so short?

I'm worried about them. They're not short. They're actually pretty tall.

Sick convertible. Garbage truck again. Anything else?

Why do people keep delivering packages to my lawn? It's not a package. It's a utility box.

Why is this person following me wherever I walk? No one's following you. That's just your shadow.

Gemini is pretty good at telling you when you're wrong. So this is all called Gemini live and starts rolling out today. All right.

Next is project mariner which is an agent that can interact with the web. And of course we've seen a lot of iterations of this. We've seen operator from openai.

We have browserbased. We have runner h a number of awesome projects and companies that are doing similar things but this is Google's version. And one of the things that they're announcing today is multitasking.

And I'll show you that in a second. But that's really the power of these asynchronous agents. when you can kick off one agent, let it go do some long horizon tasks, and then start setting up and kicking off your next agent.

And you can have potentially dozens of these agents operating over very long horizon tasks from minutes all the way up to hours. So, this is computer use agents, tooling, memory. This is kind of all the different pieces coming together in a project.

It's still very early days, and I'm sure it still breaks quite often, but it is just the beginning. And they also announced a Gentic capabilities coming to three major platforms, Chrome, search, and the Gemini app. And of course, they're calling it agent mode.

So they have AI mode, they have agent mode. Let me show you the demo that Sundar did at the event. Say you want to find an apartment for you and two roommates in Austin.

You've each got a budget of $1,200 a month. You want a washerdryer or at least a laundromat nearby. Normally, you'd have to spend a lot of time scrolling through endless listings.

Using agent mode, the Gemini app goes to work behind the scenes. It finds listings from sites like Zillow that match your criteria and uses Project Mariner when needed to adjust very specific filters. If there's an apartment you want to check out, Gemini uses MCP to access the listings and even schedule a tour on your behalf.

and it'll keep browsing for new listings for as long as you need. All right, here's the thing that I'm most excited about. I use so many different Google services, YouTube, Gmail, Calendar.

My business operates on Google apps. So, the thing that they're going to be doing is finally having this very personal AI assistant be able to get context from all of the different services that you use within the Google ecosystem. And this really is, in my mind, the holy grail of AI personalization.

It's not just having all of that context, but when you add that to having long-term memory about your interactions with AI, that's when you truly have a great, highly functional personal assistant. And one of the demos that they gave is personalized smart replies in Gmail. The ultimate AI email project for me is being able to just open up my emails and there are drafts of the replies ready for me just to hit send.

And those drafts are based on the history of interactions I have with that contact, the history of interactions that I have with all of my contacts, any other context it can get from any of the other information that I provide. That will save me so much time. And so now we're a little bit more in that direction with personalized smart replies.

Not quite fully where I just load up Gmail and every single one of my emails is going to have a draft ready for me to just review and hit send. But this is a good step in that direction. And I was extremely flattered to get another shout out at this Google event.

Check this out. You've used this vast reasoning powers on everything from unpacking scientific papers to understanding YouTube videos. And you've told us how collaborative, insightful, and genuinely helpful you found using Gemini.

And they also referenced the Rubik's Cube demo again. So it was pretty awesome to see some of our creations up at the Google event. They also announced a bunch of updates to the Gemini series of models, including adjustable budgets for thinking, faster performance, thinking summaries, and more.

And next, Google launched a diffusionbased text generation model. And so if you're not familiar with a diffusion model, they're typically used for image generation, but we have seen a couple models use diffusion as the architecture for text generation, and they tend to be much faster than transformersbased architecture, as you're seeing here. It literally was faster than you could even see it, but they're going to slow it down right here so you can actually see what's going on.

So you can see it kind of outputs one thing and then continues to iterate on it and remove the noise over time until you finally get the final output. Now, there's one problem. These diffusionbased text models tend to not be as good in terms of quality as traditional transformersbased architecture, but they are a lot faster and they're making a lot of progress.

And I asked Sundar specifically about his vision for diffusion models in the future in my interview, so stay tuned for that. And they're also introducing deep think as part of Gemini 2. 5 Pro.

Let me let Deis explain what it is. Today, we're making 2. 5 Pro even better by introducing a new mode we're calling Deep Think.

It pushes model performance to its limits, delivering groundbreaking results. Deep Think uses our latest cuttingedge research in thinking and reasoning, including parallel techniques. So far, we've seen incredible performance.

It gets an impressive score on USAMO 2025 currently one of the hardest maths benchmarks. It leads on live codebench a difficult and look at these benchmarks. So nearly a 50% on the US AMO 2025 benchmark which is essentially math Olympiads.

We have an 80% on live codebench and 84% on MMLU and beating out 03 04 mini across the board. All right. Next, they started hinting that the Gemini series of models are going to transform into world models.

Models that understand the world around us, can base their responses on the physics of the universe. Now, they didn't give much information yet, but it's still interesting to see that they're kind of hinting in that direction. Here's the clip of Deis explaining what's coming.

You can already see these capabilities emerging in the way Gemini can use its world knowledge and reasoning to represent things in nature. And in VO, our state-of-the-art video model, which has a deep understanding of intuitive physics, like how gravity, light, and materials behave. It knows what to do even when the prompts get a little creative, like this person made out of life rafts.

Understanding the physical environment will also be critical for robotics. AI systems will need to need world models to operate effectively in the real world. We fine-tuned a specialized model, Gemini Robotics, that teaches robots to do useful things like grasp, follow instructions, and adjust to novel tasks on the fly.

Making Gemini a full world model is a critical step in unlocking a new kind of AI. All right, they also announced their new image generation model, Imagine 4, and it does look quite good. Here's some examples that they showed during the demo.

So, here's a woman in a green dress. Really hyperdetailed. Kind of a paper style bird.

Beautiful flowers with little droplets on them. Just they look so so good. But I think honestly this has become standard now.

You have to have a great image generation model. But look at the some of this detail in this cat. Really good.

And it's 10 times faster than the previous model. That's a complaint a lot of people have on GPT40 image is just it takes so long. So now we have so much faster speed and we can iterate on our ideas much more quickly and probably the coolest demo of all VO3.

This is their textto video generation model and it not only does video but now includes audio. So it's really becoming a multimodal media generation model. Let me play the demo for you.

They left behind a a ball today. It bounced higher than I can jump. What manner of magic is that?

This ocean, it's a force, a wild, untamed might, and she commands your awe with every breaking light. And I've already found a ton of good examples on Twitter about V3, and I'm going to be testing it out. But here's the thing, it's really expensive.

Google also announced a new subscription tier for $250 per month, and there were Audible groans when they announced this, but then you get a much higher rate limit on a lot of their products. You get access to their cutting edge releases before anybody else. So, of course, I'm going to pay for it, and I'll report back to you and let you know how it is.

They announced Lyra 2, which is a music generation model, which seems really cool. Personally, it's not something I use every day, but if you're into music generation, if you're into music production, there you go. You got a new product.

All right. Next, they also announced Flow, which is kind of like Sora. So, it takes the video generation aspect of V3, but allows you to have a lot more creative control.

You can set up scenes. You can put different clips in different orders. So stuff Sora already does, but VO3 is a lot better at video generation.

So here's how it works. So custom gold gear shift in the shape of a head of a chicken. Okay.

And so here we go. We got it. So that's image generation.

And then you can take that. So you see use this image and then low angle 8mm wide lens shot shifting gears shaky fast car. So you take these three different images, put them together and you get a video out of them.

So you can really hyper customize your video creation using all of these products all put together in flow. Here you can see you can arrange the different clips, you can extend clips. So this is all stuff we've seen with Sora but now available in Google products.

All right, let me play the full clip of the video. Now keep in mind all of the different elements that were put together including the sound effects are all done with these generative models. Let's watch.

So, really cool. I'm going to be testing it out. I'll probably publish a video of extensive V3 testing, so stay tuned for that.

All right. Next, another incredibly cool demo, Android XR glasses. and they did live demos and of course it was a bit shaky at times, but it worked and it worked really well.

These are glasses very similar to the Meta Ray-B bands, except they actually have projections onto the lenses, so you can see things through the clear lenses. It looks really cool. And this guy's wearing it right here.

You can tell immediately they have this interesting reflection on them. As soon as he walked out, before I even knew what this was going to be about, you could tell those glasses were unique. All right, so this was the live demo.

This person was backstage wearing the glasses. You could see this is her point of view. Look right there.

You could see the temperature. You get to see a text message coming in. So, this is actually what she's seeing.

And again, this is a live demo. And at a certain point, it got a little jittery, which hopefully they didn't edit out, but uh you know, that's all part of doing a live demo. All right.

So, you can see right here it starts to get a little jittery. That has really nothing to do with the glasses as much as it does with how many devices were on the internet at that time. So you could see more jitters, more stutters.

And if I fast forward, here's the really cool part. So you can see she's looking out onto the crowd. I'm somewhere over here.

And uh yeah, this is projecting onto the glasses. And so you can see here's the map's recommendation. So look at this.

It says turn right 500 ft. And when she looks down, you can actually see that live map view. That is incredible.

Now, I have been a little bearish on glasses being kind of the ultimate form factor of artificial intelligence, but if I were outdoors, I would definitely wear these, which that's fine. Just indoors, I don't want to wear glasses. And so, I think a lot of these big tech companies are thinking people are going to wear glasses all the time, and I'm not.

Maybe I'm in the minority. Let me know in the comments what you think. So, those are all the major announcements from the event.

Remember, Sundar interview dropping soon. So, if you enjoyed this video, please consider giving a like and subscribe.