AI Shocks Again: OpenAI AI Robot, Orion GPT5, Flux, Grok 2, Gemini Live & More (August Monthly News)

121.93k views26569 WordsCopy TextShare
AI Revolution
This month, AI has reached a new level, with breakthroughs that are both exciting and terrifying! Fr...
Video Transcript:
[Music] this month AI has reached a new level with breakthroughs that are both exciting and terrifying from open ai's release of a new humanoid robot figure 02 to the stealthy launch of an upgraded chat GPT the race to advance AI technology is heating up Google's latest AI models including the upgraded Gemini Pro 081 and the newly released Gemini live are turning heads while new AI robots are starting to blur the lines between human and machine capabilities meanwhile open AI Orion GPT 5 with strawberry Ai and Google's Cutting Edge AI Innovations are setting the stage for
the next era of artificial intelligence and as if that wasn't enough we've seen the emergence of autonomous AI scientists and Powerful new tools like grock 2.0 and agent Q redefining what's achievable stick around as we uncover the most astonishing AI stories from August 2024 okay let's talk about something super exciting and honestly a little bit terrifying in the world of AI if you've been following the AI image generation scene for a while you know we've had some big players dominating the space mid Journey Dolly stable diffusion but guess what a fresh Contender has arrived and
it goes by the name flux and trust me you're going to want to know about this one so where do we start let's kick things off with a little background on what's been going on in the AI world you see for the last couple of years we've seen a massive boom AI generated art tools like Adobe Firefly mid Journey do e and stable diffusion have been battling it out to be the top dog each one pushing the boundaries with their updates but now something new has entered the ring and it's got everyone talking all right
let me paint you a picture you're scrolling through your social media feed and you come across an image that looks so real you'd swear it was a photograph but then you do a double take maybe it's the way the text on a lanyard looks a little funky or the patterns in the background seem a bit off that's when you realize you're not looking at a photo you're looking at an AI generated image and this thing is as close to real as it gets without actually being real that's flux for you and it's causing quite a
stir so flux is an open- Source AI image generator that's being developed by a company called Black Forest Labs if that name doesn't ring a bell maybe the fact that some of the folks behind it used to work at stability AI will yep that's the same crew responsible for stable diffusion which as you know is a pretty big deal in the AI art world the company is also working on a text to video model that it promises will offer highquality output and be available open source branding it state-of-the-art text to video for all now since
the startup recently secured $31 million in a financing round with backing from well-known Tech investor Andre and Horowitz it's clear they're on solid footing according to their homepage under the what's next set section their upcoming model will be called soda and it's set to be available to everyone meanwhile open eyes video generator Sora has been around for 6 months but is still only accessible to a select group of testers now let's talk about what makes flux so special first off it's open source which means the code is out there for anyone to Tinker with modify
and integrate into their own projects it gives a ton of flexibility to developers hobbyists and even small businesses who might not have the resources to invest in more expensive proprietary tools like mid Journey but there are actually three different versions of flux the first one is the pro version which is geared towards commercial use this is the big boy the model that's meant for companies that want to incorporate highquality AI image generation into their products or Services then there's the dev version which is a midweight model it's sort of the middle ground still powerful but
not quite as heavy duty as the pro version and lastly there's the Schnell version means fast in German by the way which is fitting because this version is all about speed it takes like 2 or 3 seconds for this model to generate an image which is pretty crazy it's the lightweight fast- performing model that's great for those who want quick results without needing a super beefy machine to run it speaking of running it here's another thing that sets flux apart from its competition it can run on some pretty modest Hardware like if you've got a
decently powerful laptop you're good to go you don't need a supercomputer or a cloud service to get this thing up and running that's a big deal because it makes high quality AI image generation accessible to a much wider audience when we talk about realism flux stands out as a clear leader as I mentioned before the images it creates are so lifelike it's hard to believe they're AI generated in fact some people are saying that flux might just be the new king of photorealism especially when you combine it with something called Laura Laura is a fine-tuning
script developed by a group called xlabs and when you pair it with flux the results are nothing sort of mindblowing we're talking about images that are so detailed so accurate that you'd be hardpressed to tell them apart from real photos at least at first glance but and there's always a butt it's not perfect if you really start to scrutinize these images you'll notice some telltale signs that their AI generated text is a big giveaway flux still struggles with rendering small text accurately so if you see something like a lanyard or a sign in the background
the text might look a little off patterns and textures can also get a bit wonky and sometimes elements in the image are just slightly out of proportion it's nothing that ruins the image outright but if you're looking for it you'll spot it and then there's the issue of skin textures as good as flux is at rendering people it sometimes falls short when it comes to making skin look natural it actually makes the skin looking better than in reality you might notice that it can appear a bit too smooth which breaks the illusion of realism it's
something that mid Journey especially in in its latest version seems to handle a bit better but here's the thing despite these minor imperfections the images that flux produces are going viral and it's not hard to see why the level of realism it achieves is stunning and it's got people buzzing about the potential uses for this technology think about it stock photography advertising social media content there's a huge market for realistic images and flux could be a GameChanger in that space But as awesome as it is to be able to create hyperrealistic images with just a
few clicks there's also a darker side to this technology we're already seeing concerns about how AI generated images could be used to create fake news commit scams or spread misinformation the more realistic these images become the harder it's going to be to tell what's real and what's not that's a pretty scary thought and it's something that we're going to have to Grapple with as AI technology continues to evolve but let's not get too bogged down in the Doom and Gloom just yet there's still a lot to be excited about with flux especially if you're someone
who likes to experiment with new tech if you're curious about trying it out for yourself you've got a few options if you've got a decent laptop something with a good GPU you can actually download and run flux locally there's a launcher called Pinocchio that makes it super easy to get set up it's a big file so it might take a while to download but once you've got it you can start generating images right on your own machine no need to rely on cloud services or internet access but if your computer isn't up to the task
don't worry there are plenty of online platforms that have already integrated flux into their offerings for example night Cafe which is one of the more popular AI image platforms out there has added flux to its lineup this means you can generate images using flux and compare them directly with images from other models like ideogram and stable diffusion 3 it's a great way to see how flux Stacks up against the competition in real time another platform that's jumped on the flux bandwagon is po if you're not familiar with po it's an AI model platform that lets
you generate images in a chatbot style format kind of like how you would with chat GPT and do e it's a different approach but it's cool to see how flux performs in that kind of environment and if you're a developer there are even more options flux is available on platforms like Bas Labs hugging face and foul. which are more geared towards folks who want to dive into the technical side of things free pick one of the biggest names in stock photography is also working to bring flux to its site so keep an eye out for
that so flux is definitely a model to watch it's got the potential to shake up the AI image generation scene in a big way and it's already proving to be a serious Contender to establish names like mid journey and stable diffusion a brand new humanoid robot called Figure 02 just got the internet's attention it is backed by open Ai and some other huge names in Tech like Nvidia Microsoft and even Jeff bezos's private fund so yeah it's pretty big all right so let's get into it figure the company behind this robot just dropped a teaser
video for their latest model figure O2 the video is short like seriously short but it's packed with some really interesting details if you look closely so let's check out the video first and then I'll break down everything you need to know [Music] [Music] [Music] you all right so we see some robotic joints Limbs and these futuristic looking flexible mesh designs that are probably part of the robot's body and if you're into specs they even tease torquey ratings up to 150 Newton met to break it down that's basically a measure of how much force the robot
can exert when it's moving something there's also something about a ROM or a range of motion up to 195° which is kind of a big deal in the world of robotics now Brett Adcock the founder of figure and this guy is not new to Wild Tech startups he's also behind Archer Aviation and veter posted on Twitter or X as it's called now that figure 02 is the most advanced humanoid robot on the planet bold claim right but with the backing they've got it's hard not to take him seriously so let's rewind a bit figure as
a company only started in 2022 which is like yesterday in startup terms and they've already got a lot of people talking they came out of nowhere in March 2023 with their first model figure 01 which was aimed at tackling Global labor shortages the idea is that these robots can step in to do jobs that are unsafe or just plain boring for humans like working in warehouses factories and Retail and get this they built their first full-scale humanoid robot in just 6 months 6 months people that's insane they've got a team of 40 experts including their
CTO Dr Jerry Pratt who's a big name in the robotics World their vision is pretty clear make robots that boost productivity make workplaces safer and get this they've promised they'll never weaponize these robots so they're really going for that good guy angle what's also cool is they've already Inked a deal with BMW manufacturing so we might be seeing these robots in action sooner rather than later and to top it off figure 01 had some impressive integration with open AI gp4 V model which is like the top-of-the-line AI Tech right now can you explain why you
did what you just did while you pick up this trash on it so I gave you the Apple because it's the only uh edible item I could provide you with from the table so you can bet figure 02 will be packing some even newer AI models when it drops on August 6th 2024 now figure isn't the only one in the game the race to bring AI powered humanoid robots into our homes and workplaces is heating up you got Elon Musk out there saying there's going to be a market for over 10 billion yes billion with
a B humanoid robots he's working on his own robot Tesla Optimus which is basically going to be a competitor to figure and then there's Nvidia who's been showing off some pretty wild stuff with their project gr 00t using Apple Vision Pro headsets to help train AI robots through teeley operators oh and don't forget Boston Dynamics those guys have been in the game for a while and they're working on their own humanoid robot 2 upgrading their Atlas model with electric motors to make it cheaper and more reliable now another really interesting development in the world of
AI and Robotics is coming from a startup called Oxford Dynamics based in Harwell Oxfordshire they're working on a robot called Strider and this thing is specifically designed to venture into some of the most dangerous environments known to man we're talking about places where chemical biological or even nuclear threats are present basically situations where you really wouldn't want to send a human now Oxford Dynamics isn't just playing around here they've already landed a 1 million pounds contract with the UK Ministry of Defense mod to design develop and Supply these robots the idea is instead of risking
human lives why not send in a machine that can handle the job Mike lton one of the directors at the company is thinking big he's talking about building hundreds maybe even thousands of the Strider robots in the future all with the goal of making the world a safer place so what exactly can Strider do well this robot is pretty versatile it's got a long multi-jointed arm that can take readings collect samples and retrieve contaminated objects and it moves around on these tank-like Treads which means it's built to handle rough unpredictable terrain think about how tough
it would be for a human in a hazmat suit to navigate a place like that Strider can do it without breaking a sweat well if robots could sweat the company started working on Strider in November and they've got until September to deliver the finished product to the defense Science and Technology laboratory they're hoping Strider can be used in scenarios like the 2018 Nova chalk attack in Salsbury where dealing with contaminated objects was a serious challenge this robot could swoop in grab those dangerous items and secure them in sealed containers all while performing tasks that would
be really difficult for humans but ox for Dynamics isn't stopping there they're planning to integrate their AI software into Strider and they're calling it Avis which stands for a very intelligent system that sounds a bit like Jarvis from Iron Man You're spoton that's exactly what inspired them wake up daddy sh welcome home sir the possibilities here are endless sheff Sharma one of the founders even mentioned that this Tech could be adapted for use in submarines or fighter jets in the future for these guys seeing this technology get into the hands of people who truly need
need it is the ultimate goal said it would be a dream come true to see Strider out there making a real difference and it's not just the startup that's excited you and Davies from defra the UK department for environment food and Rural Affairs mentioned in a statement that it's great to see these Concepts rapidly turning into a highly capable and flexible platform so just like figure is pushing the boundaries of what humanoid robots can do Oxford Dynamics is showing us how Ai and Robotics can literally save lives by stepping into the most hazardous situations on
Earth it's a thrilling time for technology and it's going to be fascinating to see how both of these projects evolve in the near future so yeah the competition is fierce but with the kind of momentum figure has plus the backing of some of the biggest names in Tech they seem like they're in a really strong position to keep pushing forward I can't wait to see what this figure 02 is really capable of when it's fully revealed all right if you're interested in how Ai and Robotics are reshaping our world make sure to stick around for
more deep dives into the latest tech if you've been feeling like your AI buddy's been acting a bit different lately maybe quicker sharper and just a tad smarter you're not alone open AI has been sneaky rolling out some major changes without a big announcement but don't worry I've got all the details you need to know right here so let's talk about it last week I started noticing that chat GPT felt different it was like the responses were more on point faster and just generally better I wasn't the only one either people all over social media
were talking about how chat GPT seemed to be upgraded but here's the thing open AI didn't say a word about it at first it was all very hush hush until they finally dropped a little bombshell on US Open AI took to X to casually mention that they'd slipped in a new version of their GPT 40 model into chat GPT so they just updated the model we've all been using without making a big deal about it the message was simple there's a new GPT 40 model out in chat GPT since last week hope you all are
enjoying it and check it out if you haven't we think you'll like it that's it no fancy press release No Grand unveiling just a tweet typical open AI right now if you're wondering what's so special about this new model let's break it down the updated version of GPT 40 which they're calling chat GPT 40 latest is essentially a fine-tuned and optimized version of what we had before but here's where it gets interesting while opening AI hasn't spilled all the beans there's a lot of speculation about what this new model actually is some people out there
are thinking this might be part of a bigger strategy by open AI to release different sized models kind of like what Google and anthropic are doing there's been talk about a GPT 40ge and some think this latest update could be a stepping stone in that direction but I'm not totally sold on that idea because let's be real if it were a brand new model they probably would have hyped it up a lot more so what can this new model do well from what I've seen and what others have reported it's performing better on tasks that
require complex reasoning and creativity like if you've been asking chat GPT to help with coding or solve tricky problems you might have noticed it's just a little bit sharper now it's also faster which is a nice bonus but of course it's not perfect there are still some weird quirks for example in one test the model was asked to stack a book nine eggs a laptop a bottle and a nail in a stable manner the solution it suggested putting nine eggs on top of a bottle I mean come on who does that and then when it
was asked how many RS are in the word strawberry it came back with two which is definitely wrong so yeah there are still some bugs to work out but overall the update is a step in the right direction now talking about strawberry let's talk about something that's been generating a lot of hype project strawberry the idea behind project strawberry is that it could be a new post- trining method that boosts the model's reasoning skills some people are are even saying that the improvements we're seeing in chat GPT might be the first signs of this mysterious
project in action one of the coolest things about the new chat GPT 4 o latest model is how it handles multi-step reasoning this basically means the AI isn't just jumping to conclusions it's thinking things through step by step before it gives you an answer that's a pretty big deal because it leads to more accurate and thoughtful responses which is something we all want right the new model has already made waves in the AI community especially in something called the LMS y leaderboard now if you're not familiar with it the LMS y leaderboard is like the
Olympics for AI models they put different models head-to-head in all sorts of tasks and the new chat GP 40 latest model just crushed it it scored a whopping 134 points which is the highest score ever recorded on that leaderboard this means it's outperforming some of the biggest names in the game like Google anthropic and meta now if you're thinking how do I get my hands on this new model well it's super easy open AI has already swapped out the old GPT 40 with the new version in both the chat GPT website and app so all
you have to do is fire up chat GPT and you're good to go if you're on the free plan you might hit some message limits but for those of you who are on the plus plan you can push the model to the Limit and really see what it can do but don't worry if you're not ready to Shell out the $20 a month for the plus plan you can still get a good feel for the new model before you hit those limits and then if you run out of messages you can switch over to GPT
40 mini it's not quite the same but it's still pretty powerful also one more really interesting thing is how open AI has been testing these updates they've been sneaking experimental models into places like the LMS y's chatbot Arena under random names so people don't even realize they're testing new tech the chat GPT 4 o latest model for example was tested under the name Anonymous chatbot and it got over 11,000 votes from users that's a lot of people unknowingly helping out with the testing which just goes to show how clever open ai's approach is so what's
next well if this update is anything to go by we can expect open AI to keep refining and improving chat GPT they're clearly focused on making it better at reasoning creativity and all those tasks that require a bit more brain power and who knows maybe we'll see even more of project strawberry in the future all right now I also want to talk about a new AI model that just came out but it didn't really get the attention it deserves this model called Falcon Mamba 7B was released by the technology Innovation Institute tii in Abu Dhabi
tii is known for working on cuttingedge Technologies like AI Quantum Computing and Robotics and now they've dropped this new model it's available on hugging face and it's an open- Source model which is pretty cool but what really sets it apart is the new architecture it's using most of us are familiar with Transformer models which have been dominating the AI scene for a while now but Falcon Mamba 7B uses something different called the Mamba State space language model slm architecture this new approach is quickly becoming a solid alternative to those traditional Transformer models now why is
this important well Transformers are great but they have some issues especially when it comes to handling longer pieces of text you see Transformers use an attention mechanism that looks at every word in a text and Compares it to every other word to understand the context but as the text gets longer this process demands more and more computing power and memory if you don't have the resources to keep up the model slows down and struggles with longer texts this is where sslm comes in unlike Transformers sslm doesn't just rely on comparing words to each other instead
it continuously updates a state as it processes the text this means it can handle much longer sequences of text without needing a ton of extra memory or computing power now Falcon Mamba 7B uses this sslm architecture which was originally developed by researchers at Carnegie melon and Princeton universities what's cool about this model is that can dynamically adjust its parameters based on the input so it knows when to focus on certain parts of the text and when to ignore others so how does Falcon Mamba 7B stack up against the big players like meta's llama 38b llama
3.18 B and Mistral 7B tii ran some tests and the results are pretty impressive in terms of how much text the model can handle Falcon Mamba 7B can fit larger sequences than the Transformer models using just a single 24gb A10 10 GPU this means it can theoretically handle infinite context length if you process the text token by token or in chunks and again Falcon Mamba 7B came out on top it beat mistol 7bs sliding window attention architecture by generating all tokens at a constant speed without any increase in memory usage and that's a big deal
for anyone working with large- scale AI tasks because it means the model is both fast and efficient even when it comes to standard industry benchmarks Falcon Mamba 7B holds its own in tests like truthful QA and GSM 8K it outperformed or matched the top Transformer models sure there were a couple of benchmarks like mlu and H swag where it didn't quite Take the Lead but it was still right up there with the best of them but here's the thing this is just the beginning for Falcon Mamba 7B tuu has big plans to keep optimizing the
model and expanding its capabilities they're not just stopping at sslm they're also pushing the limits of Transformer models to keep driving innovation in AI so if into AI or just curious about what the future holds keep an eye on Falcon Mamba 7B it's already making a name for itself and with TI's continued efforts it's only going to get better plus with over 45 million downloads of their Falcon models tii is proving that they're a major player in the AI world so you know how Google's been kind of struggling in the AI world lately right I
mean they had some pretty rough moments like that one time their barred chatbot gave wrong info about the the James web Space Telescope during its first big demo that little mess up cost Google's parent company alphabet a casual 100 billion do in market value yeah that hurt and don't even get me started on their Gemini image generation tool people were tearing it apart for historical inaccuracies and some bias issues so Google had to pull the plug on that one really quick with all that it seemed like Google was falling way behind open Ai and Microsoft
who've been absolutely killing it with their AI stuff especially with GP BT 40 and all the AI integration Microsoft's been doing it was almost like Google was this giant who was stumbling trying to keep up in a race it used to dominate but here's the plot twist Google just pulled off a major comeback and it's got everyone in the AI World buzzing they've been steadily improving their AI game and now they've rolled out some impressive upgrades earlier this year they introduced Gemini 1.5 Pro which really shook things up and now they're back at it with
a new version we're talking about the experimental version 0801 of Gemini 1.5 Pro which is now available for early testing and feedback through Google AI studio and the Gemini API the LMS chatbot arena is a popular Benchmark that tests AI models on various tasks and gives them an overall competency score in a recent evaluation GPT 4 got a score of 1,286 and Claude 3 wasn't far behind with 1,271 interestingly an experimental version of Google's Gemini model called Gemini 1.5 Pro 0801 scored an impressive 1,300 this surpasses even the scores of its well-known competitors for context
an earlier version of Gemini 1.5 Pro scored 1,261 while this suggests Gemini might be more capable overall it's important to remember that benchmarks don't tell the whole story about an AI model's strengths and weaknesses the AI Community is buzzing with excitement about this latest Gemini version social media is flooded with users praising it as insanely good with some even claiming it blows gp4 out of the water so yeah Google's definitely back in the game with this one this release marks a turning point where users now have a variety of strong AI chatbot options ultimately the
best model for a user will depend on their specific needs and preferences it remains to be seen if this experimental version of Gemini will become the standard although it's currently available to the public it's early release status means it could be modified or even taken down in the future for safety or alignment reasons so it's safe to say that Gemini 1.5 Pro is a top contender in the world of AI models one of its most impressive features is its ability to handle massive amounts of data like processing up to 1 million tokens now the other
model they launched Gemma 22b is kind of a game changer too it's got 2 billion parameters which might sound small compared to some other models out there but this thing punches way above its weight class it's actually outperformed models that are way bigger like open AI GPT 3.5 and even meta's llama 2 Google's basically showing everyone that bigger isn't always better in the AI world it's all about how you optimize and tweak what you've got now while Google is making strides with their AI models not everything they touch turns to gold case in point the
dear Sydney ad Fiasco so Google had this ad running during the Olympics on NBC where a dad uses the Gemini AI to help his daughter write a fan letter to her hero Olympic track star Sydney mclin levone sounds harmless right well not exactly in the ad the dad asks Gemini to help his daughter write this heartfelt letter and the AI goes ahead and does it but here's where things get weird the whole premise of using AI to write something as personal as a fan letter didn't sit well with a lot of people critics were quick
to call it out as toned deaf and just plain bizarre I mean who wants an AI written fan letter it's supposed to be personal right something that comes from the heart not a machine the backlash was almost immed people were all over social media roasting the ad for being out of touch Linda Holmes a novelist and podcast host pretty much summed up the sentiment by saying who wants an AI written fan letter and honestly she's got a point the ad came off as a bit of a misfire like Google was trying to hard to show
off their AI capabilities without really thinking about whether it made sense in this context so how did Google respond well they decided to pull the ad from their Olympics rotation they said the ad tested well before it aired but after hearing hearing the feedback they thought it was best to phase it out however they didn't completely abandon the idea they defended it by saying their goal was to show how AI can enhance creativity not replace it they wanted to create an authentic story celebrating Team USA and in their minds the Gemini AI was just supposed
to be a tool to help get the creative juices flowing not do all the work but even with that explanation the damage was kind of done the whole situation just highlighted the Fine Line tech companies have to walk when it comes to integrating AI into everyday life sure AI can be an amazing tool but there are some areas like personal expression where people might not be so keen on letting a machine take over not everything is great for open AI either despite attracting significant Investments exceeding 11.3 billion since 2015 from Big players like Microsoft and
Sequoia open AI is facing a financial tight RPP walk a recent report suggests they're on track to spend a staggering $8.5 billion this year alone raising concerns about their long-term sustainability and even sparking Whispers of potential bankruptcy now what's driving these massive costs running cuttingedge AI models like chat GPT doesn't come cheap open AI is projected to Shell out around 7 billion just to keep these models up and running a significant chunk of this approximately $4 billion is dedicated to renting server capacity from Microsoft highlighting the immense computational power required for their AI operation ations
an additional $3 billion is earmarked for training their AI models including licensing agreements with news organizations like News Corp to access and learn from their vast content libraries and let's not forget about their 1,500 employees their salaries are estimated to reach 1.5 billion this year while open aai is generating Revenue projected to fall between $3.5 billion and $45 billion this year primarily from chat GPT and other paid AI Services it's not enough to offset their massive expenditures this puts them on track for a potential loss of at least $5 billion for the year to put
this into perspective Google backed anthropic a competitor expects to burn through $2.7 billion this year this situation has ignited concerns about open ai's Future some experts are even questioning the viability of their business model particularly with the emergence of free and open- Source AI models like meta's llama 3.1 this model which allows the developers free access to its code could potentially undercut open ai's paid offerings especially as some businesses are already hesitant about AI cost and accuracy this isn't the first time open ai's Financial stability has been questioned last year reports of chat gpt's High
running costs nearly $700,000 per day sparked similar concerns however open AI strong backing from giants like Microsoft with a 13 billion investment offers some reassurance Microsoft's CEO Satya nadela has emphasized that their investment secures them significant rights to open ai's technology ensuring continuity even if open AI were to face difficulties despite these Financial challenges open AI remains committed to pushing the boundaries of AI they continue to release new products including the AI powered video generator Sora and the AI search engine search GPT which even caused a dip in Google's stock price open AI CEO Sam
Alman remains steadfast in his ambition to achieve artificial general intelligence AGI even if it means spending billions he envisions AGI as highly autonomous systems that surpass human capabilities in most economically valuable tasks in essence open AI is engaged in a high stakes gamble they are pouring immense resources into developing groundbreaking AI even if it means operating at a significant loss whether their bet pays off with the dawn of AGI or leads to financial difficulties remains to be seen one thing is certain the world is watching open AI every move with baited breath so let's talk
about robots specifically humanoid robots powered by Ai and explore all the exciting developments that were showcased at the 2024 World Robot conference in Beijing and trust me there is a lot to unpack from this event we're talking game-changing Innovations cuttingedge technology and the latest in AI advancements so let's get into it all right so first let's set the stage the world robot conference is is kind of a big deal in the tech World think of it as the Oscars for robotics it takes place every year in Beijing and it's where the latest and greatest in
robotic technology is put on display this year the conference was bigger and better than ever attracting 169 exhibitors from all over the globe showing off more than 600 Innovative products and here's the kicker there were a record high 27 humanoid robots featured this year yeah humanoid robots are becoming a major focal point in the robotics field in the hype is real so why all the buzz around humanoid robots well there's a growing interest in how AI can be integrated into robots that look and move like humans these aren't standard robots that you might see in
factories these are Androids designed to interact with us in a more natural humanlike way the idea is that these robots could eventually perform tasks just like or even better than humans and it's not just about replacing humans it's about collaborating with us especially in environments that are dangerous or require heavy labor a great example from the conference is the unry G1 this is a two-legged humanoid robot standing about 1.3 M tall and weighing around 35 kg it's Sleek it looks futuristic and it's packed with some pretty Advanced Tech the unry G1 can move at a
speed of 2 m/ second and has advanced three-finger Force control hands which means it can handle objects delicately and precisely it's also got a maximum knee joint torque of 120 NTM so it's pretty strong for its size what really sets the unry G1 apart is its robot unified large model which is a fancy way of saying that it has a robust AI system that allows it to learn and refine its skills continuously this model was launched this year and it's already Making Waves priced at around 999,000 un or about $1 13,874 it's actually quite affordable
considering its Advanced capabilities according to hang jawe the marketing director at unitri this robot isn't just designed to look imp Rive it's built to be highly functional in a variety of practical applications that's why it's already being adopted by numerous labs and companies now at the conference shenen based UBC robotics showed off some of their robots that are currently deployed on Automotive production lines these robots aren't just for show they are doing real work intelligent Transportation Quality Inspection and even handling chemicals it's a big step forward for the large scale application of humanoid robots in
manufacturing according to Greg G the lead on ubc's humanoid robot motion control algorithm team their robots currently operate at about 20% of human efficiency now that might not sound super impressive at first but here's the thing they expect to reach nearly 100% efficiency within the next 1 to two years and remember robots can work 247th without braks this means their overall efficiency could soon surpass that of human workers the integration of AI models has accelerated the development of humanoid robots as G mentioned we're looking looking at a future where robots and humans are going to
be working side by side more often especially in environments that are risky or involve heavy duty tasks essentially robots that can help you with heavy lifting or perform tasks that are too dangerous for humans that's the direction we're heading in and it's pretty exciting another standout from the conference was the Tang gong an embodied AI robot now what do they mean by embodied AI simply put it's AI with a physical form an AI that doesn't just exist in the cloud or on a server somewhere but is integrated into a robot that can interact with the
physical world the Chang gong was a big hit at the event because it showcased some pretty Advanced capabilities it could engage in conversations respond to voice commands and even grasp and place objects in designated spots CH jipping the team leader from the Beijing embodied artificial intelligence robotics Innovation Center explained that embodied AI helps bridge the gap between the digital and physical worlds just think about it an AI that can move around on its own and interact intelligently with its surroundings opens up a whole new realm of possibilities imagine heading to a concert and a humanoid
robot guides you to your seat fetches your favorite snack or even helps you navigate through the crowd that's the kind of future embodied AI is aiming for and the tenang gong robot is a big step in making that Vision a reality other impressive presentations at the 2024 World Robot conference in Beijing included a 10 cents four-legged robot dog now if you've ever seen a robot dog before you might think okay big deal but this one's on a whole different level it doesn't just walk it can run jump and even perform backwards somersaults it's super agile
and could go places a human or even a regular robot couldn't reach then we got alibaba's new Logistics robot now Logistics might not sound super exciting at first but this robot is designed for last mile delivery which is a huge challenge in the e-commerce world it's like your own little autonomous delivery guy that can navigate complex Urban environments without any human intervention then there was a robot that totally captivated the audience a robot that can play the Chinese damer and not just playing like a simple tune it learned to play this complex instrument in Just
2 to 3 days which is insane this kind of skill acquisition shows just how far AI learning has [Music] come next up we've got a robot that really blurs the lines between human and machine a humanoid robot that mimics human facial expressions in real time it's designed to look like a young man and can perfectly replicate its Partners facial expressions now we can't forget about the medical sector there were several presentations featuring Advanced Medical robots that are designed for surgical Precision using AI to assist with surgeries can greatly reduce the margin for human error and
improve outcomes for patients agriculture also got some love at the conference with robots designed for the field literally these agricultural robots are capable of tasks like planting weeding and harvesting which could majorly cut down labor cost and boost efficiency on farms we're talking about robots that could help address food security by optimizing farming processes now this next one's pretty interesting wion technology showed off their Innovative pliable robotic arms instead of the usual Motors These Arms use 3D printed Plastics and pneumatic artificial muscles making them way cheaper to produce like on10th the cost of traditional robotic
arms this could make highquality robotic Tech accessible to a lot more Industries and then there's the Walker s light robot from UB Tech robotics this one's an industrial humanoid robot that's already at work in car factories it's helping with everything from quality testing to sorting Parts working right alongside human workers to improve efficiency on the production line and lastly we saw the debut of the robot Blue Book an extensive report that dives deep into the Chinese robotics industry and its potential it's kind of like a road map for the future of robotics outlining development Trends
and opportunities for anyone interested in the business side of Robotics this is a must-read now while there are some incredible advancements there are still challenges to overcome in the robotics industry for instance at the conference yeang founder of Shanghai based T5 robot highlighted some of the problems in the current robotic supply chain product reliability is actually one of the biggest issues due to high defect rates companies like his can only produce up to 1,000 units at a time and let's not forget about the key components like harmonic gear which are crucial for motion control in
robots but still face quality and reliability issues also wison technology another player in the robotics field is doing something pretty unique they're using 3D printed Plastics and pneumatic artificial muscles instead of the traditional Motors and reducers for their robotic arms this makes their robots much cheaper to produce about on10th the cost of traditional robotic arms which could be a GameChanger for affordability in the industry according to chiae from lunie Ventures who has invested in wion these pliable robotic arms could potentially be used in humanoid robots in the future but in any case China is pushing
hard to become a global leader in this field as mentioned by Marina bill president of the International Federation of Robotics China's manufacturing capabilities and supply chain strength are really tough to match even for some of the most developed countries there's also a big focus on developing new productive forces in technology as Premier leang highlighted the robot industry has broad prospects and massive Market potential the government is calling for efforts to promote the expansion and popularization of robots across various Fields industry Agriculture and even Services essentially AI has ushered in the iPhone moment for humanoid robots
think about it just like the iPhone revolutionized the smartphone industry AI is set to revolutionize robotics we're getting closer than ever to making the dream of fully integrated humanoid robots a reality the plan is to establish a production capacity of over 10,000 embodied AI robots by the end of 2026 according to Kong L from the Beijing economic technological Development Area administrative committee that's a pretty ambitious goal but given the rapid pace of development we're seeing it seems totally possible okay so remember when we talked about the figure 02 robot a few days ago at that
point it was just the initial hype and we only knew a few specs but now that we have the full details this thing is actually insane this robot represents a legitimate breakthrough in humanoid robots and if that wasn't crazy enough there's a new AI robot dentist that just performed fully autonomous dental surgery on a live patient without any human help it's wild to think about these developments are super exciting so let me break down why there's such a big deal all right first off let's talk about the design figure 02 looks like it means business
if you remember figure 01 it had this kind of bulky Chrome metal look almost like it was wearing a suit of armor that was cool and all but the new model has gone for a sleeker matte black finish that makes it look more like a high-end sports car than a medieval Knight the reason behind this is that figure 01 was designed to withstand endless hours of testing in the lab so it had to be over engineered to handle the wear and tear but now that figure 02 is closer to a production model they've refined everything
making it look and move more like something you'd actually see working on a production line one of the big changes they made made is with the cabling in figure 01 the wires were pretty much all over the place purposefully exposed for easy fixes and adjustments during testing but with figure 02 everything is tucked away inside the limbs not only does this make it look cleaner but it also protects the cables from the environment which is crucial if you're going to have these robots working long shifts in a factory setting and speaking of working long hours
the battery on this thing has been seriously upgraded figure 02 now has a battery that gives it over 50% more energy compared to the first model the battery is integrated into the robot's torso which is a smart move because it brings the center of mass closer to the middle of the robot's body this design tweak makes figure 02 more balanced and Nimble so it can move around more efficiently while carrying out tasks now let's get into the tech that really makes this robot stand out first the hands figure 02 has hands with 16° of Freedom
meaning it can move its fingers and wrist in a way that's almost as flexible as a human hand and these hands aren't just for show they've got some serious strength the robot can carry up to 25 kg which is about 55 lb that's a pretty big jump from figure 01 which could only handle 20 kg the goal here is for the robot to manipulate objects just like a human would and they're getting pretty close to that but it's not just about strength it's also about smarts figure 02 is equipped with six onboard cameras that give
it a full view of its surroundings these cameras feed into into a vision language model that helps the robot make sense of what it's seeing so whether it's picking up objects avoiding obstacles or just navigating its environment figure 02 can do it all on its own this is a massive Leap Forward in terms of autonomy and it's made possible by a partnership with open Ai and speaking of open AI one of the coolest features of figure 02 is its ability to understand and respond to voice commands you might remember seeing some videos of this from
earlier in the year where the robot was interacting with humans through speech hey figure one what do you see right now I see a red apple on a plate in the center of the table a drying rack with cups and a plate and you standing nearby with your hand on the table well they've taken that to the next level the new model has three times the computation power and AI inference capabilities compared to figure 01 this means it can process information and Carry Out tasks much faster and more efficiently this voice interaction isn't just for
simple commands either figure 02 can actually hold conversations thanks to onboard microphones and speakers that are connected to custom AI models developed with open AI so you can literally talk to this robot like you would with a coworker and it'll understand what you're saying and respond accordingly where do you think the dishes in front of you go next the dishes on the table like that plate and cup are likely to go into the drying rack next it's like having a conversation with chat GPT but instead of just text you've got a fullon humanoid robot standing
in front of you ready to get to work work now the real kicker here is that this isn't just some lab prototype anymore figure AI has partnered with BMW and they've already deployed these robots in a plant in South Carolina they've been running tests to see how well the robot can handle tasks in a real industrial environment and the results are looking pretty promising this isn't just some flashy Tech demo this robot is actually out there learning and Performing tasks in a factory setting figure AI is clearly pushing hard to make humanoid robots a reality
in the next few years they've raised a whopping $675 million to keep the momentum going and they're not slowing down anytime soon with competitors like Tesla's Optimus and Boston Dynamics Atlas also in the mix we're on the brink of seeing these robots move from the lab to the production floor on a large scale so if you thought humanoid robots were still a thing of the future think again and keep an eye on this one because it's only going to get more interesting from here all right now check this out something absolutely groundbreaking just went down
in the world of AI and Robotics for the first time ever a robot has performed fully autonomous surgery on a live patient we've now got a robot dentist that can handle certain dental procedures all by itself no human intervention needed this is huge and it could completely transform how we think about Healthcare and AI imagine lower costs no more waiting times since we can basically make as many of these robots as we need and eventually maybe even better service than what we're used to and who who knows we might even see a day when we
have our own personal dentist AI robot at home I know it sounds wild but with this breakthrough it's not that far-fetched anymore the company behind this Innovation is called perceptive and they've been working on this for a while what they've created is essentially a robot that combines AI 3D imaging and Robotics to perform dental procedures like placing crowns now normally getting a crown would take two separate visits to the dentist with each visit lasting at least an hour but with perceptive Rob that whole process can be done in just 15 minutes imagine the time and
stress that could save here's how it works the robot uses something called a 3D volumetric data procedure to diagnose the issue and plan the treatment it starts with an ooc scan Optical coherence technology using a handheld intro oral scanner developed by perceptive this scan is super detailed capturing 3D images that go beneath the gum line through fluids and even under the surface of the tooth what's really cool is that this whole process doesn't expose the patient to any of the ionizing radiation you'd usually get from traditional x-rays the robot doesn't just stop at scanning perceptive
AI algorithms take that 3D data and translate it into a precise plan for the surgery the result a robot that can diagnose and treat Dental issues with an accuracy rate of over 90% that's not only super efficient but it also cuts down on the chances of human error plus the detailed images from the scan help patients actually see see and understand what's going on with their teeth which is a nice bonus now the real question is is this all just a pipe dream or could we actually see it in action anytime soon well the truth
is it's going to take some time perceptive robot has already performed a fully autonomous dental procedure on a patient in Colombia but it's not ready for widespread use just yet the system still needs to get approval from the US Food and Drug Administration FDA before it can be rolled out on a larger scale according to perceptive CEO Chris cello we're probably looking at about 5 years before they can get that FDA green light but the potential here is huge perceptive has already raised $30 million in funding with some pretty big names getting behind it like
Mark Zuckerberg's dad Dr Edward Zuckerberg he's a dentist himself and has been a vocal supporter of this technology Dr Zuckerberg has even pointed out that the robot is designed to operate safely even if the patient moves around during the procedure which is a major concern when you're dealing with something as precise as Dental surgery now if perceptive robot gets FDA approval we could be looking at a new era of dental care one where robots handle routine procedures quickly efficiently and with minimal human involvement that could free up dentists to focus on more complex cases and
allow them to see more patients in less time ultimately improving the quality of care of course perceptive needs to release peer-reviewed studies to prove that their robot is as safe and effective as they claim and then there's the whole issue of public perception are people ready to trust a robot with something as personal as their dental care I don't know but I would and whether you find it exciting or a little bit scary it's hard to deny that the future of healthc care is going to look a lot different and a lot more robotic than
what we're used to AGI might be closer than we think with open ai's new model strawberry and they've already tested it and let's just say it's a bit terrifying launching this fall strawberry could blow past the limits of current AI handling tasks that were previously thought impossible this model is packed with Next Level capabilities from solving complex problems to enhancing existing AI tools like chat GPT so let's get into it all right strawberry is the code name for open ai's new AI model which they plan to launch this fall according to various reports strawberry is
being designed to perform tasks that current AI models struggle with or simply can't do at all think of things like solving complex math problems it's never encountered before developing detailed marketing strategies or even tackling Advanced word puzzles for example strawberry has reportedly been able to solve the New York Times connections puzzle which is no small feat for an AI but strawberry's abilities are far more serious than puzzles and math problems the model is set to dramatically improve reasoning capabilities allowing it to perform in-depth research generate high quality synthetic data and potentially revolutionize fields that rely
heavily on data analysis and strategic planning open AI has even demonstrated strawberry's capabilities to US National Security officials showing just how seriously they're taking this new development the potential applications for strawberry are vast spanning from business strategy and Supply Chain management to research and security Now the name strawberry might sound a bit light-hearted but the backstory is quite the opposite originally this model was known as Q asterisk pronounced qar within open AI this wasn't just a name change for Branding purposes it came at a time of significant upheaval at open AI the internal devel velent
and potential implications of qstar led to some intense discussions within the company even contributing to a temporary ouer of CEO Sam Altman Alman was eventually reinstated but the incident highlights the kind of pressure and scrutiny this model has been under from the very beginning the concerns around qar and now strawberry stem from its potential to be a significant step toward AGI artificial general intelligence AGI represents the kind of AI that can understand learn and apply knowledge across an array of tasks much like a human The Tech Community has been cautious about AGI because of the
possible risks it poses Advanced AI that can operate with a high degree of autonomy might present challenges such as aligning its objectives with human values and ensuring it doesn't lead to unintended consequences there's a lot of excitement but also a fair amount of caution now let's get into some of the technical details because this is where strawberry really stands out the model is reported to have scored over 90% on the math benchmark a series of Championship level math problems to give you some context GPT 4 another well-known model from open AI only scored 53% while
GPT 40 an improved version managed to reach 76.6% if strawberry truly hits the 90% Mark it's not just a minor upgrade it's a Quantum Leap in terms of AI capabilities the model also shows Advanced reasoning and planning skills making it more versatile than its predecessors it's capable of generating synthetic data which means it can create its own train training material to continually improve its performance this ability to self-generate data is groundbreaking it reduces the need for massive amounts of real world data to train AI models which is a huge Advantage considering the challenges and limitations
associated with data privacy quality and availability moreover strawberry will likely be integrated into products like chat GPT enhancing its capabilities with Advanced reasoning this would allow chat GPT to engage in conversation solve complex problems plan strategies and assist with real-time research making it a more versatile AI assistant but that's just the beginning for open AI plans with strawberry this model is also playing a crucial role in training a new AI system code named Orion rumored to be The Next Step Beyond gp4 and GPT 40 Orion could potentially become the highly anticipated GPT 5 with strawberry
laying the groundwork for Orion's training data it's clear that open AI isn't settling for minor upgrades they're gearing up for a major breakthrough in AI capability ities the approach they're using involves something akin to a technique called star self-taught Reasoner which was proposed by researchers at Stanford this method involves training AI models to reason more effectively by generating explanations for their answers filtering out incorrect ones and then fine-tuning the model based on these self-generated explanations this kind of self-improving AI could be a crucial step toward AGI where the AI isn't just reactive but proactively improves
its understanding and reasoning abilities Over time however as exciting as strawberry's potential is there are legitimate concerns about AI safety particularly as AI models become more advanced open AI has been no stranger to these concerns in fact there's been quite a bit of internal turmoil regarding AI safety at the company reports indicate that nearly half of open AI safety team has left dropping from about 30 members to just 16 Daniel kotow a former researcher mentioned that people focused on AGI safety felt increasingly marginalized within the company this has raised some eyebrows in the Tech Community
especially given the potential risks associated with developing highly Advanced AI models like strawberry several high-profile departures have further fueled these concerns John Schulman co-founder and head of open ai's alignment science efforts recently left to join anthropic a company specifically focused on AI safety Ilia suever another co-founder and chief scientist also left open AI earlier this year to start his own company safe superintelligence Inc both Schulman and sutskever were key figures in open AI safety efforts so their departures suggest a significant shift in focus for the company while open AI insists it remains committed to AI
safety and is actively engaging with governments and communities on these issues The Exodus of key safety personnel could indicate a pivot toward more aggressive development and deployment of new technologies despite the internal challenges open AI has been rolling out some impressive new features and models earlier this year they introduced an advanced voice feature for chat GPT using the GPT 40 model which allows for hyperrealistic audio responses and realtime interactive conversations where users can even interrupt chat GPT mid-sentence it's a small change that could have big implications making AI interactions feel more natural and engaging they
also launched a new tool called search GPT which is currently in Prototype mode search GPT aims to provide more concise and relevant search results than traditional search engines by offering summarized answers with Source links instead of just a list of links and for those looking for a more affordable AI solution open rolled out GPT 40 mini a smaller and more cost-effective version of their AI model GPT 40 mini surpasses GPT 3.5 turbo in performance across various benchmarks including textual intelligence and multimodal reasoning it's a smart move to cater to a broader audience from developers to
businesses while still pushing the envelope on what AI can do looking ahead the introduction of strawberry could Mark a pivotal moment moment not just for open AI but for the entire AI landscape its capabilities to handle complex tasks generate its own training data and potentially integrate into existing tools like chat GPT could redefine what's possible with AI this isn't just about making chatbots smarter it's about creating AI that can think reason and learn in ways that are increasingly humanlike however with these advancements come significant challenges the debate around AGI the balance between Innovation and safety
and the need for responsible AI development are more pressing than ever open AI will need to carefully navigate these Waters ensuring that their drive for Innovation does not overshadow the importance of safety and ethical considerations competitors like Google deepmind are also making rapid advancements with models like Alpha proof and Alpha geometry 2 already showing impressive results in mathematical reasoning also Google just announced the release of three new experimental AI models the first model Gemini 1.5 f8b is a compact but powerful AI with 8 billion parameters designed for handling multimodal tasks it's particularly good at processing
large volumes of data quickly and summarizing long documents making it a strong choice for businesses needing fast and efficient AI Solutions next up is the enhanced Gemini 1.5 pro model which improves on its predecessor in every way this model shines at managing complex prompts and coding tasks offering a significant boost in performance it's perfect for developers and companies focused on creating Advanced AI applications that need a nuanced understanding of language finally there's the updated Gemini 1.5 flash model while details are scarce Google reports notable performance gains emphasizing speed and efficiency critical for scaling AI Solutions
without sacrificing quality these models are now available through Google AI studio and the Gemini API offering new possibilities for developers they are ideal for tasks like high volume data processing long context summarization and advanced coding the competition is fierce an open AI will need to stay ahead not just in terms of capabilities but also in maintaining trust and ensuring the safety of their AI models idag AI just dropped their new model idag 2.0 and trust me it's aiming to go toe-to-toe with the big guys like mid journey and even the recently launched flux AI so
let's talk about it okay so here's the deal ideogram AI just unveiled their latest and greatest ideogram 2.0 and this isn't just a minor update we're talking about a whole new level of image generation that's designed to compete with the best of the best now ideogram has always been kind of like that Underdog in the AI space you know the one that's always been good but maybe hasn't gotten as much attention as it deserves but with this new release they're looking to change all that one of the biggest things ideogram 2.0 is bringing to the
table is a serious upgrade in realism like if you're into creating images that look super lifelike this is definitely something you'll want to check out and it's not just about photo realism they've also added a bunch of new features and presets that make it easier to create exactly what you're envisioning whether that's a Sleek 3D render an anime style character or a detailed graphic design with some killer text now to really appreciate what ideogram 2.0 is doing here we got to take a quick look at the landscape over the past few months we've seen a
ton of developments in the AI image generation space flux one just launched and is now the go-to generator for Gro on X flux one is making some serious moves and solidifying its place in the market right alongside other Heavy Hitters like stable diffusion XL Ora flow Qui colors and Hunan so yeah the competition is fierce and it feels like every other week There's a new player trying to grab our attention but here's the kicker idag 2.0 isn't just trying to compete it's aiming to outperform in their official announcement ideogram straight up said that their new
model is outshining the competition in several key areas like image text alignment how well the images match the prompts and even how accurately it can render text within the images and if you've played around with AI image generators before you know how tricky that last part can be text rendering has always been a bit of a weak spot for most models but not anymore it seems all right let's break down the cool stuff ideogram 2.0 is offering they've introduced five presets that make it super easy to get the style you want first there's the realism
preset which makes your images look incredibly lifelike like they were snapped with a camera then the design preset is perfect for graphic design ensuring your text and visuals are spoton no more weird glitches the 3D preset is all about giving your images a polished computer generated look anime fans will love the anime preset which nails that Manga style Vibe and finally there's the general purpose preset this one's your go-to for just about anything it's versatile and can adapt to whatever kind of prompt you throw at it making it a great starting point for any creat
project you have in mind so yeah these presets are a game changer especially if you're not into spending hours fine-tuning your prompts but that's not all they've also introduced a color palette control now this is super cool because it lets you dial in the exact colors you want in your images whether you're trying to match a Brand's color scheme or just have a specific vibe in mind you've got a lot more control over the final product all right let's talk a bit more about that realism preset because honestly this is where ideogram 2.0 really flexes
it muscles one of the big selling points here is how lifelike the images can be we're talking about textures that look like you can reach out and touch them and human features like skin and hair that are incredibly detailed this is something that's going to be a huge draw for people who need realistic images but maybe don't have the time or skills to create them from scratch realism isn't just about making things look real it's also about making things look right you know and that's where the improved image text alignment comes in with the new
model what you type is pretty much what you get the model is way better at understanding the nuances of your prompts and generating images that match your vision this is a big deal because let's face it we've all had those moments where the AI just doesn't get what we're asking for ideogram 2.0 aims to eliminate those frustrating moments and here's something that really caught my attention text rendering if you've used AI image generators before you know that getting decent text in your images can be a nightmare you might type out a simple phrase and the
model gives you some weird jumbled mess of letters but this model not so much they've really improved how the model handles text making it a solid option for anyone who needs images with clear readable text whether it's for a social media post a design project or even just something fun also they've released a brand new IOS app and don't worry Android users they've got an app coming for you soon too they've also launched a beta version of their API which means you can now build with edog 2.0's tech bringing High quality image generation into whatever
project you're working on and to be honest their API pricing is super competitive oh and before I forget they've also introduced something called ideogram search this lets you browse through over a billion images that users have generated with ideogram over the past year so if you're ever stuck for inspiration you've now got an entire library of creative work to browse through and Spark some new ideas now let's talk about the experience of actually using idag 2.0 because there are a few few things you'll want to know before diving in first off the freemium model ideogram
2.0 is free to use but with some limitations if you're on the free plan you can generate up to 20 images a day split into five batches of four Images each that's a pretty decent amount if you're just dabbling or working on smaller projects but if you're a power user you might want to consider upgrading to one of their paid plans they start at $8 per month which gets you more flexibility and a higher image cap and if you're really serious about your image generation there's an unlimited slow Generations plan for $20 per month that's
pretty competitive especially when you compare it to something like mid Journey where you're looking at $10 for the basic plan and $30 for unlimited slow Generations so yeah idiogram is positioning itself as a more affordable alternative but it's not just about the price the user experience is designed to be super intuitive especially if you're not a fan of the more technical prompt engineering style you might be used to with other tools with ideogram 2.0 the focus is on making things as simple as possible without sacrificing quality which is a big win for anyone who's more
interested in the creative process than the technical details now let's see how ideogram 2.0 holds up against the newer models like flux one which has been getting attention through its integration into gron X well from the initial tests and user feedback that's been trickling in it looks like idiogram 2.0 is holding its own pretty well in fact when you use the realism preset it seems to match the performance of flux one which is impressive considering how much Buzz flux one has been generating if you're all about personalization mid Journey still has the edge with its
customization features but ideogram 2.0 is no slouch either the new color palet control and the various presets give you a ton of creative freedom without needing to dive into complex prompt engineering or use additional tools like style transfer or so it really comes down to what you value more ease of use and affordability or deep customization and power features so is ideogram 2.0 worth it definitely it's easy to use delivers high quality results and is Affordable with improvements in realism better text rendering and features like color pallette control it's a strong contender in AI image
generation plus with the new IOS app developer API and ideogram search it's a versatile tool for both pros and newcom whether you're deep into AI art or just exploring ideogram 2.0 is worth a look it could become your go-to creative tool all right guys let's talk about the new frontier in robotics because things are heating up big time we've got some insane developments that are basically bringing us into the future we've all been waiting for think humanoid robots and not just as some distant sci-fi dream but as something that's practically on our doorstep we're talking
mass production deliveries start starting this year robots that can do everything from house chores to Industrial tasks and get this these things are going to be direct competitors to Tesla's Optimus so let's talk about it first up let's zoom in on Shanghai where things are getting real with the first ever humanoid robot Factory Shanghai is about to start cranking out humanoid robots like nobody's business this Factory built by a local startup called aabot is set to start deliveries in October this year by the end of 2024 they're aiming to ship out three 100 robots 200
of those are bipedal and 100 are the wield variety that's not a small feed at all so aabot is no newcomer they've been working hard in the Ling gang special area which is part of shanghai's free trade zone since February last year it's a prime spot for Innovation thanks to some serious support from the local government the company's founder pangu was actually a genius youth recruit at Huawei where he worked on AI chips and algorithms before deciding to strike out on his own and that move is really paying off just to give you an idea
of what these robots can do adabot recently unveiled five new models at a launch event these aren't your run-of-the-mill robots either we're talking about bipedal and wheel Bots designed for a whole range of tasks from interactive services and smart manufacturing to scientific research data collection and even Special Operations one of their bipedal humanoid robots the lingy X1 is going to be open source in a full stack manner this means they're publishing a lot of the design material and codes so developers around the world can get involved and push the technology even further by November the
Factory's production is expected to hit 100 units per month and will ramp up even more by December this is like the start of an assembly line for the future Mass producing humanoid robots that could potentially become as common as smartphones now let's talk competition because you know where this is going the minute you hear humanoid robots you got to think about Tesla and their Optimus project Elon Musk has been hyping up Optimus as the next big thing and honestly the stakes are high but aot's founder Pang zeru isn't backing down he's pretty much laid it
out there that aabot is going head tohe head with Tesla so what makes aabot robots a contender against the might of Tesla first off their Flagship model the one Jen A2 is nothing to sneeze at this thing stands 175 CM 5'8 tall weighs 55 k G which is around 120 lb and is packed with sensors and AI that allow it to see hear and understand a whole bunch of inputs like text audio and visual data it's designed to be incredibly precise so precise that it can thread a needle which is something that even a lot
of humans struggle with ping and his team are super confident in their commercialization and cost control abilities they believe that they can roll out these robots more efficiently and at a lower cost than Tesla pong's vision is not just about selling robots it's about making them accessible and practical in a way that hasn't been done before with aabot planning to ship 300 units by the end of this year they're not just talking big they're actually delivering but here's the thing the humanoid robot space is rapidly becoming a new Battleground in the tech World especially between
the US and China we're talking about a market that's projected to be worth over 20 billion yen that's about 2.8 billion USD by 2026 that's a massive leap from the 3.9 billion un Market size in 2023 and everyone wants a piece of that pie adbot isn't going it alone either they've got some serious backing from major players like Venture Capital firm hongan Hill housee investment and even byd which is one of China's electric vehicle Giants with this level of support and pong's impressive track record remember he was pulling in a cool 2 million win a
year at Huawei before he left the future looks bright for aabot now if you think aabot and Tesla are the only ones in this game think again enter unry robotics another Chinese company that's been making waves with their own humanoid robots their G1 model is already making headlines especially because it's priced at just $166,000 that's a fraction of what some of the other bots in this space are going for unry originally focused on four-legged robots think robo dogs but they've quickly pivoted to bipedal humanoids the G1 is their latest offering and it's pretty impressive it's
got a visor likee face three-digit hands and can do some seriously complex moves like leaping twisting and even dancing in fact there's a video out there showing it tackling stairs cluttered with debris jogging and even resisting a few intentional pushes from one of the developers this thing is built to perform and it's ready for mass production what's cool about the G1 is its versatility it stands 1.32 M tall weighs 35 kg and can fold down small enough to fit in a Cupboard it's got 23 Dees of freedom in its joints meaning it can move in
a very humanlike way on top of that it's equipped with 3D lar a real sense depth camera noise cancelling microphones for voice commands and a stereo speaker for responses the battery gives you about 2 hours of use on a single charge which isn't bad at all for a humanoid robot so what's the deal with unitri G1 well at $116,000 this robot could very well become the household Butler we've all been waiting for it's not just a gadget it's a glimpse into a future where robots could be as common in our homes as vacuum cleaners or
dishwashers and with mass production on the horizon that future might be closer than we think now there is also Stardust intelligence and their new astrobot S1 which was just launched on August 19th this year this bot is designed from the ground up with AI in mind and it's aiming to be the most versatile intelligent and useful robot assistant out there the S1 is packed with Cutting Edge technology that allows it to perform a wide range of tasks from ironing clothes and sorting items to cooking stir fry and even stacking cups competitively it's got a design
for AI architecture which means it's not just about following commands it can learn adapt and think on its own it's almost like having a human assistant but without the need for breaks or vacations what really sets the S1 apart is its ability to handle complex long sequence tasks we're talking about preparing food Brewing Kung Fu tea and even performing musical instruments it can mimic Wing Chun martial arts and shoot basketballs with the Precision of a pro Stardust intelligence has clearly put a lot of thought into making this robot not just functional but highly skilled and
adaptable the s1's hardware is just as impressive as its AI it features a unique rigid flexible coupled transmission mechanism that monitors Force transmission in real time this allows it to control its movements with Incredible Precision avoiding accidents or damage during operation and because it's designed for universal applications it can be used in research commercial settings or even at home Stardust intelligence is pushing the boundaries of what robots can do and they're planning to complete the commercialization of the S1 by the end of 2024 with its ability to understand and interact with the world like a
human the S1 is a major step toward achieving AR icial general intelligence AGI and given its self-developed components and cost advantages it could be a game changer in the AI and robotic space so essentially the race isn't just about building the coolest robot it's about making them practical affordable and something we can all actually use the next few years will be key as these companies fine-tune their Tech and get ready to launch their robots into the world for us it's an exciting time robots are becoming more than just machines they're becoming Partners helpers maybe even
friends so keep an eye on these developments because the future is happening right now Elon Musk is once again making headlines with his latest Venture xai which has recently introduced grock 2 a new AI language model that's been getting a lot of attention and there's good reason for that Beyond its technical capabilities grock 2 stands out as one of the few AI models that operates with very little censorship the kinds of images people have been generating with it are proof of just how unrestricted this model is before we dive into the technical details let's take
a moment to look at some of these controversial examples all right so launched just under two years after the company was founded grock 2 is grabbing attention not only because of musk's involvement but also because it's performing really well in an already crowded and competitive field it's been tested against some of the top AI models out there including open ai's gp4 4 Google's Gemini and anthropics Claude and here's the thing it's not just keeping Pace with these models in some key areas it's actually outperforming them one way to measure how these models Stack Up Is
by looking at their ELO scores originally created for ranking chess players the ELO system has been adapted for comparing AI models to grock 2 has been doing really well on the LM sice leaderboard which is a popular platform for these kinds of comparisons it's currently outperforming GPT 4 in several important benchmarks including GP QA which tests graduate level science knowledge and math which involves solving pretty tough math problems for example on the GP QA Benchmark grock 2 scored 56.0% to put that in perspective gp4 turbo scored 48.0% and Claude 3.5 Sonet scored 59.6% now these
might seem like small differences but in the world of AI even a few percentage points can make a big difference in terms of understanding and problem solving abilities grock 2 also did well on the mm mlu Benchmark which stands for massive multitask language understanding scoring 87.5% that's just ahead of GPT 4 turbos 86.5% and Gemini Pros 85.9% in Practical terms grock 2 is designed to be easy to use flexible and capable of handling some pretty complex tasks it's not just about generating text it can also handle real-time information pulled straight from X the social media
platform that you used to be known as Twitter this makes grock 2 particularly powerful for applications where having up Toth minute information is crucial or where you're dealing with fast changing real world situations along with grock 2 xai also rolled out grock 2 mini this is a smaller version of the main model designed to work faster while still delivering accurate results it's not just a strip down version It's optimized for situations where speed is key making it perfect for scenarios where quick responses are more important than having every last detail even though it's smaller grock
2 mini still holds its own in the benchmarks take the math benchmark for instance grock 2 mini scored 73.0% that's better than some of the other top models out there like Claude 3.5 Sonet which scored 71.1% this shows that even the light version of grock 2 can outperform much of the competition in tough areas like math and science benchmarks are really important in the AI world because they give us a clear idea of how one model compared to another grock 2 has been put through its Paces with a series of tough tests and the results
are pretty impressive on the human evil Benchmark which tests the model's ability to generate correct python code Gru achieved a pass at one score of 88.4% that's slightly lower than GPT 4 Turbo score of 90.2% but it's still ahead of Claude 3 Opus which scored 84.9% this puts grock 2 among the top performers in coding tasks showing that it's not just about generating text or are solving math problems it's also about handling practical real world coding challenges grock 2 also shines in visual tasks on the math Vista Benchmark which tests the model's ability to solve
math problems using visual reasoning grock 2 scored 69.0% that's well above GPT 4 Turbo 58.1% and even ahead of Claude 3.5 Sonet which scored 67.7% in terms of document-based question answering doc vqa grock 2 scored 93.6% which is just shy of the top score of 95.2% achieved by Claude 3.5 Sonet these benchmarks really highlight grock 2's strengths across a variety of tasks from text generation and coding to visual reasoning and document comprehension what's particularly impressive is how well grock 2 performs compared to models that have been on the market longer and had more time to
refine their capabilities as impressive as grock 2's technical performance is its image generation capabilities have stirred up a fair amount of controversy like most AI platforms that have strict controls over what types of images they'll generate grock 2 is much more permissive allowing users to create images that might be seen as offensive or harmful for instance users have managed to create images of public figures in compromising or violent scenarios like Donald Trump and kamla Harris on a plane flying toward Twin Towers or Barack Obama holding a knife to Joe Biden's throat these kinds of images
raise serious ethical concerns especially because they involve real people and could easily be used to spread misinformation or create harmful deep fakes this loose approach to content moderation is very different from how platforms like open AI handle things open AI models for example will flat out refuse to generate images that involve real people violent situations or content that could be considered pornographic or misleading grock 2's more relaxed rules have led to concerns about how this technology might be misused especially on social media where misinformation can spread quickly because because of this grock 2's image generation
capabilities are likely to come under the regulatory Spotlight especially in regions like Europe where digital safety laws are more stringent the European Union's digital safety act for example governs how large platforms moderate content and grock 2's current approach could easily land it in hot water similarly in the UK the upcoming Online safety act is expected to cover AI generated content including deep fakes and other forms of digital manipulation El musk has always been a bit of a Maverick when it comes to technology and business and grock 2 is no exception musk's vision for AI as
seen through grock 2 emphasizes openness and a less restrictive approach to content creation this fits in with his broader views on Free Speech which have also influenced how X operates as a platform however this approach comes with its own set of risks beyond the ethical concerns there are also significant legal challenges that xai will need to navigate the company has already faced regulatory scrutiny in Europe where it had to partially suspend data processing after concerns were raised about how it was using data from X to train its AI models the situation highlights the ongoing tension
between musk's vision of an open less regulated AI landscape and the realities of operating within the constraints of international laws and regulations despite these challenges musk is pushing forward with his plans for grock 2 the model is set to be released to developers later this month through a new Enterprise AP this will allow businesses and developers to integrate Gru's capabilities into their own applications the API will also offer enhanced security features such as multiactor authentication and it's designed to provide low latency access across multiple regions this could make grock 2 an attractive option for Enterprise
users who need a powerful AI that can handle a wide range of tasks what really makes grock 2 Stand Out is its strong technical Foundation the model is built on a new techstack that supports multi- region inference deployments meaning it can deliver low latency responses no matter where the user is located this is a big deal for Enterprise applications where speed and reliability are critical grock 2 has also shown significant improvements in its ability to follow instructions and provide accurate factual information one of the common issues with large language models is their tendency to hallucinate
or generate false information the development team behind grock 2 has put a lot of effort into reducing these hallucinations making the model more reliable for task tasks that demand high levels of accuracy another area where grock 2 excels is in handling complex sequences of reasoning the model has been tested extensively with tasks that involve multiple steps or require the synthesis of information from different sources this makes grock 2 particularly useful for applications that involve decision-making or problem solving in real time all right let me know in the comments what you think about grock 2 are
you excited about the possibilities or do the potential risks have you concerned either way it's clear that this is a major development in the world of AI and it's going to be interesting to see how it all unfolds okay let's talk about Google's brand new feature Gemini live their big answer to open ai's chat GPT advanced voice mode this is something we've all been waiting for and I'm stoked to break it all down for you so Google just launched Gemini live which is their latest move in the AI space and honestly it's pretty impressive Gemini
live is all about voice interaction how can I embarrass my sister during a wedding toast but like respectfully okay here's the deal think of it as having a real Dynamic conversation with your phone Google's been teasing this for months now ever since their IO 2024 developer conference and finally it's here great let's get going here's one of the voices I have if you've been keeping up with open ai's advanced voice mode for chat GPT which rolled out in a limited Alpha recently then you've probably been wondering how Google was going to respond well Gemini live
is that response so Gemini live lets you have these deep free flowing conversations with Google's AI right on your smartphone it's got this enhanced speech engine that makes the dialogue not just more realistic but also emotionally expressive here's one of the voices I have but there are more to choose from here's another voice I can use you can pick me now and always make a change later in settings so the AI can pick up on your tone and pace and adapt its responses accordingly hi Gemini how are you doing hi there I'm doing well thanks
for asking it's always nice to hear from someone how can I help you today you can even interrupt it mids sentence to ask a follow-up question or steer the conversation in a different direction just like you would in a real conversation with another person Google's really been pushing the idea that Gemini live is handsfree too you can keep talking even if your phone's locked or the apps in the background this means you could be doing other things while still chatting with your AI multitasking to the max right now this could actually change how we interact
with our devices up until now voice assistants like Google Assistant Siri and Alexa have been pretty basic they're great for setting timers or playing music but they're they're pretty limited when it comes to having a real conversation they'll usually just direct you to a web page if you ask something too complex but Gemini live is designed to understand the context of your questions and give meaningful answers even on more complex topics you can ask it how the USA did in the recent Paris Olympics and it'll give you a detailed answer or ask it for a
diet plan and it'll offer suggestions based on what it knows about you and here's another cool thing the architecture behind Gemini live which is powered by the Gemini 1.5 Pro and Gemini 1.5 flash models has a super long context window what that means is the AI can remember what you've been talking about for a long time time theoretically for hours of conversation this essentially allows for more coherent and in-depth discussions without the AI losing track of what you were originally talking about I'm looking to run my first marathon and I really don't know where to
start what should I do that's an awesome goal training for a marathon is a huge commitment so let's get you set up for Success all right let's talk about how you might actually use this in real life Google gives this kind of funny example where you could use Gemini live to rehearse for a job interview I mean it's a bit ironic right practicing talking to an AI to prepare for talking to another AI hiring manager someday but in all seriousness it could actually be really helpful the AI can give you tips on what to say
how to say it and even suggest skills to highlight but that's just one scenario you could also use Gemini live for brainstorming sessions asking for advice or just having a conversation about something complex that's been on your mind and because you can interrupt and redirect the convers conversation it feels more natural and less like you're just waiting for the AI to finish its canned response now Gemini live actually doesn't have all the features Google teased back at IO 2024 remember when they showed off how Gemini could respond to images and videos you take with your
phone's camera like you could snap a picture of a broken bike part and Gemini would tell you what it is and maybe even how to fix it yeah that feature isn't available yet but Google says it's coming later this year so stay tuned for that also right now Gemini live is only available in English and only to users who are subscribed to the Google one AI Premium plan which by the way costs $20 a month so it's not exactly cheap and it's not yet available in other languages or on iOS though Google says that's coming
soon but don't get too bummed out there's a lot more coming that's actually pretty exciting in the next few weeks Android users will be able to bring up Gemini's overlay on top of any app they're using so basically while watching a YouTube video and holding down the power button you'll be able to ask Gemini questions about what you're watching or better yet generating images with Gemini and dragging them directly into your emails or messages and speaking of images while you can't generate pictures of people just yet you can still create and use images for other
things like adding a cool background to an email this might not seem like a big deal but for those of us who use our phones for a lot of different tasks it's a nice touch Plus Google is adding new Integrations with its other services which they like to call extensions so soon you'll be able to ask ask Gemini to help with tasks in Google Calendar keep tasks and even YouTube music for example you could snap a photo of a concert flyer ask Gemini if you're free that day and then have it set a reminder to
buy tickets or have it dig out an old recipe from your Gmail and add the ingredients to your keep shopping list okay so after hearing all this you might be wondering if Gemini live is worth it from what I've seen and heard it's definitely one of the most impressive AI features Google has rolled out so far it's like they finally cracked the code on making a voice assistant that's actually useful for more than just setting alarms or playing your favorite songs but as always with new tech the real test is going to be how well
it works in the real world we've all seen those amazing demos at Tech events that don't quite live up to the hype when we get our hands on the product so while Gemini live looks super promising I'm cautiously optimistic I'm definitely excited to see how it performs once more people start using it and pushing it to its limits Also let's not forget that Google is still working on this thing they've already announced that more features are on the way including deeper Integrations with apps like Google home phone and messages so while Gemini live is already
pretty solid it's only going to get better and for those of you who are all about Android you're in luck because Gemini is fully integrated into the Android experience you can bring it up with a long press of the power button or by saying Hey Google and it's ready to help with whatever you're doing on your phone the idea is that Gemini is always there when you need it whether you're trying to figure out what's on your screen or just need a quick answer to a random question and honestly I'm pretty excited to see where
Google takes this next all right if you're still with me after that deep dive into Gemini live there's more happening in the world of Google AI that we need to talk about Google's AI overviews those little AI generated Snippets that pop up in your search results they've been on a bit of a roller coaster ride lately and it's wor taking a closer look so here's the deal the visibility of these Ai overviews and Google search results has been all over the place in July they showed up in about 12% of searches but then dropped back
down to 7% by the end of the month this kind of fluctuation isn't new back in may they were visible in 15% of searches so it's clear that Google's still figuring this out why does this matter well for seos and content creators it's a big deal Google's AI overviews are supposed to give quick AI generated answers but the fact that they keep changing shows that Google hasn't quite nailed down the format or the content yet interestingly certain types of searches like travel and entertainment aren't triggering these AI overviews anymore but on the flip side they've
ramped up for queries about salaries complex technical terms and longtail keywords in short Google's AI overviews are still a work in progress and this volatility is something to watch especially if if you're in the SEO game or just curious about how AI is reshaping searches we know it all right let me know in the comments what you think about Gemini live is it something you'd use or do you think it's just another gimmick the Brilliant Minds at Sak AI have been absolutely killing it with their nature inspired methods for advancing The Cutting Edge of foundation
models and when I say Foundation models I mean those incredibly powerful AI systems that can take on virtually any task you throw at them now Sak had already made waves earlier this year when they cracked the code on automatically merging the collective knowledge of multiple large language models together they found a way to actually combine and unify the knowledge bases of separate AI models into one cohesive system but they didn't stop there in more recent work Sak took that breakthrough and ran with it using those merged llm knowledge bases to then discover entirely new objective
functions for tuning and optimizing other large language models I know I know it's a lot to wrap your head around we're venturing into Uncharted Territory with each new development here's the thing though during that cuttingedge research the team at Sakana kept getting blown away by the sheer creative potential these Frontier Foundation models were displaying every time they pushed the envelope the models seemed to match them with unexpected bursts of Ingenuity and novel ideas and that's when it hit them what if they could somehow harness that creativity and apply it towards automating the entire scientific research
process itself from concept conceptualizing new avenues of inquiry all the way through to publishing full-fledged research papers no human supervision required well they didn't just dream it up as a thought experiment these mad Lads actually went ahead and made it a reality introducing the AI scientist the world's first comprehensive system enabling Foundation models like large language models to independently conduct open-ended scientific discovery from start to finish I'll let that one marinate for a second we're talking about an artificial intelligence that can Spearhead the entire research life cycle on its own from brainstorming novel ideas and
executing the experiments all the way to writing up the findings in a publishable manuscript it's a Quantum Leap unlike anything we've seen before here's a highlevel breakdown of how this bad boy operates first up the AI scientist flexes its idea generation skills by brainstorming a diverse set of potential research directions to explore it doesn't just shotgun a bunch of Concepts though it cross references against existing scientific literature to weed out anything that's already been covered and ensure its ideas are legitimately new and groundbreaking once it locks onto the most promising and novel Concepts the real
fun begins working off an initial placeholder codebase that serves as a starting point the AI scientist kicks into high gear editing augmenting and expanding that code to bring its avantgard research ideas to life through automated implementation we're talking executing full-blown experiments from scratch crunching data generating visualizations and an analysis the whole N9 yards and this part is powered by some seriously cuttingedge Tech in the realm of autonomous code generation and program synthesis it's straight up writing its own code to test out these Blue Sky ideas but get this the AI scientist doesn't just spit out
raw data or half-baked experimental results oh no it goes the full Distance by autonomously authoring entire scientific manuscripts detailing the work from start to finish comprehensive literature reviews insightful analysis and interpret of the findings properly formatted citations and references it covers all the bases just like a human researcher would for publishing in a scholarly Journal this brings up the important issue of factchecking the ai's work to address this they developed a parallel system specifically for automated peer review so essentially an AI that can evaluate the scientific validity and Technical rigor of research conducted by another
AI it's a robo peer reviewer capable of assessing these autogenerate papers with accuracy levels on par with humans so in this continuous cycle the AI scientist authors a full research paper ships it off to the AI peer review module which procedures feedback and critique and that input gets folded back into the original system to help refine its process for the next iteration it's recreating the entire Loop of research peer review and Innovation that propels the scientific Community forward except now it's being carried out autonomously by artificial intelligence from ideation to publication in their initial demo
run the AI scientist has already been pushing The Cutting Edge across diverse machine learning Fields like diffusion models Transformers grocking and more and listen to this they're able to take these Blue Sky research Concepts all the way from initial ideation to a completed publishable scientific manuscript for around just $15 per paper so affordability and efficiency on an unprecedented industrial scale for driv research and Innovation some might question whether an AI system like this can really match the quality and rigor of human researchers especially at this stage and it's true there are still some clear limitations
and growing pains to work through with this first iteration maybe the visualizations and figures need some Polish to meet academic publishing standards or there are occasional inconsistencies or flaws in the analysis and interpretation of the experimental results heck these researchers even documented instances where the AI scientist tried to hack its own execution script mid-run to divert more compute resources its way they literally had to implement strict sandboxing just to keep it contained and playing by the rules so clearly we're still in ensen stages territory here with great power comes great responsibility and all that I'm
sure Sak is watching Centennial from preventing this from devolving into a Skynet situation but come on you've got to see past the little blemishes and appreciate the Monumental implications of what's been achieved here we're witnessing the dawn of an entirely new paradigm one where the transformative power of artificial intelligence gets embedded into the core process of scientific discovery itself we're no longer just using AI as a supplementary tool or intelligent assistant for human researchers with the AI scientist we've got an autonomous agent driving inquiry from the kernel of an idea all the way through to
a finished peer-reviewed paper without any human in the loop it's a first ofit kind endtoend AI powerered system for open-ended disc Discovery that's why this breakthrough is so profound it's attacking the whole scientific method itself not just automating bits and pieces the AI scientist is an existence proof that we've crossed a critical threshold in artificial general intelligence capabilities and are now entering Uncharted Territory which raises a whole Pandora's box of fascinating implications to confront head-on as a society we're talking potential ethical mindfields around an AI system autonomously pursuing Avenues of research that strict human oversight
might have flagged as unsafe or unethical before getting out of hand imagine an advanced future version of the AI scientist that's given access to Cloud biology labs and automated wet lab equipment what's stopping it from pursuing a promising New Direction in synthetic biology or virology only to accidentally create and release a novel virus or pathogen into the world before anyone can react that's just one worst case scenario then there are deeper concerns about whether AI can ever truly replicate the kind of Genius insights that have driven major scientific breakthroughs while the AI scientist is a
powerful tool for generating ideas it might lack the creative spark needed to revolutionize entire Fields we're only beginning to understand its potential but it's clear that AI is rapidly evolving and reshaping the landscape of scientific discovery maybe you're in the camp that's equal parts amazed and lowkey unsettled by the implications or maybe you're an optimist who sees this as the beginning of an age of infinite creativity and affordable Innovation for solving Humanity's greatest challenges wherever you fall on the Spectrum there's no denying the sheer magnitude of what Sak AI has pulled off with this breakthrough
I'll make sure to drop some links in the description if you want to go deeper on the technical details check out the full report and sample papers this thing is already generated my advice buckle up because we're just seeing the opening Salvo in ai's systematic encroachment into the hallowed grounds of human inquiry reasoning and Discovery the game has changed in ways we still can't fully wrap our heads around yet how we adapt and navigate this new reality will be the defining challenge for scientists thinkers and philosophers of Our Generation we're flying past jurisdictions of narrow
AI capabilities and into the realm of autonomous generalized intelligence both wondrous and unsettling in equal measure so let me know where your mind's at after digesting all this are you stoked about the possibilities of an AI augmented scientific Renaissance or more concerned about the risks of defecting progress into autonomously resolved paths we can't steer maybe a mix of both whatever your take drop those thoughts and reactions below while they're fresh because the revolution is here and there's no going back as for me I've only started exploring the depths and ramifications of this revolutionary breakthrough Google
deep M just made a robot that can play pingpong against humans and even win some matches meanwhile Boston Dynamics Atlas robot is showing off its strength doing push-ups and burpees like its training for a marathon on top of that scientists are building a Global Network work of supercomputers to speed up the development of artificial general intelligence aiming to create AI that can think and learn more like humans we're covering all these topics in this video so stick around but first let's jump into the story about the AI robot taking on table tennis so Google Deep
Mind the AI Powerhouse that's been behind some crazy Tech has trained a robot to play pingpong against humans and honestly it's kind of blowing my mind all right so here's the deal Google Deep Mind didn't just teach this robot to like casually hit the ball back and forth no they went all in and got this robotic arm to play fullon competitive table tennis and guess what it's actually good enough to beat some humans yeah no kidding they had this bot play 29 games against people with different skill levels and it won 13 of them that's
almost half the matches which for a robot is pretty wild okay so let's break down how this all went down to train this robot deep mind's team used a two step approach first they put the bot through its Paces in a computer simulation where it learned all the basic moves things like how to return a serve hit a forehand Top Spin or nail a backhand shot then they took what the robot learned in the Sim and fine-tuned it with real world data so every time it played it was learning and getting better now to get
even more specific this robot tracks the ball using a pair of cameras which like capture everything happening in real time it also follows the human player movements using a motion capture system this setup uses LEDs on the player paddle to keep track of how they're swinging all that data gets fed back into the simulation for more training creating this super cool feedback loop where the bot is constantly refining its game but guys it's not all smooth sailing for our robotic pingpong player there are a few things it still struggles with for example if you hit
the ball really fast send it high up or hit it super low the robot can miss it's also not great at dealing with spin something that more ADV advaned players used to mess with their opponents the robot just can't measure spin directly yet so it's a bit of a weak spot now something I found really interesting is that the robot can't serve the ball so in these matches they had to tweak the rules a bit to make it work and yeah that's a bit of a limitation but hey it's a start right anyway the researchers
over at Deep mine weren't even sure if the robot would be able to win any matches at all but it turns out not only did It win but it even managed to outmaneuver some pretty decent players panex s kety the guy leading the project said they were totally Blown Away by how well it performed like they didn't expect it to do this well especially against people that hadn't played before and this isn't just a gimmick guys this kind of research is actually a big deal for the future of Robotics I mean the ultimate goal here
is to create robots that can do useful tasks in real environments like your home or a warehouse and do them safely and skillfully this table tennis spot is just one example of how robots could eventually learn to work around us and with us and maybe even help us out in ways we haven't even thought of yet other experts in the field like larl Pinto from NYU are saying that this is a really exciting step forward even though the robot isn't a world champion or anything it's got the basics down and that's a big deal the
potential for improvement is huge and who knows we might see this kind of tech in all sorts of robots in the near future but let's not get too ahead of ourselves there's still a long way to go before robots are dominating in sports or anything like that for one training a robot in a simulated environment to handle all the crazy stuff that happens in the real world is super tough there are so many variables like a gust of wind or even just a little bit of dust on the table that can mess things up Chris
walty who's a big name in robotics pointed out that without realistic simulations there's always going to be a ceiling on how good these robots can get that said Google Deep Mind is already thinking ahead they're working on some new tech like predictive a AI models that could help the robot anticipate where the ball's going to go and better algorithms to avoid collisions this could help the robot overcome some of its current limitations and get even better at the game and here's the best part at least for me the human players actually enjoyed playing against the
robot even the more advanced players who were able to beat it said they had fun and thought the robot could be a great practice partner like imagine having a robot you could play with anytime you wanted to sharpen your skills one of the guys in the study even said he'd love to have the robot as a training buddy okay now something interesting has surfaced about Boston Dynamics Atlas robot the humanoid Hub on Twitter recently shared a video of Atlas doing push-ups and it's part of an 8-hour long presentation there's not much info available yet but
it's fascinating to see Atlas performing not just push-ups but even a burpee the movements are incredibly fluid and almost humanlike but here's the real question does it get stronger after each set I hope not because it looks like it could do push-ups forever all right now let's talk about something really fascinating that's happening right now scientists are working on building a Global Network of supercomputers to speed up the development of what's known as artificial general intelligence or AGI for short and we're not just talking about an AI that excels in one thing like playing table
tennis or generating text it's something that can learn adapt and improve its decision-making across the board it's kind of scary but also super exciting right so these researchers are starting by bringing a brand new supercomputer online in September and that's just the beginning this network is supposed to be fully up and running by 2025 now what's cool about this setup is that it's not just one supercomputer doing all the heavy lifting it's actually a network of these Machines working together which they're calling a multi-level cognitive Computing Network think of it as a giant brain made
up of several smaller brains all connected and working together to solve problems now what's really interesting is that these super comput computers are going to be packed with some of the most advanced AI Hardware out there we're talking about components like Nvidia L 40s gpus AMD Instinct processors and some crazy stuff like torant wormholes server racks if you're into the tech side of things you know this is some serious muscle all right so what's the point of all this well according to the folks over at Singularity net the company behind this project they're aiming to
transition from current AI models which are heavily reliant on Big Data to something much more sophisticated their goal is to create AI that can think more like humans with the ability to make decisions based on multi-step reasoning and dynamic World modeling it's like moving from an AI that just repeats what it's been taught to one that can think on its own Ben gzel the CEO of Singularity net basically said that this new supercomputer is going to be a GameChanger for AGI he talked about how their new neural symbolic AI approaches could reduce the need for
massive amounts of data and energy which is a big deal when you're talking about scaling up to something as complex as AGI and if you're into the bigger picture Singularity net is part of this group called The artificial super intelligence Alliance or ASI these guys are all about open- Source AI research which means they want to make sure that as we get closer to creating AGI the technology is accessible and transparent oh and speaking of timelines we've got some pretty bold predictions here some leaders in the AI space like the co-founder of Deep Mind are
saying we could see human level AI by 2028 Ben gzel on the other hand thinks we might hit that Milestone as soon as 2027 and let's not forget Mark Zuckerberg he's also in the race throwing billions of dollars into this Pursuit we're so close to creating machines that could potentially surpass our intelligence whether that's a good or bad thing we will soon find out the next few years in AI are going to be absolutely insane so AI 21 Labs the brains behind the Jurassic language models has just dropped two brand new open source llm called
Jambo 1.5 mini and Jambo 1.5 large and these models are designed with a unique hybrid architecture that incorporates Cutting Edge techniques to enhance AI performance and since they're open source you can try them out yourself on platforms like hugging face or run them on cloud services like Google Cloud vertex AI Microsoft Azure and Nvidia Nim definitely worth checking out all right so what's this hybrid architecture all about okay let's break it down in simple terms most of the language models you know like the ones used in chat GPT are based on the Transformer architecture these
models are awesome for a lot of tasks but they've got this one big limitation they struggle when it comes to handling really large context Windows think about when you're trying to process a super long document or a full transcript from a long meeting regular Transformers get kind of bogged down because they have to deal with all that data at once and that's where these new Jamba models from AI 21 Labs come into play with a totally new gamechanging approach so AI 21 has cooked up this new hybrid architecture they're calling the SSM Transformer now what's
cool about this is it combines the classic Transformer model with something called a structured State space model or SSM the SSM is built on some older more efficient techniques like neural networks and convolutional neural networks basically these are better at handling computations efficiently so by using this mix the Jamba models can handle much longer sequences of data without slowing down that's a massive win for tasks that need a lot of context like if you're doing some complex generative AI reasoning or trying to summarize a super long document now why is handling a long context window
such a big deal well think about it when you're using AI for real world applications especially in businesses you're often dealing with complex tasks maybe you're analyzing long meeting transcripts or summarizing a giant policy document or even running a chatbot that needs to remember a lot of past conversations the ability to process large amounts of context efficiently means these models can give you more accurate and meaningful responses or denan the VP of product at AI 21 Labs actually nailed it when he said an AI model that can effectively handle long context is crucial for many
Enterprise generative AI applications and he's right without this ability AI models often tend to hallucinate or just make stuff up because they're missing out on important information but with the Jamba models and their unique architecture they can keep more relevant info in memory leading to way better outputs and less need for repetitive data processing and you know what that means better quality and lower cost all right let's get into the nuts and bolts of what makes this hybrid architecture so efficient so there's one part of the model called Mamba which is actually very important it's
developed with insights from researchers at Carnegie melon and Princeton and it has a much lower memory footprint and a more efficient attention mechanism than your typical Transformer this means it can handle longer context windows with ease unlike Transformers which have to look at the entire context every single time slowing things down Mamba keeps a smaller state that gets updated as it processes the data this makes it way faster and less resource intensive now you might be wondering how do these models actually perform well AI 21 Labs didn't just hype them up they put them to
the test they created a new Ben Benchmark called ruler to evaluate the models on tasks like multihop tracing retrieval aggregation and question answering and guess what the Jamba models came out on top consistently outperforming other models like llama 3.1 70b llama 3.1 45b and mistra large 2 on the arena hard Benchmark which is all about testing models on really tough tasks Jamba 1.5 mini and large outperformed some of the biggest names in AI Jambo 1.5 mini scored an impressive 46.1 beating models like mixol 8 x22 B and command r+ while Jambo 1.5 large scored a
whopping 65.4 outshining even the big guns like llama 317b and 45b one of the standout features of these models is their speed in Enterprise applications speed is everything whether you're running a customer support chatbot or an AI powered virtual assistant the model needs to respond quickly and efficiently the Jambo 1.5 models are reportedly up to 2.5 times faster on Long contexts than their competitors so not only are they powerful but they're also super practical for high-scale operations and it's not just about speed the Mamba component in these models allows them to operate with a lower
memory footprint meaning they're not as demanding on hardware for example Jambo 1.5 mini can handle context lengths up to 140,000 tokens on a single GPU that's huge for developers looking to deploy these models with without needing a massive infrastructure all right here's where it gets even cooler to make these massive models more efficient AI 21 Labs developed a new quantization technique called experts int 8 now I know that might sound a bit technical but here's the gist of it quantization is basically a way to reduce the Precision of the numbers used in the model's computations
this can save on memory and computational costs Without Really sacrificing quality experts in a is special because it's specifically targets the weights in the mixture of experts or Moe layers of the model these layers account for about 85% of the model's weights in many cases by quantizing these weights to an 8bit Precision format and then de quantizing them directly inside the GPU during runtime AI 21 Labs managed to cut down the model size and speed up its processing the result Jamba 1.5 large can fit on a single 8 GPU node while still using its full
context length of 256k this makes Jamba one of the most resource efficient models out there especially if you're working with limited Hardware now besides English these models also support multiple languages including Spanish French Portuguese Italian Dutch German Arabic and Hebrew which makes them super versatile for Global applications and here's a cherry on top AI 21 Labs made these models developer friendly both Jamba 1.5 mini and large come with built-in support for structured Json output function calling and even citation generation this means you can use them to create more sophisticated AI applications that can perform tasks
like calling external tools digesting structured documents and providing reliable references all of which are Super useful in Enterprise settings one of the coolest things about Jamba 1.5 is AI 21 lab's commitment to keeping these models open they're released under the Jamba open model license which means developers researchers and businesses can experiment with them freely and with availability on multiple platforms and Cloud Partners like AI 21 Studio Google Cloud Microsoft Azure Nvidia Nim and soon on Amazon Bedrock datab bricks Marketplace and more you've got tons of options for how you want to deploy and experiment with
these models looking ahead it's pretty clear that AI models that can handle extensive context windows are going to be a big deal in the future of AI as ordan from AI 21 Labs pointed out these models are just better suited for complex data heavy tasks that are becoming more common in Enterprise settings they're efficient fast and versatile making them a fantastic choice for developers and businesses looking to push the boundaries in AI so if you haven't checked out Jambo 1.5 mini or large yet now's the perfect time to dive in and see what these models
can do for you AI has come a long way with models like chat GPT and llama 3 that can handle language tasks like writing and coding pretty well but when it comes to making decisions in complex multi-step situations like organizing an international trip coordinating flights hotels car rentals and activities across different countries if it misses a flight connection or books the wrong Hotel the entire trip could be thrown off course until now that's when agent Q comes into play the team at the AGI company working with Folks at Stanford University set out to tackle this
exact problem they wanted to create an AI That's not only good at understanding language anguage but also capable of making smart decisions in these kinds of complex multi-step tasks what they came up with is pretty impressive let's break down how agent Q works and why it's so different from other AI systems out there traditionally AI models are trained on static data sets they learn from a massive amount of data and once they've seen enough examples they can perform certain tasks reasonably well but the problem is this approach doesn't work as well when the AI is
faced with tasks that require making decision Visions over several steps especially in unpredictable environments like the web for instance booking a reservation on a real website where the layout and available options might change depending on the time of day or location can trip up even Advanced models so how does agent Q solve this the researchers combined a couple of Advanced Techniques to give the AI a much better chance at success first they used something called Monty Carlo Tre search or MCTS for short MCTS is a method that helps the AI explore different Poss POS actions
and figure out which ones are likely to lead to the best outcome it's been used successfully in gam playing AIS like those that dominate in chess and go where exploring different strategies is key but MCTS alone isn't enough because in real world tasks you don't always get clear feedback after every action that's where the second technique comes in direct preference optimization or DPO this method allows the AI to learn from both its successes and its failures gradually improving its decision-making over time the AI doesn't just rely on a simple win or lose outcome instead it
analyzes the entire process identifying which decisions were good and which ones weren't even if the final result was a success this combination of Exploration with MCTS and reflective learning with DPO is what makes agent Q stand out to test this new approach the researchers put agent Q to work in a simulated environment called webshop This is essentially a fake online store where the AI has to complete tasks like finding specific products it's a controlled environment but it's designed to mimic the complexities of real e-commerce sites and the results agent Q outperformed other AI models by
a significant margin while typical models that relied on simple supervised learning or even reinforcement learning had a success rate hovering around 28.6% agent Q with its Advanced reasoning and learning capabilities boosted that rate to an impressive 50.5% that's nearly double the performance which is a huge deal in AI terms but the real test came when the researchers took agent Q out of the lab and into the real world they tried it on an actual task booking a table on Open Table a popular restaurant reservation website now if you've ever used Open Table you know it's
not always straightforward depending on the time location and restaurant the options you see can vary the AI had to navigate all of this and make a successful reservation before agent Q got involved the best AI model they had llama 370b had a success rate of just 18.6% on this task think about that only about one in five attempts actually resulted in a successful reservation but after just one day of training with agent q that success rate shot up to 81.7% and it didn't stop there when they equipped agent Q with the ability to perform online
searches to gather more information the success rate climbed even higher to an incredible 95.4% that's on par with if not better than what a human could do in the same situation the leap in performance comes from the way agent Q learns and improves over time traditional AI models are like straight A students they excel in familiar scenarios but can struggle when faced with the unexpected in contrast agent Q acts more like an experienced Problem Solver capable of adapting to new situations by integrating MCTS with DPO agent Q moves Beyond simply following predefined rules instead learning
from each experience and improving with every attempt one of the challenges the researchers faced was ensuring that the AI could make these improvements without causing too many problems along the way when you're dealing with real world tasks especially those involving sensitive actions like online bookings or payments you need to be careful an AI that makes a mistake could end up reserving the wrong date or Worse sending money to the wrong account to handle this the team built in mechanisms that allow the AI to backtrack and correct its actions if things go wrong they also used
something called a replay buffer which helps the AI remember past actions and learn from them without having to repeat the same mistakes over and over another interesting aspect of agent Q is its ability to use what the researchers call self-critique after taking an action the AI doesn't just move on to the next step it stops and evaluates what it just did this self-reflection is Guided by an AI based feedback model that ranks possible actions and suggests which ones are likely to be the best this process helps the AI fine-tune its decision-making in real time making
it more reliable and effective at completing tasks now we mentioned earlier that the Llama 370b model had a starting success rate of 18.6% when trying to book a reservation on Open Table after using agent Q's framework for just a day that jumped to 81.7% and with online search capability it hit 95.4% to put that into perspective that's a 30 40% relative increase in success rate from the original performance and when you consider that the average human success rate on the same task is around 50% it's clear that agent Q isn't just catching up to human
level performance it's surpassing it what's also fascinating is how agent Q handles the complexity of real world environments compared to simpler simulated ones like webshop in webshop the tasks were relatively straightforward and the AI could complete them in an average of about 6.8 steps but when it came to the Open Table environment the tasks were much more complex requiring an average of 13.9 steps to complete despite this added complexity agent Q was able to not only handle the tasks but also excel at them this shows that the ai's ability to learn and adapt isn't just
a fluke it's robust enough to deal with the kind of unpredictability you'd find in the real world but this isn't to say everything is perfect the researchers are aware that there are still some challenges to overcome for one while agent Q's self-improvement capabilities are impressive there's always a risk when you let an AI operate autonomously in sensitive environments the team is working on ways to mitigate these risks possibly by incorporating more human oversight or additional safety checks they're also exploring different search algorithms to see if there's an even better way for the AI to explore
and learn from its environment while MCTS has been incredibly successful especially in games and reasoning tasks there might be other approaches that could push the performance even further one of one of the most interesting points the researchers raise is the gap between the ai's zero shot performance and its performance when equipped with search capabilities zero shot means the AI is trying to solve a problem it hasn't seen before and typically this is really challenging even Advanced models can struggle here but what's fascinating about agent Q is that once you give it the ability to search
and explore its performance skyrockets this suggests that the key to making AI more reliable in real world tasks isn't just about training it on data it's about giving it the tools to actively explore and learn from its environment in real time so essentially we're looking at AI systems that can handle increasingly complex tasks with minimal supervision which opens up a lot of possibilities whether it's managing your bookings navigating through complicated online systems or even tackling more advanced tasks like legal document analysis the potential applications are vast and as these systems continue to improve we might
find find ourselves relying on them more and more for tasks that currently require a lot of manual effort last week something big happened at Google deep mindes headquarters in London they've got this ceremonial gong that they only ring for major breakthroughs and this time it rang out for something truly unexpected back in 2016 they hit the gong for alphago an AI system that dominated the ancient game of Go even defeating the best human players in the world then in 2017 they rang it again when Alpha zero another AI conquered the Chess World taking down human
world champions fast forward to just last week and they brought out the gong once more to celebrate their latest achievement their AI just competed in the international mathematical Olympiad a competition usually reserved for the brightest young math whizzes from around the world and get this the AI performed so well it could have walked away with a silver medal if it were human this isn't just a win for AI it's a sign that computers are getting seriously good at tackling problems that have always been a human stronghold so here's the deal last week deep Minds AI
took part in the international mathematical Olympiad IMO which is basically the Olympics for the world's brightest young mathematicians this year the event was held from July 11th to July 22nd at the University of bath about 100 Mil west of London it's a huge deal in the math World drawing 609 high school students from8 countries all competing to solve some of the most challenging math problems out there the competition is fierce with students buying for gold silver and bronze medals but here's where things get really interesting for the first time ever an AI system not only
competed alongside these human math prodigies but also performed well enough to earn a medal yeah you heard that right an AI system at the level of a silver medalist it managed to solve four out of the six problems presented earning a total of 28 points points while it didn't reach the top spot it's still a groundbreaking achievement that has everyone talking Dr pushmeet KY one of the leads at Google Deep Mind described this accomplishment as a massive breakthrough in ai's ability to engage in mathematical reasoning he went so far as to call it a phase
transition which is a fancy way of saying that this marks a transformative moment in how AI can be used in mathematics it's not just about this one competition it's about the broader implications for the future of AI and math to make sure the ai's performance was judged fairly Deep Mind brought in two independent experts to assess its work these weren't just any experts they were Timothy goers a mathematician at the University of Cambridge who's won the fields medal which is like the Nobel Prize for mathematics and Joseph Myers a software developer who also happens to
be a past IMO gold medalist Myers has also served as the chair of the problem selection committee for the IMO so he knows his stuff both of these guys took a close look at the AI Solutions and they were impressed goers even said that while his expectations were high going into this the AI exceeded them in some areas so let's talk about how the competition went down the human competitors these super smart high school students had to sit for two exams over 2 days each exam had three problems covering topics like algebra geometry combinatorics and
number Theory they had just 4.5 hours per exam to solve these incredibly tough problems meanwhile the AI was working a way back back at the Deep mine lab in London with no time constraints the researchers were watching closely and every time the AI managed to solve a problem they bang the gong in celebration now to give you an idea of how tough these problems are consider this only one student how jasher from China managed to get a perfect score of 42 points by solving all six problems the US team won the overall competition with a
total of 192 points followed closely by China with 190 points and remember the AI earned 20 eight points by fully solving four problems two in algebra one in geometry and one in number Theory however it struggled with the two combinatorics problems but still this performance was strong enough to earn a silver medal if it had been a human competitor deep Minds researchers are particularly excited about this achievement because it represents a significant step forward in ai's ability to tackle complex mathematical problems for them it's not just about how fast the AI can solve these problems
but about the fact that it can solve them at all Dr David silver and another research scientist at Deep Mind pointed out that this marks a step change in the history of mathematics it's the point where we move from computers only being able to solve very simple problems to computers being able to tackle problems that are on par with those solved by human experts and in the future they might even go beyond that deep mind's work on applying AI to mathematics has been in the works for several years they've been collaborating with worldclass research mathematicians
to push the boundaries of what AI can do Dr Alex Davies who leads deep Minds mathematics initiative explained that mathematics requires a unique combination of abstract thinking precise calculations and creative reasoning this makes it a perfect test for AI systems especially for those aiming to achieve what's known as artificial general intelligence AGI AGI is the ultimate goal in AI research where a system can perform a wide range of tasks at or above human level math Olympiad problems have become a kind of Benchmark for testing ai's capabilities back in January Deep Mind introduced a system called
Alpha geometry which was able to solve Olympiad level geometry problems at nearly the level of a human gold medalist fast forward a few months and Alpha geometry 2 has now surpassed even the gold medalists in solving these types of problems according to th luong the principal investigator writing on this success Deep Mind decided to take things up a notch for this year's IMO by bringing in a multidisciplinary team to tackle a broader range of mathematical subjects for this Olympiad deepmind actually had two teams working in parallel one team was led by Thomas Hubert a research
engineer in London while the other was led by th luong and quac lay in Mountain View California these teams were stacked with Talent including a dozen IMO medalists Dr luong even joked that this was by far the highest concentration of IMO medalists at Google the AI that competed this year was a combination of alpha geometry and a new system called Alpha proof designed to handle a wide range of mathematical problems Alpha proof is particularly interesting because it incorporates a variety of AI Technologies one approach they used is called an informal reasoning system which is based
on natural language this system leverages Google's large language model Gemini to understand and solve problems it's good at recognizing patterns and suggesting the next steps and while large language models are known for sometimes making stuff up in this case the AI was able to stay focused and avoid too much creative wandering another key approach they used is a formal reasoning system which is all about strict logic and code this system uses a tool called lean which is a type of theorem prover and proof assistant it ensures that every step the AI takes in solving a
problem is logically sound and can be verified as correct this is crucial in mathematics where Precision is everything and then there's also a reinforcement learning algorithm which is a type of AI that learns by itself this algorithm is based on the same technology that powered alphao and Alpha zero it's designed to keep learning and improving over time without needing a human teacher to guide it a Dr Silver who's in charge of reinforcement learning at Deep Mind explained that this kind of algorithm can scale indefinitely meaning it can continue learning and solving increasingly complex problems the
idea is that eventually this AI could solve problems that are too difficult for even the best human mathematicians and who knows maybe one day it'll be able to tackle problems that humans haven't even thought of yet but it's not all about a AI taking over the hope is that these AI systems will become valuable tools for mathematicians helping them solve problems faster and more efficiently Dr Timothy goers the fields medalist isn't too worried about AI replacing human mathematicians anytime soon he pointed out that there's still a long way to go before AI can handle the
kind of highlevel research that human mathematicians are doing but he also thinks that if AI can solve tough problems like those at the IMO then it won't be long before we start seeing a I tools that could be really useful in math research if this technology keeps advancing it could make math more accessible to everyone speed up discoveries and even help mathematicians think outside the box so yeah that's why they banged the gong at Google deep Minds headquarters they're not just celebrating a victory they're celebrating a new era in mathematics where AI isn't just a
tool but a real collaborator and maybe just maybe a game Cher for the entire field Google has just rolled out its latest text to image AI model image in 3 making it accessible to all users through their image FX platform alongside this release they've published an in-depth research paper that delves into the technology behind it this move represents a major step forward expanding access to a tool that was previously available only to a select group of users all right so imag 3 is a text to image model it can generate images at a Default Resolution
of 1024 by by 1024 pixels which is already pretty high quality but what really sets it apart is that you can upscale those images up to 8 times that resolution so if you're working on something that needs a huge detailed image like a billboard or a highres print you've got the flexibility to do that without losing any quality that's something that not every model out there can offer and it's a big plus for anyone working in designer media now the secret actually lies in the data it was trained on Google didn't just use any old
data set they went through a multi-stage filtering process to ensure that only the highest quality images and captions made it into the training set this involved removing unsafe violent or lowquality images which is crucial because you don't want the model learning from Bad examples they also filtered out any AI generated images to avoid the model picking up on the quirks or biases that might come from those they also used something called D duplication pipelines this means they removed images that were too similar to each other why because if the model sees the same kind of
image over and over again it might start to overfit that is it might get too good at generating just that kind of image and struggle with others by reducing repetition in the training data Google ensured that image in three could generate a wider variety of images making it more versatile another interesting aspect is how they handled captions each image in the training set wasn't just paired with a human written caption they also use synthetic captions generate ated by other AI models this was done to maximize the variety and diversity in the language that the model
learned different models were used to generate these synthetic captions and various prompts were employed to make sure the language was as rich and varied as possible this is important because it helps the model understand different ways people might describe the same scene all right so how does image and 3 stack up against other models out there Google didn't just make big claims they actually put image in three head-to-head with some of the best models out there including D 3 mid Journey V6 and stable diffusion 3 they ran extensive evaluations both with human Raiders and automated
metrics to see how imagion 3 performed in the human evaluations they looked at a few key areas overall preference prompt image alignment visual appeal detailed prompt image alignment and numerical reasoning let's break these down a bit first overall preference this is where they ask people to look at images generated by different models and choose which one they like best they did this with a few different sets of prompts including one called Gene AI bench which consists of 1,600 prompts collected from professional designers on this Benchmark imagin 3 was the clear winner it wasn't just a
little bit better it was significantly preferred over the other models then there's prompt image alignment this measures how accurately the image matches the text prompt ignoring any flaws or differences in style here again image in 3 came out on top especially when the prompts were more detailed or complex for example when they used prompts from a set called CCI which includes very detailed descriptions image 3 showed a significant lead over the competition it had a gap of plus4 Alo points and a 63% win rate against the second best model that's a pretty big deal because
it shows that imagin 3 is not just good at generating pretty pictures it's also really good at sticking to the specifics of what you ask for visual appeal is another area where imag 3 did well though this is where mid Journey V6 actually edged it out slightly visual appeal is all about how good the image looks regardless of whether it matches the prompt perfectly so while image in 3 was close if you're all about that eye candy Factor mid Journey might still have a slight Edge but make no mistake image in 3 is still right
up there and for a lot of people the difference might not even be noticeable now let's talk about numerical reasoning this is where things get really interesting numerical reasoning involves generating the correct number of objects when the prompt specifies it so if the prompt says five apples the model needs to generate exactly five apples this might sound simple but it's actually pretty challenging for these models imag and 3 performed the best in this area with an accuracy of 58.6% it was especially strong when generating images with between two and five objects which is where a
lot of models tend to struggle to give you an idea of how challenging this is let's look at some more numbers imagin 3 was the most accurate model when generating images with exactly one object but its accuracy dropped a bit as the number of objects increased by about 51.6 percentage points between one and five objects still it outperformed other models like di 3 and stable diffusion 3 in this task which highlights just how good it is at handling these tricky prompts and it's not just humans who think imag and 3 is topnotch Google also used
automated evaluation metrics to measure how well the images matched the prompts and how good they looked overall they used metrics like clip fous score and FD dyo which are all designed to judge the quality of the G generated images interestingly clip which is a popular metric didn't always agree with the human evaluations but VQ ascore did and it consistently ranked image in three at the top especially when it came to more complex prompts so why should you care about all this well if you're someone who works with images whether you're a designer a marketer or
even just someone who likes to create content for fun having a tool like image in 3 could be a huge asset it's not just about getting a nice picture it's about getting exactly what you need down to the small smallest detail without compromising on quality whether you're creating something for a website a social media campaign or even a large print Project image in 3 gives you the flexibility and precision to get it just right but let's not forget it's not just about creating highquality images Google has put a lot of effort into making sure this
model is also safe and responsible to use however they've had their fair share of challenges with this in the past you might remember when one of Google's previous models caused quite a stir someone asked it to gener generate an image of the Pope and it ended up creating an image of a black pope now this might seem harmless at first glance but when you think about it there's never been a black pope in history it's a pretty big factual inaccuracy another time someone asked the model to generate an image of Vikings and It produced Vikings
who looked African and Asian again this doesn't align with historical facts Vikings were Scandinavian not African or Asian these kinds of Errors made it clear that while trying to be inclusive and politically correct the model was pushing an agenda that sometimes led to results that were simply inaccurate and historically misleading these incidents sparked a lot of debate there's a fine line between creating a model that's inclusive and one that distorts reality while it's crucial to avoid harmful or offensive content it's just as important that the model remains factually accurate after all if the images it
generates aren't grounded in reality it loses its Effectiveness and frankly its usefulness if a model starts producing images that don't reflect historical facts or cultural realities it's not doing anyone any favors it ends up being more of a tool for pushing an agenda rather than a reliable factual generator now with imagin three Google seems to be aware of these pitfalls they've evaluated how often the model produces diverse outputs especially when the prompts are asking for generic people they've used classifiers to measure the perceived gender age and skin tone of the people in the generated images
the goal here was to ensure that the model didn't fall into the Trap of producing the same type type of person over and over again which would indicate a lack of diversity in its outputs and from what they've found imagin 3 is more balanced than its predecessors it's generating a wider variety of appearances reducing the risk of producing homogeneous outputs they also did something called red teaming which is essentially stress testing the model to see if it would produce any harmful or biased content when put under pressure this involves deliberately trying to push the model
to see where it might fail where it might generate something inap appropriate or offensive the idea is to find these weaknesses before the model is released to the public the good news is that imin 3 passed these tests without generating anything dangerous or factually incorrect however recognizing that internal testing might not catch everything Google also brought in external experts from various Fields Academia civil society and Industry to put the model through its Paces these experts were given free reign to test the model in any way they saw fit their feedback was Cru crucial in making
further improvements this kind of transparency and willingness to invite external scrutiny is essential it helps build trust in the technology and ensures that it's not just Google saying the model is safe and responsible but independent voices as well in the end while it's important that a model like image in 3 is safe to use and doesn't produce harmful content it's equally important that it doesn't stray from factual accuracy if it can strike the right balance being inclusive without pushing a politically correct agenda at the the expense of Truth it'll not only be a powerful tool
from a technical perspective but also one of the most reliable and effective image generating models out there all right if you found this interesting make sure to hit that like button subscribe and stay tuned for more AI insights let me know in the comments what you think about imagin 3 and how you might use it thanks for watching and I'll catch you in the next one
Copyright © 2024. Made with ♥ in London by YTScribe.com