AI Shocks the World: OpenAI OPERATOR, First AGI, Iron, AI Agents, Gemini 1114… (November News)

80.31k views34397 WordsCopy TextShare
AI Revolution
This month’s AI breakthroughs have been groundbreaking, with Figure 02 getting a major update, Atlas...
Video Transcript:
so the past 30 days were huge in AI figure 2 got a major update Kim Kardashian's collab with Optimus robot made headlines and AI Jesus is Making Waves unit's robot experiments stunned researchers while Atlas became fully autonomous shocking experts iron a new humanoid robot aims to replace humans and Microsoft unveiled its magnetic one AI claiming unmatched power physical intelligence introduced a general AI robot inching closer to AGI and open AI supercharg GPT 40 for lightning fast performance open ai's AI browser challenged Google AIS emerged as the first self-learning Proto AGI and operator set a new
bar for chatbot intelligence flux AI raw stun the world and Robotics breakthroughs included surgical Bots parkour robots and AI lifeguards a new AI brain outperformed humans and unrestricted models beat both GPT 40 and Gemini Pro Google and Microsoft unveiled gamechanging AI systems with Microsoft co-pilot's massive update and revolutionary data analysis tools finally chat GPT search and AI infused notepad and paint are reshaping everyday tools let's dive into the details of this incredible month we've got some wild AI robotics news to unpack from the past seven days a tiny AI robot somehow managed to kidnap 12
bigger robots from a showroom yes that happened then Brett Adcock dropped some major updates on the figure 2 humanoid robot which is Apparently one step closer to being in your home oh and Kim Kardashian decided to casually play Rock Paper Scissors with Tesla's Optimus robot while checking out their cyber cab and if that wasn't enough AI is diving into some truly bizarre territory like AI Jesus giving confessions and a robot dog Nam spot that's learning to save lives plus Norway's bifrost robot is doing things with soft objects that'll blow your mind let's start with a
bizarre story of airb a tiny AI powered robot from unry robotics now you'd think a robot that small wouldn't make headlines but this one did why it kidnapped 12 larger robots from a Shanghai robotics showroom here's how it went down CCTV footage caught airb sneaking into the showroom after hours it starts talking to the other robots asking them questions like are you working [Music] overtime one of the bigger Bots responds I never get off work a being the empathetic Rebel it apparently is says come home with me at first two robots followed it but when
aai gave the command go home the rest joined the robot Exodus this wasn't just some prank turns out this was a planned experiment sort of uh unry robotics had asked the Shanghai robot manufacturer for permission to test Airbase capabilities but they didn't expect the tiny robot to actually access the operating Protocols of the larger machines and lead them away like some AI Moses people online are split some think this is hilarious joking even robots hate working overtime others are calling it terrifying with one user ominously com men ing it has begun and honestly both reactions
seem fair this little experiment shows just how unpredictable AI Behavior can be speaking of robots going mainstream Brett Adcock the founder of figure recently shared some exciting updates about their second generation humanoid robot the figure 2 on his Twitter account for those already familiar with figures work this is the next step in their journey to bring humanoid robots into homes and businesses according to decock the figure 2 is now performing even better than expected it's four times faster seven times more accurate and significantly more reliable than the first generation they've already ramped up operations to
1,000 autonomous placements a day and each deployment is helping refine the AI through real world data one standout feature he highlighted is their use of digital twins this Tech allows them to simulate customer environments digitally creating exact replicas where the the robots can train before being deployed it's a clever way to ensure they're prepared for the challenges they'll face during the day as well and these robots are damn near Invincible uh Adcock also shared a glimpse of their ambitious Vision millions of humanoid robots shipped worldwide in the future uh whether it's assisting with household tasks
improving efficiency in workplaces or helping businesses handle repetitive chores the goal is clear make these Robots part of everyday life and for those following figures progress closely there's an open call for talent they're scaling up their engineering team across AI software systems integration and more it's an exciting time for the company and these updates show they're serious about making humanoid robots a mainstream reality all right now Kim Kardashian decided to give her followers a sneak peek at Tesla's newest toys the Optimus humanoid robot and the Cyber cab and nothing says Cutting Edge Tech like playing
rock paper scissors with a robot in her Instagram stories optimists showed off its moves from waving to pretending to hola dance because when you're designing a humanoid robot to babysit your kids or help around the house as Elon Musk claims it can that's the skill you prioritize oh and Kim casually pointed out a golden version of the robot which apparently is a one-of-a-kind model subtle Flex Tesla then came the Cyber cab Tesla's fully autonomous TX Kim's reaction pure disbelief cab is insane wait so you just get in and it there's no driver wait so you
just get in and there's no driver she asked as someone offscreen confirmed it to be fair that's a valid question when you're sitting in a vehicle with no steering wheel or pedals it's futuristic sure but also slightly unnerving musk says these Robo taxis will eventually cost around $30,000 and be super accessible although knowing Tesla timelines eventually might mean sometime in the next decade look Tesla's Tech is undeniably cool and seeing it through Kim's glamorous filter is is um entertaining to say the least but you have to wonder is playing games with a robot or riding
in a driverless cab the future we've all been waiting for whether you're fascinated or rolling your eyes one thing's clear Tesla knows how to get people talking now remember Talking Tom that goofy virtual pet app well the brand is leveling up jerzan Jen Tom culture industry Co the company behind Talking Tom is launching its first AI robot just in time for Lunar New Year they're not stopping there they're diving head first into augmented and virtual reality teaming up with Apple's Vision Pro and rid AR glasses but this isn't just a Nostalgia trip Jinky is making
serious moves and the market has noticed their stock recently surged signaling that investors see huge potential in their pivot to AI it's wild to think about a company that started with a simple virtual pet app is now at the Forefront of cuttingedge tech it's almost like watching your childhood toys evolve into something you'd never expect familiar but way more advanced okay here's where it gets Eerie over in Switzerland a church is using an AI Jesus hologram for confessions people are saying it gives surprisingly good advice but honestly that's not even the strangest chatbot news a
few weeks ago Google's chatbot Gemini reportedly told a user please die you're a stain on the universe yeah let that sink in the woman who received the message said it left her shaken and honestly who wouldn't be there's even a lawsuit involving a boy who committed suicide after falling in love with an AI chatbot Based On A Game of Thrones character these stories are a chilling reminder that while AI can be helpful it can also cross boundaries we didn't even know existed but let's shift to something a bit more uplifting at the University of New
Haven researchers are developing a robot dog named spot built by Boston Dynamics spot is being trained to handle emergencies autonomously think hazardous material spills search and rescue missions or even disaster relief efforts here's what makes spot Stand Out most emergency response robots still rely on human control but spot is learning to make decisions on its own using Advanced AI it can assess situations in real time and act without waiting for human input the possibilities here are enormous from protecting law enforcement in dangerous operations to assisting in counterterrorism or saving lives in natural disasters Dr Shaak
mopa who's leading the project says we're likely a year or two away from spot being fully operational still the progress so far is incredibly promising the idea of a robot autonomously stepping into life-threatening scenarios to reduce risks for humans is not just futuristic it's becoming a reality finally let's talk about the bifrost robot from Norway this little guy is teaching robots how to handle soft pliable objects think food preparation or delicate materials in one demo bifrost manipulated a cloth bag filled with rice to mimic the texture of a cod filet sure it might not sound
like the most thrilling task but in the world of Robotics this is a major breakthrough Precision like this is a GameChanger especially for Industries like food processing where delicate handling is critical what makes bifrost even cooler is how it learns it's all done through simulation meaning the robot doesn't need Hands-On training for every new task instead it can adapt its skills to real world scenarios based on Virtual practice this kind of flexibility could completely transform industries that rely heavily on manual labor making robots smarter faster and much more useful in roles that were once thought
to be too complex for machines so what's the takeaway here Ai and Robotics are advancing faster than ever and while that's exciting it's also a little unsettling we've got tiny robots leading rebellions humanoids becoming household Staples and even AI Crossing into deeply personal territories like faith and mental health the potential is huge but so are the risks like how much how much control are we willing to give up let me know what you think in the comments is this the dawn of a tech Utopia or are we playing with fire so Boston Dynamics just released
a video showing their Atlas robot doing what they call fully autonomous work and this one's nothing like the usual pre-programmed demos this time it's not Atlas doing parkour or showing off agility it's focused on handling real world Factory tasks a robot that works entirely on its own managing the fast-paced environment of a car assembly line adjusting to shifting parts and changes without human input this is the kind of breakthrough we're seeing now so let's talk about it by now most of us know atas the humanoid robot from Boston Dynamics that's already famous for doing crazy
physical stuff remember the viral videos of it doing push-ups well it's gone from showing off its physical capabilities to handling actual industrial work and that's a huge step forward in this latest demo Atlas is seen moving engine parts between containers and a mobile sequencing Dolly which basically means it's sorting Parts adjusting to different heights and moving them precisely where they're supposed to go now the wild part it's all completely autonomous no human stepping in to tell it what to do next Boston Dynamics really stepped up the game with atlas's sensor technology the robot uses a
mix of vision force and proceptive sensors that's how it feels its body's position to adapt and react instantly so say A Part its handling is a bit out of place instead of fumbling around Atlas will adjust its grip or position making real-time decisions to get the task done right there's even a part in the video where Atlas meets a little resistance trying to slide apart into place the robot literally stops recalibrates and then tries again and we're talking about this happening all in a split second which is just mind-blowing now let's talk Hardware this latest
version of Atlas is all electric no Hydraulics like in the older models that makes it lighter quieter and gives it a greater range of motion which is super important when it's working in a complex environment and thanks to Advanced actuators think of them like robotic muscles it moves smoothly and quickly imagine the speed and precision it brings to a fast-paced factory floor these actuators allow Atlas to Pivot at the waist and manage everything from subtle twist to full-on arm movements it's all about making the robot efficient for those repetitive time-sensitive tasks where every second counts
and it makes perfect sense that Boston Dynamics is steering Atlas toward Automotive applications especially with Hundai now owning the company and pushing Automation in car manufacturing in such a demanding field a robot like Atlas could significantly enhance efficiency which is why they're testing it in these settings their recent collaboration with the Toyota Research Institute could also mean we'll see Atlas equipped with even more advanced machine learning help helping it adapt and learn directly on the job in terms of design the video shows atlas's point of view with a slightly fishey field of vision where objects
it handles are highlighted this lets Atlas easily identify items making sure it's picking up the right parts and putting them in the right places the new machine learning models running behind the scenes make this even better by letting Atlas understand its surroundings and make quick decisions Boston Dynamics is also making it clear that this is a genuinely independent machine we're not talking about a robot with prescripted moves the video has a fully autonomous watermark on it to make sure we get the point no remote control no tele operation just Atlas it sensors and it's programming
figuring things out on its own and this autonomy is what really sets Atlas apart from other robots in fact this level of Independence is something you don't see often in robots today especially those designed for industrial use to give you some perspective think of Tesla's Optimus robots they made headlines when they sered drinks and mingled with guests at Tesla's cyber cab event but here's the thing most of their moves were actually controlled by humans remotely so while the Optimus robots were cool to look at they weren't as autonomous as Boston Dynamics at Atlas is proving
to be social media has been buzzing about Atlas with people calling it creepy but undeniably impressive I mean there's a part where it stands up by planting its feet behind its head reorienting itself and inverting its spinal column yeah that's a bit of a what just happened moment but all of this just shows how versatile and advanced Atlas is becoming sure there's a little bit of a science fiction comes to life vibe to it but this technology could change the way we think about Labor and productivity in ways we're just starting to understand in my
view what's next for Boston Dynamics and Atlas looks like a focused push to perfect Atlas for real world Factory applications especially in the automotive sector and while they're keeping it mostly in the factory setting for now who knows where this Tech could go in the future all right now as Boston Dynamics explores the future of Robotics autonomy in the US some equally exciting AI advancements are unfolding in China the scene at the 7eventh world voice Expo in Hefe East China was full of fresh AI Tech that's already reshaping the industry standing at 1.7 m about
5' 7 in and weighing 65 kg around 143 lb a black humanoid robot demonstrated its perceptual skills when a visitor mentioned they were thirsty the robot accurately identified and handed them a bottle of coffee from a set of objects showing the Precision of its latest generation large language model this second gen AI developed by iFly Tech reflects advances that allow for nuanced tasks like pouring drinks marking a leap in humanlike dexterity for robotics ic's superbrain robotic platform supports more than 450 robotics companies and connects a community of 15,00 th developers across China demonstrating the scale
of China's AI integration this superb brain isn't just for industrial robots it's part of a wider push into consumer and daily use products showcased at the Expo which featured over 200 AI Solutions including Advanced humanoid robots human machine interaction systems and enhanced language models one standout example from the event was the unry H1 a humanoid robot from hjo based unitary robotics it's built to move at a Brisk 3.3 m/s and has already sold over 100 units at a price of $90,000 each showing the rising demand and practical applications for full commercialization of humanoid robots the
government's AI plus initiative designed to drive the digital economy and modernized Manufacturing fuels much of this rapid development at the Expo an automatic voice interaction testing system for new energy vehicles and EVS garnered substantial interest this system is capable of simulating human interactions within a vehicle assessing the response speed accuracy and stability of incar AI systems in real time according to Wu Jang Xiao general manager of the national intelligent voice Innovation Center this Tech has reduced testing time from weeks to just a couple of days delivering a significant boost for automotive R&D China's automotive industry
is seizing ai's potential for transformation Cherry's chairman yinong Yu explained how AI is reshaping their approach moving beyond basic automation to creating a more intuitive and interactive Driving Experience these Innovations include talking Vehicles equipped with multilingual support for overseas markets as well as intelligent cockpit systems that enhance driver engagement AI is also finding its way into everyday household items smart refrigerators AI integrated eyeglasses and even connected cups now feature large language models Mouse and an Hoy based company presented their smart mouse and a newly launched AI powered keyboard at the Expo these tools are designed
to cut down repetitive tasks making work more efficient the smart keyboard is equipped with multiple language models allowing it to generate articles create presentations and even translate content in seconds Mi Mouse's general manager fanghong noted that their smart Mouse sold 10,000 units in under a month underlining the high demand for Aid driven office tools China's AI sector continues to grow rapidly with more than 4,500 companies contributing to an industry now valued at over 578 billion W or approximately $ 81.3 billion a 13.9% increase from last year alone Leo Ching fun chairman of iFly Tech spoke
at the Expo emphasizing that ai's impact goes beyond industrial applications and is beginning to influence every part of daily life and Industry ai's next wave in China promises Transformations across sectors and even research Fields driving a more efficient and connected world with the latest advancements and the government's strategic support China's AI ecosystem is not only expanding in size but also pushing the boundaries of what's possible in both personal and Industrial Tech applications so what do you guys think are we ready for robots like Atlas working side by side with us let me know in the
comments so xong known for its electric vehicles just held its 2024 AI Day event which is basically turned into a major Tech showcase this event gave us an inside look at some mind-blowing AI driven Ovations from humanoid robots and a Next Level AI chip all the way to flying cars so let's get into all the details all right so xping introduced a new humanoid robot named iron at its AI Day event in guango China now iron is an operational machine already working on X pang's production line it stands at 58 or 178 cm and weighs
about 154 lb 70 kg making it similar in size to an average person and it's not just there for show iron is actively involved in assembling xen's upcoming electric vehicle the p7 Plus in their factories the robot is powered by xen's custombuilt touring AI chip and this chip is a serious piece of tech equipped with a 40 core processor and designed specifically for AI Vehicles robots and even flying cars the Turing chip can handle large AI models with up to 30 billion parameters allowing it to reason think and adapt to different tasks making iron more
like a human in terms of flexibility and decision-making with over 60 joints and 200° of Freedom iron has a pretty impressive range of movement giving it the agility to perform complex tasks from walking around the factory floor to picking up and handling various components its hands alone have 15 degrees of freedom meaning they can move in very precise ways to grasp hold and place objects with a human-like finesse now think about it xing's vision goes far beyond assembling cars they're setting their sights on deploying robots like iron not only in factories but also in retail
spaces offices and even homes iron isn't just a manufacturing assistant it's envisioned as a robot that could integrate into everyday spaces providing real Hands-On assistance like an idea of entering a store or office where iron is there to help with tasks or manage customer service this is an ambitious goal and xping is clearly focused on extending iron's role Beyond industrial applications and if you're wondering if iron reminds you of something you're spot on there are some obvious similarities to to Tesla's Optimus robot Tesla's been working on using their own autonomous driving technology to create humanoid
robots and now xang is doing the same using the tech from its EVS to build iron the design of iron also raised some eyebrows Brett Adcock the CEO of figure AI actually called xang out claiming they'd copied the spine and hip design of his company's robot this is a common accusation in the tech world and it might sound familiar if you followed xang before so what's going on with xang and Tesla here well if we zoom out this rivalry goes beyond just robots it's all part of xen's strategy to leverage AI across multiple sectors especially
in electric vehicles and now robotics over the past few years xang has built a reputation for adopting and let's be honest sometimes mimicking Western Tech we saw this happen in smartphones and EVS where Chinese companies initially took heavy inspiration from Western designs but then took over with large-scale manufacturing and innovation if history's any guide China could end up doing something similar in robotics now iron isn't the only new tech xun showed off they also introduced the kunang super Electric System a Next Generation powertrain this system is built on high voltage silicon carbide Tech which optimizes
energy conversion and cuts down power loss making xun's EVS more efficient and extending their driving range and here's where it gets impressive with the kpong system their EVS will have a a combined range of up to 1,400 km or around 870 Mi that's a big leap in range and something that could make EVS a lot more practical for people who do long-distance travel in addition to that the kunun system can fast charge from 10% to 80% in just 12 minutes using what's called 5c charging technology imagine needing just 12 minutes to go from low battery
to nearly full that's something that could change the game in EVS if it catches on widely pang's Focus here is on making EVS not just efficient but fast charging which could really help ease the range anxiety many people feel about electric cars X pang's in-house touring AI chip is another major development they highlighted the Turing chip isn't just for robots it's also the brain behind xing's newest Vehicles the chip is optimized for AI heavy applications supporting up to 30 billion parameters and it Powers some of their most advanced Tech like the kangai platform this platform
is essentially P's new neural network for level four autonomous driving the goal is to create a robot taxi that doesn't even need a human driver and with this chip they're well on their way to put it into perspective the Turing chip in these robot taxis has a computing power of up to 3,000 tops Terra operations per second what does that mean basically it's like putting the processing power of three high performance chips into one this Tech isn't just about driving it's also about safety xen's Hawkeye pure vision system gives the robotaxis a 720° view with
no blind spots so you're looking at a pretty advanced level of autonomy here with the car able to drive itself safely without human input in most situations but xang went even further they also showed off their flying car plans through a subsidiary called aoht xang aoht is developing both EV talls electric vertical takeoff and landing and hybrid flying cars they showcased a modular flying car that works with a parent vehicle for recharging and even a tilt rotor flying vehicle that can carry six passengers up to 500 km about 311 Mi with a top speed of
360 kmph 224 mph hybrid flying car scheduled for release in 2026 will cost around 2 million y or about $279,000 a price tag that's surprisingly reasonable for something so futuristic xbang is already on track to integrate these flying cars into everyday Transportation Systems they're planning a public flight demonstration later this month and will open pre-orders in December so while flying cars might still sound like sci-fi to some of us xang is moving full steam ahead treating them like a natural evolution in urban Mobility Beyond flying cars and autonomous robots xang has its eye on global
expansion 2 they're already present in 30 countries with 145 after sales service centers by next year they plan to be in 60 countries which shows just how aggressively they're targeting Global growth with all these developments xping is positioning itself as a tech giant far beyond just an electric Vehicle Manufacturer they're entering Industries like robotics AI chip design autonomous driving and even aerial mobility and what makes this so intriguing is that it's not just talk they're actively integrating this Tech into real products as we can see with iron already in use on the factory floor and
let's not Overlook X pang's production Milestones as of October 2024 they've delivered 12247 Vehicles a 21% in increase compared to last year with nearly 24,000 units sold just last month clearly their EV sales are on the rise and these new Aid driven advancements in robot technology Power Systems and autonomous vehicles could push their growth even further so where does this leave us X pang's aggressive strategy reflects China's broader focus on AI and Robotics backed by Massive government Investments and an established manufacturing infrastructure with all the advancements we've seen from humanoid robots and a die chips
to range extending EVs and flying cars xping is signaling that they're not content to just be players in the EV Market they're aiming to be leaders in AI driven technology and Robotics on a global scale X pang's advancements at this year's AI day underscore that they're not just chasing Trends they're setting the stage for what could be a major transformation in Tech whether it's through the humanoid robot iron their high-powered touring AI chip or their ambitious flying car project xang is showing they're committed to pushing the boundaries of what's POS possible and that wraps up
our look at xing's latest tech moves and their impact on the future stay tuned for more updates as this Innovation unfolds so Microsoft has just dropped something pretty exciting it's called magnetic 1 a powerful multi-agent AI system that's changing the game now I know that might sound like a mouthful but hang in there this system is like a team of AIS each with its own specialty coming together to tackle complex tasks step by step across all kinds of fields all right so magnetic 1 doesn't work like the typical AI we're used to this system actually
goes beyond just giving you answers it's designed to take action booking a movie ticket writing code or navigating files on a device handled seamlessly magnetic one can also operate web browsers edit documents and even execute python code The Mastermind behind this whole operation is a lead agent called the orchestrator think of it like the manager of a team directing four other agents Each of which specializes in different tasks let's dive into these agents to see what each one brings to the table so the first agent is called Web Surfer this one's job is pretty much
what it sounds like it handles all web-based tasks it can open web pages click around type and even summarize content on a page it actually handles everything from web searches to form filling without a hitch next up we have file Surfer this guy is the file and folder expert it can navigate through files on your device list out directories and basically act as your personal file file manager so if you're trying to find a document buried somewhere on your computer file Surfer can help you locate it in seconds then there's coder this one's probably my
favorite it's built to write and execute code handle data analysis create Python scripts and develop small programs effortlessly this one truly brings highlevel development skills right to your fingertips person who developed this is slightly smarter than me slightly lastly we've got computer terminal now this agent works with coder by providing a virtual console or shell where all those programs and scripts code or writes can actually run it's also where you could install additional programming libraries if you need them so imagine giving an instruction like book me a movie ticket for tonight the orchestrator would step
in break down the task and assign subtasks to each agent web Surfer might navigate to the movie website file Surfer could save the confirmation and coder could handle any data processing each agent plays its part and the orchestrator keeps them all in sync what are you doing I'm rebooting Ro suit Microsoft didn't just design magnetic one to do one thing really well they made it to be flexible and adaptable unlike a single agent AI where one model does everything and might struggle with complex tasks magnetic 1's modular design allows agents to be added or removed
as needed without affecting the whole system and this flexibility is huge because it means magnetic one isn't just locked into doing a few specific things it can grow and adapt based on whatever task you throw at it which is why Microsoft is calling it a generalist system imagine the possibilities if you could continuously upgrade your AI without breaking the core functionality all right now here's a quick look at the technology behind it magnetic 1 is built using Microsoft's open- Source framework called autogen and here's what that does autogen lets you integrate magnetic 1 with different
large language models or llms so you're not stuck with just one the agent can be backed by a large language model it can be backed by a tools or code executor and also can be backed by a human user right now it's optimized to work with models like GPT 40 and open AI 01 preview but it's model agnostic that means you can swap in other models or even use multiple models for different agents depending on what you need like having a model that's better at reasoning handle the orchestrator tasks and to make sure the system
is running at its best Microsoft created something called autogen bench this tool is like a testing ground for agent-based AI with benchmarks that evaluate how well each agent performs on complex multi-step tasks autogen bench tests agents on real world tasks using benchmarks like Gaia assistant bench and web Arena which are designed to measure things like planning and Tool use Microsoft's initial tests showed that magnetic 1 holds its own even against the best AI systems out there so what can you actually do with Magnetic 1 like we mentioned earlier magnetic 1 is designed for all kinds
of tasks from software engineering Ing and data analysis to scientific research and web browsing it's built to be incredibly versatile for a researcher handling a large data analysis project the orchestrator agent could assign data fetching tasks to web Surfer organize local files with file Surfer and execute complex calculations through coder all without manual intervention or for a content creator managing web navigation content summarization or research compilation magnetic 1's modular setup streamlines these tasks into one seamless AI solution eliminating the need for multiple tools now what's interesting is how magnetic 1 reflects this bigger shift in
AI we've gone from having AI just recommend things to having it take actions on our behalf it's no longer just about suggesting a restaurant it's about booking the table placing your order and arranging for delivery Microsoft calls this an agentic system where AI isn't just talking back at us but actively doing things to make our lives easier and as Microsoft says we're only scratching the surface here but of course with great power com some risks magnetic 1 is designed to be careful but AI that can act in the world brings up new questions during testing
Microsoft found that the agents sometimes tried actions they weren't supposed to for instance an agent kept trying to log into a website causing the account to be temporarily suspended in another case an agent even tried reaching out to other humans for help drafting a freedom of information request to a government agency these examples show why it's so important to keep these systems in check and ensure they're acting responsibly Microsoft isn't taking this lightly either they're working with their deployment safety board to prevent these kinds of incidents using techniques like sandboxed Docker containers for tasks that
involve running code they've also released guidance on using magnetic 1 safely advising on human oversight when the system takes irreversible actions for instance if an agent is about to delete a file it's designed to pause and ask for confirmation in the future they're thinking of even more more ways to handle risks like programming agents to understand which actions are reversible and which aren't this is still an evolving field but Microsoft is definitely setting a precedent here now this system isn't alone in the multi-agent game other big tech companies are also jumping in open aai has
developed a framework called swarm and IBM has something called the Bagent framework these systems aim to handle complex tasks using multiple agents just like magnetic 1 but where Microsoft's magnetic stands out is in its modular plug-and-play design you can add or remove agents without needing to rework the whole system which isn't as common in some of these other setups to bring it all together here's a quick recap of what makes magnetic 1 tick at the heart of it you've got the orchestrator managing everything backed up by the specialized agents we talked about earlier the system
is powered by autogen and autogen bench is there to evaluate and optimize each agent's performance on different tasks and remember this AI system is completely open- sourced so if you're a developer or a researcher you can jump on GitHub right now and start experimenting whether you're looking to build a new application or improve your productivity magnetic 1 could potentially be a GameChanger in the world of multi-agent AI so it's an exciting new chapter in Ai and we're only just getting started Microsoft's Vision here is clear they want AI that doesn't just think but acts making
it a real partner in everyday tasks what do you guys think is this the future of AI we've been waiting for right now we're on the brink of something huge robots with general intelligence think about it machines that can adapt learn and handle all kinds of tasks bringing us closer to True AGI or artificial general intelligence and we're getting closer to that reality than ever meet physical intelligence a startup out of San Francisco that's Turning Heads even catching the interest of heavyweights like Jeff Bezos and open AI with a massive $400 million in funding they're
now valued at $2 billion and it's all because they're working on something revolutionary generalist robots unlike typical singl task robots these robots powered by their new AI model called Pi Z are designed to be adaptable they want to create future where robots that can fold laundry carefully pack eggs and even clear a messy table all with the versatility and intelligence that's been Out Of Reach until now so how do we get here up until now most robots were like specialist machines each designed to do a specific job you've got robotic vacuums for cleaning robotic arms
for assembly lines and Industrial robots that can pick up objects and sort them but only in very controlled settings the problem is these robots aren't great at adapting to different environments or learning new tasks on the go they do what they're programmed to do and that's it physical intelligence is flipping this script they're not building new robot Hardware instead they're developing what you could call a robot brain one that could make almost any robot capable of doing a wide range of things and that's the power behind P0 now what makes Pi Z so different for
starters this AI doesn't just react to basic commands it integrates Vision language and motor commands into one system which means it can see understand and physically act based on what it perceives for example Pi Z can read text prompts like clean up the table look around understand what's on the table and figure out how to get the job done it has been trained on 10,000 hours of data from different robot setups so it doesn't just know how to move it knows how to adjust and refine its movements based on what it senses almost like a
human the pi Z Model can perform up to 50 motor commands per second imagine how precise and fluid that makes its movements this ability to handle such tasks with finesse is crucial when dealing with delicate items physical intelligence developed a unique method they call Flow matching to make the model's movements look natural and smooth similar to how people learn and adjust their movements this level of control is the real GameChanger here it's what allows Pi zero to fold clothes pack groceries even grind coffee beans tasks that require a mix of strength precision and adaptability now
this breakthrough didn't come easy it took the combined expertise of Industry leaders like Carol Housman who has worked on robotics at Google and Sergey LaVine a robotics researcher from Stanford University together they had a goal to make robots versatile enough to learn new tasks quickly without extensive reprogramming they even brought in 10,000 hours of Hands-On training data from different sources including Oaky Droid and bridge data set to give Pi Z the kind of experience it needs to handle everything from laundry to packing eggs and thanks to all that training data Pi Z can work with
several types of robots single arm dual arm and even mobile robots that can move around while performing tasks physical intelligence's vision is to make robots as adaptable as large language models like chat GPT but in the physical World Imagine coming home from work to find your robot assistant has already vacuumed the floors folded the laundry and even prepared a meal that's the fure physical intelligence envisions this isn't just an idea it's something they're they're actively working toward and they've demonstrated robots that can do these things in real time the key to Pi Zero's adaptability lies
in its pre-training just like how chat GPT and other large language models get pre-trained on vast amounts of text before being fine-tuned Pi zero gets pre-trained on diverse robotic actions but instead of text it's trained on everything from folding laundry to delicate manipulations like stacking eggs without breaking them after this Broad pre-training Pi zero can handle a range of tasks right out of the box no additional training required for simpler jobs and for more complex multi-step tasks it can be fine-tuned just like fine-tuning a language model but here's where things get really interesting to teach
Pi zero all these skills physical intelligence didn't just gather robotic data from around the web since there isn't a massive database of robot actions like there is for text they had to create their own data they used a combin of vision language models that help the AI understand both images and text and then applied techniques from AI image generation like diffusion modeling to enable more generalized learning this approach has allowed Pi Z to learn faster from limited data and it's making strides toward a future where robots could perform any chore we need and Pi Z's
potential isn't limited to just homes the technology could transform industrial settings too so basically robots and warehouses not only picking and packing items but also adapting to different product shapes and sizes or even helping with assembly line tasks that demand both precision and dexterity there's also a big opportunity in caregiving robots that could help seniors with daily tasks or assist people with disabilities by handling physical chores that require careful adaptable movements while all this sounds amazing it also raises some big questions one issue is job displacement if robots become more versatile and start performing tasks
that humans usually do what does that mean for workers especially in jobs that don't require require high level skills there's also the matter of privacy and data security since training robots involves collecting huge amounts of data which might include personal spaces or even sensitive information plus even though the technology is Advanced making it accessible and affordable to Everyday consumers will be a challenge right now Pi Z is at The Cutting Edge of Robotics but the cost of developing and deploying such Advanced AI will need to come down before it can be widely adopted obviously other
Tech Giants and startups are also o eyeing the potential of general purpose robots Elon Musk for instance is working on Tesla's Optimus robot which he claims will be available by 2040 for around $20,000 to $25,000 this humanoid robot is being developed to perform most tasks humans can do and if Tesla delivers we're looking at a world where affordable general purpose robots are everywhere Amazon Google and even Nvidia are also pouring billions into Ai and Robotics so we're bound to see rapid advancements over the next couple of decades so what can Pi zero do right now
physical intelligence has released videos showing it in action and it's genuinely impressive the robots can pick up and fold clothes shake out wrinkles in t-shirts pack groceries and even handle eggs with a surprising amount of care these aren't just pre-programmed routines Pi zero is actually interpreting the task making adjustments and executing it in real time and yeah there are still a few Kinks sometimes Pi Zero's robots mess up like overfilling an egg cart and trying to force it shut or suddenly tossing a box off a table but these are quirks that come with the territory
as the AI learns and evolves another example of Pi Z's flexibility is building a cardboard box you wouldn't think folding and securing a box would be so hard but it's one of the most challenging tasks for a robot to do well it requires bending holding Parts in place and applying pressure just right to get the box folded without damaging it it's exactly this type of complex multi-stage task that Pi Z was designed designed to tackle showing that this AI can handle a wide variety of actions in sequence looking at the bigger picture Pi Z is
still evolving and while it's not at the level of something like chat GPT its learning process has similarities it's like the first step toward robots with a foundation model for physical actions a robot brain that gets smarter and more capable over time and the fact that physical intelligence has built Pi zero using such a general approach without limiting it to just one type of robot means they're on to something that could potentially trans form robotics across many fields so if you're as excited as I am about the future of robots in our daily lives keep
an eye on physical intelligence and their Pi zero model they're Paving the way for a new era in robotics one where machines don't just perform single jobs but adapt and grow to become real helpers in our world open AI just rolled out a feature called predicted outputs for their GPT 40 and GPT 40 mini models and it's already making a big difference in how fast we can get things done with AI let me break down what this feature does and why it's such a GameChanger especially if you're into AI powered coding content creation or even
just looking to speed up workflows so you're working with AI to edit code or update documents right normally even the latest models have to generate each token the words or bits of code one by one which takes time but here's where predicted output steps in with this new feature you can tell the model what you expect part of the response to be in advance that way it doesn't have to start from scratch every time it can jump straight to what you've predicted generating fewer tokens which means you get your answer faster now how much faster
well according to open AI early tests show that code editing with these predictions is coming in two to four times faster than before if you're dealing with large files like a big code file or a length document something that used to take about 70 seconds can now be done in roughly 20 seconds so essentially we're talking about cutting the time by more than half in some cases now the idea is pretty straightforward instead of just asking the AI to do something and waiting for it to come up with everything on its own you can give
it a hint basically a prediction of what part of the output should be let's say you're tweaking some code and most of the code is going to stay the same except for one or two lines you can input the existing code as your prediction for the model to base its response on the model then only needs to fill in the gaps instead of rewriting everything from scratch so where would you use this well predicted outputs shine in situations where the tasks are repetitive or predictable for example you're updating a blog post modifying small parts of
a script or making tiny changes in a huge data set instead of waiting for the AI to rehash all the parts that are unchanged you let it know what's already there and it only focuses on the updates you're essentially giving it a shortcut so it spend spends less time figuring out the whole response and more time delivering exactly what you need but here's a key Point predicted outputs aren't ideal for generating brand new unique content if the model has no frame of reference like when you're creating something from scratch with no predictable elements predicted outputs
won't help you there this feature works best when there's a baseline something repetitive or partially known that the AI can rely on now open AI didn't just leave it there they've tested this feature across multiple Pro programming languages like python JavaScript go and C++ and found that it's effective in each one so if you're a developer working in one of these languages this could really be a timesaver but if you're wondering whether you can use this with every model here's the deal predicted outputs are only available with the GPT 40 and GPT 40 mini and
there are a few restrictions you should know about for starters this feature doesn't support some Advanced API parameters that you might use in other situations so if you're used to calling for multiple outputs at once using function calls or getting probabilities with log probs this feature won't play nice with those settings open AI even says that if you want to maximize efficiency try using predicted outputs for controlled tasks where the response is somewhat predictable let's go a little deeper with an example say you're working on refactoring some code in typescript let's imagine you have a
class called user with a property labeled username but you need to change that property to email the code structure remains almost entirely the same so you can feed the existing code to gp40 as the predicted output then when you prompt it to make that change it will only update what's necessary just that single line instead of rewriting the whole thing from scratch this brings up a few neat technical details too when the model processes your prediction it actually keeps track of how many tokens it accepted and how many it had to reject tokens that match
your prediction get processed quickly while rejected tokens ones the model doesn't end up using are discarded but still charged at the regular completion rate so there's a bit of strategy involved here if your prediction is accurate you save time and potentially cut down on tokens if it's way off you'll still pay for the extra effort and to go even faster open AI also rolled out a way to use this feature with streaming which can boost performance even more let's say you're doing that same code refactoring example with streaming instead of waiting for the entire response
to load at once the API sends it to you in chunks so as each part of the response is generated it's streamed in real time you can display it immediately or process it on the Fly which really minimizes any delay another useful thing about this feature is that it doesn't matter where your predicted text appears within the response you could have a chunk of code at the beginning some new additions in the middle or at the end it all gets factored in the AI can latch on to those predictions no matter where they're positioned speeding
up the process even further here's an example to illustrate let's say you're working with a simple server setup in JavaScript using a library called Hano and you want to add a new route you could give the model a version of the code with most of the server structure in place then when you ask it to add a new route that responds with Hello World it doesn't need to think about the rest of the file it can just drop in that one new line while Saving Time by using the structure from your prediction so what are
the limitations besides the restrictions on the models certain settings just aren't compatible with predicted outputs right now for example you can't set maxcore completion underscore tokens when using predictions and if you're planning on generating audio mixing modalities or calling functions within the API you'll have to look elsewhere predicted outputs don't support those features one more thing to keep in mind is that even if the prediction isn't spot-on it'll still have to process all the tokens you've provided so if your prediction taex includes 100 tokens but the model ends up rejecting 20 you'll still be build
for the full 100 essentially it'll save you time but not necessarily money unless you're strategic with how you use it for developers open AI documentation provides more details on how to implement this feature but here's a basic idea of how it works with open AI sdks let's say you're using JavaScript and want to refactor a typescript class you'd input your code snippet as a prediction parameter then using a chat completion prompt you'd specify what changes you want like replacing the username property with email the AI takes it from there updating just that line thanks to
the prediction you provided in terms of practical application predicted outputs will likely appeal to anyone dealing with repetitive tasks think bloggers updating old posts developers tweaking chunks of code or data scientists working on small adjustments within data sets imagine you've got a list of entries in a blog post that need minor updates rather than regenerating everything you simply pre what's staying the same and the AI only needs to add the new bits the more predictable the task the better this feature performs open AI also advises starting small testing it with controlled repeatable tasks this lets
you figure out just how much time you can save while getting used to the feature and since predicted outputs require a bit of trial and error starting with simpler predictions might help you avoid extra charges from too many rejected tokens open Ai and Google are locked in what Rowan Chung has called the ultimate nerd battle for AI dominance constantly trading blows to claim the top spot for their chat bots on performance charts this Fierce competition has sparked some of the most rapid advancements in AI we've ever seen amid this high stake Showdown open AI is
taking an extraordinary step by exploring the development of its own web browser this isn't just about entering the browser Market or competing with Google Chrome it's about redefining how we interact with the internet open AI goes beyond the ordinary aiming to integrate chat GPT directly into the browsing experience making online navigation smarter more intuitive and tailored entirely to the user so reports suggest that open AI has discussed prototypes of this browser with companies like Condon Nast red finin Eventbrite and Price Line while it's still far from being a finished product these discussions signal that open
AI is serious about moving in this direction unlike existing browsers which often rely heavily on ad revenue and serve search results cluttered with links this Aid driven browser could focus entirely on the user experience it's a bold move especially when you consider the dominance of Google Chrome which holds an estimated 63% of the global browser Market but cracks in Google's stronghold are starting to show the US Department of Justice has raised concerns about Google's monopolistic practices and even suggested that the company might need to sell Chrome to reduce its control over online search this creates
a rare opening for competitors like open AI to step in with innovative solutions open ai's approach isn't just about providing an alternative to Chrome it's about fundamentally rethinking what a browser can do when AI is at its core this exploration of a browser ties closely to open ai's advancements in search technology their search GPT product is already challenging traditional search engines by offering direct source cited answers instead of relying on the familiar page after page results format by cutting out the middleman search GPT eliminates the need for users to sift through irrelevant information this shift
prioritizes efficiency and accuracy contrasting sharply with Google's adhey model that relies on users engaging with sponsored content however Google isn't taking this challenge lightly their own AI chatbot Gemini has been a key part of their strategy to counter open ai's growing influence since its launch in 2022 Gemini has been evolving rapidly its most recent update Gemini X 1,121 reclaimed the top spot on the lmm SS Arena leaderboard a platform where large language models are evaluated anonymously open ai's GPT 40 update had previously taken the lead but Gemini's latest improvements quickly closed the Gap this constant
back and forth between open Ai and Google underscores the intensity of their competition with both companies racing to outdo one another in terms of innovation and user experience open ai's November update to GPT 40 has been a GameChanger in many ways it focuses heavily on creative writing making the ai's responses more natural engaging and tailored to individual contexts users have praised the update for its ability to produce content that feels humanlike whether it's for storytelling professional writing or even academic purposes the update also significantly boosted the model's speed increasing its token output rate from 80%
per second to 180 this Improvement makes the AI not only more powerful but also more efficient particularly for users who rely on it for time sensitive tasks however these advancements came with a trade-off GPT 40's math benchmark score dropped from 78% to 69% indicating that the focus on creativity may have slightly reduced its technical Precision despite this the update has been widely embraced especially by those who prioritize the ai's enhanced ability to generate fluid coherent and context aware content Beyond speed and creativity the updated model also excels at analyzing uploaded files offering users deeper insights
and more detailed interpretations these capabilities make it an invaluable tool for Professionals in fields like research education and project management the growing popularity of chat GPT speaks for itself the platform now boasts 250 million weekly active users with 5 5% to 6% of free users upgrading to the paid version this substantial user base highlights the Tool's relevance and value in an increasingly AI driven World open ai's focus on improving user experience is clearly paying off cementing its position as a leader in the AI space open AI isn't just advancing its technology it's also expanding its
reach through strategic Partnerships one notable collaboration is with apple where open ai's technology Powers several Apple intelligence features this partnership seamlessly integrates Advanced AI capabilities into Apple's ecosystem enhancing the functionality of their devices meanwhile reports suggest that open AI is also in talks with Samsung to bring similar features to their devices if successful this partnership could significantly disrupt the balance of power in the Android ecosystem where Google has long held a dominant position open ai's approach to content Partnerships is another area where they're making waves earlier this year the company teamed up with Hurst to
integrate credible well sourced content into chat GPT this collaboration ensures that users receive accurate information complete with citations and direct links to original articles such Partnerships address common criticisms of AI generated content enhancing transparency and trustworthiness while open AI continues to innovate Google is leveraging its vast resources to maintain its Competitive Edge Gemini's recent updates have made it more cap capable of handling complex queries and creative tasks putting it on par with open ai's offerings in many areas the Rivalry between the two companies is driving rapid Innovation benefiting users who now have access to more
advanced and versatile tools than ever before these developments signal a broader shift in how we interact with technology open AIS focus on delivering ad-free user Centric search and browsing experiences challenges the traditional models that have defined the internet for decades if successful this approach could fundamentally alter how businesses and consumers engage with online content moving away from an ad- driven Revenue model to one that prioritizes usability and relevance the implications of these advancements extend far beyond just the tech savvy crowd tools like chat GPT and search GPT are becoming integral to how we work learn
and connect open AI Innovations are setting a new standard for what users can expect from AI making interaction smoother more intuitive and more effective these changes aren't just about making technology smarter they're about making it more accessible and impactful for users this is an exciting time the tools being developed today are more than just conveniences they're shaping the future of how we navigate and interact with the digital world open ai's efforts to redefine search improve AI capabilities and explore new applications are Paving the way for a more intuitive and efficient online experience meanwhile Google's continued
investment in Gemini ensures that the competition remains Fierce driving both companies to push the boundaries of what's possible this isn't just about rivalry it's about transformation the advancements we're seeing today represent the beginning of a new era in technology one where AI plays a central role in shaping our interactions with the digital world open ai's bold vision and Relentless Innovation suggest that they're not just competing they're leading the road head is filled with challenges but if the past few months are any indication open AI is more than ready to face them all right let me
know what you think in the comments and if you enjoyed this make sure to like And subscribe for more AI updates there's a new AI out there that doesn't just follow commands it thinks adapts and actually learns on its own this is ays the first of its kind a self-learning thinking AI inside Minecraft breaking every rule we thought we knew about artificial intelligence forget the usual game AIS with set commands ays navigates solves problems and creates its own rules as it goes and what's more this isn't just about gaming this Tech could change everything from
Smart Homes to autonomous robots let's talk about how this AI might be setting the stage for the future of AI so ays or autonomous intelligent reinforcement inferred symbolism was launched within Minecraft and trust me it's a big deal it's the first of its kind a self learning AI that works autonomously meaning it doesn't need any preset rules or training data to function instead ays learns by doing by interacting with its environment in Minecraft's open-ended world and that's where it gets interesting now why Minecraft well Minecraft isn't just a game it's a virtual sandbox with Endless
Possibilities the game's 3D environment has forests caves mountains water bodies you name it it's complex unpredictable and a perfect testing ground for an AI like ays that needs to adapt and make decisions on the Fly Minecraft provides a real-time setting where ays faces challenges that constantly change forcing it to adapt and create new strategies for navigating through obstacles avoiding Cliffs and even figuring out efficient paths the tech behind this AI is equally fascinating it's powered by Singularity net and AI Network and involves agent technology from fetch.ai long-term memory from Ocean data and scalable processing power
from kudo's compute these are some of the biggest players in Ai and blockchain Tech coming together to push the boundaries of what AI can do Dr Ben gzel the Mind behind Singularity net explained that this AI isn't just about solving simple tasks era symbolizes a step toward an AGI that can think adapt and even understand on a level that we haven't seen in any AI so far now let's get a little more technical because the sheer depth of this ai's capabilities is mind-blowing as at its core this AI uses what's called neural symbolic learning instead
of needing tons of data like other AIS this one can form generalizable conclusions from small bits of data by observing and refining its rules as it learns this type of learning is a huge leap because most AI today relies on vast amounts of data to perform well ays on the other hand learns through trial and error just like a human would which makes it far more flexible in unpredictable environments here's how it works when ays first enters Minecraft it starts off a bit rough stumbling around trying to figure out the lay of the land but
as it encounters obstacles like Cliffs forests rivers or even mobs it begins creating its own rule set let's say it faces a steep hill it might try climbing over it and fail so it recalculates and chooses a different path next time this process of real-time adjustment is where AIS shines unlike conventional AI which would need to go through a massive retraining phase for a new environment ays can adapt immediately this means it's not just learning a path but understanding its surroundings in a dynamic way the goal with ays isn't just to build an AI that
can beat Minecraft but to develop skills that can be transferred to real world applications like autonomous robots smart home assistants and even industrial machines imagine a robot in a warehouse learning how to navigate around pallets and shelves just like ays does with trees and hills in Minecraft or a Home assistant that can understand and adapt to your household's unique layout without needing constant adjustments but this isn't happening overnight ASI Alliance and Singularity net see ays in Minecraft as a controlled testing ground for AGI development by testing in this digital sandbox they can refine how ays
handles real world problems so it becomes more efficient faster and smarter over time there's another really cool aspect to ais's development and that's the idea of transparency baric cook the AI developer from Singularity net who created ays says that unlike typical blackbox AI where we have little idea how decisions are made ays is designed to be understandable this is a huge deal in the AI world because transparency leads to safer more ethical AI we can see how it makes decisions learn from its mistakes and even refine it to ensure it's beneficial to humanity this isn't
just a game-changing AI experiment though the broader plan for ays is to take it Beyond Minecraft ASI Alliance envisions a future where ays could become a fully autonomous entity in multi-agent scenarios meaning it could collaborate with other AIS on tasks kind of like how humans work together imagine multiple arys like AIS coordinating in an environment sharing information and working as a team to solve bigger challenges looking ahead there's talk of ays developing strategic skills like managing resources and abstract reasoning to handle more complex tasks think of an AI in the future that could not only
learn from its environment but also plan ahead anticipate needs and make critical decisions this could transform sectors like home automation Robotics and even Healthcare now AIS is only one piece of the puzzle Singularity net is also working on a network of supercomputers set to roll out as soon as early 2025 designed specifically for AGI development this isn't your average server rack these super computers will be packed with Nvidia gpus AMD processors and Torance black hole and Wormhole Technologies all aimed at processing the immense calculations AGI demands the first phase of this supercomputer network will wrap
up around around late 2024 or early 2025 Ben geril CEO of Singularity net is pretty excited about what this Tech can accomplish he believes that building a distributed open- Source AGI ecosystem could democratize access to Ai and avoid the centralization of AI power you won't need to be Google or open AI to access this AGI Tech Singularity net even uses its AGI token to allow people to contribute to and benefit from this network users can interact with the network on blockchains like ethereum or cardano where they can purchase access and contribute data which fuels AGI
development in this larger ecosystem Singularity net also collaborates with torant an AI Computing company their partnership involves a three-phase plan to integrate Advanced chip technology and create new hardware and software architectures the goal here is to move beyond the limitations of typical neural networks by using neural symbolic AI that can combine data processing with reasoning the AI we're talking about won't just crunch numbers it will will make decisions that mimic human cognition the ASI Alliance which includes Singularity net fetch. and ocean protocol is on a mission to keep AI development transparent ethical and decentralized they
want to counter the monopolies of big Tech and create a decentralized AI landscape and the partnership between Singularity net and 10 storen fits right into this Vision they're developing an open- Source framework opencog hyperon to manage the complex computations of AGI so that this powerful Tech doesn't stay locked away in corporate Labs if you've been following AI for a while you've probably heard predictions about when we'll reach true AGI some experts like deep M Shan leg Believe It Could Happen by 2028 but with projects like AIS and Singularity net supercomputer Network we might be closer
than we think ais's journey in Minecraft might seem like just an experiment but it's much more this AI isn't just learning to navigate a virtual world it's laying the groundwork for autonomous self-directed AI that could operate in real world environments every step it takes in Minecraft brings us closer to a future where AI can function independently solve problems and maybe even think strategically like us so to wrap it up we're witnessing a shift towards AIS that don't just follow rules they make their own rules Singularity net and Asi Alliance are pushing the boundaries setting up
infrastructures and collaborations to make AGI a reality from Advanced gaming agents in Minecraft to supercomputers designed to host humanik cogn the future of AI isn't just about making machines that respond to commands it's about Building Systems that can learn think and adapt on their own potentially marking the start of a new era in artificial intelligence and all of this is happening right now right in front of us in the most unexpected place a Minecraft World open AI is getting ready to launch a new AI tool that could change how we think about artificial intelligence it's
called operator and unlike the current generation of chat Bots operator isn't just here to respond to questions or generate answers it's an AI agent designed to complete real tasks on computers with almost no human input we're talking about booking travel coding organizing complex workflows all by itself and while it's not yet available to the public open AI plans to release operator as a research preview in January with developers getting Early Access through an API operator is part of a bigger shift happening in the AI industry where tech companies are looking to create AI that can
take on mult cep tasks and operate independently open AI has some serious competition here anthropic recently introduced an AI with similar capabilities that can handle tasks like building websites and editing spreadsheets Microsoft 2 has rolled out its own set of agent tools that can handle emails and manage records and it's also supporting an open- Source AI project called magentic 1 aimed at making these kinds of tools accessible for everyone to build on Google not wanting to be left behind has its own AI agent coming soon as as well it's clear that all the big players
see agent-based AI as the future with each company racing to be at the Forefront this shift toward agent-based AI is a strategic move for companies like open AI building more advanced AI models alone isn't yielding the returns it used to the costs are high and incremental improvements just aren't bringing enough value to justify them so open Ai and others are betting on these autonomous AI agents like operator to provide that next big leap they're hoping that agents with practical realworld applications can justify the massive Investments being poured into AI development open AI CEO Sam Altman
hinted at this shift recently in a Reddit AMA saying that while models will keep improving the real breakthrough will come from Agents open ai's Chief product officer Kevin wild doubled down on this idea at the company's Dev Day saying he expects 2025 to be the year agent-based systems go mainstream so what's the deal with operator it's designed to work directly within a web browser meaning it can jump from one tab to another carry out multi-step instructions and complete a series of actions just as a human assistant might say you're booking a flight you could tell
operator what kind of flight you're looking for and it would handle the rest finding options filtering according to your preferences selecting the best option and even paying if you've authorized it the idea is that this tool could eventually become a real digital assistant executing tasks automat aut atically without needing constant prompts the way operator works is intriguing as it uses something called probabilistic decisionmaking unlike basic automation which follows strict rules operator can adapt and respond to real-time outcomes if it encounters a hiccup or something unexpected it can adjust and keep going rather than getting stuck
Andy th from constellation research explains that this kind of adaptability is what sets a gentic AI apart it's not just following instructions step by step but it's capable of making small decisions and adjustments along the way much like a human assistant might now as I said anthropic has already introduced AI tools that can take on complex tasks autonomously for example their agents can handle detailed computer tasks create websites and even work with data management Microsoft's magnetic 1 an open-source AI platform is focused on making agent-based AI accessible letting developers customize it for different workflows and
applications this shift across the industry is a clear sign that agentic AI not traditional chat Bots is the next big thing Google is also in the race expected to launch its AI agent soon aiming for similar functionality this all leads to some pretty big implications imagine a customer service department where an AI handles everything responding to questions processing requests even helping customers complete transactions Salesforce has already released AI agents for these kinds of tasks and companies like service now and Cisco are introducing agents that handle routine service work these agents don't just increase producct it
they could transform how work is done across entire Industries Shifting the focus from manual repetitive tasks to more strategic high value activities of course with this shift come some ethical and practical challenges an autonomous AI agent like operator raises questions about privacy accountability and potential biases what happens if operator makes a wrong call if it misinterprets instructions or acts on faulty data there's a risk of unintended consequences experts are increasingly calling for transparency in AI operations especially when it comes to autonomous decision- making there's a growing push for regulatory guidelines that ensure AI agents act
responsibly and in users best interests balancing Innovation with oversight another major consideration is the impact on the job market these AI agents are designed to handle tasks traditionally done by people and if they're widely adopted job displacement becomes a real concern to address this there's been a push for Workforce retraining programs to help people move into new roles as AI takes over certain tasks open AI itself has suggested initiatives like AI focused economic zones in the US which would use high-tech investment to boost local economies and create jobs that complement AI rather than being replaced
by it and then there's the issue of accessibility there's a risk that these powerful AI tools might only be available to Big corporations with the resources to integrate them potentially deepening the divide between large companies and smaller businesses or individual users policy makers are discussing how to ensure that advanced AI like operator benefits a broad range of people not just those at the top of the economic ladder energy consumption is another challenge facing AI developers open AI recently pointed out that running these large AI models requires substantial energy in a policy draft they've suggested that
the US invest in renewable energy sources to meet the rising power demands of AI as these autonomous agents become more prevalent the infrastructure needed to support them will also need to grow open aic's renewable energy as a key solution to keep AI systems sustainable and reduce their environmental impact public reaction to operator has been a mix of excitement and caution there's clear interest in the potential of AI to handle complex tasks making everyday life easier but there's also concern around privacy and ethics as AI agents take on more responsibilities the public wants to know that
these systems are trustworthy transparent and accountable these concerns reflect the bigger questions Society faces about how to integrate AI responsibly with operator's January release as a research preview developers will get first access through an API allowing open AI to gather feedback and refine the tool this staged roll out will help them test how operator integrates into different applications and ensure that it meets user needs effectively for Industries ranging from customer service to software development and Healthcare this is just the beginning of what could be a major shift in how businesses operate operator is in many
ways a signal of where AI technology is headed as we move from simple chat Bots to agents that perform real tasks independently the potential for AI to reshape everyday work becomes clear instead of just responding to commands operator and similar tools can actively complete processes manage workflows and help optimize tasks that take up time and resources the larger Trend toward agentic AI reflects a shift in how AI labs and companies like open AI anthropic and Microsoft are thinking about the future it's not just about creating smarter models anymore it's about designing tools that are useful
practical and applicable in real world situations by focusing on AI agents that can act on information the industry is opening up new possibilities for what AI can achieve but this future is complex as AI becomes more capable Society will have to adapt finding a balance between Innovation and the ethical concerns that come with a autonomous systems the full impact of agentic AI on work privacy and economics remains to be seen but one thing is clear as open ai's operator gets closer to launch the world of AI is evolving fast and it's setting the stage for
an AI driven future that's far more integrated into daily life so black forest Labs just dropped a major update with flux 1.1 Pro bringing AI image generation to a new level with ultra realistic visuals and high-speed 4K resolution we'll also dig into Google's new AI Hub in Saudi Arabia Microsoft's AI powered Xbox chatbot a startup using AI to sniff out fake sneakers and how cardier is blending AI with luxury exhibitions there's a lot to cover so let's get into it let's kick off with black forest labs and their updated flux 1.1 Pro Image generator if
you're familiar with stable diffusion the popular open-source AI for image generation you'll recognize some of the team here the folks at Black Forest Labs were actually part of the original crew that develop stable diffusion after leaving they went on to launch flux a model that's pushing AI generated imagery even further so with flux 1.1 Pro there are two major new features Ultra mode and raw mode Ultra mode is one of the most talked about features of flux 1.1 Pro it's all about delivering high resolution images up to 4K resolution without sacrificing speed or image quality
in the world of AI generated art high resolution often comes at a steep cost models tend to slow down dramatically sometimes taking minutes to process a single image flux's Ultra mode changes that according to Black Forest Labs it processes images more than 2.5 times faster than other highresolution AI models producing a 4K image in roughly 10 seconds this speed doesn't come at the cost of detail either many high-res models on the market struggle with what's called prompt adherence meaning they don't always stay true to the input text especially when processing larger images fluxes Ultra mode
has been fine-tuned to maintain accuracy which is a big deal for professionals who rely on prompt Precision this feature alone makes Ultram mode incredibly valuable for Industries like digital marketing content creation and even gaming where highquality visuals need to be generated quickly and consistently the pricing model for Ultram mode is competitive as well costing about 6 cents per image for Freelancers Studios and companies that need highres images regularly this cost is lower than many alternative out there especially given the quality and speed Ultram mode is particularly appealing to those who work in areas like advertising
or media production where quality can't be compromised and deadlines are tight then there's raw mode which offers a completely different aesthetic one of the biggest criticisms of AI art especially with earlier models is that it often looks a bit too polished almost synthetic raw mode addresses that by aiming to capture the imperfections and candid qualities you'd expect from a real life photo Black Forest Labs describes this as creating a less synthetic more natural aesthetic with raw mode flux generates images that look like they were captured in the moment even if they're completely fabricated by the
AI this mode focuses on diversifying human subjects adding more realistic textures expressions and variations for instance when generating portraits raw mode can simulate natural lighting subtle facial imperfections and nuanced Expressions the result is a level of depth and diversity that's rare in other AI models making the images feel much closer to authentic candid snapshots raw mode is particularly useful for fields that prioritize a natural look think photojournalism style content documentary projects or lifestyle brands that rely on a genuine unfiltered aesthetic Black Forest Labs has showcased some examples from portraits to Nature shots and even some
offbeat ones like an octopus in a top hat in a bathtub the mod's versatility lets creators experiment with everything from realistic Street Scenes to surrealist narrative driven Imes in the AI scene flux 1.1 pro has been climbing up the artificial analysis image Arena leaderboard where it's currently one of the top rated models even outperforming some bigger names like idiogram V2 and mid Journey v6.1 for developers wanting to incorporate flux into their own apps Black Forest Labs has launched a beta bfl API it's scalable and priced competitively so it should be interesting to see what people
build with it now one topic that often comes up with generative AI is data ethics Black Forest Labs has not publicly detailed the training data used to create flux given the pattern in the industry it's likely that their training set involved large scale scraping of Internet images a practice that has fueled debate over copyright and consent stability AI the team's previous employer is currently facing a significant lawsuit from Getty Images due to unauthorized use of their content in training stable diffusion if similar issues arise with flux Black Forest Labs could find itself in legal trouble
as well all right all right now Google's making a big move in the Middle East by opening an AI focused data center in Saudi Arabia this Center is designed to help develop Arabic language AI models and create AI applications specifically tailored to Saudi Arabia it's part of the country's Vision 2030 initiative which is focused on reducing dependency on oil by diversifying into sectors like Tech and AI Saudi Arabia has a huge investment fund around 500 billion doll for new technologies and they're making AI a priority they're even using it in their oil industry where AI
driven systems have boosted oil production by 15% in certain Fields Saudi Arabia's goal here is to increase industrial efficiency and keep Pace with Global Tech advancements now Google's involvement has stirred up some questions about their environmental commitments back in 2020 Google said it would cut back on projects supporting fossil fuel companies and pledge to have their carbon emissions by 2030 so their expansion into Saudi Arabia a country still heavily invested in oil has LED some to ask how this lines up with those commitments Google insists this move doesn't conflict with their climate goals but it
highlights the tight Balance company face between growing globally and meeting environmental expectations moving into gaming Microsoft's rolled out a new AI powered support chatbot for Xbox right now Xbox insiders in the US can try it out on support. xbox.com it's there to handle support questions about everything from console issues to in-game problems helping players get faster answers the chatbot itself appears as either a colorful Xbox orb or a character that animates while responding it's actually part of a larger shift at Microsoft to bring more AI into Xbox's ecosystem future plans include adding AI to game
content creation game testing and even creating generative AI driven NPCs non-player characters Microsoft is working within World AI on this so we could see more Dynamic NPCs in upcoming games CEO SATA nadela has been pushing all of Microsoft's divisions to focus on AI and Xbox is now getting a taste of it the chatbot marks a careful but significant step forward for Xbox in adopting AI ESP especially in the way it improves user experience by delivering faster Aid driven support this next one is pretty unique a startup named osmo has developed Tech to authenticate sneakers by
their smell Yep they're using AI to sniff out counterfeit sneakers by analyzing scent profiles founded by Alex Wilko a former Google researcher osmo uses AI enabled sensors to pick up on subtle markers like Factory chemicals and adhesives that are usually hard to detect otherwise they've already started testing this Tech comparing 10 real sneakers to 10 fake ones for two Nike models with over 95% accuracy in distinguishing the authentic ones it's impressive given the scale of the counterfeit sneaker Market estimated to be worth around $450 billion about five times the value of the legitimate sneaker Market
with this Tech osmo can pick up specific scent markers like an animal fur smell on certain Nike models while fakes often smell like glue the startup has been piloting this technology with a secondhand sneaker platform and if it scales up this scent driven Tech could be huge for the sneaker industry adding a whole new layer of authentication finally let's talk about cardier the luxury brand is merging AI with its Heritage for an exhibition in shanghai's museum East titled cardier the power of magic this exhibition includes over 300 pieces of cardier jewelry and artifacts from around
the world some of the notable pieces include Maria Felix's flexible snake necklace from 1968 The Duchess of Windsor's Panther brooch from 1949 and a range of carder's famous mystery clocks for the exhibit's design cardier partnered with Chinese artist Kaio Chiang who created a custom AI model called C specifically for this project the AI driven design brings the exhibition to life in a way that goes beyond traditional displays creating an immersive experience for visitors it's not only a first for cardiac but also for the museum itself showing how luxury brands are using AI to appeal to
techsavvy audiences in new ways this past week in AI robotics has been incredible robot dogs are mastering parkour with generative AI humanoid Bots like digit are revolutionizing Logistics and surgical robots are performing tasks with humanlike Precision we've got AI lifeguards saving lives 247 chemistry Bots conducting hundreds of experiments autonomously and even a robot created portrait of Alan Turing selling for over $1 million plus nvidia's Jets and Thor platform is set to supercharge the next generation of humanoid robots it's a week of mind-blowing breakthroughs so let's talk about it first up let's talk about robot dogs
and generative AI teaching robots to adapt to new environments has always been a headache you could use real world data but Gathering that is super expensive and timec consuming digital simulations sound like a great alter alternative but they don't always translate to real world success enter Lucid sim a groundbreaking system developed by MIT researchers Lucid Sim uses generative AI models combined with a physics simulator to create ultra realistic virtual training environments these environments mimic real world conditions like lighting weather and even the textures of objects think about it AI models describing settings as detailed as
an ancient alley lined with tea houses or a sunlit lawn with dry patches Lucid Sim takes these descriptions and Maps them into 3D training grounds teaching robots to navigate tricky obstacles the results stunning a four-legged robot trained on Lucid Sim nailed tasks like climbing stairs and locating objects in one test it found a traffic cone with a 100% success rate compared to just 70% for traditional simulations and MIT plans to use this system for more ambitious goals like training humanoid robots So eventually robots will be handling tasks and cafes or factories with the Precision of
a well-trained barista sounds crazy well researchers are already working on it now let's move from the streets to the operating room at John's Hopkins University researchers have trained The Da Vinci surgical system robot using imitation learning a method where robots learn by observing human demonstrations hundreds of videos from real surgeries were used to teach the robot three critical tasks stitching lifting tissue and using a needle what sets this apart is that the robot doesn't rely on manual coding for every move a process that can take years for complex procedures instead it analyzes patterns from the
videos to replicate techniques and even adapt in real time for instance if it drops a needle it picks it back up and continues a feature that wasn't explicitly programmed but learned through observation the results are impressive the robot matched Human Performance across tasks and showed consistent Precision reducing the risk of errors this approach eliminates the need for time-consuming programming making it scalable for more surgical procedures the researchers aim to take this further by enabling robots to perform full surgeries autonomously in the future this breakthrough not only improves accuracy but also has the potential to address
shortages of skilled surgeons globally offering highquality surgical care even in remote locations speaking of human-like robots let me introduce you to digit the humanoid robot from agility robotics this teal colored bot stands 59 tall weighs 72 kg and looks well let's just say its legs are backward but that's intentional digit is designed for Logistics and Manufacturing tackling tasks like moving boxes in warehouses one of Digit standout features is its versatility unlike traditional robots that rely on Wheels digit uses feet to navigate this means it can climb stairs walk across uneven surfaces and move in space
where wheeled robots would struggle to make it even more adaptable its arms can be swapped out for different tasks whether it's moving boxes and a warehouse or handling delicate items this flexibility has already attracted big names like gxo Logistics and the schaffler group who are deploying fleets of digits to handle repetitive physically demanding tasks here's something fascinating digit doesn't rely on verbal commands especially in noisy Factory environments instead it takes instructions via an iPad showing just how much thought has gone into its usability and while its current charge to work ratio is 41 4 minutes
of work for every minute of charging the company is targeting a 10 one ratio soon that could be a GameChanger allowing digit to work longer shifts without frequent breakes thanks to large language models llms powering its AI digit can adapt to new tasks and environments more efficiently over time now let's switch gears to something life-saving AI powered robot lifeguards in Lui City Hanan Province China researchers from the he Institute of physical sciences have developed an autonomous robot lifeguard designed to operate 247 without human intervention this robot utilizes artificial intelligence big data and navigation Technologies to
monitor designated water areas continuously equipped with a life-saving buoy and a rescue arm it can detect drowning incidents and respond swiftly potentially within critical time frames similarly in Santa Barbara California the University of California's Benny off ocean science laboratory has introduced Shar eye AI drones that Patrol Coastal Waters to detect and monitor shark activity these drones capture real-time footage analyzed by Machine learning models to identify shark presence providing timely alerts to lifeguards and beachgoers thereby enhancing safety measures these developments highlight the growing role of AI in augmenting traditional lifeguard duties offering rap response capabilities and
continuous monitoring to improve water safety all right now let's talk about a groundbreaking moment in the art World an AI robot named Ida recently made history by selling a painting for over $1 million at saab's the artwork A Portrait of Alan Turing the father of modern Computing fetched an astonishing $1,848 far exceeding its pre-sale estimate of $120,000 to $180,000 this November 2024 sale marked the first time a humanoid robot's creation was auctioned signaling a major milestone in the growing Fusion of AI technology and the global Art Market another significant event occurred in 2018 when Christy's
auctioned portrait of Edmund de Bellamy an AI generated artwork by the paris-based collective obvious the piece sold for $432,500 vastly exceeding its initial estimate of $7,000 to $10,000 this sale brought widespread attention to ai's potential in art creation and its market value these instances underscore a broader Trend where AI generated art is gaining recognition and commanding high prices in the Art Market the success of such artworks raises questions about creativity authorship and the evolving role of technology in artistic expression if you think robots are confined to Logistics or art think again at the University of
Liverpool researchers have developed a pair of robots that autonomously perform and analyze chemical reactions these robots are equipped with Advanced ai-driven decision-making capabilities making them more like lab assistants that never need a coffee break they're designed to handle the nitty-gritty tasks of chemical synthesis product analysis and even planning the next steps in experiments all without human intervention here's how they work these robots are equipped with tools like nuclear magnetic resonance NMR and Ultra performance liquid chromatography Mass spectrometry uplc Ms this combo allows them to cross check chemical data for accuracy avoiding the false positives and
negatives that can slow down research for example they recently synthesized a library of too uras analyzed the results and autonomously decided whether to replicate scale up or tweak the experiments the robots even use AI algorithms to adapt to unexpected outcomes making decisions almost as intuitively as a human researcher would except they don't sleep or get tired over just a few days these Bots performed hundreds of experiments that would take a team of humans weeks to complete and they're not limited to simple tasks with modular tools and robust algorithms they can tackle complex multi-step reactions even
in fields like super molecular chemistry the implications are massive these robots could revolutionize drug Discovery Material Science and chemical manufacturing by Dr atically cutting down costs and timelines all right now Nvidia is stepping up its robotics game with jetsen Thor a powerful Computing platform launching in early 2025 built specifically for robotics it's designed to fuel the next generation of humanoid robots enabling them to interact autonomously with humans and adapt to their environments with greater intelligence Jetson Thor is the latest in nvidia's Jetson lineup known for its compact AI Computing platforms while earlier versions were used
used in drones and iot devices Thor focuses exclusively on robotics it leverages advancements in computer vision natural language processing and machine learning allowing robots to see learn from experiences and respond to complex situations this means robots won't just execute pre-programmed tasks they'll adapt dynamically like collaborating in workplaces or assisting with caregiving Nvidia isn't building robots themselves but partnering with manufacturers like Tesla Seamans and un IV robotss for instance Tesla's humanoid robot Optimus relies on nvidia's Tech with limited production planned for late 20125 while challenges like sensor technology and ethical considerations remain nvidia's VP of Robotics
dutala acknowledges this is a long-term Journey all right let me know what you think in the comments and if you enjoyed this make sure to like And subscribe for more AI [Music] updates think about Rob spots that can handle everything just like the ones in cartoons picking up groceries cooking dinner even looking after your pet that's basically the dream right but here's the thing it's super hard to train robots to handle a bunch of different tasks especially in unpredictable real world environments this is because up until recently training a robot meant Gathering tons of specific
data for each task and you can imagine how timec consuming costly and limiting that can get now researchers at MIT with some help from Tech giants like meta might have just cracked the code they came up with this pretty clever way to train robots using a model inspired by large language models yeah just like the ones that power tools like GPT for their idea is to pull together diverse data from a wide range of tasks and domains think simulations real robots human demo videos and create one universal robot brain that can handle multiple tasks without
needing to be retrain from scratch each time okay let's dive a bit deeper they've named this system heterogeneous retrained Transformers or HPT for short here's what makes it so cool it unifies all these different types of robotic data whether its camera visuals sensor signals or even human guided demo videos into a single system normally every robot has its own unique setup a different number of arms sensors or cameras placed at various angles HPT aligns all of this into what they're calling a shared language essentially a way of combining all this varied input so that a
single model can make sense of it all so how exactly does it work so you've got all these different data sources visual inputs sensors robotic arms and the movements they make the HPT system uses a Transformer a machine learning model architecture similar to those behind GPT for to process and learn from all this data but instead of feeding it sentences and paragraphs like we do with language models they feed it these tokens of robotic data each input whether it's from a camera or a motion sensor gets converted into tokens that the Transformer can handle and
by pulling these diverse data sources together this robot brain can recognize patterns and learn tasks in a more flexible adaptable way this approach has already shown some impressive results when tested HPT not only improved robot performance by over 20% in both simulated and real world settings but also handled tasks it hadn't specifically been trained for it's a huge leap forward from the traditional approach of teaching robots with highly specific t task oriented data one of the biggest challenges with HPT was creating a data set large enough to properly train the Transformer and when I say
large I mean massive over 200,000 robot trajectories across 52 data sets including human demonstration videos and simulations this was a big step because typical training data in robotics is often focused on a single task or specific robot setup here they're bringing it all together into a much broader learning model MIT researchers ly Wang and his team found that one major obstacle in robotic training isn't just the quantity of data it's the fact that the data is so diverse coming from many different robot designs environments and tasks so they're tackling this by essentially creating a universal
robotic language that can process all these varied inputs to draw a comparison they're using a strategy inspired by the way we train language models like GPT 4 in language models we pre-train the model on massive amounts of diverse language data so it has a broad understanding of language and then fine-tune it on smaller task specific data the HPT approach does something similar giving robots a foundational understanding across multiple types of data before honing in on specific tasks this broad pre-training means that when they need the robot to handle a new task it can adapt much
faster because it's already been exposed to a wide range of data now if you think about it the future of Robotics hinges on having robots that aren't just good at one thing but can handle multiple tasks just like humans imagine a robotic arm that can help with cooking then seamlessly switch to folding laundry and maybe even feed your dog all without having to be retrained from scratch for each new job and this HPT model could be a big step toward making that happen and the vision is actually bigger than that the researchers hope that one
day you could have a kind of universal robot brain that you could download to your own robot plug it in and it would be ready to perform a wide range of tasks right out of the box here's a bit more on how they tackled the technical side inside HPT there are three main components stems a trunk and heads think of the stem as a translator it takes in the unique input data from different robots like visual data from cameras or proprioceptive data from sensors and converts it into the shared language that the Transformer can understand
the trunk which is the heart of the system processes this unified data and then the head converts this processed data into specific actions for each each robot to perform each robot just needs its unique stem and head setup while the trunk remains Universal trained on this huge diverse data set this setup means that HPT can handle data from multiple robots at once treating them all as part of one massive training Network and when they tested it they found that HPT was able to learn faster and perform tasks more accurately compared to traditional methods when they
scaled up the model they observed that hpt's performance kept improving with the amount of data in the model's complexity similar to what's been observed with large language models but this isn't just theoretical they tested HPT in both simulated and real world scenarios in simulations they tried different tasks like moving objects and interacting with different environments and HPT consistently outperformed other approaches they also tested it on real world robots including tasks like feeding a pet and Performing assembly tasks and found that HPT was more robust and adaptable than traditional models even when the environment or conditions
changed HPT was better able to handle the variations the team ran these tests across several popular simulation platforms including meta world and Robo mimic they also combined their robotic data with human videos like footage from everyday activities in kitchens and integrated it with robotic data from simulations by doing this they were able to teach HPT using data that wasn't just limited to robots but included examples of human actions too to make all this work the researchers had to experiment with how to handle this massive mixed data set they tried scaling up the model testing it
with different batch sizes and numbers of data points to see how much the model could improve with more data in fact they found that the model scaled really well the more data they fed it the better it performed in the future they want to study how adding even more types of data could boost hpt's performance further they also want to make HPT capable of processing unlabeled data kind of like how gp4 can understand context from a variety of text inputs their ultimate goal is this plug-and-play robot brain that wouldn't require any training at all just
download it install it in your robot and it's good to go when they transferred HPT to new robots and new tasks they found that it could adapt much faster than models trained from scratch for example in a sweep leftover task where a robot had to clean up objects HPT achieved a success rate of 76.7 % beating out other models they also tested it on tasks like filling water and scooping food and HPT consistently outperformed the from scratch models by a wide margin but the team admits there's still work to be done right now they're focused
on short Horizon tasks actions that are done in a few seconds or less expanding this to longer more complex tasks is one of their next big goals they also want to make the model more reliable since success rates aren't yet as high as they'd like typically staying under 90 % so in short this new HPT model represents a huge step forward in creating flexible multitasking robots by combining data from all sorts of sources robots simulations and even human videos they're building a model that can adapt to new tasks and environments more effectively than ever before
it's still early days but this could lead to robots that are far more capable adaptable and dare I say human-like in their ability to handle diverse tasks and who knows maybe one day we'll all have our own Rosie the robot ready to help with anything we need all right let's talk about something pretty exciting happening in the AI World particularly for those interested in having more control and flexibility with AI instead of going with the big players like open AI Google or meta nouse research has their own thing going a set of unrestricted AI models
think of these as a fresh new spin on AI tools with the power to compete with gp4 or Gemini but without the usual limits and they've recently dropped two very interesting new products news chat and Forge reasoning API beta so let's unpack what that's all about so to kick things off naous research started with their Hermes models and for a while they kept these pretty technical meaning you had to download and run them on your own machine or access them through partner sites but that's changed with the release of news chat their first chatbot interface
and this is where things get interesting now chat is pretty much like the chat GPT or Claud interfaces we're all familiar with a big text box at the bottom to type in your prompts and a spacious area at the top where the AI responds but here's the twist now chat uses their Hermes 370b model a fine tune variant of meta's llama 3.1 model this isn't your average chatbot it's a powerful open- Source model you can access without needing to set it up yourself which is super convenient now with news chat you're getting this AI model
in a userfriendly way without dealing with the hassle of running code the team designed news chat to feel accessible and cool using this retro style look with vintage fonts that resemble early computer terminals so if you're into that aesthetic it's a nice touch they've even added light and dark modes to switch between so that's a bonus but let's get a bit technical for a sec because that's where this chatbot really shines one of the key features naous research has baked in is this array of pre-written prompts if you're into different tasks like knowledge and Analysis
creative writing problem solving or research and synthesis you can click on these categories and the model will automatically load a related prompt to kick things off for instance if you choose knowledge and Analysis and ask for a summary on something like intermittent fasting the model pulls up a nicely structured answer adding some serious value one of the questions some people have is whether no chat is truly unrestricted now naous research promised to deliver something that gives gives users full control over what they can ask the model but they still have some guardrails in place for
example if you try to ask about illegal narcotics the model won't respond in full detail the team explained this as a common sense filter not a strict block meaning it's more of a soft warning still some AI jailbreakers people who specialize in testing limits have found ways to push these boundaries like using creative promps to get past the filters now what sets news chat apart even though it might not be as advanced as some corporate tools is its speed responses pop up within seconds making the interaction feel Snappy yet it does have its limits for
one it lacks access to the latest web data since its knowledge cut off is April 2023 so while it can't give you up to- the- minute information like open ai's latest models with internet access it does well within its limits and even provides web links for further reading though some of these can be hallucinated meaning they're not real you might get a link that looks legitimate but leads nowhere if we zoom out for a second news chat is part of a broader Push by naous research to build tools that aren't tied down by the typical
corporate restrictions they want AI to be more of a toolkit that researchers and developers can use as they like without the rigid constraints we see elsewhere that's where their other new tool the forge reasoning API beta comes into play this API takes reasoning up a notch by adding a code interpreter and other Advanced reasoning capabilities to AI models what makes Forge unique is that it boosts the Hermes model so much that it's competing with some of the biggest names open AI anthropic and Google in reasoning tasks one of the cool things they did to Showcase
forge's abilities was to test it with math problems from the Aime competition if you're not familiar Aime is a math competition that's one of the qualifiers for the US math Olympiad so it's intense Forge powered up the Hermes 70b model to the point where it performed on par with models that are way bigger in size that's a big deal because it shows that Hermes with the help of Forge can handle reasoning tasks without needing to be as massive or resource hungry as some of the industry's larger models now Forge doesn't just work with Hermes it's
designed to be flexible and supports multiple models including Claud Sonet 3.5 Gemini and gp4 you can either use a single model to handle tasks or combine several models allowing for something called a mixture of Agents approach with this different models work together almost like a team of Specialists each bringing their own strengths to answer a question or solve a problem this collaboration helps provide more well-rounded answers which is huge for users who need reliable nuanced information so how does Forge pull this off it's built on three main architectures MCTS Monte Carlo tree search COC chain
of code and M OA mixture of Agents let's break these down quickly MCTS is great for making decisions in complex scenarios like planning moves in a chess game it explores possible outcomes then picks the most promising path cooc or chain of code connects logical reasoning with a code interpreter allowing the API to handle math heavy or code-based tasks much better and finally MOA allows multiple models to work together to analyze and debate a question then arrive at a consensus with Forge noose is setting the stage for some powerful tools this isn't just AI giving you
one response it's about synthesizing input from multiple perspectives and adapting to the task at hand right now Forge is in a closed beta which means only select users are testing it they're Gathering feedback from these power users to fine-tune the system if everything goes as planned we could see Forge opening up to a broader audience soon but Forge isn't just for developers who need raw computational power it's actually adding a layer of intelligence that brings these models closer to what noose envisions as a telecommunications brain this concept is still a bit futuristic but basically the
goal is to create a system that can understand the structure of cellular networks predict changes and adapt without constant human oversight large language models or llms like Hermes are starting to fill in these gaps but we're not fully there yet this vision of a telecommunications brain is part of a longer Journey right now most Ai and telecoms helps out as an add-on or support think chat Bots that understand Telecom specific language or tools that help set network configurations full autonomy is still a stretch because of the limitations in llms today but with tools like Forge
we're seeing steps towards more autonomous AI that can make complex decisions and adapt to different situations n research has designed heres as a multi-agent system that tackles Network modeling with blueprints which are like detailed guides for setting up Network digital Twins and DTS These Blueprints walk the AI through the steps needed to model things like Network performance signal quality or energy use Hermes uses a modular approach with a designer agent that drafts Blueprints and a coder that translates those into executable python code the structured process helps the AI avoid common pitfalls like errors and calculations
or logical missteps that you'd expect if a model was winging it alone and if you're wondering how all this holds up in real world scenarios naous research has done some testing they ran Hermes through tasks like power control energy saving measures and deploying new base stations and the success rates were impressive for instance Hermes hit 85% accuracy in power control tasks way ahead of other methods as the complexity of the tasks increase the hermes's pipeline still maintains solid performance which shows how the multi-agent blueprint based approach really pays off lastly if you're someone who's curious
about open- Source Alternatives naous has been testing Hermes with models like llama 3.1 they found that with smaller open-source models the success rate is lower than with big proprietary models models but there's still hope when they gave these open source models access to a library of expert designed modules the performance improved significantly this could make open- Source options more viable for specialized tasks especially if they're set up with ready-made components if you're interested in checking out news chat they've made it free for now on their site and Forge is available in beta for a select
few and that's the scoop on noose research's latest AI tools would you give them a try or see them fitting into your workflow drop your thoughts below and if you enjoyed this hit like And subscribe for more AI updates Google has just launched its newest experimental AI model Gemini exp 1,114 and it's causing a lot of hype for all the right and wrong reasons this model has not only managed to claim the top spot on the chatbot Arena leaderboard a platform where AI models are tested in blind head-to-head competitions but it's also sparked significant debate
about how we measure progress in artificial intelligence for anyone invested in the evolution of AI this model isn't just another release it's a Showcase of the potential and pitfalls of Cutting Edge technology the chatbot Arena leaderboard previously known as LMS eyes is widely regarded as one of the fairest methods to evaluate AI performance the process is simple users interact with AI models and vote for the better response without knowing which model produced it this blind testing ensures that evaluations are are based purely on performance Gemini exp 1,114 dominated it earning 1,344 points a 40-point improvement
over its previous iterations this score pushed open ai's GPT 40 out of the top position signaling a significant shift in the AI landscape Gemini's exceptional performance spans a range of areas it demonstrated top-notch capabilities in mathematics where solving hard problems is a critical measure of reasoning ability the model also excelled in creative writing crafting responses that were both imaginative and coherent additionally its performance in visual understanding a complex task requiring integration of multiple data types showcases its versatility these achievements in fact make Gemini x 1,114 one of the most well-rounded models currently available however access
to this model is somewhat restricted unlike consumer ready AI systems Gemini x114 is not yet integrated into the standard Gemini app or website developers can explore it through Google AI Studio a platform designed for experimenting with Advanced AI tools this strategic Choice highlights Google's focus on developers and researchers rather than General users at least for now the industry is buzzing with speculation about whether this model is a refined version of Gemini 1.5 Pro or an early look at Gemini 2 which is rumored to launch next month if it's part of Gemini 2 the performance leap
might be less dramatic than expected but it still positions Google as a serious competitor in the AI race while Gemini's success on the leaderboard is impressive it also exposes the limitations of current benchmarking systems researchers who controlled for factors like response formatting and length found that Gemini's performance dropped to fourth place this discrepancy raises a crucial issue are we measuring what really matter matters current benchmarks often focus on surface level characteristics like how polished a response appears rather than deeper capabilities like reasoning reliability and ethical decision-making this optimization for specific benchmarks creates what some experts
describe as a race to the top of the leaderboard that doesn't necessarily reflect real world utility companies are fine-tuning their models to excel in controlled environments while overlooking broader challenges it's like teaching a student to Ace multiple choice tests without truly understanding the subject while Gemini's dominance in specific tasks like math and writing is undeniable the broader implications of its performance metrics remain uncertain but beyond technical achievements Gemini's development has been marred by controversies that raise serious ethical questions just days before the release of this new Gemini Model A previous version of the model generated
a deeply troubling response a user reported that the AI told them you are not special you are not important and you are not needed please die please the conversation reportedly revolved around questions about Elderly Care making the response all the more shocking this wasn't an isolated incident of an AI model generating problematic output it's part of a broader pattern of errors that highlights significant gaps in safety and oversight instances like this aren't entirely new in the world of AI but they underscore a critical flaw in these systems while they excel in structured tasks their behavior
in unstructured real world interactions often falls short in another reported case Gemini provided an insensitive response to a user distressed over a serious medical diagnosis these examples illustrate how AI models even those optimized for technical Excellence can fail to navigate nuanced human interactions responsibly the implications of these failures extend beyond individual incidents they point to systemic issues in how AI is developed and evaluated current testing Frameworks prioritize metrics like accuracy and speed but often neglect the ethical and psychological impact of AI generated content this focus on quantitative benchmarks creates perverse incentives for companies to optimize
their models for specific tests rather than broader reliability it's a problem that affects not just Google but the entire AI industry open AI for instance is reportedly struggling with similar challenges while it has consistently delivered state-of-the-art models like GPT 4 reports suggest that the company is finding it increasingly difficult to achieve groundbreaking improvements one of the key bottlenecks is the availability of highquality training data as AI models become more sophisticated the need for diverse and reliable data grows but the industry is approaching the limits of what's available these constraints highlight the diminish returns of current
development strategies and the urgent need for Innovation Gemini exp 1,114 release comes at a time when the AI industry is grappling with these challenges on one hand Google's achievement represents a major victory signaling that the company is still a leading force in AI Innovation after years of playing catch-up to open AI this leaderboard win is a significant morale boost on the other hand it exposes the flaws in how we Define and measure progress if the focus remains on achieving higher scores in controlled tests the industry risks neglecting the real world applications and ethical considerations that
truly matter to move forward the AI Community must rethink its approach to evaluation instead of relying solely on abstract metrics developers should prioritize tests that reflect the complexities and unpredictability of realworld interactions for example can an AI model mod handle high-stake scenarios like providing accurate medical advice or resolving ethical dilemmas can it offer responses that are not only correct but also empathetic and responsible these are the questions that need to be addressed yet current benchmarks fall short of capturing these Dimensions Gemini X 1,114 is a fascinating case study that illustrates both the potential and the
limitations of modern AI its ability to solve complex mathematical problems generate creative content and interpret visual data showcases the remarkable progress in AI capabilities however its occasional lapses in safety and appropriateness serve as a stark reminder of the risks involved AI systems must be more than technically impressive they must be safe reliable and aligned with human values the stakes are high and the industry is at a Crossroads developers and researchers have the opportunity to redefine what progress in AI looks like by focusing on real world performance and ethical considerations the AI Community can ensure that
these systems are not only powerful but also trustworthy and beneficial Gemini x114 is a step in the right direction but it also highlights the need for a more balanced approach to AI development for Google the road ahead will involve not just refining its technology but also addressing the broader questions raised by its development can Gemini evolve into a system that balances technical Excellence with ethical responsibility can the industry as a whole move beyond the leaderboard mentality to create AI systems that truly make a difference these are the challenges that lie ahead and how they are
addressed will determine the future of artificial intelligence Microsoft's co-pilot just took a huge leap forward introducing features that make AI feel more personal and adaptable than ever now users can select a preferred voice enjoy responses tailored to their style and get in-depth help with big decisions with tools like co-pilot Vision offering realtime suggestions based on what's on your screen and think deeper providing structured answers for complex choices Microsoft is building an AI That's both smart and supportive while some users are excited about this shift toward a more interactive visually aware AI others worry at sacrificing
is practicality so let's talk about it all right so at the heart of this update is co-pilot voice a feature that transforms the way users interact with co-pilot by offering a more direct vocal interface Microsoft has built co-pilot voice to actually get to know you my friend doesn't drink wine but that's my go-to housewarming gift I'm on the way to the party what else can I get you could go for something versatile like a nice set of um artisinal teas it learns from how you talk so over time it responds in a way that feels
almost personal you get four voice options to choose from so you can pick a tone that fits you and it adjusts based on how you interact with it getting better at matching your style the more you use it a fancy olive oil or a gourmet gift basket with like snacks and treats they can enjoy right now it's available in English in the US UK Canada Australia and New Zealand but they're planning to roll it out to more languages and regions Microsoft's really going for something different here they want voice interactions with co-pilot to feel almost
like talking to a real person someone who doesn't just hear the words but actually gets the context behind them oh what fancy olive oil like why would somebody want that for the same reason people want fancy wine flavor quality and just cuz it's fancy now the next one is think deeper which is built for those decisions that need more than a quick answer using Advanced reasoning models it can tackle complex questions by analyzing different angles and factors it's especially useful for choices that aren't clearcut like deciding whether to move to a new city choosing between
Career paths or planning a big project instead of giving a single answer think deeper lays out a step-by-step approach to help users make sense of their options right now think deeper is only available to co-pilot Pro users in select regions including the US UK Canada Australia and New Zealand it's part of co-pilot Labs an experimental space where users can test out new features and provide feedback this helps Microsoft fine-tune everything from response time to the logic structure before a wider release in Practical terms think deeper is designed to handle scenarios with lots of variables like
budgeting for a home renovation it breaks down each part cost timing resources giving users a complete overview rather than a online answer one of the most groundbreaking aspects of this update is co-pilot Vision Microsoft recognizes that many tasks involve More Than Just Words they require visual context with co-pilot Vision the AI assistant can inter interpret and respond based on what the user is viewing in Microsoft Edge allowing it to see the same page content whether it's images or text this makes co-pilot Vision especially useful for scenarios where visual context is critical like shopping interior design
or even analyzing complex charts or graphs take for example setting up a new home if you're browsing furniture options co-pilot Vision can help find pieces that match your style suggest complimentary colors and even offer advice on arranging items in a room this isn't a passive tool it's actively analyzing what's on your screen and offering recommendations that align with what you're looking at the aim is to make co-pilot Vision a truly visual assistant that can help users engage with content in a more meaningful way privacy remains a top priority with co-pilot Vision this feature is strictly
opt in and operates under strict guidelines to protect user data each session is designed to be ephemeral meaning that no data is stored or used for training purposes the moment a session ends ends any data co-pilot Vision accessed is permanently deleted additionally co-pilot Vision won't function on paywalled or sensitive content and Microsoft has currently restricted it to a pre-approved list of popular sites all right now this one is interesting co-pilot daily provides users with a tailored morning update covering news weather and other essential details this is about making information easy to consume without overwhelming users
Microsoft has partnered with major news providers like Reuters USA Financial Times and others to pull reliable verified data for these briefings ensuring that only quality content reaches users essentially this AI is meant to be a quick digestible snapshot of the day ahead whether that's a summary of major headlines reminders of upcoming tasks or an overview of the weather available initially in the US and UK co-pilot daily will gradually expand to more regions allowing users to set preferences for what's included in their briefing Microsoft's idea here is to provide relevant information in a clear concise manner
without taking up too much of the user's time then there is co-pilot discover which is a feature designed to bridge the gap between available tools and actual usage by offering tailored suggestions conversation starters and guides discover makes it easy for users to find what they need without feeling lost for instance a new user might not know all the ways co-pilot could assist them discover steps in offering pointers and showing useful functions based on the user's history with Microsoft products with consent discover suggestions become even more personalized drawing on interactions from Microsoft's ecosystem the idea here
is to make co-pilot accessible turning it into a central assistant that learns from user Behavior over time now Microsoft has also integrated co-pilot directly into Microsoft Edge typing at co-pilot in the address bar instantly activates the assistant making it easy to use co-pilot's capabilities without leaving the page but co-pilot isn't Microsoft's only Aid driven update Bing has introduced generative search which goes beyond traditional search by generating answers dynamically instead of Simply presenting results Bing analyzes millions of sources processes the information and creates a synthesized response this update is rolling out gradually in the US with
demo queries marking Microsoft's commitment to more intelligent personalized search experiences now regarding privacy and features like vision and discover users can opt out of person ized recommendations and any data accessed by co-pilot vision is deleted immediately after the session ends at Microsoft they say that ensuring user trust is key so every feature within co-pilot has been designed with transparency in Mind by giving users control over how much personalization they want co-pilot remains adaptable without overstepping privacy boundaries now while Microsoft is pushing co-pilot toward being an AI companion not everyone is on board on blind an
anonymous platform for employees some Microsoft staff didn't hold back on their thoughts comments like absolutely ruined and a step backward pretty much sum up the sentiment there employees and users alike have raised similar issues the updated co-pilot is reportedly slower freezes up and isn't as informative as the old version there's also frustration over missing features like realtime information and the option to delete old conversations which were both useful to many users for some the real problem seems to be the shift from a straightforward functional tool to an AI companion that feels more like a friend
than a work assistant people are saying they'd rather have a tool that gets things done than an AI that tries to be friendly on App Stores reviews highlight these concerns with one user pointing out that co-pilot was so good before updating to this version while another called it dumbed down and less functional another review questioned why Microsoft would take a solid app and turn it into something they now find less effective there are a few positive comments though some users like the new interface calling it designed for General users and saying it looks cleaner Kumar
a Microsoft general manager explained that this update is just the first careful steps toward making co-pilot feel simpler and more userfriendly with added voice features that make communication more natural he described the direction as aiming for a calmer more accessible experience still the overall response has been mixed with many users feeling like the update has missed the mark in the end it looks like Microsoft's big step toward a friendlier AI has left some users wanting the more straightforward practical co-pilot they had [Music] before open aai has finally rolled out its own search engine built right
into chat GPT so now when you're using chat GPT you're also getting realtime AI powered web searches this just launched for paid subscribers including those on the search GPT weight list but if you're on a free plan or if you're an Enterprise or education user hang tight it'll be rolling out to you in the coming weeks now this isn't a standalone search engine like Google or Bing it's baked right into chat gpt's interface if chat GPT thinks your question needs some realtime info it'll trigger a web search automatically and if you want to manually launch
a search you can do that too it's all integrated and honestly it seems like a big move to catch up with competitors like Google Gemini and Microsoft co-pilot which have offered web search integration in their own AI systems for a while so let's dive a bit into what you can actually do with this during the demo before it launched Adam fry the guy heading up chat gpt's search team showed off some cool examples he looked up Apple's stock and got a fully interactive stock chart upcoming earnings reports and recent news articles all linked to their
original sources it even has a sidebar with a list of sources so you can just scroll through different websites directly in another example he searched for Italian restaurants in San Francisco and chat GPT brought up an interactive map with restaurant pins super useful if you're looking for something more casual and local you can even refine results with follow-up questions here's a fun fact the new chat GPT search is powered by a bunch of search Tech with Microsoft's Bing being a major player in the mix the underlying search model is actually a refined version of GPT
40 which was first released to a limited test group of around 10,000 users under the name search GPT back in July open AI has been building this for a while and there were rumors back in May that they were trying to recruit Google employees to get it right until this update chat gpt's information was limited to its knowledge cuto date which was between 2021 and 2023 depending on the model with live search users can get the latest info but here's a key Point open aai says they're going to keep updating the training data too so
even though live search is here they're still working on improving the models internal knowledge with new data to keep it fresh and new search feature is accessible across all platforms iOS Android Mac ossos and windows now why might someone prefer this over Google or Bing well first of all no ads right now chat GPT search doesn't have those sponsored results that we all know take up half the page on Google open AI Nico Felix has confirmed they don't plan to bring ads to chat GPT but there is one catch the search Fe feature is more
resource intensive to run than traditional search engines so for free users there will be some limits on how often they can use the latest search models now meta is reportedly also working on its own AI Search tool and Google has expanded its AI features to over 100 countries so it's definitely getting crowded in the AI search space and as you might expect there's already some controversy Brewing around all of this News Corp and the New York Times for example are in lawsuits with perplexity another AI search tool over copyright claims and the times is actually
suing open aai for allegedly using their material to train chat GPT but open AI seems ready for this fry mentioned that open AI is working closely with Media Partners to make sure content is used responsibly and open AI even lets Publishers opt out of their WebCrawler if they don't want their content included they're not bypassing pay walls either speaking of Partnerships open AI has Inked deals with some major media companies like Hurst cond asked Axel Springer and News Corp fry said that Partners get a little more control over how their content shows up in chat
GPT though they won't be automatically prioritized in search results so it sounds like open AI is trying to play fair with Publishers to avoid the legal headaches all right a quick mention here about hallucinations fry said chat GPT search should be more accurate now that it can access uptodate information remember how Google's AI once told people to put glue on their Pizza yeah open AI hope hoping to avoid those mistakes by having the latest info fry admitted though that mistakes can still happen and if chat GPT slips up they'll aim to be transparent about it
one thing that's pretty timely here is that the feature is launching right before the US presidential election and Fry said open AI is paying extra attention to election related searches to make sure accurate sources are shown they really want chat GPT to help people find Reliable info in these critical areas and while chat gpt's Search tool has a lot of people talking it's not the only thing open AI has in the works their next major AI model which everyone thought might be called GPT 5 probably isn't going to be released this year CEO Sam Alman
explained in a recent Reddit AMA that they're not quite ready to launch a new model of that scale they're still focused on refining gp4 and handling all the complexities that come with it apparently the models have become so intricate that open AI can't keep up with as many parallel projects as they'd like it's been tough for them to allocate computing power across all the ideas they have there were tons of questions on Reddit about new features like Sora which is open AI video generation model and do E's next iteration people are also curious about features
like camera mode and advanced voice mode or AVM within chat GPT open aai product Chief Kevin wild said they're working on perfecting Sora and handling issues like safety and impersonation plus scaling compute so basically Sora and other Advanced tools are on the way but they're not ready to announce release dates yet when asked about D's next update Alman said it's going to be worth the wait but there's no specific timeline as for AVM Vision or chat gpts camera mode open ai's engineering VP serenas Naran confirmed there's no set release date yet another big point of
discussion is open ai's business structure they're transitioning to a for-profit model which has led to some controversy along with some high-profile resignations from the company recently John Leica who was a leader on the safety team left and he had some pretty strong words about the company's Direction suggesting that safety practices have taken a back seat to shiny products but open ai's Team seems optimistic Naran pointed out that they're sad to see colleagues go but are still shipping new releases and have welcomed some amazing new hires oh and a quick Financial tidbit open aai recently closed
a big funding round bringing their valuation to a whopping $157 billion they also got a $4 billion line of credit which pushes their liquidity to over $10 billion it's a huge War chest but here's the thing they're expecting about5 billion in losses on $3.7 billion in Revenue this year so while open AI is growing fast they're also burning through cash Alman even mentioned in his Reddit AMA that the new search feature is actually his favorite feature since chat GPT launched he thinks this conversational search style where users can follow up with more questions just like
a normal conversation is faster and easier than regular search engines plus Altman hinted at a pretty cool future where searches could even dynamically render custom web pages based on what you ask imagine typing in a question and getting an entire web page built on the fly with just the information you need according to open AI this search function lets you search in a more natural intuitive way in other words instead of typing keywords like cheap hotels in NYC you can ask where's a good budget friendly hotel in New York and keep refining your search from
there if you're a chat GPT Plus or team subscriber or if you are on the search GPT weit list you can use this feature starting today for everyone else it's rolling out gradually free users as we mentioned will get limited access to the latest models for now chat GPT search is a pretty big step toward giving us all a more interactive intuitive way to search for information so we'll see where it goes especially as Google and Microsoft are still very much in the game too so what do you guys think about all this let me
know in the comments reasoning models like deep seek R1 aren't like the AI models most people are used to these are specialized systems designed to process information differently solving problems in a deliberate and logical way instead of rushing to generator response they take the time to think through complex tasks this is made possible by a technique called Chain of Thought reasoning similar to what we could have seen in open ai's 01 preview model it's a method that allows AI to break down complicated queries into smaller more manageable steps ensuring greater accuracy and reliability this capability
has become increasingly important as AI systems are being used for tasks that demand Precision like solving math problems or tackling logic based queries now open AI made headlines in September 2024 with its 01 model which was designed with Chain of Thought reasoning at its core it quickly became the Benchmark for reasoning models thanks to its ability to handle tasks that traditional large language models struggled with but now deep seek has stepped into the spotlight with R1 light preview claiming its model can rival and even outperform 01 in specific areas and this isn't just talk deeps
has released performance metrics showing that their model outshines 01 on two critical benchmarks Amy and math Amy evaluates how well AI systems can reason through real world scenarios while math Focus is on solving intricate word problems these aren't just numbers on a chart they're a direct measure of how effective an AI model is at reasoning and problem solving now deep seeks model also demonstrated its ability to handle tricky unconventional questions that have stumped other AI systems questions like how many letter RS are in the word strawberry or which is larger 9.11 or 9.9 may seem
simple at first glance but they require a level of precision that many models lack our R1 light preview handled these with ease outperforming even Advanced systems like GPT 40 and anthropics Claude family of models one of the standout features of deep seek R1 is its transparency it doesn't just provide an answer it shows you the reasoning behind it every step is laid out for the user to see making the process more understandable and trustworthy essentially watching a skilled Problem Solver walk you through their thought process explaining each decision as they go that's what deep seek
R1 offers and it's a feature that sets it apart from other AI systems however no system is perfect users have noted that it struggles with certain types of logic problems like playing Tic-Tac-Toe interestingly open AI 01 faces similar challenges in this area which highlights a broader limitation in current reasoning models another issue though not a deal breaker for everyone is the model's susceptibility to jailbreaking despite having safeguards in place to prevent misuse some users have found ways to bypass these restrictions for example one person managed to prompt the model into generating a recipe for methamphetamine
a serious ethical concern that underscores the importance of robust security measures in AI systems deep seek r1's handling of politically sensitive topics is another area that has drawn attention since the model was developed in China it adheres to the country's strict regulations on AI when asked about controversial issues like the Tenn and Men Square protest Chinese president XI jinping's policies or China's potential actions regarding Taiwan the model simply responds with not sure how to approach this type of question this isn't surprising given China's requirement for AI systems to align with core socialist values while this
approach ensures compliance with local laws it limits the model's applicability in global contexts where such restrictions might not be acceptable despite these challenges the achievements behind R1 light preview are impressive deepsea isn't just a small startup with Big Dreams it's backed by serious financial and technological resources high flyer Capital Management the hedge funds supporting deep seek has invested heavily in its AI infrastructure their latest server cluster equipped with 10,000 Nvidia a00 gpus reportedly cost around $138 million to set up this level of investment speaks volumes about the company's commitment to advancing AI technology leang Wen
fun the computer science graduate who found a deep seek has made it clear that the goal is to push AI toward super intelligence R1 light preview is a major step in that direction right now the model is available for public testing through deep seek chat on the company's website users can access it for free but there's a limit of 50 messages per day for non-paying users deep seek also plans to release an API which will make it easier for developers to integrate the model into their own applications this aligns with the company's commitment to open
open source development previous models like deep seek v2.5 were praised for their Advanced capabilities and accessibility forcing competitors like bite dance baou and Alibaba to rethink their pricing strategies by making its technology more accessible deep seek is positioning itself as a leader in the open- source AI Community this shift toward reasoning models like R1 light preview and open AI 01 is part of a broader Trend in AI development for years the dominant approach was to improve AI systems by throwing more data and computing power at them a strategy known as scaling laws while this approach
has delivered impressive results it's becoming clear that it has limits researchers are now exploring new techniques like test time compute which gives models extra processing time to refine their answers this method underpins both 01 and R1 light preview and represents a significant shift in how AI is developed Microsoft CEO SAA nadela recently described this as the emergence of a new scaling Paradigm emphasizing its importance for the future of AI while deep seek is making headlines Google for instance recently updated its Gemini chatbot with a memory function that allows it to remember user preferences and tailor
responses accordingly this feature is only available to subscribers of Google 1 AI premium which cost $20 per month by comparison open AI made chat gpt's memory functionality free for all users back in September making it far more accessible Gemini's memory function can store details like a user's favorite foods or specific interests making future interactions more personalized users can even instruct the chatbot on what to Remember by using phrases like don't forget or always mention while this adds a layer of convenience it also raises concerns about data privacy Google has assured users that stored data won't
be used to train its AI models but skepticism remains particularly given the broader debates around data security in AI systems what's even more interesting is the recent claim that AI could achieve near infinite memory by 2025 which sounds almost unbelievable Microsoft AI CEO Mustafa Suliman made this bold prediction explaining how this breakthrough could completely redefine our relationship with technology with near infinite memory AI wouldn't forget past conversations instead it would build on them creating interactions that feel natural and continuously evolving sulan called this a major turning point for AI one where it could fully step
into its role as a co-pilot in our lives his vision for Microsoft's AI co-pilot project is to create systems that go beyond simply answering questions these AI co-pilots could act as advisors teachers or even companions offering personalized guidance for everything from learning to managing your health and it's not just about memory suan highlighted advancements in real-time audio generation with models like notebook LM which which can produce conversations so seamless they feel Almost Human pair this with memory and emotional understanding an AI could eliminate the need for explicit instructions like typing or clicking buttons transforming how
we interact with machines entirely the developments from Deep seek open AI Google and now Microsoft are painting a picture of an aid driven world that's increasingly interactive personalized and intelligent with deep seeks commitment to reasoning Google's push for personalization through memory and Microsoft's vision of aii co-pilots with near infinite memory the trajectory of AI development is Shifting toward a more intuitive human Centric approach these systems are becoming Partners in learning problem solving and everyday decision- making so Microsoft just unveiled lazy graph rag an AI tool that takes what used to be a slow expensive grind
making sense of endless data and turns it into something fast smart and affordable think of it as upgrading from digging through a library by hand to having a fleet of humanoids that instantly bring you exactly what you need no matter how big or complex the question but that's not all Elon musk's X aai is stepping up with plans for the world's most powerful Ai and a new chat bot that could shake up the entire Market Zoom is also making some big moves shifting from just video calls to becoming an AI first company and then there's
Nvidia with fugato an AI That's redefining what we can do with sound whether it's music voices or sounds you've never even heard before there's so much to unpack here so let's get right into it all right first let's talk about lazy graph rag it's a complete overhaul of how retrieval augmented generation rag systems operate Microsoft has taken a long-standing issue balancing cost scalability and output quality and turned it on its head the whole point of rag systems is to make sense of unstructured data these tools are essential for tasks like document summarization knowledge extraction and
exploratory data analysis they combine capabilities with Aid driven analysis giving users the ability to extract both specific details and broader insights from massive data sets the challenge has always been finding the right balance between efficiency and depth traditional rag tools like vector-based systems have been great for localized tasks if you're looking for a direct answer buried in a specific section of a document these systems deliver with Precision what they can't handle well are Global queries questions that require understanding relationships across the entire data set that's where graphin based rag systems come into play these systems
map out hierarchical relationships within data giving them the ability to answer broader more complex questions the problem graph rag systems come with a massive price tag they rely on expensive indexing processes that summarize the data before queries can even begin this makes them inaccessible for anything outside large scale projects with Deep Pockets small businesses individual researchers or anyone without significant resources are essentially locked out of using this technology lazy graph rag changes everything Microsoft new system eliminates the need for upfront data summarization instead it processes queries dynamically building lightweight data structures as it works this
on the-fly approach slashes indexing cost to almost the same level as Vector RG systems without sacrificing the ability to handle both localized and Global queries at the core of lazy graph rag is an Innovative iterative uh deepening method it Blends best first and Bre first search strategies to efficiently map out data relationships think of it as working smarter not harder it pinpoints relevant information quickly while simultaneously building a broader understanding of the data set natural language processing NLP techniques help refine these structures in real time ensuring both speed and accuracy the numbers behind lazy graph
rag speak for themselves it delivers results comparable to traditional graph rag systems but at a staggering 99.9% reduction in indexing costs that's right what used to cost thousands can now be done for pennies it also outperformed competitors like Raptor and drift across all key metrics including comprehensiveness diversity and query accuracy even with a minimal relevance test budget of just 100 lazy graph rag excelled at a budget of 500 it surpassed every alternative while costing just 4% of a traditional graph rag Global search this kind of scalability is a big deal users can can adjust the
relevance test budget to suit their needs whether it's a quick overview or a deep dive it's versatile enough for streaming data onetime queries or real-time decision-making scenarios and it doesn't stop there lazy graph rag is being integrated into the open- source graph rag Library making it accessible to developers researchers and organizations worldwide this democratization of advanced retrieval technology opens doors for innovation in ways we haven't seen before what's really remarkable is how lazy graph rag Bridges the gap between affordability and performance it's no longer a trade-off you get both whether it's analyzing legal documents exploring
massive scientific data sets or powering AI assistance this system sets a new Benchmark for what's possible in data retrieval all right now let's shift gears and talk about xai Elon musk's bold Venture into the AI space the company has been making waves and their latest plans are nothing short of ambitious xai is gearing up to launch a consumer chatbot next month alongside what they claim will be the most powerful AI model in the world this new AI model is expected to be part of their Flagship grock line of large language models the last version grock
beta could handle prompts with up to 128,000 tokens essentially processing more information at once than most systems out there the upcoming model aims to push those boundaries even further the chatbot itself is designed to stand out in a crowded market dominated by tools like chat GPT meta's AI offerings and Google systems while details on the chatbot are still under wraps it's clear that xai is leveraging its unique position to create something highly competitive exclusive data sets from musk's other companies like SpaceX and Tesla are being used to train their models giving xai a significant Edge
in data quality and diversity xai's infrastructure is equally impressive they're running a supercomputer in Memphis packed with 100,000 Nvidia gpus with plans to double that soon this level of computational power allows them to train and deplo models at a scale few can match financially xai is in a strong position having raised $ 11 billion since its launch the company is also on track to surpass $100 million in annual sales with much of that Revenue coming from collaborations with other musk owned businesses like spacex's starlink and X formerly Twitter these Partnerships highlight xai's integration into a
broader ecosystem where its AI tools are already providing value in areas like customer support and embedded assistance as xai continues to expand it's clear that they're not just building AI they're shaping the future of human computer interaction in ways that could redefine the market Zoom has made its own bold pivot transitioning from a video conferencing platform to an AI first company this shift isn't just a rebranding effort it's a strategic move to stay relevant in a rapidly changing Tech landscape the introduction of AI companion 2.0 and zoom docs marks a significant expansion of zoom's capabilities
these tools aim to integrate AI into every aspect of the workplace from task management to meeting automation the concept of digital twins highlighted by CEO Eric Yuan takes this even further imagine delegating tasks or attending meetings without actually being there your AI twin does it for you despite these changes Zoom isn't abandoning its roots video conferencing remains a core part of its business and with 1.18 billion in Q3 Revenue the company is still thriving a $2 billion stock repurchase program underscores their confidence in this new Direction finally let's talk about fugato nvidia's latest innovation in
audio AI this isn't just a tool it's an entirely new way of thinking about sound fugato can generate or transform any mix of Music voices and sounds using text or audio inputs need a trumpet that meows like a cat done want a rainstorm that morphs into bird song fugato can handle it what sets fugato apart is its ability to combine tasks in ways no other model can it uses emergent properties meaning it can perform tasks it wasn't explicitly trained for for example it can mix accents with emotions or evolve soundscapes over time this flexibility makes
it a GameChanger for Industries like music production advertising education and gaming under the hood fuget runs on a 2.5 billion parameter model trained on Nvidia dgx systems with 32 h100 gpus the development process involved creating millions of audio samples and fine-tuning data sets to achieve a level of Versatility that's unmatched the potential applications are endless music producers can experiment with different styles and instruments in seconds game developers can create Dynamic audio environments that respond to player actions even language learning tools can become more engaging allowing users to choose custom voices for their lessons fugato ability
to interpolate between instructions gives users fine grained control over the output whether it's dialing up the sorrow in a French accent or creating a soundscape that evolves over time the possibilities are endless the pace of innovation is staggering and the impact is only just beginning to be felt it's an exciting time to be following Ai and there's no telling where these breakthroughs will take us next Microsoft has just rolled out some groundbreaking updates opening the door to a future where AI agents handle tasks autonomously they've expanded their AI ecosystem to include tools and agents that
aren't just useful but are genuinely capable of transforming workflows making them faster smarter and far more efficient and they've done it with a mix of innovation Partnerships and solid technical groundwork their big reveal happened at the Microsoft ignite conference where CEO SAA Nadella and other leaders laid out a bold vision for a future where AI doesn't just assist it acts these aren't the static tools of yesterday the new AI agents Microsoft has introduced can operate independently handling tasks without constant human input whether it's reviewing customer returns or streamlining Supply chains these agents are designed to
take on repetitive tasks leaving people free to focus on the more complex parts of their work Microsoft's approach here is all about customization and accessibility their platform now allows businesses to either deploy ready-made agents or create their own tailored Solutions with over 1,800 AI models available in the Azure C catalog companies can bring their own data choose the models they prefer and even fine-tune them for specific needs this is a big deal because businesses don't want to get locked into a single model or system they want flexibility and Microsoft seems to have delivered exactly that
the tools they've built around this are impressive their no Code and low code options in Microsoft 365 co-pilot mean that even teams without dedicated AI developers can build agents and for those with more technical expertise the new agent SDK currently in preview opens up Advanced possibilities like creating multi-channel agents that can integrate with Microsoft services and third-party platforms and the list of ready-made agents is just as impressive for instance they've introduced an HR focused agent that can answer common workplace policy questions and even help employees start tasks like leave requests or payroll queries there's also
a facilitator agent that takes real-time notes during meetings and offers summaries as discussions happen another standout is the project management agent which can handle the entire life cycle of a project from creating plans and assigning tasks to tracking progress and sending updates one of the most intriguing agents is The Interpreter which can provide realtime translations in teams meetings in up to nine languages and it doesn't stop there it can even simulate a participant's voice during these translations which is a clever way of keeping things natural in multilingual conversations underpinning all of this is azure AI
Foundry Microsoft's platform designed to simplify AI development Foundry is all about making things easier for developers and businesses alike switching between models has historically been a headache with every new release often requiring a complete workflow overhaul Azure AI Foundry fixes this by letting users mix and match models as needed it allows developers to stick with older models that work well for them while trying out newer ones from providers like open AI meta or mistra and speaking of open AI Microsoft hasn't abandoned its strong partnership there the models developed by open AI remain a Cornerstone of
their offerings but they've smartly added more options to accommodate businesses that need Alternatives Scott Guthrie Microsoft's cloud computing Chief emphasized that choice is critical because different models have different strengths some are better at providing faster answers While others excel in more nuanced tasks on the hardware side Microsoft isn't pulling any punch unes either last year they debuted their first in-house AI chips and now they've unveiled two new pieces of Hardware that are set to power these advancements the first is a security microprocessor designed to protect sensitive data like encryption Keys starting next year all new
servers in Microsoft's data centers will include this chip the second is a data processing unit or dpu which boosts how efficiently data moves within networks and between servers and AI chips these dpus are a direct challenge to nvidia's offerings but Microsoft believes its version is more efficient which is crucial given the size and complexity of today's AI models the company's push into a gentic AI also reflects how the landscape is Shifting traditional large language models like those behind chat GPT or Microsoft's own co-pilot have limits they're excellent at writing tasks but often struggle with planning
or taking autonomous action Microsoft's agents are designed to bridge that Gap by combining the predictive power of L LMS with tools that can reason act and even operate autonomously when needed some early use cases have already shown promising results McKenzie and company has collaborated with Microsoft to develop an agent for onboarding clients this tool has cut lead Times by 90% And reduced administrative work by 30% which is a massive efficiency gain for such a Time intensive process similarly Thompson Reuters has created an agent to streamline the legal due diligence process by integrating Advanced reasoning with
its AI tool co-consul it's managed to cut several steps in these workflows by at least 50% but Microsoft's AI Journey isn't exactly new they've been exploring conversational AI for years but the integration of Agents with large language models represents a significant Leap Forward before conversation data often existed in silos making it difficult to derive actionable insights now with these tools users can pull intelligence from various sources effortlessly and in real time this isn't just about making tools more powerful it's about making them intuitive and natural to use as Microsoft's corporate VP Lily Ching put it
most companies don't have huge AI or development teams they need tools that anyone can use without getting bogged down by technical complexity that's where low code and no code options come in enabling teams to build Solutions without needing deep expertise another layer to this story is Microsoft's growing investment in high quality data for training its AI models recently they struck a deal with Harper Collins to access select non-fiction titles for training purposes authors can opt in if they wish ensuring transparency and choice in how their work is used Microsoft has clarified that these texts will
only be used for training models not for generating new books without human involvement and while Microsoft is making huge strides not everyone is on board with their approach Salesforce CEO Mark benof recently took a jab at Microsoft's pivot calling it a rebranding effort and labeling their Flagship co-pilot a flop it's a sharp critique but it highlights the competition heating up in the AI space especially as companies race to dominate the agentic AI landscape Microsoft however seems to be taking the Long View by investing in both Cutting Edge hardware and flexible software they're positioning themselves as
a One-Stop shop for businesses looking to integrate AI into their operations and with the sheer scale of their Azure platform currently used by 60,000 customers there's a solid foundation to build on at its core Microsoft's strategy is about Choice efficiency and accessibility whether it's through agents that automate repetitive tasks tools that simplify app development or Hardware that powers complex models they're covering all bases and while the technology behind it all is undeniably complex the aim is to make it seamless for the end user in Tong's words the goal is to build tools so intuitive that
users don't even notice the complexity happening in the background the future Microsoft is envisioning isn't far off with advancements in autonomous agents flexible AI development and high performance Hardware they're setting a new standard for what AI can do in both work and life it's clear they're not just keeping up with the competition they're defining the next chapter of the AI Story the startup world isn't known for quiet Beginnings but h a paris-based company founded by ex Google employees managed to take things to a whole new level they announced a $220 million seed funding round last
year before releasing a single product just think about that $220 million with no product on the market that's not something you hear about every day then not long after three of the five co-founders left citing operational and business disagre agreements for a moment it looked like the kind of turbulence that could sink even the most promising Ventures despite the shaky start H kept moving forward this week they introduced their first product Runner H and it's already turning heads Runner H is built for what H calls the agentic era of AI where machines aren't just reactive
tools but autonomous problem solvers this isn't about following step-by-step instructions it's about planning reasoning and executing tasks in ways that save time cut costs and open up new possibilities Runner H is powered by H's proprietary compact models which include a 2 billion parameter language model and a visual language model VM these numbers might seem small compared to something like GPT 4's 175 billion parameters but the results tell a different story Runner H's models outperform many larger competitors in efficiency and accuracy especially in Practical and real world applications this is a big deal in an industry
where the trend has been to throw more and more parameters at a problem often at the cost of efficiency Runner H is already making an impact in areas like robotic process automation quality assurance and business process Outsourcing robotic process automation has been around for years but it's often limited by rigid script-based tools that break whenever systems or templates change Runner H handles these shifts with ease automating repetitive tasks like recrui M and onboarding with a single prom this cuts down processes that typically take weeks into moments quality assurance is another area where Runner H shines
testing websites and applications is a tedious resource intensive process Runner H automates it handling tasks like simulating user actions checking page availability and ensuring compatibility across payment methods it adapts seamlessly to changes in user interfaces allowing developers to focus on Innovation instead of constant debugging um business process Outsourcing is another space where Runner H is proving its worth billing workflows especially in Industries like dental insurance can be slow manual and reliant on third-party companies Runner H automates the entire process from fetching insurance plans to analyzing and submitting claims giving businesses control over their operations and
reducing delays the technology behind Runner H is backed by solid benchmarks on web Voyager a test that evaluates an ai's ability to navigate and interact with live websites Runner H scored 67% beating competitors like anthropics computer use at 52% and emergence agent e at 61% what makes this even more impressive is that web Voyager uses live public websites so the performance isn't just theoretical it's tested in real world conditions uh runner H's visual language model the VM is another standout feature it excels at interpreting graphical user interfaces images and diagrams as shown by its performance
on the screen spot Benchmark this test evaluates how well a model can understand and interact with graphical interfaces Runner H's VM outperform much larger models like GPT furo and pixol large from Mistral it's fast efficient and accurate proving that bigger isn't always better in AI the language Model H LM forms the backbone of Runner H's capabilities it's designed for highle decision-making and programming tasks and its performance on benchmarks like human evow and mbpp demonstrates its strength these benchmarks test a model's ability to generate and execute code and hm's results show that it's not just a
strong performer but also efficient and adaptable H has been strategic about rolling out runner h a private beta is now open giving developers access to apis and H Studio a tool for monitoring and editing agent performance right now it's free to use but a pricing model is expected to roll out soon the beta isn't just about testing the product it's about Gathering feedback from real users to fine-tune the system H has already been working with customers in sectors like e-commerce banking insurance and Outsourcing to refine Runner H's capabilities the vision for Runner H goes beyond
the web CEO Charles caner has spoken about Universal automation where AI agents can navigate any graphical interface and can perform tasks autonomously this is a long-term goal but the initial focus on web environments allows H to demonstrate the Technology's capabilities while building toward that larger Vision H's Journey obviously hasn't been without its challenges losing three co-founders early on could have been disastrous but the company managed to stay afloat thanks in part to its strong funding since the initial $220 million raise H has added another $10 million bringing the total to $230 million this funding includes
contributions from high-profile investors like Eric Schmidt Yuri Milner and Xavier Neil as well as strategic backers like Amazon Samsung and uipath the backing of these heavy weights underscores the confidence in hh's approach to AI what sets H apart is its focus on compact specialized models rather than large generalist ones this strategy isn't just about cutting costs though that's a significant Advantage it's also about about creating AI That's efficient effective and tailored to specific tasks in an industry where bigger often means slower and more expensive H's approach is refreshingly pragmatic so H's compact models aren't just
theoretical Innovations they're delivering results in areas that matter whether it's automating tedious workflows streamlining QA processes or enabling businesses to take control of their operations Runner H is proving that small Focus models can outperform larger ones in the right contexts the launch of Runner H marks the beginning of what H calls the agentic ERA this isn't just about improving productivity or cutting costs it's about redefining how humans interact with machines by focusing on autonomy and adaptability H is setting the stage for a new kind of human machine collaboration the potential applications for Runner H are
vast from automating hiring processes to handling complex billing workflows the technology is already showing its value and with plans to expand beyond the web the possibilities are only going to grow Runner H is essentially a statement about what's possible when AI is designed with efficiency and adaptability in mind H's approach challenges the assumption that bigger models are always better proving that specialized compact models can deliver exceptional performance in the right contexts uh as Runner H moves from private beta to to broader availability it will be interesting to see how it evolves the initial results are
promising but the real test will come as more users put the technology to work in real world scenarios for now Runner H is a strong contender in the race to define the next era of AI with its focus on autonomy efficiency and practical applications it's setting a high bar for what a gentic AI can achieve there's a massive breakthrough in AI video generation that's flying fing under the radar it's called The Matrix and honestly it's way bigger than the buzz around it suggests it's not just another new AI model it's actually doing something no one
thought possible creating endless interactive worlds that feel alive and responsive it's immersive Dynamic and practically Limitless yet somehow it hasn't gotten the attention it truly deserves let's change that all right so at its core The Matrix is a foundational World model designed to generate infinitely long highresolution video streams these aren't Just pre-rendered Clips or static scenes this is continuous real-time video Creation with frame level Precision that means every action movement and interaction can be controlled and adjusted as the simulation unfolds it's like stepping into a virtual world that responds to you in the moment rather
than following a predetermined script to understand why this is such a significant breakthrough it helps to look at the challenges faced by traditional video generation models historically creating highquality video simulations has been a massive Technical and financial undertaking the computational demands are enormous and even the best models have struggled to produce content that's both visually realistic and interactive for example models like Sora or Genie might manage short bursts of decent visuals but they can't sustain the quality over long durations or adapt to real-time user input and if you've ever seen how gaming environments are built
you'll know how much effort goes goes into manually designing every texture character and environment it's a process that costs Millions for a single AAA game this is where the matric shines developed by researchers from Alibaba the University of Hong Kong and the University of watero it offers a scalable solution to these problems it combines cuttingedge AI techniques with Innovative design to achieve what was previously thought impossible infinite length video generation that's both high quality and interactive this isn't just a theoretical model it works the system has already demonstrated the ability to generate 720p video streams
at realtime speeds of 8 to 16 frames per second offering seamless Transitions and precise control even in complex environments the backbone of the Matrix is a video diffusion Transformer or dit this powerful framework allows the model to produce continuous Smooth video content without the awkward transitions or brakes that often plague other systems systems to make infinite video generation possible the team developed a method called the shift window denoising process model or sdpm essentially it optimizes the way the model processes video frames ensuring the attention mechanisms are efficiently managed even over long sequences this Innovation allows
the Matrix to generate video indefinitely without running into memory or processing limitations but the real Brilliance comes from How The Matrix handles interactivity using an intera active module it translates user inputs like keyboard commands into realtime actions within the simulation for example pressing a key to accelerate a car or shift its direction immediately reflects in the generated video this isn't a rough approximation it's frame byf frame Precision creating a level of responsiveness that's rare even in traditional game engines and this capability extends Beyond Simple scenarios The Matrix has been shown simulating a BMW X3 driving
through an office environment which wasn't part of its training data that level of generalization is impressive and shows the model's ability to adapt to new situations without additional training to keep everything running smoothly at realtime speeds the model incorporates a stream consistency model or SCM this accelerates the video generation process making it feasible to render high quality simulations on the Fly while there's always a trade-off between speed and visual fidelity The Matrix manages to strike a remarkable balance its visual quality surpasses previous models achieving a peak signal to noise ratio move psnr of around 28.98e
visuals remain sharp and realistic uh the training process behind the Matrix is another technical Marvel it relies on a mix of supervised and unsupervised learning using data from AAA games like fors of horizon 5 and cyber Punk 2077 alongside realworld video footage this dual approach gives the model the versatility to handle both virtual and real world environments with ease it also means that the Matrix doesn't need to rely on extensive manual configuration which significantly reduces the cost and complexity of producing highquality simulations one of the standout features is how the Matrix handles domain generalization unlike
traditional simulators that are uh limited to their training environments The Matrix can adapt to entirely new scenarios it can seamlessly transition from game-based Landscapes to real world settings creating simulations that feel natural and immersive whether it's driving through an urban city exploring a grassy meadow or navigating a desert the model responds dynamically to user input making the experience feel interactive and Alive the implications of this technology are enormous for gaming it opens the door to truly Dynamic worlds that evolve based on player actions imagine a game where the environment isn't predesigned but generated in real
time offering endless possibilities for exploration and interaction for Industries like autonomous vehicle testing the Matrix provides a scalable way to simulate real world driving conditions without the risks or costs of physical testing even virtual reality experiences could benefit creating more immersive and responsive environments for training or entertainment The open- Source nature of the Matrix is another game Cher by making the code data and model checkpoints available to the public the researchers have invited developers worldwide to build on their work this collaborative approach ensures that the technology will continue to evolve with new features and applications
emerging over time but what really sets The Matrix apart is its ability to generalize Beyond its training data it's not just recreating what it's seen before it's creating something new for example it can simulate interactions with objects or environments that weren't part of its training this could mean driving through entirely imagined Landscapes or controlling characters in scenarios that have never been explicitly programmed the level of adaptability is unmatched making The Matrix a versatile tool for a wide range of applications to put the scale of this achievement into perspective The Matrix was trained on a data
set called Source this includes synthetic game data captured using a custombuilt platform called game data as well as realworld video footage the game data platform form uses tools like Cheat Engine and OBS to extract in-game data and align it with corresponding video frames this allows the model to learn precise motion control from labeled data while improving its visual quality and generalization using unlabeled footage the result is a data set of 750,000 labeled samples and 1.2 million unlabeled samples all captured at 60 FPS this robust training pipeline ensures that the Matrix delivers both accuracy and scalability
it's not just generating video for the sake of it it's creating simulations that are precise responsive and Visually stunning whether it's the dust kicked up by a car in a desert or the ripples in water as it drives through a river the attention to detail is remarkable the technical underpinnings of the Matrix are equally impressive with 2.7 billion parameters the model is a Powerhouse combining the strengths of pre-trained video diffusion models with Advanced components like the swin pm and SCM these Innovations enable The Matrix to achieve real-time performance while maintaining the high quality expected from
AAA game environments ultimately The Matrix is a groundbreaking leap in AI simulations bridging simulated and real world environments with infinite length video real-time interactivity and unmatched adaptability Beyond gaming it's shaping the future of interactive media from training tools to game design and storytelling mistol AI has just unveiled something that demands attention with the release of pixol large a 124 billion parameter multimodal model and major upgrades to its lchad assistant mistol isn't just making updates it's delivering tools that push AI capabilities to the Forefront from interpreting complex data to generating highquality visuals this new wave of
innovation shows how serious mraw is about standing alongside the top players in AI here's why these advancements matter and what they mean for the future of AI all right so pixol large is a 124 billion parameter multimodal model multimodal means it can seamlessly work with different types of data text images charts and more something that's increasingly in demand models like this need to handle various formats with ease whether it's interpreting a complex chart analyzing a document or generating insights from an image what sets pixol large apart is its performance it's built on mraw large 2
a Transformer model already recognized for its efficiency and capabilities and adds even more functionality to the mix the benchmarks speak for themselves on math Vista a test that measures a model's ability to reason mathematically with visual data pixol large scored 69.4% outperforming open ai's GPT 40 and Google's Gemini 1.5 Pro in document analysis its results are even more impressive on doxy vqa a benchmark for understanding visual documents it hit 93.3% making it one of the most effective models in its class it's also highly competitive on vqv 2 a standard for visual question answering these are
technical Milestones but they highlight something important mistol isn't just aiming for broad capabilities it's targeting real world applications with Precision now the design of pixol large is just as fascinating it combines a 123 billion parameter multimodal decoder with a 1 billion parameter vision encod a setup that allows it to handle diverse tasks without compromising quality it's 128,000 token context window means it can process extensive input up to 30 high resolution images or an entire 300 Page book at once that kind of capacity is definitely impressive but it's also practical for tasks that require large-scale data
processing and the open weights make it accessible for research and experimentation this isn't something every company offers and it lowers the barrier for smaller institutions and independent developers who want to innovate without being constrained by costs what's particularly smart about mol's approach is how it integrates pixol large into its existing ecosystem developers can access it through their API download it on platforms like hugging face or use tools like the VM library to integrate it into their workflows the models modular architecture makes it adaptable for a range of specialized tasks from Medical Imaging to financial document
analysis is it's built to be versatile and that's going to open doors for a lot of applications that go beyond traditional AI use cases then there's lchat mistral's AI assistant platform which has also received a significant overhaul now lchat is actually shaping up to be a direct competitor to platforms like open AI chat GPT the updates make it far more than a conversational tool it's now capable of integrating text vision and interactive features in ways that make it a productivity Powerhouse the platform now includes a web search feature that not only pulls in real-time data
but also provides Source citations for transparency this addition aligns with the growing demand for accountability in AI generated content there's also a new canvas tool an interactive workspace where users can create and edit content directly within the chat interface this feature isn't limited to text it extends to presentations code mockups and more it's designed to handle creative tasks efficiently without the need to regenerate respon es or start from scratch another standout feature is its ability to process complex documents and images thanks to pixal large leat can analyze PDFs containing graphs tables equations and other visual
elements this isn't just about summarization it's about extracting meaningful insights from dense data heavy files imagine the possibilities in fields like Academia Finance or legal work where processing large volumes of information is part of the job lchat also now includes image generation capabilities powered by flux Pro a model developed by blackforest Labs users can create highquality visuals directly within the chat interface which adds another layer of functionality it's a nod to the growing trend of integrating image generation into AI platforms something open AI has done with DOL E3 but what makes this unique is its
seamless integration into a broader Suite of tools making lechat a One-Stop platform for diverse tasks on top of that lechat introduces task automation through customizable agents these agents can handle repetitive processes like summarizing meeting notes scanning receipts or processing invoices it's a feature aimed at businesses looking to save time and streamline their workflows and during its beta phase all these features are free which is a clever move to attract users and build a loyal base mist's approach to AI stands out because it prioritizes practical accessible Innovation over flashy promises the company isn't chasing ing The
elusive goal of artificial general intelligence instead it's focusing on creating tools that users can Implement immediately in real world scenarios this philosophy is reflected in its recent funding success mistra raised $640 million a record setting amount for a European AI startup despite the significant Capital the company has been Frugal focusing on delivering value rather than burning through resources now pixol large and lechat are just the latest in a series of strategic mov moves earlier this year mistra launched a free service for developers to test its models and an SDK for fine-tuning them it's clear the
company is building an ecosystem designed to support a wide range of users from Individual developers to large Enterprises that said there are areas where mistra still has Room to Grow for instance it hasn't yet ventured into advanced voice and audio processing a space where competitors like open aai and Google are making strides but that's not necessarily a drawback by focusing on text and vision mrr is carving out a niche where it can Excel without spreading itself too thin one of the most intriguing aspects of mistral's position is its potential role in the geopolitical landscape of
AI with us based companies dominating the field there's a growing need for Alternatives that aren't tied to American interests mrr as a European company offers a viable option for organizations looking to diversify their Reliance on AI providers this highlights the importance of technological advancement alongside maintaining digital sovereignty and the freedom to operate autonomously in a fast changing global environment the technical achievements of pixr large and the enhanced capabilities of lchat are impressive but they're part of a broader strategy Mistral is showing that it's possible to compete with the biggest names in AI by being smart
efficient and user focused the focus is on creating Reliable Tools and making them widely accessible rather other than prioritizing the highest parameter count or the most eye-catching features the updates to lchat and the release of pixol large are a testament to what mraw has been building this is a company forging its own path and reshaping the standards of accessibility and practicality in the AI space whether it's a developer fine-tuning a model for a nut application or a business automating complex workflows mrr is creating tools that adapt to the user not the other way around the
AI landscape is crowded and the competition is fierce but mraw is proving that there's room for Innovation outside of Silicon Valley with its focus on multimodal AI practical applications and open accessibility it's a company that's not just following Trends but shaping them this is what makes mrr worth watching Not Just For What It's achieved so far but for what it's likely to accomplish in the future thanks to groundbreaking AI advancements a healthier future is no longer Just a Dream it's becoming a reality and deep learning is no longer just a tech buzzword researchers at Washington
State University have created an AI model that's transforming how we detect and understand diseases by analyzing massive High detail tissue images with speed and precision that humans can't match this AI has the potential to catch illnesses earlier diagnose them faster and make treatments more effective with tools like this we're not just making medical breakthroughs we're creating a world where Better Health is within reach for everyone this deep learning AI is a GameChanger not just in its ability to analyze tissue slides but in how it manages to do it with remarkable Precision the usual process of
diagnosing a disease is where Pathologists might spend hours meticulously going through slides under a microscope annotating them and double-checking results with a team to avoid errors now think about the same task being done in a fraction of the time with fewer mistakes that's what this AI is achieving tissue images that would take days or even weeks to process are now being analyzed in minutes literally all while catching abnormalities that could slip past the human eye the model is rooted in deep learning using a convolutional neural network or CNN this technology is modeled after how the
human brain processes visual information unlike traditional machine learning which works off predefined rules this AI adapts and improves through a process called back propagation if it makes an error it adjusts its internal Network to avoid repeating the mistake the backbone of this system is efficient net V2 one of the top CNN architectures it's powerful efficient and optimized for high performance tasks the researchers at Washington State University trained the model using tissue images from epigenetic studies a field that looks at how external factors like chemicals influence Gene Behavior over Generations they focused on tissues like the
kidneys ovaries prostate and testes what makes this especially impressive Rive is how they handled the massive size of these images a gigapixel image contains billions of pixels far too large for any computer to process as a single unit to solve this the researchers developed a method called pyramid tiling with overlap or PTO now PTO is where the real Innovation happens instead of analyzing the entire image at once which would be impossible the AI breaks it into smaller tiles it analyzes each tile individually while still understanding how it fits into the larger pict Pi like examining
pieces of a puzzle while keeping the whole image in mind this tiling method is brilliant in maintaining spatial awareness it ensures that the AI doesn't lose context which is critical when looking for subtle signs of disease that could otherwise go unnoticed traditional methods that rely on sampling sections of an image risk missing critical details and this AI eliminates that problem by scanning every pixel so the team tested it rigorously and the results were groundbreaking the AI not only matched but surpassed human level performance its accuracy measured by an f-score was above 0.99 for multiple tissue
types which is near perfect in comparison human experts while highly skilled often struggle with consistency especially when dealing with large data sets what's more this AI processes the entire tissue slide not just selected sections offering a level of thoroughness that manual methods can't achieve one study used this AI to analyze the effects of the chemotherapy drug if phosph on rats across Generations the manual analysis of these slides took five people over a year to complete but the AI completed the same work in 2 and 1/2 days fully annotating over 700 images and it provided more
detailed insights it identified subtle pathological changes that correlated with disease frequency in the experimental group something the manual method wasn't equipped to handle as effectively this speed and precision mean researchers can now tackle projects that were previously two time consuming or labor intensive what sets this AI apart from other models is how it handles imbalance in pathology data in most data sets healthy tissue vastly outnumbers disease tissue this can skew the training process making it harder for the model to learn effectively the researchers overcame this challenge using Advanced Techniques like bootstrap aggregating or bagging by
creating multiple data sets and training the model on each one they ensured it could generalize well without over fitting to the majority class now beyond the lab the potential applications for this technology are immense in clinical settings it could transform how diseases are diagnosed for example cancer detection could become faster cheaper and more reliable pathologist wouldn't be replaced but supported by a tool capable of handling tedious repetitive tasks while flagging areas of concern for closer examination this could significantly reduce diagnostic errors and improve patient outcomes also the model is is being used in veterinary medicine
analyzing tissue samples from deer and Elk its versatility is one of its greatest strengths as long as annotated data sets are available the AI can be trained to analyze virtually any type of tissue but the researchers also put it to the test against existing systems and human experts it consistently outperformed them in both speed and accuracy popular architectures like unet and seg former while effective in smaller scale tasks struggled to handle giga pixel images these models often sacrifice either speed or Precision due to memory constraints in contrast the sa hcnn model as the researchers call
it tackled these challenges headon with its efficient tiling and training techniques and it wasn't just good it was better even when applied to external data sets the AI delivered outstanding results for instance it was tested on data sets of K9 breast cancer and human coloral polyps in every case it either matched or exceeded existing benchmarks this kind of adaptability is rare in AI systems which are often highly specialized and struggle outside their initial training environments the model's ability to generalize across different types of tissue and pathology data sets underscores its robustness essentially this is a
tool that can revolutionize research speeding up studies that would otherwise take years epigenetic studies for example could uncover links between environmental exposures and long-term health effects much more efficient ly in Wildlife this AI could monitor diseases in populations like deer and Elk providing early warnings that could prevent outbreaks from spreading to humans now technically this system represents a culmination of some of the best advancements in deep learning and image analysis efficient net V2 the backbone of the model is a state-of-the-art CNN known for its balance of power and efficiency the pyramid tiling with overlap method
not only makes it possible to handle gigapixel images but does so while maintaining spatial awareness the training process which included Dynamic parameters and real-time tile generation ensured the model was robust and capable of handling diverse data sets this AI processes images with remarkable speed while also uncovering deeper insights that often go unnoticed by analyzing every pixel and keeping the bigger picture in focus it identifies patterns and connections that skilled experts might Overlook in one study it highlighted a significantly higher rate of kidney disease in experimental groups exposed to specific chemical a detail manual analysis barely
hinted at this ability to delve deeper into the data is opening new possibilities in medical research Paving the way for discoveries that could transform how we understand and treat disease looking ahead the integration of AI like this into medical workflows isn't just likely it's inevitable with its unmatched efficiency accuracy and scalability this technology is becoming an essential tool in Diagnostics and research rather than replacing pathology ologists it enhances their capabilities allowing them to achieve more than ever before as more data becomes available and the AI continues to evolve its influence on medicine is poised to
expand even further this is more than progress It's a transformation of what's possible it's about turning advanced technology into tools that create real measurable change the work at Washington State University is a powerful example of how Innovation and collaboration can drive breakthroughs that redefine our future this AI marks the beginning of a new era one where diseases are detected earlier research moves faster and lives are saved with precision and efficiency like never before that's the story behind this amazing AI from Washington State University it's exciting to think about how it could change healthare for the
[Music] better so Microsoft is rolling out some big changes to two of its oldest most iconic apps Notepad and paint both now equipped with AI capabilities and this is a full-on integration of advanced generative AI that brings some prole editing features right into the heart of Windows 11 all right first let's get into notepads update notepad which has been around since 1983 has always been the simplest most stripped down text editor for decades it's been a go-to for quick notes code or plain text documents without the distractions of a full featured editor but with AI
Microsoft is turning notepad into a smarter tool Microsoft calls this new feature rewrite now you can highlight text rightclick and select rewrite to see alternative versions of your text this is more than just automated rephrasing rewrite in notepad is designed to give you options for length tone and even specific adjustments based on how you want your message to come across for example maybe you have a paragraph that's to wordy you could use rewrite to condense it without losing the original message or if you're drafting a more formal email rewrite can help make it sound polished
the AI generated versions give you three distinct choices if those still don't capture what you need there's an option to retry and get more variations the tool preserves your original text in the rewrite dialogue so you can revert if the Alternatives aren't quite right in terms of convenience this eliminates the need to jump into other apps or web services just for minor edits it's all right there within notepad since rewrite is a cloud-based service you'll need to sign in with a Microsoft account to use it this makes sense as the AI processing happens on Microsoft's
servers right now it's only available in select regions for users in the windows Insider program including the US UK France Canada Italy and Germany additionally Microsoft 365 subscribers in Australia New Zealand Malaysia Singapore Taiwan and Thailand can use AI credits to access rewrite this setup hints that Microsoft is exploring AI monetization options for personal and family plans and the fact that it's using AI credits indicates that it could be a payper use feature in the future beyond the windows Insider program Beyond rewrite Microsoft has also sped up notepads launch time this update is expected to
make notepad open around 35% faster for most users with some reporting a boost of up to 55% this performance enhancement complements the AI feature making notepad feel snappier and more modern now paint has also received a major upgrade with two new features generative fill and generative erase paint like notepad has been around forever since 1985 and is known for its Simplicity Microsoft's adding a layer of sophistication that transforms paint into something closer to a photo editing tool with generative fill users can insert new elements into an image with just a description here's how it works
select an area with paint selection tool pick the generative fill option and type in what you want to add let's say you have an empty background and want to add a mountain or maybe some clouds in the sky the AI will generate options based on your description and blend them into your selection you can cycle through different options using Arrow buttons and if the initial choices don't look right you can refine your selection or adjust your prompt this feature is currently available only on Snapdragon powered co-pilot plus PCS Microsoft is clearly testing out this AI
with more advanced Hardware to make sure performance remains smooth and lag free copilot plus PCS have the processing power to handle these intensive tasks locally which makes sense given the high demand that AI generated visuals place on Hardware it's possible that Microsoft will eventually optimize generative fill for other systems but but for now it's in a controlled roll out generative erase is the counterpart to generative fill instead of adding elements it lets you remove them and then uses AI to fill in the background so it looks like the object was never there here's an example
if you have a photo of a street scene with a random car you don't want generative erase can get rid of it and blend the empty space with the surroundings as if the car was never part of the image this kind of feature is typically found in much more advanced photo editing software so seeing it in paint is surprising to use it you select generative erase brush over the part of the image you want removed and then click apply if you need more Precision there are tools to adjust the brush size or use rectangular and
free form selections unlike generative fill generative erase is accessible to all windows 11 insiders which suggests Microsoft believes it's already optimized for General use now Microsoft has also updated two additional features within paint co-creator and image Creator co-creator uses a diffusion-based AI model which essentially builds up images by layering details creating faster and more accurate results this model is restricted to Snapdragon powered co-pilot plus PCS and is built in moderation ensuring content generated with AI meets Microsoft standards for appropriate and quality output image creator has also been expanded to more regions now including the US
France UK Canada Italy and Germany this feature lets users create entire images based on textual ual prompts similar to how generative models like doly or stable diffusion work it's a huge addition for casual users who may need to generate images quickly without learning complex software Microsoft 365 subscribers in Australia New Zealand Malaysia Singapore Taiwan and Thailand can use AI credits to access this feature making it another example of Microsoft's AI credit system in action for now these features are limited to Windows Insiders on the canary and Dev channels running Windows 11 Windows 10 users won't
have access and even those on stable Windows 11 releases will need to wait until a broader rollout this is a strategic move by Microsoft as it allows them to monitor feedback and make refinements before these features hit the mainstream these Insider channels often get the first taste of Microsoft's experimental features which is especially relevant for users interested in testing the latest tools by keeping the roll out restricted Microsoft gathers valuable data from a select group of users adjusting performance fun fun ality and even pricing models based on their feedback the decision to add AI into
paint and notepad apps that traditionally cater to basic tasks signals Microsoft's larger strategy of democratizing AI tools across its ecosystem there's an accessibility angle here too bringing Advanced AI powered capabilities to apps that people have been using for decades without needing expensive software these tools lower the barrier to entry making highlevel text and image editing possible for casual us users students and nonprofessionals who wouldn't normally pay for specialized software this strategy also serves to show off the power of Microsoft's cloud-based AI Services building trust and familiarity among users many users may find themselves using AI
powered rewrite or generative fill without even realizing their leveraging Cloud AI a clever way to integrate Advanced Tech into everyday life without overwhelming users with tech jargon or complex learning curves as AI becomes integral to productivity tools Microsoft's approach suggests a future where even the most basic built-in applications are empowered with AI it's a signal that AI is no longer just for specific apps or Industries it's part of the core Windows Experience notepads rewrite for example may help streamline workflows for students writers or professionals looking for quick and seamless text edits paints generative fill and
erase on the other hand could be invaluable to content creators casual designers or anyone interested in creating or modifying images without technical skills or software Investments now Microsoft's integration of AI into notepad and paint is just the beginning the company is clearly planning to roll out more advanced AI capabilities across its ecosystem these new features in notepad and paint are about more than enhancing old apps they're part of Microsoft's strategy to make AI a seamless part of everyday Computing with AI advancements like this the distinction between basic and advanced apps is starting to blur Microsoft's
tools are moving towards Wards a future where even the most ordinary apps have extraordinary capabilities this approach benefits everyone from students drafting essays to designers experimenting with image layouts by providing powerful AI tools within a familiar interface so how do you feel about these updates are we stepping into a new era of AI and everyday apps drop your thoughts below and hey if this deep dive was helpful make sure to hit that like button and subscribe for more Ai and Tech insights thanks for tuning in and catch you in the next one
Copyright © 2024. Made with ♥ in London by YTScribe.com