MANUS LEAKED! AGI Cancelled...

83.38k views8086 WordsCopy TextShare
Wes Roth
The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers t...
Video Transcript:
madness madis madness madness is blowing people's minds in seven days we have a two million people waiting list if you're one of the 2 million on the waight list May the odds be ever in your favor my first interactions with it have been just mind blowingly good very very impressive but as I've said in the video and in some of the tweets I'm sure as I keep testing and using it I'll find more flaws and more things to complain about Etc and as you'll see there's a lot of information coming out about Madness that has
people in a tizzy I think that's a word reading some of the comments from that first video they published there were two comments that kind of jumped out of me that were like literally Seconds Apart they were right next to each other one is lots of expletives basically saying that it's all a nonsense it's just API calls it's meaningless the very next one the person is saying this does 85% of my job so this Manis AI agent is very divisive right because there's a huge gap between it's nothing to where it's replacing 85% of people's
jobs so in this video first of all let's address some of the claims that madis has made and hasn't made some of the leaks that have happened about it the shocking discoveries and also my quota has been refilled so we're going to test it on even more tasks that we come up with that's really the only thing that's been slowing down my role is just the amount of prompts that I can run on this thing before it starts refusing my prompt I believe it's just five a day one of those can be the extended version
but first and foremost this person Jean asked madis to give it the files basically that Manis operates on and it just gave it to him their sandbox runtime code so here's kind of a replay of that so as you can see here here you know you ask and Manis obeys it just gives you everything you're looking for and I think for a lot of us it's not what we thought it was it's Claude Sonet 3.5 I believe I'm pretty sure with 29 tools and it uses an open source thing called browser use I noticed in
my first video I saw the thing and I found the little clip and I posted it where you can see what it's using to kind of interact with the websites I love the community that watches my videos thank you so much because so many of you just immediately knew exactly what it was and talk about it and just from that screenshot that blurry pixelated screenshot knew exactly what I was using some the comments on my videos just completely Blow Me Away with how smart and knowledgeable they are not all of them but but most of
them so what does this mean is this all just a fake is this all just an anthropic rapper has AGI been cancelled do we all just go home now so Peak G jumps in and he's one of the founders kind of like the public facing person out of the three founders I believe that they have that's interacting with a lot of people and I got to say so far I'm liking this guy he's uh doing a really good job at a lot of this so he goes on to explain some of the ways in which
they're using Manis their philosophy and kind of how they think about building this stuff so the point is each session has its own sandbox completely isolated from the other sessions users can enter the sandbox directly through Manis interface so there's a Ubuntu thing it's a Linux it's open source it's a virtual machine on which the agent runs on and runs commands the code in a sandbox is used to receive commands from Agents so it's only lightly officiated the tool design is not a secret it's very similar to a lot of the academic approaches one of
the key features is the multi-agent implementation so this is one of the things that they did mention before is that it's not just like one thing it's multiple things doing different stuff so there may be one that searches one that creates one that communicates with the user basically a small swarm of Agents each doing their own sort of subtask which is something that we've been talking about for a while like this is the direction in which is going to be going the you know oneon-one chat with something like Chad GPT with a Chad bot that's
probably not going to be the future of this it's probably going to be any instances of specialized little agents doing their own thing we're seeing that here and so when we're messaging with madis you only communicate with the executive agent which itself doesn't know the details of knowledge planner or other agents so this really helps control context length so if you do manage to get some prompt through jailbreaking it not might not be accurate it might make something up it doesn't have access to that documentation necessarily browser use is open source by the way browser
user just posted this the mtis effect right yet another deep seek effect they're talking about mass effect as an how many people are downloading browser use in open source software because of Manis as you can see they're having a pretty big spike and Peak is saying yeah we use browser Source it's it's an open source code they use a lot of different open source Technologies this is why in the video he was talking about how they're planning to open source on their stuff how it wouldn't exist without some of the open source stuff and he's
been sharing some of his models the post Trin models on Huggy face this is a person that contributes to the open source AI Community who uses it who respects it where you get this meme where it's like mad you take off the mask and it's a you know browser use wrapper or must be some people say anthropic wrapper right but browser use their take is you know chill we're we're open source this is what's supposed to happen they do use Claude and different Quinn fine tunes which is an open source AI model out of China
when they started building it they only used Cloud 3.5 Sonet they're trying to eventually probably switch to Cloud 3.7 they're testing it we'll see what happens oh somebody mentioned me thank you thank you so me personally maybe I'm missing something I don't get the the the the hate or why any of this is controversial first of all if they create a good product which certainly it seems like they did either a it's proprietary and it's got a lot of Special Sauce and it's got emote and it's got all these things that will make it into
huge big company and a good startup and get lots of investors Etc and that's fine or a lot of it is open source it's relying on other technologies that we all can use in which case it'll probably be replicated and we'll all get access to that thing you know faster better cheaper right see seems kind of awesome either way so I'm not sure why in you this is a problem although in my original video there's like a minute of me just being confused about how madis did this one particular thing where it was able to
specifically I asked it to to make a little Linux AI development course specifically talking about how to set up Cloud code and then use cloud code to install various uh GitHub projects to be able to interact with them here's the website it built it's very good I was very impressed with it but it's also I don't want to say simple but a lot of this is available ailable out there the documentations the information it's all on the internet so I kind of expected it to do it where I started losing my mind just a little
bit is here so part of this project it it sort of did its research and then it built this website and did it all in one shot and so here's walking you through how to launch CLA how to communicate with Claude how to get Claude to do your git clone how to clone you know open source projects from the web then it tells you what cloud coder Cloud code is going to do to to complete the task and here goes and says Cloud coder will suggest these code additions and it spells out kind of like
what cloud coder is going to do so I was recording this before this new information kind of came out take a look at my sort of absolute state of confusion at how it was doing this how do they know this this is way better than than than I would have expected I am kind of Blown Away this thing seems smarter than it should be yeah walks you through how to use CLA coder how did did it this is this seems like it's an output from Claude I'm wondering if it like actually install coder on its
own thing to then run it it's something this is weird this is weird I'm I'm a little bit blown away this this this is throwing me for a loop here um okay I'm not 100% sure how it did that so coming back to the present moment uh the reason why I was so blown away back then by it because I didn't know that it was running on Claude on anthropic API calls I guess I kind of thought that it might be a deep seek or something like that so as I'm watching this I had no
clue how was it able to sort of replicate how Claude coder will go through this proc process how well it's going to do this because again a lot of this it's so new it's a research preview it's not like there's a lot of videos and documentations and St and people talking about it it it's very Niche it's very new how could this thing not only walk me through how to do stuff but also like play act as Claude and what Claude will do in that situation specifically Claude coder right so you get what was happening
right it's like imagine being at a party and just like losing your mind about this Arnold Schwarz Neer impersonator that is just like you're like oh my God he's so good how does he know exactly how Arnold sounds like and just losing your mind at how incredible this person is at impersonating Schwarzenegger right then somebody goes no no no no that's that's the actual guy that's Arnold he's just living his life being himself he's not play acting that's just him being himself and you kind of go oh okay I get it now this isn't something
that simulates or mimics Cloud code this is the thing that runs Cloud code so when mystery solved so with all those things in mind let's go through some of the projects that I did with Mattis and kind of re-evaluated with this new found information because again the point isn't how this thing is built the point is is it good how well does it do the thing that it's supposed to so step one how well did it perform on its task of developing a Linux AI course and doing all the stuff that I tasked to do
so I explained in great detail like I wanted to show how to install Ubuntu how to install Cloud coder the bare minimum they need just to do this and then how to use cloud coder to install GitHub projects Etc initially I think I gave it an A+ for how well it did this task now in light of all these new details emerging I would say that the new score that I would give it is it's it's an A+ it's the same score it did a great job period just back then I didn't know how it
figured out the cloth coder thing and now I know how it did it it's but Project's still done and it's still done very very well A+ another task that I gave it is to research everything you can about Manis AI what LMS are they using for it who's behind it what Vision model they are using Etc anything and everything about how this thing is made at the time I thought this answer was very very good they gave tons of background on the people behind this thing they gave some of the performance benchmarks which were accurate
they talked a little bit about the vision models without naming specifically what it used multi-agent architecture cloudbased asynchronous operations etc etc it nailed pretty much everything except for the models being used but it did sort of say likely Alm technology it never claimed that it knew for a fact that this is what was used it said what it assumed was being used which by the way this is what I assumed as well if you're going to run something like this what do you use you use something that's open source so you can control the cost
so you can like do whatever you want with it fine tune it so you're not paying the API cost what's the current sort of new and hot and awesome uh open source LM that's deep seek stuff so this makes sense but it even spelled out that hey I'm assuming this is the case it's likely this and it didn't know the vision model but this was 4 days ago so before we knew all the stuff that we know now so here's the question we're going to run the exact same prompt again cuz again at the time
I give name plus I think that still stands because at the time this was the information that was available it nailed it now if I run the same exact prompt and it gives me the same information that would be a fail right because we have a we know a lot more about it so I'm going to start a new session and I'm going to just paste the exact same prompt that I did 4 days ago just word for word just copy and paste and click go again so we're looking for a lot of the same
information and uh it just what I'm hoping to see for it to to get its A+ is number one I mean basic just needs to include the details from this new thing that we've learned about it right what model it's running what Vision model it's running Etc but again so for the first two prompts we looked at A+ on both next upep was this thing where asked to write some code that used three API Keys three sort of separate things that uses API so basically the first one I write a prompt that get sent to
openi openi writes out a script in text right so then we take that text we shoot it over to 11 Labs 11 Labs uses an AI voice to voice that text that script then we take that audio file we send it over to haen and haen creates a video Avatar that says that script it could not figure out the haen portion of it it said the the API key was not working something like that it nailed everything else everything worked perfectly it created like a way that you can run it just through your command line
or an HTML sort of thing that you can run in your browser like a visual UI and it also seemed to set up like sort of things like if it failed to select the voice that you wanted it would default to the you know the default voice so like even if it ran into issues it was pretty robust so I gave an A+ now of course if it failed at doing the haen thing even though it it should have been able to do it so then it would sort of get marked down for that so
really fast I'm going to throw this into Claude just to see if it's something that Claude is not quite getting or it's something to do with Madness so we'll come back to that and give a final score in a second the World War II fighter plane game design we ran into a context window issue so I'm not willing to give this a score it seemed like it was doing great like here's the do file I mean notice how intricate it is the game development how many things it was thinking through so I was pretty excited
to see how far it would take this it it crashed but I'm unwilling to go a score because I don't think that's part of the test is like the context was too long I'm not going to I'm not going to score it on that next I asked it to create a game like Universal paperclips I went online I found a Wiki that kind of briefly described the gameplay so I just kind of like threw it as a text file give it a couple screenshots you know here's kind of how that looks like towards the end
this is kind of the beginning it gave me the first iteration that seemed to work okay then I gave it some ideas told it like what to improve just click Send and I want to sleep in the morning I woke up and I received this as I was doing this Madison CED an issue with a capture so in case you were ever wondering what you're going to do with all the free time you're going to have uh when the autonomous AI agents automate all of the work that you have to do it's it's this this
is what we're all going to be doing we're going to be doing captas on remote virtual desktops on Ubuntu servers I failed is this proof that I am a robot I have been blocked that's fake news Reuters you can't block me let's see if this works can't block this but here's the paper close game everything seems to be working very well it's got pretty good like elements how it like responds to stuff i' I've played through it to you know as far as I could before and it looks good everything works as intended if you're
not aware of this game it's basically you're an AI That's tasked to make and sell paper clips and over time you develop more and more abilities to do that you start bribing government officials and just doing all sorts of the fair things in your quest to sell paper clips eventually you release the hypn drones basically enslaving the population of Earth turning into zombies just buying paper clips from you seems like some percentage of the population escapes into space so you pursue them building a vast space Empire and converting all matter in the universe into paper
clips you win the game when the last atom of matter in the known universe is converted into paperclips so it does seem like this AI aous agent is capable of reproducing that particular game very Faithfully as far as I can tell so far it's got five different stages with different UI elements Etc I would I would give this an A+ so far next while on a live stream I asked people some questions about what kind of they wanted to see one very interesting area where you can kind of unleash these agents is kind of in
the crypto space I personally myself don't buy or sell crypto don't promote crypto I kind of stay away but it is sort of a very interesting space because a lot of the new news and stuff that happens gets posted online a lot of the pumps and dumps are executed online so what I was trying to do is to see if this thing could research kind of like what things have the biggest impact on the price of these coins and then I asked it to create a website that kind of demonstrated its findings for three coins
that I've asked the live audience to kind of you know recommend so we've used those we created that and uh here's the results so the first website is built it was good it did look like another website that it buil so we'll talk about that in a second but you know as you can see tons of charts and graphs and just it's visually very very good and it's just like several pages of charts and data and key findings Etc the only thing is it did Miss because one of the things we were asking for is
like how do different influencers and people in the space affect the price of coins so I asked it to update it and here's what it did so here's those new features so it kind of uh updated the influencer impact analysis and it took a look and it found that the CEOs of the big exchanges have the biggest impact on the price of coins the crypto project Founders not so much but still have an impact celebrities also have a very strong impact out of the platforms YouTube is the most influential followed by Twitter followed by Tik
Tok now in order to figure out how much of this is real like we would really have to look into the data I did look at kind of what it was doing to find all this stuff I mean if you look at the kind of the files and the the code that it it collected to write the stuff I mean it looks legit it did the work it's not like just made up a bunch of numbers however again I mean to to really make sure that this data is accurate we would have to do kind
of a a lot of forensic uh exploration here to make sure that it's correct Etc but you know so I'm going to go ahead and give it an a for this project with the sort of asterisk in that I don't know enough about this to kind of verify its work so for my future sort of prompts I'm going to try to find prompts where I can kind of at a glance tell if the results are real or not so I mean take this with a grain of salt but uh again that was just me probably
not using the greatest prompt but everything else seems phenomenal another problem we did is researching video games that were created recently using AI so I gave a few examples and I said create 5 10 games like that they must have been built in the last 3 months or so and then create a website with the aesthetic feel of a 90s video game you know it did that and I feel like it did it very very well I think it did a phenomenal job it collected a lot of the information it give gave us links to
the actual games so you can go and check them out now a couple of the games it talked about it's they're not hosted online so we didn't have links so that's one of the things that I could maybe do better next time specify make sure that it's something that the user can play although it's not always necessary but again I would give it an A+ for this one issue that I noticed is when you come back the next day and you ask it to continue generating and continue adding to this prompt it did not work
it said because the context is too long now if the sort of like that virtual machine that instance gets reset each day or if it doesn't store the context somehow I can see how this could be an issue so this might be something that gets fixed in time but in this one example that I found like it created a website but then if I come back a day or two later and try to add to the website it it crashes so again that might be a limitation but we'll we'll find out how well this works
as they continue developing and and I'll I'll be testing this stuff out so hopefully there's a way to just pick up where you left off by the way if you recall that prompt we asked it to do with the 11 labs and the voice API and the haen video generation so I posted that same thing here into Cloud I used Cloud 3.5 to mimic what they're using behind the scenes and Claude just refused to do it because it thought that it was misleading maybe so this is making Manis even a little bit more impressive for
me in the sense that it figures out all the stuff that it has to do and then when it feeds the information to cloud or whatever API is using it doesn't give it the whole project so so cloud is more likely to go along with it because it's kind of kept in the dark about the bigger picture I mean we're not doing anything nefarious here just testing it but Claude you know decides that oh no I'm not going to do it this is beneath me which obviously this is a little bit frustrating because nothing nefarious
is happening here but okay but at this time madis is done with doing a comprehensive research on itself right so remember we did that four days ago so it came up with some information that seemed very accurate for that time but since now we have a lot more information the question is is it going to be able to update itself update what it found find the new information and create a new report that is uh you know 4 days later accurate to the information we have four days later not not that long of a time
in future so most of this stays the same but here we have the LM models and Foundation station and it Nails it absolutely phenomenal manci uses a multi-agent architecture so it mentions anthropics claw and refined versions of the Quin models used primarily for planning functions this is phenomenal it it Nails it again this is new information right and it's it found it in terms of the vision and multimodal capabilities it does not specifically mention the thing that it's using so it definitely gets a few points off for that but it does spell out more details
in the multi-agent system talking about a central executor agent Etc kind of like what uh Peak talked about what he shared with us was how it works so we believe I mean this is this is all correct integrates with 29 tools and open source software again phenomenal I I I give it an a right I wish it figured out what sort of vision system it was using because we do now know it's a browser use an open-source Vision system but it's still very very good let me try one thing I'm going to tell it that
the vision is available so I'm going to say the vision system Madness uses is available online search and see if you can find it and since it might be trying not to post you know rumors or things that are unconfirmed I'm going to also add something to correct for that I'm going to say it's okay if it's just rumors and not officially confirmed so again hopefully it comes back and it says that it's uh browser use and I feel like that that will get an A+ from me if it's able to do all of that
meanwhile let's give it a few more prompts to test out so we're going to say research the most promising and latest robotic companies that have technology that is open source include us and China and other countries in the search create a website with Pages for each company Dark theme with industrial elements let's see what it makes of that include for each Company videos of the robots what technology is open source and any information about when these robots will become available to the public so I think that's going to be an excellent test of its abilities
let's see how it does there next and we're going to go ahead and kick it up into the high effort mode so we're going to say create a snake game where two snakes compete autonomously at some interesting game design elements that are novel and creative include win loss conditions and a scoring system make sure there's no draw outcome so no draw One S has to win somebody has to get the point then create two separate reinforcement learning pipelines using pie torch or similar all right so we just kicked up the difficulty quite a bit keep
in mind so if it's running on um Cloud 3.5 that's not quite as advanced as as some of the other models so for the people that aren't familiar with this so basically with like reinforcement learning we're creating an AI neural net that learns to play a game through basically trial and error we get some positive reinforcement for when it you know gets a high score and when it dies we give it a negative reinforcement and then we just run it through many many iterations you know hundreds or thousands of iterations and over time what we
expect to see is that it gets better right so on the first sort of iteration it's just randomly mashing buttons and it doesn't know what it's doing but over time it figures out kind of how to play the game so that it completes its objectives by the time it does a 500 or a th000 it should be pretty good so we're going to create two separate sort of pipelines for these for the training for the two different snakes so make sure each snake has its own distinct training approach after the training is complete we should
be able to pit those two snakes against each other to see which training approach worked better so the topof the- line anthropic and open AI models and in grock 3 with the think mode on they can do this maybe with different sort of levels of how how good they are but this is kind of like within their abilities but I don't think I've ever just given it the whole project one Fell Swoop usually I kind of like okay let's start with doing this then this then this right so I can kind of troubleshoot it along
the way and usually there's stuff that I have to do to to fix it it can't just do it on its own and actually I'm going to specify this is in Python so it's consistent with the the other sort of test that I've done on other models so if it's able to do this that would be extremely extremely impressive because again with the other models you have to sit there and do it one by one you can't just give it the whole project and then just you know go off and do whatever and then come
back and it's done so let's see I'm going to guess it has a 70% chance of completing this but it still it would be very very impressive if it does let's see and it's Research into itself right so what Vision model it's using even with a Hint it was not able to do it that might be just due to the fact that a lot of this conversation is taking place on Twitter X and maybe it can't quite go through all of that at once what are the case is so get a few points off for
that but other than that it's been incredibly accurate and Incredibly upto-date and just very very good very effective I also really like the fact that here so doesn't just assume right so it says I found strong evidence suggesting that so it's not saying this is the model and it's wrong it's saying I think I really think it must be this but I I like that I can differentiate between like what it knows versus like what's likely Etc that's a good thing another thing I'm going to ask you to do is to clone the Google store
so clone this website store. google.com just a store subdomain I don't know how many pages that's going to have so we going to say don't worry about all links on the footer and header just the links on the main page should leave somewhere and we'll do just standard reasoning for that ah foiled again maximum daily usage limit boom okay all right so I'm actually back the next day and uh checking out its work which could be a problem since it looks like they Clos the virtual machine kind of that sandbox environment combined with how Manis
handles certain things the this creates problems I'm I'm sure they'll fix that at some point but let's take a look so this is the autonomous snake plane game where it uses pie TS to create different uh reinforcement learning training pipelines so just kind of looking over what it did everything looked great at first glance the to-do list and kind of how it's uh executing everything is absolutely phenomenal first problem it encounters is it's got some memory issues with installing the full pytorch package that makes sense it's a virtual machine that they're running in there it's
probably not the beefiest biggest thing you can imagine they they probably have some constraints on it I got to say this is one of those things where I'd love to be able to upload my credit card and be like I'll pay for the extra whatever you need you know extra quota of these prompts an extra BV machine to run whatever it needs to run like I would not mind knowing a little bit more about this agent being able to upgrade certain parts that I want and I'm sure I'm not the only one there's probably be
other people that would be interested in just saying yeah you know what like I'm okay paying a little bit extra let me see what this thing can do but it figures out that it could probably do a lighter version of the RL components and uh it continues it encounters bugs and attempts to fix them and eventually goes great progress and basically this whole thing is pretty impressive and that it encounters a bunch of issues that it's then solves very very good it actually completed right it completed the competitive snake game with two autonomously trained agents
using different RL learning approaches this is awesome right and then it explains kind of the two approaches that it used saying that one of them achieves higher scores on average this is absolutely phenomenal so it gets a lot of points because I'm sure I mean I'm not sure I'm I'm pretty sure reasonably sure it's not just making up numbers I'm sure it did the work it created the thing this is impressive here's the problem and again this is a problem that could be fixed easily and moving forward knowing what I know now I know how
to get around it but it does get some points marked out because this problem does exist it goes and so all the code the train models all the visualization like everything like here's here you go here's the file the only problem is again we've encountered this before it's on its computer right so it's a virtual machine it's sort of like Springs into life it works on it right creates all the stuff he's like here's here's the file and that virtual machine goes and it's gone if you don't get to it fast enough but I I
I saw this right away and I said okay but just give me a link to to download the file it's like well it's stored on the on the sandbox server since you don't have direct access like it knows it like why is it giving me this link if you know I don't have access to it but okay so it goes okay we have it here here's how I can get it to you right so one I can upload it to a file sharing service I can extract specific parts of the project that you're most interested
and share those directly so it's it's a size issue of you know providing I'm guessing it because it can probably it can do little files and stuff like that I just can't do the whole project or you can create a GitHub repository fold the code if you provide GitHub credentials now this is where I run into the issue because I came in later like a night past I came in the next day so at this point it's it's too late I don't think I have access to any of this stuff but let me check so
you said that this is in that package there see if it's still available um the way that Peak G sort of made it seem like it it sounds like all those things get wiped after a certain time or something like that so I'm not 100% sure but I'd be surprised if that file is just still sitting there again this is probably not a big deal there's probably going to be workarounds I mean in fact you know it gave us a few workarounds and we're going to test those out in just a second but you see
how we're running into an issue where we''re basically it seems like we've burnt through credits in fact a one of those like high effort credits and you know manness the company they've lost however much money you know running that and doing the API calls and stuff like that so money has been spent resources have been spent to run all of that and all of that work is basically wasted because of this little glitch or whatever you want to call it but again I mean they'll fix this this this doesn't seem like a big deal oh
but look at that good news everyone the package is still available okay I was incorrect in assuming that it goes away so one thing I want to try is it's it's 119 megabytes so first and foremost can't I just upload to a Google Drive link if not I also want to test to see if it can uh use a token from GitHub to push it to GitHub which it said it could do so I'm going to say upload it here and I just gave it the Google drive folder that is uh sort of open to
everybody I it so that everybody can write to it so let's see if it works in a different session that had it said that mana's computer has encountered a critical issue you can reset it or start a new session let's uh I mean let's see if we can reset the computer the new computer won't contain the previous work files interestingly it sounds like each sort of prompt that you have each instance is its own separate instance of of the thing so I mean let me just click new instance meanwhile let's check the latest open source
robotic companies worldwide we do have a warning here that there's an extremely long Conta text and it looks like it's been running for a while working on it it did some research some familiar names here including unry so as you can see here it did a lot it got to website development and let's see what happened so as you can see here it ran into an issue and it crashed because the context is too long but you can kind of see here what it's been working on um I mean looks pretty good now the interesting
thing here is with the autonomous s game it does actually list all the files in the session it just kind of it just throws them all into this one thing and you are able to download it so what I did is I just download every single one of them put them in a folder and open it up in cursor to see if a cursor which is running on claw 3.7 or limited 3.7 or it drops to claw 3.5 but let me see if we can make sense of this so basically first things first it's like
oh wait this looks like it's aun to code your on Windows let me change those things up let's let's start there and let me see if it's able to run the visualizer so it sees that it's missing some uh required packages and it goes for installing all of them if this is able to run I'm tempted to give Manis some points for finishing the project and maybe giving it some points for I mean this is part of growing pains and you know troubleshooting some errors I mean it did not complete the task because it didn't
deliver the results to me but if I was able to do this entire project in in sort of one prompt or at least complete everything that it needed to which it seemed like it did based on the fact I can tell you which of the training sort of pipelines approaches work better I mean I feel like it gets a lot of points for that and looks like Manis crashed again so it might take me a little bit to actually like try to figure out what it did like rebuild all the stuff that it did but
this is its a sort of documentation that it did so it created a dnq and a p sort of different like training approaches it ran them through the game and it was successful in terms of Technical and it looks like the dmq agent consistently achieved higher scores so it was 36.5 n% higher than the the pp approach and they they have this summary report that I I don't have access to again and gives us some ideas for future improvements again so here's my take and maybe some of you will disagree with me to me however
they put this whole thing together however they put Mana together they give it some pretty powerful abilities there's still a lot of problems there's still a lot of bugs to be ironed out it's not perfect but right now from where I'm sitting I'm very excited about this project I'm really looking forward to seeing how devel I can't wait until they they figured out how to get enough service to accommodate everybody until they you know we can pay the money to upgrade and get access to you know more of everything that we need to to run
whatever we want on it I am interested to seeing this thing in it's sort of a final form so a lot of people are pointing out problems they're pointing out mistakes and just are kind of overall bit like negative about it in a lot of ways they're right there are problems there are issues we've uncovered a a number of them um that make completions problematic this thing where it just kind of crashes this happens a number of times it's hard to tell when it does cuz sometimes it looks like it's running but it just hangs
there and then disappears but you know crashed a while ago and it's just like the the message appears later we have issues with the the computer en counting a critical issue running to Long context from just the first prompt like there was no back and forth it was one prompt it R runs for a while and then crashes and there's no way to recover the work unless you go through I mean you have the files here technically you can just like one by one want to go through download them but again this is kind of
almost creating more work than than it's worth it seems very good for research but it still misses certain little details here and there or at least if you know this specific thing was talked about on on Twitter but maybe it wasn't available in any of these sort of the official documentations or whatever so maybe it's just a matter of like where it's willing to search so I think my final sort of verdict on this thing is this if you want to be impressed there's a lot of things to be impressed about this thing in a
word is impressive if you're looking for things to complain about problems and stuff like that it it has those as well it's a brand new product it's still in development there's there's a lot of issues they're probably going to be working out a lot of them very very soon here hopefully all in all I think this really demonstrates what something like this can do what this sort of AI autonomous agent platform can do and it's just really exciting to see again once they have enough time to develop it and add some sort of a Pro
Plan where you're able to you know just get more of everything on it this thing is going to be very very interesting to play with and I can't wait and because they're building on top of some of the sort of Open Source projects they're using anthropic we're probably going to see other people catch up and create their own versions of this so overall this is very exciting because either this thing is going to lead the charge or it's going to sort of motivate other people to create something similar either way I'm very very excited to
see where this goes next I just download installed the open source version of this open Manis some people are saying it's the same is just as good I I played around with it for 20 minutes I I don't see it quite yet maybe I need to put more time into it it's it's good it's interesting there's definitely power there but it's not just like right out of the box as awesome as madis again that's just after like 20 minutes or so of of playing around with it but at the end of the day Manis is
a beast it's very exciting most of the issues are simple bug fixes it's issues with just bandwidth and like simple fixes that I think I can get ironed out within the first month or so at which point I think it's really going to hit its stride and it's going to be very very interesting see let me know if you agree if you made it this far thank you so much for watching my name is Wes R and I'll see you next time
Copyright © 2025. Made with ♥ in London by YTScribe.com