last year a paper out of Stanford put thousands of agents in a fully simulated environment and let them live their lives the agents formed relationships built up memories developed their own personalities it was stunning and this paper allowed us to have a vision of what could be possible with simulating entire Societies or even what the future of video games might be like imagine a video game world where NPCs had actual personality ities Back stories they lived their lives in real time that was the promise of that paper but now there is a new paper by
the same author out of Stanford and it shows that it's possible to actually get real human personalities into these agents to then live their lives in these simulated environments it is mindblowing let me break it all down for you so this was the previous paper generative agents interactive simulacre of human behavior and what they found is by putting all of these AI agents powered by chat GPT into this simulated environment with a little bit of backstory they would develop their own personalities their own relationships they would form friendships they would develop plans for example one
of the agents threw a birthday party and invited all of their friends but not only that those friends invited some of their friends and they coordinated and they showed up to the birthday party it's really incredible to think about but imagine this you could actually put your own own personality into these agents now let me show you the new paper so by the same lead author junsung Park we have this new paper generative agent simulations of 1,000 people and so the gist of what they've done is essentially taken that other paper but interviewed a thousand
people two hourlong interviews of various questions trying to extract the personality of a real human and then place them in the simulated environment listen to this we present a novel agent architecture that simulates the attitudes and behaviors of 1,52 real individuals they not only replicated the personalities of these real individuals but they actually were able to test and prove that those agents behaved and had the same thoughts and personalities as their human counterparts and the results the generative agents replicate participants responses on the General Social Survey 85% as accurately as participants replicate their own answers
two weeks later so what does all of this mean let me break it all down so they use very common social science tests such as the General Social Survey the Big Five personality inventory well-known behavioral economic games and other social science experiments to try to extract the essence of what makes up somebody's personality they then took that 2 hours worth of interview con converted it into memories for these agents to base their own answers on they retested them against the General Social Survey the Big Five personality test and the social science test and they found
that those agents behaved essentially 85% as accurate as the humans did when asked those same questions two weeks later so that is extremely accurate but why would they do this what is the point according to the paper these simulations could help pilot intervention develop complex theories capturing Nuance casual and contextual interactions and in my opinion the most important expander understanding of structures like institutions and networks across domains such as economics sociology organizations and political science essentially we can start to predict how people and organizations and societies will behave without actually implementing something extreme first so
if we have this idea about a completely new tax plan for example and we want to see how people will behave based on this new tax plan rather than actually having to go implement it and then seeing or predicting based on not as accurate methods we could actually set up entire societies of AI agents based on real people and then see how they might react to these massive tax changes today's video is brought to you by mamut mamut AI brings all of the best models together in one place for one price Claude llama GPT 40
mraw Gemini Pro and even gpt1 and rather than having to pay for each of these AI separately you pay $10 to mammut and they bring it all together in one place plus they have image generation mid Journey flux Pro Dolly and stable diffusion again all for $10 models are frequently updated as soon as they're released so be sure to check out mamut for access to all the best models for one low price mamut doai that is m a MM o u t h. AI thanks again to mamut let me tell you how it works we
present a generative agent architecture that simulates more than 1,000 real individuals using 2hour qualitative interviews the architecture combines these interviews with a large language model to replicate individual's attitudes and behaviors by anchoring on individuals we can measure accuracy by comparing simulated attitudes and behaviors to the actual attitudes and behaviors they tested it against the General Social Survey Big Five personality inventory five well-known behavioral economic games like the dictator game and the public goods game and then the prisoner experiment which is funnily enough out of Stanford five social science experiments with control and treatment conditions that
we sampled from a recent large scale replication effort and so how did they create these agents that essentially replicated people's thoughts and behaviors well they turned to indepth interviews not surveys not simple question answers they actually gave them real long form and at times Dynamic interviews combining pre-specified questions with adaptive follow-up questions based on respondents answers are a foundational social science method with several advantages over more structured data collection techniques and they were semi-structured interviews meaning there's a set of questions but then they allowed for dynamic and not predetermined follow-up questions and the interesting part
they actually used AI to do all of the interviews this freedom to answer questions and really Dynamic follow-ups give interviewees more freedom to highlight what they find important ultimately shaping what is measured so here's how it works they have the human participants they give a 2hour voice too interview that is both sides are using voice the interview script is then transcribed and those are given to the gen ative agents to serve as the agents memory now if we remember back to this previous paper they basically started the agents with just a very brief description of
their background and each agent as they interacted with the world they would develop these memories long-term memory short-term memory they had some really cool techniques using rag essentially that allowed them to draw from the memory to determine what actions that agent would take next whether that action is where to go what to do or even how to interact with other agents within the simulation but now in the new paper they took that two hours of interview that essentially get the essence of the thoughts and behaviors of a real human and they use that as the
memory for the agents so technically those agents should behave how their real human counterpart would behave then they had the actual participant responses 2 weeks later and then they tested what the SIM ulated responses from those agents would be so basically they gave the test once used that to give memory to the agents then 2 weeks later they gave those same questions those interview style questions to the humans and then they also gave those interview style questions to the agents and compared the results and it turns out they were really accurate the interview script explored
a wide range of topics of interest to social scientists from participants life stories tell me the story of your life from your childhood to education to family and relationships and to any major life events you may have had to their views on current societal issues how have you responded to the increase focus on race and or racism and policing those are just some examples then the AI interviewer dynamically generated follow-up questions tailored to each participant's responses then they took all those responses and gave it to the agent as memory so when an agent is queried
in the simulated environment that is the entire interview transcript is injected into the model prompt instructing the model to imitate the person based on their interview data for experiments requiring multiple decision-making steps agents were given memory of previous stimuli and the responses to those stimuli through short text descriptions the resulting agents can respond to any textual stimulus including Force Choice prompts surveys and multi-stage interactional settings so let's look at the actual results for the GSS the General Social Survey the generative agents predicted participants responses with an average normalized accuracy of 85 stunning these interview-based agents
significantly outperformed both demographic based and persona-based agents so what does that mean they basically took a bunch of information or a bunch of knowledge about what a demographic might respond with rather than interviewing individuals and then they use that as the memory and what they found is when they use that more generic knowledge rather than interviewing individuals they didn't perform nearly as well and that actually shows bias in that data in that generic data for the big five questions the generative agents achieved a normalized correlation of 080 again stunning and once again outperforming the demographic
based and Persona based versions of those agents on the five well-known economic games designed to elicit participants behavior in decision-making contacts with real Stakes they have the dictator game game the trust game the public goods game the prisoners dilemma the generative agents achieved a normalized correlation of 66 and just to make sure that the agents didn't just answer the questions right or accurately just based on their existing knowledge before all of this additional knowledge was given to them they randomly removed 80% of the interview transcript basically 96 of the 120-minute interview and the interview-based agents
still outperformed the composite agent achieving an average normalized accuracy of 79 on the GSS with similar results observed for the big five second to investigate whether the predictive power of interview stems from linguistic cues or the richness of the knowledge gained we created interview summary generative agents by prompting gbt 40 to convert interview transcripts into bullet pointed summaries of key response pairs capturing the factual content while removing the original linguistic features they are just re affirming that the knowledge gained from these interviews is real and not just from different language cues and once again it
outperformed these findings suggest when informing language models about human behavior interviews are much more effective and efficient than survey based methods and they also talk a lot about bias in artificial intelligence which I know a lot of you have asked me to comment on and make videos about I've done it a little bit in the past so we're going to talk about about that a little bit right now there is concern about AI systems underperforming or misrepresenting underrepresented populations basically if there's not enough training data for underrepresented populations how will the model know how to
respond based on what those populations would have responded with to address this concern we conducted a subgroup analysis focusing on political ideology race and gender dimensions of particular interest in relevant literature so what did they find interview-based agents cons consistently reduce biases across tasks compared to demographic based agents so for political ideology the bias dropped from 12.35 for demographic based generative agents to 7.85 for interview-based generative agents and similar results across the other benchmarks that we talked about so one thing that we've talked about on this channel essentially all public data has been used to
train models now imagine if AI agents were able to interview humans across the world different questions interview Style with follow-up questions dynamically and all of that data was then used to train different models for different purposes that is a huge amount of additional data and then if all of those agents were responding how their human counterparts would or at least highly accurately compared to their human counterparts then imagine we can create all of this synthetic data by interviewing the actual AI counterparts it's really cool to think about but that's the gist of this paper I
hope you enjoyed I thought it was fascinating if you enjoyed this video please consider giving a like And subscribe and I'll see you in the next one