OpenAI researcher on why soft skills are the future of work | Karina Nguyen

9.76k views13339 WordsCopy TextShare
Lenny's Podcast
Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking produc...
Video Transcript:
not only are you working at The Cutting Edge of AI and llms you're actually building The Cutting Edge when I first came to andar I was like oh my God I really love frontend engineering and then the reason why I switched to research is because I realized oh my God cloud is getting better at frontend Cloud is getting R of like coding I think CL can like develop new apps what skills do you think will be most valuable going forward for product teams in particular creative thinking you kind of want to like generate a bunch
of ideas like filter through them in order just the best product experience I think it's actually really really hard to teach the model how to be aesthetic with really good viral design or like how to be extremely creative in the way they write what do you think people most misunderstand about how models are created when you taught the model some of the self- knowledge of you actually don't have a physical body to operate in the physical world the model would get like extremely confused today my guest is Karina and Karina is an AI researcher at
open aai where she helped build canvas tasks the 01 Chain of Thought model and more prior to open AI she was at anthropic where she led work on posttraining and evaluation for the CLA 3 models built a document upload feature with 100K context windows and so much more she was also an engineer at New York Times was a designer at Dropbox and at Square it's very rare to get a glimpse into how someone working on the bleeding edge of AI and llms operates and how they think about where things are heading in our conversation we
talk about how teams at open AI operate and build product what skills she thinks you should be building as AI gets smarter how models are created why synthetic data will allow models to keep getting smarter and why she moved from engineering to research after realizing how good llms are going to be at coding if you enjoy this podcast don't forget to subscribe and follow in your favorite podcasting app or YouTube it's the best way to avoid missing future episodes and it helps the podcast tremendously with that I bring you Karina n this episode is brought
to you by interpret interpret unifies all your customer interactions from Gone calls to zenes tickets to Twitter threads to App Store reviews and makes it available for analysis it's trusted by Leading product ORS like canva notion Loom linear monday.com and straa to bring the voice of the customer into the product development process helping you build best inclass products faster what makes interpret special is its ability to build and update customer specific AI models that provide the most granular and accurate insights into your business Connect customer insights to revenue and operational data in your CRM or
data warehouse to map the business impact of each customer need and prioritize confidently and Empower your entire team to easily take action on use cases like wind loss analysis critical bug detection and identifying drivers of turn with interprets AI assistant wisdom looking to automate your feedback loops and prioritize your road map with confidence like notion canva and linear visit at e n t r p t.com Lenny to connect with the team and to get two free months when you sign up for an annual plan this is a limited time offer that's interpret. c/ Lenny this
episode is brought to you by vanta and I am very excited to have Christina Copo CEO and co-founder vant joining me for this very short conversation great to be here big fan of the podcast in the newsletter vanta is a longtime sponsor of the show but for some of our newer listeners what does vanta do and who is it for sure so we started vanta in 2018 focused on Founders helping them start to build out their Security Programs and get credit for all of that hard security work with compliance certifications like sock 2 or ISO
271 today we currently help over 9,000 companies including some startup household names like at lassan ramp and Lang chain start and scale their Security Programs and ultimately build trust by automating compliance centralizing GRC and accelerating Security reviews that is awesome I know from experience that these things take a lot of time and a lot of resources and nobody wants to spend time doing this that is very much our experience both before the company and to some extent during it but the idea is with automation with AI with software we are helping customers build trust with
prospects and customers in an efficient way and you know our joke we started this compliance company so you don't have to we appreciate you for doing that and you have a special discount for listening ERS they can get $1,000 off vanta at v.com Lenny that's vana.com Lenny for $1,000 off Anta thanks for that Christina thank you Karina thank you so much for being here welcome to the podcast thank you so much Lenny for inviting me I'm very excited to have you here because not only are you working at The Cutting Edge of AI and llms
you're actually building The Cutting Edge of AI and LMS you recently launched this feature which basically uh the first agent feature of open AI I also just did this survey I don't know if you know about this I did this I did a survey of my readers and asked them what tools do you use every day in your work and most use and chat gbt was number one above Gmail above slack above anything else 90% of people said they use CBT regularly it's absurd and it wasn't around two years ago yeah uh also we're recording
this the week that open ey announced Stargate which is this half trillion dollar investment in AI infrastructure so there just like a lot happening uh constantly in Ai and you have a really unique glimpse into how things are working where things are going how thing how work gets done so I have a lot of questions for you I want to talk about how you operate and how you work at open AI where you think things are going what skills are going to matter more and less in the future and also just where things are going
broadly so how does that sound sounds great thank you so much um yeah I was extremely lucky to join early days on dropic and kind of learned a lot of things uh there and and I joined openi around like eight months ago so yeah I'm excited to that more okay I'm G to definitely ask you about the differences between those but I want to start more Technical and just Dive Right In I want to talk about model training people always hear about models being trained this these big models how much data takes how long it
takes how much money toss it takes how uh how we're running out of data which I want to talk about let me just ask you this question what do you think people most misunderstand about how models are created model training is more an art than a science and in a lot of ways like we as like model trainers think a lot about like data quality I like it's one of the most important things in model chaining is like uh how do you ensure the highest quality data for certain like interaction Model Behavior that you want
to create but the way you debug models is actually very similar the way you debug software um so one of the things that I've learned early days at on topic was like we've discovered especially with like cl to training when you taught the model some of the self- knowledge of like hey like you actually don't have a physical body to operate like in the physical world but then at the same time we had data that kind of taught the model um some of the function calls which is like this is how you set the and
so the model would get like extremely confused uh about like whether it can set an alarm in the but it doesn't have a body in the physical world so it's like the model gets confused and sometimes it like over refus so sometimes it says like I don't know like uh sorry I cannot help you and so there is always like a balanced trade off between uh how do you make the model to be more helpful for users but also um not being harmful Ro in other scenarios so it's always about like how do you make
the model like more robust and like oper across like variety of diverse scenarios that is so funny I never thought about that most of the data that trained on is kind of like assuming it's like a human describing the world and how they operate and there's assumes there's a body and you can do things and the model told you don't have a body yeah okay I want to talk a little bit about data while we're on this topic I know you have strong opinions here there's kind of this meme that are going to stop getting
smarter because they're running out of data they're trained in large part on the internet and there's only one internet and they've already been trained on it what more can you show them about the world and there's this trend of synthetic data this term synthetic data what is synthetic data why do you think this important do you think it's going to work I think there are two questions here um we can unpack one at a time but uh people say if you're hitting the data wall uh I think people think more in the terms of like
pre-trained large models that are trained on the on the entire internet to predict the next token but what actually the model is learning during that process is actually how do you compress the compression algorithm here the model learns to compress a lot of knowledge and it learns how to model the world um so the next prediction of the word like teach me how to drive basically and you only have like a few words that will match that a car so the model actually learns um about the world in itself so it's like it's modeling human
behavior sometimes it's modeling and when you talk to like pre- chain models which are very very large they're actually extremely diverse and extremely creative because you can talk to almost any RIT user through pin model but I think what's happening right now was like new um Paradigm of like o1 serious is that like the scaling in post chaining itself is not hitting the wall and that's because basically we we went from like raw data sets from from pre trade models to infinite amount of tasks that you can teach the model in the post training World
via reinforcement learning so any task for example like how to search the web how to use the computer how to write uh well like all sorts of tasks that you like trying to teach the model all the different skills and that's why I be saying like there's no data wall or whatever because there will be infinite amount of tasks and that's how the model becomes extremely super intelligent and we are actually getting saturated in all benchmarks so I think the bottleneck is actually in evaluations uh that we don't have um all the Frontier like Evas
like I don't know um GP QA which is like a Google proof question answering like PhD level intelligence Benchmark is like getting to like I don't know more than like 60 70% which is what HD gets uh so it's like we we literally hitting the wall in like evolve I'm want to follow both those threads so the first is on this idea of synthetic data is a simple way to understand it that the models are generating the data that future models are trained on and you ask it to generate all these ways of doing stuff
all these tasks as you described and then the newer models trained on this data that the previous model generated some tasks are synthetically curated so this is like an active like research area is like how do can you con synthetically construct like new tasks that model to like learn sometimes you know like when you develop product you get a lot of like data from the product and like user feedback and you you can use that data too in like uh this like post chaining World um sometimes you still want to like use like human um
human data because uh actually some of the tasks can be like really really hard uh to a teach um like like like experts like only know like certain knowledge about like some chemicals or like biological knowledge so like you actually need to tap into uh the expert uh knowledge a lot so yeah I think just to me like synthetic data training is more um for like product it's like a rapid model iteration for S like part outcomes and we can dive more into but uh the way we made canvas and tasks and like new like
product features fiki was mostly done with by synthetic training let's actually get into that that's really interesting I want to talk about evals but let's follow that thread so talk about how this helped you create canvas so when I first came to opening I I really had this idea of like okay like it would be really cool for chat P to actually like change the visual interface but also change like the way it is with people so going from like being a chatbot to more of a collaborative agent and a collaborator it is like a
it's like a step towards like more gench systems um that become like innovators ultimately and so they they entire team of like applied Engineers designers products like research kind of like got like formed uh in the air almost out of like nothing it's just like a collection of people who just like got together and we rapidly started itating each other actually like Kev is like one of the I would say like the first project of open where researchers and applied Engineers started working together from the very beginning of the product development cycle and I I
think like there's a lot of things that we have learned on the way but um I definitely came to with the mindset of like we need to do like a really rapid model iteration such that like it would be much easier for engineers to you know work with the latest model possible but also learn from like user feedback or like early like internal dog food uh how do we improve the model very rapidly and you know it's really hard to like um kind of like figure out like how people when you deploy a product how
people would be able to like use it and so like the way you synthetically train the model is basically figuring out like what are the most core behaviors that you want this product feature to do and for canvas for example uh it was it came down to like three main behaviors it was how do you trigger canvas for prompts like write me a long essay when the user intention is mostly like iterating over long documents or write me a piece of code or when to not trigger canvas for prompts like can you tell me more
about President like I don't know um some of the general questions so you don't want to let trigger canvas because the user intention is mostly getting answer not necessarily like iterate uh always a long document the second behavior is um how do how do we teach the model to update the document when the user ask so one of the behaviors that the uh to the model is actually have like the some Agency on autonomy to literally go to the document and like select specific sections and either delete it or edit so highlight it and rewrite
certain sections so sometimes the model sometimes the user would just like say change the second paragraph to be something friendlier and you would have to like teach the model to literally find the second paragraph in the document and change it to a friendly tone so basically you teach both like how to trigger like uh edit itself but also how do you teach the model to get higher quality edit uh for the document in case of like coding for example uh there's also like the question of like how good the model is that like completely rear
writing the document versus like having like very specific targeted edits so that's like another like layer of decision boundary within like edit itself is like select the entire document then like rewrite completely or you want to like have like very targul custom behavior and you know like when we first launch the model we would bias the model towards like more rewrites because we thought the quality of the r were like much higher but over time you're like kind of Shifting based on like user feedback and was the learning from iterative deployment lastly this the third
behavior that we taught synthetically the model is how to make comments on any document so the way we used it is like we would use a one model to produce to like simulate like use a conversation let's say like write me a document about XYZ but then we used o1 to like produce the document and then we kind of injected like user prompt to be like oh make some comments critique my piece of writing uh or critique this piece of writing that you just made um and then Vaught the model to like make comments on
the document on like very specific talk docment so it's like also like what kind of comments you want the model to make like do do they make sense or not like how do you teach the quality of that um and it all came down to like measuring progress why very robust evolves but yeah this is how you like use like a one like kind of like synthetic data generation for like for staining okay this so so interesting uh so you talk about this idea of teaching the model and you mention how it's using synthetic data
to teach the model different behaviors is a simple way to think about it basically that's where you do that by uh showing it what success looks like using basically EV vales is that the simple way to think about it like here's what you doing this successfully would look like and that teach is it okay I see this is what I should yeah great yeah amazing yeah you got it okay got it um I want to start unpacking what your day-to-day looks like as you're building these sort of things is it like you sitting there uh
talking to some version of chat GPT uh crafting these EV valves sometimes I do that sometimes I do sit for actually I think I learned this so much from and is like people spend so much time just like prompting models and like quality l b bash all the time and you actually get a lot of new ideas of how do you make the model uh better it's like oh like this is this response is kind of weird like why is it doing this and you start like debuging or something or like you start like figuring
out like new methods of like how do you teach the model to response in a different way like have better personality let's say so it's the same thing of like how personality is made like in the models windows it's like very similar methods but yes I I think my time out I have changed I think when they first came I was like mostly like research IC work uh so I was like building a lot of like I was like writing code like you know changing models writing evos working with PMS and like designers to like
learn teach them how to like even think about like evaluations I think that was like really cool experience and I think this is like an an adoption of like how do we like do this like PRI management of like AI features or like um AI models um yeah but now it's like mostly like you know like management and like mentorship um I'm still like doing I see like research code after like 4M although but um yeah it's just kind of like changed all right don't talk too much about being a manager because everyone's firing their
managers who needs managers anymore that's the What I Hear now just kidding it's interesting that so much of your time was spent on teaching product teams how evals integrate and how important that is and I've heard this a few times and I haven't personally experienced it yet so I think it's an important threat to follow is just how writing these evaluations is going to become increasingly an important part of the job of product teams especially when they're building AI features and working withs so can you just talk a bit more about what that looks like
is it like sitting there with an Excel spreadsheet basically showing like here's the input here's the output here how good the result was talk about what that actually looks like very practically it certainly depends like on the what you're developing but uh there are various types of like evaluations so sometimes I do ask product managers or uh there's also like new role that we have like model designers to um kind of like go through some of the user feedback maybe or like think of like various like user conversations that should have triggered like under this
s is it should trigger canvas and then you have this like ground T label of like okay with this conversation it should look trigger canvas under this conversation it should not trigger canvas and you have this like very B deterministic kind of like eval that for like this is behaviors is like this uh when we were launching tasks for example like how do you make correct schedules is like actually really hard for the model but we we we buil out like some of the determined IC evaluations that it's like okay like if the US says
like 7 p.m. it should like the model should say 7 p.m. so if you can like have like different metic evolves whether it's like pass or fail um so yeah and like the way it works is like sometimes I ask PR just like go create like a goal sheet like have different tabs and like um what's the current behavior what's like the ideal behavior and like why or like some notes and sometimes you usually use it for eval sometimes uh we use it for training because like if you give the spure to like a one
model it can probably figure out like how to uh teach itself a good behavior and I think there are second type of like evals that is kind of more prevalent is like uh human evaluations and you can have specific trainers or you can have like internal people to um when you have like a conversation of the prompt and then you have like various completion of models you kind of choose the win rate which model is the best which model produce the highest quality comment or edit and then you can have like continuous win rates and
as you develop new models it should always like win over the previous models so um it depends on what you want to measure so interesting like basically what I'm hearing and there something I'm learning about as I talk to people is product development start might move from this like here's a spec PRD let's build it together and then cool let's review it are we happy with this to from that to hey AI build this thing for me and here's what correct looks like and I'm spending all my time on what is correct look like on
evals essentially you definitely want to like measure progress of your model and this is where eval is is because like you can have prompted model as a baseline already and if there most robust Evol is the one where prompted baselines uh get the lowest score or something and then because then you know like oh if you trained a good model then it should like just like he climb and that eval of the time while not like also like regressing on like other intelligence evils so it's like I think it's more what that's that's what I'm
saying like it's more in art than science it's like okay like if you optimize the model with this Behavior like you kind of don't want to like brain damage in like other areas of intelligence or and this is happening like all the time in every lab in every like research team um I would say like prompting is like also a way to like prototype like new product ideas like early days at andar when I was working on like file uploads feature I remember just like you know prompting the model um to just like and when
we were like launching like Hond key contacts I was just like part typing this in my local local browser I did the like people really really loved it and they just like wanted like API for like file uploads or something and then that's when it clicked to me like I also like wrote a blog post a long time ago like it clicked to me like prompting is like a new way of like product development or like prototyping for designers and for like product managers for example one of the features that I wanted to do is
like have a personalized uh recommend um personalized start accounts so whenever you come to like Claud like it should like recommend you like start a promps based on what your interests are and so like you can literally do it like prompting for that experiment that another feature was like generating titles for the conversations it's a very small like micro experience but I'm really proud of and the way we did that was be we we took like five latest conversation from the US like as the model like what's the style of the user and then like
for the next new conversation the generated title will be of the same like style it's like really little like micro experiences like this um that's so cool did you do that at atropic or at open AI I don't toic okay cool I love the file upload feature that Claud has by the way oh chat GPT doesn't have that yet is that right um I think it has I think like the way it's implemented is like very different though okay maybe it's the PDF feature cuz I use it all the time with cl okay that's cool
soone needs to get on that uh man it's wild how many features you built that I use every day and that many people use every day this prototyping point you made is really important it's something that comes up a ton on this podcast also of how that is maybe the way that AI has most impacted the job of product Builders recently is just prototyping instead of going from showing just like here's a PRD here's a design PMS more and more just here's the Prototype the idea that I have and it's working you can play with
it yeah yeah okay I want to spend a little more time on how you operate so you talked about you built this in launch this tasks feature is that is that the way you describe your tasks yeah so talk about how that emerged and let's better understand just how you collaborate with product teams and how opena Works in that way whatever you can share there I think canvas and tasks are uh going into the bucket of like projects where it's like more like short or like medium terms and um actually the way canvas and tasks
came up about to be was like it started with like one person prototyping and creating like a spec it's kind of like PRD it's like creating a speck of like the behavior of the model I don't think like tasks is like extremely like groundbreak groundbreaking feature necessarily what makes it like really cool is because the models are so General model cannot search they can like write sci-fi stories they can like search for stocks they can like summarize the news every day because the models are so General like giving something familiar to people that like you
know notifications is like very familiar like having reminders is like very familiar so like creating like a form factor for the people who like very familiar simp like can like bulds is very familiar but then you add like magical AI moment and like it becomes like very powerful but the way comes usually like operationally like yeah size just like a prototype like literally prompted prototype of like how you would want like the model to behave for like tasks for example like you kind of like need to design a little bit of like design design systems
design thinking is like okay like well if the if this the user says like um remind me to go to lunch like at 8 a.m. tomorrow okay what what kind of information does a model need to extract from that prompt in order to create a reminder and so this is how you like like design like a a spec for um a new feature like a tool canvas and tasks all tools so it's like how do you like create the tool stack and then it's like most mostly like like uh developing Json schema I was like
okay like from this problem maybe the model should extract like the time that the user requested and then you think about like which which format you want the time to be and then like how do you want the model to like notify you is like basically the user should give instruction to the model uh and then this instruction would like fire off like every day or something at that that particular time uh so for example if you say like search like every day I want to like learn know about the um latest AI news um
the model should be ride into like okay like search for the latest AI news and this will will this task will get fired at that particular time that the model that the user requested and then you know like your design is like tool spe and then actually I don't know like I feel like sometimes like it's like through conversations I like I don't like people ask me to like join the like team and they like oh my God like we need researchers or like we need like some support like we need like to train the
modals or sometimes like with canas is like mostly like I just bitched the idea like it got staffed quite immediately during the break um so I know like it's like depends on the project and usually with Staffing it's like mostly like a product manager um model designer actual product designer a couple researchers and a bunch like applied Engineers depends on the complexity of project and then like you know it take it took for tasks it took like I know like two months or so to go from like zero to one basically oh wow um for
canvas this was like four five months I guess um to go from Zer to one but uh yeah and then like you know you teach product managers how to like build evals and like maybe you know how how do we not only like ship uh the better feature but how do we think like more longer term like what kind of like cool features did you want tasks to have like I think it would be nice for TAS to be like extreme a little bit more personalized it' be nice to have like to create tasks via
voice on the mobile right like so you kind of need to like this is how you get like research Ro now right here is like thinking like how the feature will be developed in the future and then from there it's like you like start creating data sets like uh with EBS you want to make sure that goes well and then like you need to have like a trade-off between like what methods you want to use and the reason why I really love like synthetic like relying pure lens data instead of like collecting data from humans
is because it's like much more scalable it's cheap less than have like you literally sample from the model and you teach the qu behaviors of the models and that will generalize um to all sorts of the diverse coverage and when you launch the better feature you learn so much from the users that you can like all your synthetic sets can be can be shifted in the distribution of how the users behave in on the product Behavior and this is how you improve uh and this is what happens canvas too when we laun from beta to
G this episode is brought to you by loom loom lets you record your screen your camera and your voice to share video messages easily record a loom and send it out with just a link to gather feedback at context or share an update so now you can delete that novel length email that you are writing instead you can record your screen and share your message faster Loom can help you have fewer meetings and make the meetings that you do have much more productive meetings start with everyone on the same page and end early problem solved
time saved we know that everyone isn't a one take wonder when it comes to recording videos salum comes with easy editing and AI features to help you record once and get back to the work that counts save time align your team stay connected and get more done with loom now part of atlassian the makers of jira try a loom for free today at Loom boom.com Lenny that's l.com Lenny something that I want to help people understand and I don't even 100% understand this is what's the simplest way to understand the job of a researcher versus
say a model designer and other folks involved like what's the simplest way to understand what researchers do at open air so the project that I described that mostly like product oriented like research is mostly like product research another part component of my team is actually more like longer term exploratory projects and it's more about like developing new methods understanding those methods under a variety of circumstances so like basically develop new methods you kind of like need to follow very similar kind of like recipe of like building evals but it's much more sophisticated Evol like you
kind of want to have like Auto distribution like if you want to like measure generalization um you kind of need to like capture that but it's basically more sciencey in a way where you know if if we talk about synthetic data like one of the hardest things about synthetic data is like how do you make it like more diverse diversity in Sy is like one of the most important questions uh right now and just like exploring like ways to inject like diversity as a general method that will work for all is like a one of
the like research Explorations other ones is like more like developing new capabilities I feel like it's all this about like you know like you you work on this like new method and you have like Signs of Life that it's working either you think of like how do you make it more general or you think of like how do you make it very useful or like and this is how like longer term projects become more like medium like short-term project that makes sense essentially working on developing ways to make the model smarter 045 06 you ways
to uh like o1 was a big breakthrough right the way operates where it's not just here's your answer it actually thinks and has right takes time to think through the process of coming up with an answer okay yeah very helpful speaking of that of thinking about the future where things are going I want to spend some time on just this Insight that basically you are building The Cutting Edge of AI like at the very bleeding edge of where AI is going and where it is and so uh I'm very curious to hear just your take
on how you think things are going to change in the world and how people work based on where you see things are going and I know it's a broad question but let's say like in the next three years how do you see the world changing how do you see people's way of working changing it's a very humbling experience to be in both Labs I guess like to me when I first came to andar I was like oh God I really love F engineering and then like the reason why I switched to like research is because
I realized at that time is like oh my God like cloud is getting better at like front ends like cloud is getting better at like cing I think CL can like develop new apps or something and so like it can like develop new features for the thing that I'm working so it's like it was kind of like this meta realization where it's like oh my God like the world is actually changing and they like when we first like launched 100K context at that time obviously you know I'm thinking about like from factors that's like yeah
like file uploads were like very natural very familiar to people but you could imagine we could just like make like infinite chats in the cloud AI app right like as if like it's like in 100K context but because like file uploads it's like form follows function it's like the form factor the file uploads kind of enable people to just like literally upload anything the books or like any reports financial and like ask any task to the model and I remember it was like you know Enterprise customers like um like fincial customers are like really interested
in that it's like oh wow like actually they it's actually one of the very common task that people do uh in that setting was like kind of crazy to like see uh how some of the Redundant tasks are getting like automated basically by this like smart models and they're entering the the era where I actually don't know for example sometimes like if A1 gives me the correct answer or not because I'm not an expert in that field and it's like I don't even know how to verify the output um of the models is because like
only experts known like they can like verify this so yes so basically there are trends that are going on the first trend is the cost of reasoning and intelligence is drastically going down I had a blog post about this maybe I should update on like latest benchmarks because at that time like MMO everybody was like doing like um some like like one Benchmark and then be like quickly saturated The Benchmark and like now to like do the same plop but was with another like Frontier e but the cost of intelligence is like going down because
it's it becomes like much cheaper smart small models are becoming even smarter than like large models and that's because of like the distillation research this happened with like lot High cool I was like working on like P like clty High cool and I realized it was much smarter than like CLA to which was like way you know bigger like something like that um but like the power of like small models become very intelligent and fast and cheap we are moving towards that Ro that has like multiple indications but that means that like people will have
more access to Ai and that's really good like Builders and Developers will have much better access to AI but also it means like all the work that has been like bottlenecked by in intelligence will be kind of like unblocked so anyone like in I'm thinking about like healthcare right like if I have instead of going to the doctor I can like ask chpd or like give chpd a list of symptoms and ask me like oh which like would I have like a cold flu or like something else like I can literally get the access to
like uh doctor almost and there's like been some like research studies around that yeah there was a New York Times story about that where they compared doctors to doctors using chat jpt to just chat jpt and just just chaty PT was the best yeah of them all like like doctors made it worse yeah yeah that's crazy like right like education I think uh I would have dreamed if like I had the tool like CH when I was like young and like would learn so much but it's like people can now learn almost anything from these
models so they can learn new language they can learn how to build new book apps like I right anything that you want and like I'm so like it's humbling to like have like launch canvas and like bring that thing to the people enable them to do something else that they couldn't have ever before I think this is there's something like magical around this experience uh education has both have massive implications like I guess like scientific research right like I I think it's like the dream of like anyi research is like aate AI research uh it's
kind of scary I'd say um which makes me think that like people management will stay you know it's like one of the hardest things to it's like emotional intelligence for the models or like creative creativity in itself is like one of the hardest things so writers I I don't think like people should be worried as much I think it's like I think it's elevate a lot of like redundant tasks uh for people this is awesome okay I want to follow this thread for sure and it's funny that what you described is like you were an
engineer anthropic and you're like okay CLA is GNA be very good at engineering this isn't going to be a potentially career long term so I'm going to move into research and the AI is going to need me for a long time to build it make it smarter I would say we still have like I think canas team has still have like really Co like front Engineers that are really like you know people who like really care about like interaction design like interactive Spirit like I don't think like models are there yet like I think if
but we can get the models to like this top 1% of like front end or something um for sure so what I want to move on to next along these lines is just and this is just speculation but uh what skills do you think will be most valuable going forward for product teams in particular particular so folks are listening and they're like okay this is scary what should I be building now to help me stay ahead and not be in trouble down the road what skills do you think are going to be most more and
more important to build yeah I think like creative thinking like you kind of want to like um come up like generate a bunch of ideas and like filter through them and know just like build a best product experience listening you know you want to like build something that like the most General model will not replace you and often times you you build something and you make it really really good for like specific set of users and actually the mode is now in like your user feedback the mode is like more in like whether you listen
to them like whether you you can like rapidly trate like the mode is like in here I I don't think like we we are yet to like think there's so many ideas I there's an abundance of like ideas that you can look very regardless like I wouldn't be worried I feel like in fact like I do think like people in AI field are like I I wish they were like a little more creative and like connecting dots across like different like fields or something like that to like develop really cool new like generation a new
paradigms of interactions with this AI like I don't think we've cracked this problem at all um couple years ago I was like telling some people I was like you know you kind of want to like build for the future so it's like it doesn't necessarily matter whether the model is good or not good right now but you can build product ideas such that like by the time the models will be really good it will work really well um and I think it's just like happened naturally like for example like at anic like right like uh
the cloud aact and I feel like early days of canvas was like back in like 2022 like before chai PT like writing ID was like on knowledge but I feel like Claude 1.3 model itself was like not there to like made like really extreme good like high quality edits for example like coding um and I feel like I I see like startups like cursor and it's like doing super well like and that's because they like iterate so fast they like invent like new ways are like training models they move really fast they listen to like
users like massive distribution is like yeah it it's kind of cool that's really helpful actually so what I'm hearing is that soft skills essentially are going to be more and more important powerful you talked about management leading people being creative and coming up with Innovative insights listening there's a post I wrote that I'll link to where I look I I try to analyze what AI how AI will impact product management and we're actually very aligned and my sense was the same thing that soft skills are going to become more and more important and the things
that are going to be replaced as the heart skills which is interesting because usually people value the heart skills like coding design writing really well and it's interesting that AI is actually really good at that because it's taking a bunch of data synthesizing it and writing creating a Thing versus all these fuzzy things around of what influences convinces people to do things and aligning and listening like you said creativity anything along along those lines come up as I say that I think it's actually really really hard to teach the model how to be aesthetic or
like uh do like visual a really good like visual design or like how to be extremely creative in the way to write I think like I still think like CHP kind of sucks at like writing and that's because it's like it's like bottl KN by this like creative reasoning I think like prioritization is like one of the most important like I think like um for manager I feel like actually like AI research progress is oned by like management like research management is because you have like constraint set of comput and you need to like allocate
the comput to the research PA that you feel the most convinced about was like you need to like really you need to have like a really high conviction in the research buts to put the computer and like it's more like return on investment um kind of situation it's like okay yeah like I'm thinking a lot about like like okay like how do across all my projects which projects a higher priority is like priorization and also like on the lower level it's like which experiments are really important to run right now and which are not and
like cut through the line so I see like prioritization communication like um management um people SK like empathy like understanding people like kind of like collaboration like I think like canvas wouldn't be like an amazing launch if it wasn't like about like people and I think it it's a wonderful good group of people and like I got a chance to like work with like people like Lee Byron who's like a co-creator like graph K and like some of the best like apple designers it's like so cool to like see and like how do you create
this like collaboration between people it's just like something that's still Humane I think I me just follow us through a little bit because I imagine people listening are like okay but once we have AGI or SGI it's like it'll do all this it you know it's like there's a world where like why isn't all this done I think it's easy to just assume all that I'm curious this idea of creativity and listening why you think AI isn't good at it other than it's just very hard to train it to do this well is there anything
there of just like why this is especially difficult for AI nms to get good at I think currently it's difficult for many reasons I think it's still like an active like research area and is something that like I think my team is like working on is like okay how like how do we teach the model to be like more creative and like the writing and actually like I thinking like this new paradigm like the models think more should actually lead to like better writing in itself but like when it comes down to like idea generation
or like um discriminating of like what is a good like visual design and not I think like if hasn't had learned like examples from like people to discriminate it very well I do think it's because like you know there are not that many people who are like actually like really like it's not like accessible to like model so learn from these people I guess um so I guess like that's why it's sucks yeah that makes sense basically there's not enough of you yet uh researchers TR teaching it to do these things slash people that have
incredible taste and creativity that can teach these things you could argue this will come but I'm not we don't need to keep going down that thread let me ask you a specific question in this post I wrote I I made this argument that a lot of people disagreed with that strategy is something that AI tooling will become increasingly great at and take over there's the sense that that's a thing that people will continue to be much better at and you can't offload toi basically developing your strategy telling you what to do to win my case
is isn't strategy just take all the inputs all the data you have available understand the world around you and come up with a plan to win feels like AI would be like an LM would be incredibly smart at this what's your take I think so too I think like again like we you teach the model all sorts of like tools and like capabilities and like reasoning right and it's like when it comes down to like as is for cus right now would have been very cool to like for the model to just like aggregate all
the feedback from users like some I me like the top five like most painful FL flows like us experiences and then like the model itself is like very capable of like like thinking of like knowing how it's being made uh figure out like how to like create the data sets for itself to like train on it and I don't think like we are far away from that kind of like self-improvement models becoming like self-improved VI like then like the part of development is basically they kind of like self-improving like it's kind of like it's own
like organism or something um yeah like again like strategies like it's more like data analysis and like um coming up with like like I think what models are really good at is like um like connecting the dots I think it's like okay if you have user feedback from this Source but you also have an internal like dashboard with Matrix and then you have you know like other kind of like feedback um or like input and then like it can co-create like a a plan for you like recommendations even and I think this is like one
of the most common like use cases for CH P2 is like coming up with like this sorts of things that makes sense like essentially a human can only comprehend so much information at once and look at so much data at once to synthesize takeaways and as you said these context windows are huge now here's all the information what's the most important thing I should do yeah same as like scientific research is because like you like ideally the model would be able to like suggest like ideas like new ideas like iterate on the experiment or like
given the empirical results of the previous experiments like how do you like come up with like new ideas or like methods yeah oh man uh okay so just to close the loop on this conversation this part of the thread is the skills you're suggesting people focus on building and leaning into a soft skills like creativity managing influence collaboration looking for patterns is that generally where your mind is at yeah I'm thinking a lot about like how do we make a relations more effectively and I think this is more most of like management I guess it's
like how do you organize like research teams or like generally teams like combine compos teams such that they will be at their maximally succeed like at the maximum like performance of what can possibly like we can like literally create like the next generation of computers it's just like the matter of connection and like the way you manage through that it's like scaling organizations or like scaling product research it gu yeah I think what like you're basically building this thing and not efficiently doing it is like limiting the potential of the human species right now is
mismanagement within the research team and op anthropic and some of these other models yeah it's kind of crazy to think about it holy moly okay so speaking of anthropic and open AI you've worked at both very few people have worked at both companies and seen how they operate I'm curious just what you've noticed about the differences between these two how they operate how they think how they approach stuff what can you share along those lines it's more similar than different uh obviously there is a lot of like there are some like differences also comes to
like nuances is a to culture I really love anic and I have a lot of friends there and I also love opening eye and then still have a lot of friends though so it's like it's not about like enemy I feel like there's like in the ey all like yeah the competitors there like enemies but there actually like fun big community and like of people like doing the same thing I they would have learned from anic is this like real care and craft towards it's like Model Behavior model craft model training and I've been thinking
a lot about like okay like what makes claw claw than what makes Chach CH and like I this comes down to like operational processes that kind of lead to the outputs to to the model uh is the outputed model and it's like the reason why Claud has so much more personality and like uh is more like a librarian I don't know like I don't know like visualizing CLA being like a librarian some like a um very like Nery or something um it's because I feel like it's a reflection of the creators who like making this
model and like a lot of like details around like the character and the personality and like whether the model should follow up on this question or like not like was the correct like ethical behavior for the model like in this scenario like a lot of like crafts um and like cre read it like thiss and this is where I learned that part of like art I guess uh at onar I say like Anar is like much smaller like when I joined it was like what like 70 people when I left it was 700 people like
obviously the culture changed so much I really enjoyed doing like early days startup like wies and like people knew each other as a family but like the culture shifted I would say like andar I learned from andar that like much better at like focusing and like priorization like very very like very hardcore prioritization I guess and then need to do it like but I think like open ey is like much more um Innovative and uh much more like Risk Takers in terms of like product or like research actually you know like I don't you can
like your full-time job can be just like teaching the model how to be like creative writers and it's like there's some luxury in this like research freedom that that comes to scale maybe I don't know um but it gives you it's like you'll have I feel like I I have much more creative like product freedom to do almost anything I guess within like opening eye like a l CH into like theion that you want it's like more like yeah probably Bottoms Up I guess yeah that's how I was I was thinking about it it feels
like opening eyes is more bottoms up uh distributed people Bubble Up ideas try stuff there's more lot and that em leads to more products launching I imagine more things just kind of being tried versus more of a let's just make sure everything we do is awesome and great and craft and thinking deeply about every right every investment that's really interesting I've never heard it describe this way uh Karina we've covered so much ground this is going to help a lot of people with so many uh ways of thinking about where the future is going before
we get to our very exciting lightning around I'm curious if there's anything else that you think might be helpful to share or get into one of my regrets I guess when I was early days at on was that like I think there was like some luxury of the time pre chpt to actually like come in with like a bunch of ideas and like prototype like almost every day um and I think like we did a lot of cool ideas like Claud and slack was actually one of the first like uh tool usy like products it's
like a cloth could operate in like your workplace now it's like kindal you like add CLA summarize the thread so maybe you have a entire conversation with someone and then you want like a summary like what happened like you can say at Cloud summarize this also it was really fun to like even like iterate on the model itself it's like when you just like talk to the model in like slack forever um it created like some social element it's kind of cool it's kind of like me join me in like um this Discord like people
learned so much about like promting and like how to work with like CLA I feel one of the features that was like early tasks part typ was like you know every Monday clog we just like summarize the entire Channel or like every Friday we just like summarize like bunch of channels and and give like the news about the organization or something so it's it's kind of like really cool like form practor I think like thinking about like pH factor is like a really important like question like in AI especially we haven't like even figured out
like how do we create like an awesome like product experience with like Oh serious models it's like the Paradigm between like synchronous real time give an answer Paradigm into like more asynchronous Paradigm of like agents working on the background but then now the question is like the agents should build trust with you right and Trust book over time which is like with humans and um you know you start this collaboration which is why like a collabor like this collaboration model was like you and a model is like so important because you both trust and the
model learns from your preferences so that it can become like more personalized and it will start predicting the next like action that you want to take on the computer or something and and it's like kind of like more predictive much more we we we went from like personal computer to like personal model uh basically here that's uh why is it not a thing that seems like such an obvious feature that every LM should have is a slackbot version of them is that is that a thing I can install or is that not a thing right
now I know that CLA and Slot was sunsetted in like 2023 or something but that's because like I think uh I think it was like after ch ch PT was mostly like the focus on like consumer use cases or like Enterprise use cases uh I think didn't want like I think the form f up like claw and slack it's like um was kind of constrained a little bit um when you want to new features I want that I know that JB had like slack part to so I don't know like maybe it will come back
some all right I would I would pay for that uh any other memories from that time of early days because that's a really special place to have been as early days anthropic any other memories or stories from that time that might be interesting to share I think the very first launch when we fell like when click and use again was like 100k context um launch is like when the models could input the entire like book and give you like a summary of the book or something um or the entire Financial or like have like multi
files Financial reports and then like give you an answer um to the question to very specific question I think there was something in there that kind of like oh my God this is like a really cool new capability not like model capability but more like the capabilities that came from the product form factor itself rather than like the model capability as much um I think like other prototypes that we were thinking about like yeah like uh there's like one part like Claude workspaces And it's like kind of the same like idea like Claude and I
would have this shared workspace and that Shar work like a documents and you can like it the and I feel like sometimes the ideas like part ideas lag and they lock for like two years um just like in this case it's interesting there's these Milestones that kind of uh open up our view of what is happening and where things are going chat PT I think was the first of just like wow this is much better than I would have thought you talked about 100K context Windows where you could upload a book and ask you questions
have it summarized I actually use that all the time when I have interview guests and they wrote a book I sometimes don't have time to read the whole book so I use it to help me understand what the most interesting parts are and then I actually Di into the book just to be clear uh and then I don't know maybe like voice was another one where you could talk to say chat GPT is there any other moments there that you're like wow this is much better than I thought it was going to be yeah I
think like uh the computer use agents like the model operating the desktop and you can essentially think of like you know new kind of like experience where the model can learn the way you browse and from that preference it can just like browse as just like you and it's kind of like simulation simulated like Persona and it's actually very similar to the idea of like okay like maybe Sam Alman doesn't have a lot of like um time maybe I want to like talk to like his simulators like his simulation and ask like or like for
example like yeah like I I I really appreciate some of the tech mhip like yob like but he doesn't have a lot of time so it's like I really want to like ask him these questions like how respond let's simulated environments like those um would be really cool it's a great place to plug Lenny bot I have one of those it's trained on all of my podcast and newsletters and I it sits on many models I don't know which one exactly they use but it's exactly that it's uh and it's not even me it's all
the guests that have been on the podcast on newsletter I wrote and you could just ask it how do I grow my product how do I a strategy and it's actually shockingly good do you feel like it reflects who you are like the best part of it is you can talk to it it's built there's an 11 Labs Voice version that's trained on my voice now from this podcast and it's actually very good and people like have told me they sit there for hours talking to it wow and somebody uh told it interview me like
I am on Lenny's podcast ask me questions about my career and he did a half hour podcast episode with Lenny God that's so fun it's incredible future is wild yeah I think like content transformation is like you know like I would imagine sometime like you know um when you generate a Sci-Fi story in canvas like you can like transform this into like audio B like where have like very natural like content transformation like one media to another media I think like one of my in earliest inspiration um is like one the last episodes of like
Westworld where uh I don't but where Dolores comes to her work at that time and she comes to like this like new workspace and she starts like writing a story and then they should writes a story like a 3D like virtual reality starts like creating on the fly so I kind of want to create that kind of cool wow speaking of medium uh I guess I I was I was wondering if I should go in this direction or not but real quick Uh Kevin wild Kevin wheel I don't know exactly how to pronounce his last
name the CPO of uh is it while or wheel I think real wheel okay okay let's just say that go he was uh he did a panel at the Len and friend Summit last year and he made this really fascinating point that chat is a really interesting interface for these tools because they're just getting smarter and smarter and smarter and smarter and smarter and Chad continues to work as a paradigm to just interact with them similar to a human you could talk to Albert Einstein you could talk to someone not very smart and it's all
conversation still and so it's a really flexible way to interact with increasingly good intelligence at some point it'll not be so great and you're talking about all these ways that you're adding additional ways to interact but it's interesting chat proved to be a really powerful layer on top of all the stuff yeah that's really cool I feel like child also has like social element which is like very uh Humane it's like yeah know you sometimes want to like get into group chat and like yeah having conversations with there it's kind of like a group chat
in itself as like messaging I this this idea of like how do you build like features like this like I see tasks as like this like um General kind of like feature that will scale very nicely as the models would develop like new capabilities as those it's like like the models will be able to like do better like searches and like you know create new like come up with like more creative like writing on like render you know react apps and like HTML pre like apps and like you can have like every day a new
puzzle for you like every day like continue the story from the previous days it's like it it scales very nicely you mentioned something as we were getting into this extra section that we ended up going down is this idea of uh your the agents using a computer I know this is actually something you're going to launch today the day we're recording it which will be out by the time this comes out called operator can you talk about this very cool feature that people will have access to yeah so uh I unfortunately did not work on
that but I'm really really excited about like this launch um it's basically an agent that can complete the task in its own like virtual computer like in its own virtual environment you can do any literally task like order me a book on Amazon song and then ideally the model will either like follow up with you like which book do you want or like know you so well that they like start recommending like oh here's the five books that I might recommend you to to buy and then like you hit like yeah help me help me
buy and then uh the model goes off uh into its own like virtual little browser and like complete the task and buy the book on the Amazon and then if you give the model like itial credit card obviously it comes with like a lot of trust and like safety um then it will just complete the thing for you that's a virtual assistance it's interesting how this just sounds like obviously this should happen like why is this not a other thing which is also mind-blowing that we're just assuming this should exist like just some AI doing
things for you on a computer you just ask it to do like it's absurd it's actually really hard and I think like um you're still cracking this way you feel like I don't know if you use like tople it's like a pair programming product no but um I don't remember if you love Pa programming so if you oh yeah Shopify uses this I remember it came up on a podcast episode oh nice yeah so is it's a very cool product where you can just like call anyone at any time and then like share screen and
the other person can like have access to the screen and like start like literally operating your computer uh and it's very like real time like the allegiance is like very um it's like very high quality um and it's just like I kind of want the same it's like I want to like P program with like my model and like the model should even talk to me like draw like very specific like section in my code in vs code and like tell me like I teach me and you can have like different modes it's like right
here this is like a product right here for you I don't know um some people should bu build up it sounds like a startup just got birthed yes from someone listening to this you mentioned that it's very hard to do this uh agent controlling a computer as you and helping out what makes it so hard for whatever however much you can explain briefly much of it is like uh because right now the models operating on like pixels instead of like language or whatnot like pixels is actually really really hard for the models because like perceptional
visual perception I think there's still like a lot of like multimod like research that's going on um but I think like language scaled so much like easier compared to like multimodal because of that another like thing that I just like my team is working on that is like how do you derive human intent um very correctly it's like sometimes like does a model know enough information to ask a followup question or like to complete the task you kind of don't want like an agent to like go off for like 10 minutes and then come back
with like an answer that you didn't even want that actually creates like much more verse us experience and this is comes with like teaching the model like like people skills it's like you know like what do people like like kind of like creating like the mental model of the user and like care about the user in order to ask certain questions like actually that part is like hard to for the models that relates to what we talked about earlier where this kind of the soft skill people skills pieces yeah not where these models are strong
yet okay I'm gonna skip the lightning round I want to ask just one question from the lightning round something fun uh okay so when AI replaces your job Karina I'm curious what you're and it gives you a stip in gives you a monthly STP in here's your here's your salary for the month what what would you want to do what do you want to spend your time on what will you be doing in his future world I've been thinking about this a lot of times I have I feel like I have a lot of jobs
options I would love to be a writer I think I think that would be super cool uh you should like write like short stories like sidefire stories um novels I really like art history so you know those like um conservationists like in the museums who just like try to preserve like art paintings but just like painting through a lot of things I think that would be really cool um to do um yeah that sounds beautiful I don't know uh what I'm hearing is you need to Nerf these models to not get very good at writing
so that you can continue although at that point you don't need to do it for like you don't need people to buy you're just doing it for fun so it doesn't even matter if they're incredibly good at writing or art art conservation oh man what an episode we conversation what a wild time we're living in Karina thank you so much for being here two final questions where can folks find you online if they want to reach out and follow up on anything and how can listeners be useful to you you can find me I'm on
Twitter it'san um you can also sh me at email on my website um and I'm my team is hiring and so like I'm looking for research Engineers research scientists as well as like machine learning Engineers like people who come from like product Engineers who want to like learn like model training um actually hiring for like my team my team is go Frontier product research and the train models we develop new methods but for product oriented outcomes what a place to work holy moly uh what's the best way for people to apply for these uh very
lucrative roles I think you can shoot me a DM on Twitter okay or um I'm yet to create a job description okay this is the job description or you can apply into like post training team yeah okay this you're going to get a flood of DMs I hope you're prepared Karina thank you so much for being here this was incredible thank you so much Lenny bye everyone fun thank you so much for listening if you found this valuable you can subscribe to the show on Apple podcast Spotify or your favorite podcast app also please consider
giving us a rating or leaving a review as that really helps other listeners find the podcast you can find all past episodes or learn more more about the show at Lenny podcast.com see you in the next episode
Related Videos
An operator’s guide to product strategy | Chandra Janakiraman (CPO at VRChat, ex-Meta, Headspace)
1:47:22
An operator’s guide to product strategy | ...
Lenny's Podcast
11,911 views
My 17 Minute AI Workflow To Stand Out At Work
17:30
My 17 Minute AI Workflow To Stand Out At Work
Vicky Zhao [BEEAMP]
332,158 views
How to break out of autopilot and create the life you want | Graham Weaver (Stanford GSB professor)
1:12:13
How to break out of autopilot and create t...
Lenny's Podcast
18,142 views
The Engineering Unlocks Behind DeepSeek | YC Decoded
13:06
The Engineering Unlocks Behind DeepSeek | ...
Y Combinator
136,587 views
FULL REMARKS: JD Vance Puts European Leaders On Notice About Trying To Regulate U.S. Tech Giants
15:50
FULL REMARKS: JD Vance Puts European Leade...
Forbes Breaking News
1,097,384 views
NVIDIA CEO Jensen Huang's Vision for the Future
1:03:03
NVIDIA CEO Jensen Huang's Vision for the F...
Cleo Abram
1,445,246 views
President Trump & Elon Musk in the Oval Office: Full Remarks
30:30
President Trump & Elon Musk in the Oval Of...
Bloomberg Podcasts
161,783 views
How To Get AI Startup Ideas
43:49
How To Get AI Startup Ideas
Y Combinator
83,899 views
Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote
26:52
Andrew Ng Explores The Rise Of AI Agents A...
Snowflake Inc.
554,436 views
Scripts for navigating difficult conversations | Alisa Cohn (executive coach)
1:23:43
Scripts for navigating difficult conversat...
Lenny's Podcast
11,918 views
How China’s New AI Model DeepSeek Is Threatening U.S. Dominance
40:25
How China’s New AI Model DeepSeek Is Threa...
CNBC
5,260,400 views
Linear’s secret to building beloved B2B products | Nan Yu (Head of Product)
1:21:08
Linear’s secret to building beloved B2B pr...
Lenny's Podcast
17,273 views
What Happens If Trump Dismantles The Department Of Education?
16:17
What Happens If Trump Dismantles The Depar...
CNBC
719,122 views
Building Wiz: the fastest-growing startup in history | Raaz Herzberg (CMO and VP Product Strategy)
1:05:20
Building Wiz: the fastest-growing startup ...
Lenny's Podcast
23,892 views
Deep Dive into LLMs like ChatGPT
3:31:24
Deep Dive into LLMs like ChatGPT
Andrej Karpathy
654,914 views
Vertical AI Agents Could Be 10X Bigger Than SaaS
42:13
Vertical AI Agents Could Be 10X Bigger Tha...
Y Combinator
673,287 views
DeepSeek facts vs hype, model distillation, and open source competition
39:17
DeepSeek facts vs hype, model distillation...
IBM Technology
109,577 views
AI Revolution: What Nobody Else Is Seeing
39:33
AI Revolution: What Nobody Else Is Seeing
Y Combinator
138,969 views
Body Language Expert: Stop Using This, It’s Making People Dislike You, So Are These Subtle Mistakes!
2:43:35
Body Language Expert: Stop Using This, It’...
The Diary Of A CEO
8,267,131 views
AI Is Making You An Illiterate Programmer
27:22
AI Is Making You An Illiterate Programmer
ThePrimeTime
339,748 views
Copyright © 2025. Made with ♥ in London by YTScribe.com