With the Gemini API, you can build literally anything, whether it's a mobile app, a chatbot, a Chrome extension, or a full-stack web app. Anything that requires an AI-powered back end, you will learn how to build in this video. All right, let's start by going to the Gemini 2.
0 Quick Start, which is the official documentation from Google. Here we can see that there are only two main steps required to get an initial completion. Even if you've never coded anything before, you can absolutely follow along.
So, all right, we need to first install the dependencies for the package and then copy this syntax that Google gives us. So let's move on to the next step, which is project setup. I'm going to open Cursor and I'm going to open an empty folder.
Now, if you've never used Cursor, you might be thinking, "Oh my God, David, what is this? How do I use it? " Don't worry!
Calm down. You can use, you know, ChatGPT, Claude, or just inside of Vect, say, "Help me set up a new project in Cursor," and it will help you do just that. Now, once you open an empty folder, we need to create a file, so let's do a new file named `main.
py`, and in here we're going to be pasting the code. Actually, I should probably explain what I want to build. I want to create a Gemini-powered O3.
If you don't know what O3 is, it's the latest AI model from OpenAI that they announced a few days ago, and it's basically the closest thing we have to AGI. So normal LLMs work by the user putting in an input, right, which is called a prompt, and then you get an output from the model. Now, where reasoning models differ is that when you do the prompt, they go into a thinking section where they do a lot of silent reasoning.
So they actually burn a lot of tokens on these hidden reasoning layers that you don't see, and it takes, you know, a minute, maybe two minutes, sometimes even more. This results in a much higher quality output; basically, it's like giving a person more time to think and prepare before answering a tough question. So this is how normal LLMs like Llama 3, GPT-4, and Claude work, and this is how reasoning models such as O1, O3, or the new Gemini 2.
0 Flash Thinking work. Right? So the process is quite similar, but with the key difference that in the middle, we have the reasoning layers, which is what I'm going to build in this video.
We can mark this as completed. Now, a lot of you have been telling me, "David, I want to sign up for Vect, but I'm not sure what I'm getting. " So if you go to Vect.
ai on the login page, you see a quick two-minute demo explaining all of the core features. So again, if you want to try Vectal, which is the world's first AI-powered productivity app, make sure to check it out. It's linked below, or just go to Vect.
ai. All right, so we need to get an API key for Gemini as the next step. And actually, we cannot get it from here; we have to go to a different link, which is the AI Studio from Google.
I know Google is notorious for making it difficult to set up an API, but don't worry! If you pay attention and actually follow what I'm showing in this video, you'll be able to do it, no problem. Now, before we create the API key, we should probably finish the project setup inside of Cursor.
So the first thing we obviously need is to install the package. As always, it's good to follow the official documentation, right? So here we need to install the Google Generative AI package.
Let's copy this code snippet, go back to Cursor, and open the terminal. So, on the top, you can press "Terminal" and "New Terminal," or just press Command + J or Control + J if you're on Windows. Before you paste this in, you actually need to decide which environment you want to be using.
Now, if you've never done this, you might be confused like, "Oh my God, David, what are environments? " Just ask any chatbot, "Explain what Python environments are in simple terms. " At any point when you're feeling friction or uncertainty, just use AI; you know, ChatGPT, Claude, it doesn't matter—just use one of the AI tools.
Don't let yourself get overwhelmed; just keep making progress. So actually, I already have my own test environment created with `conda activate test`. If you don't have Conda and if you don't know how to install it, again just say, "Help me install Conda on macOS" or "Help me install Conda on Windows.
" That's why I think Vectal is truly revolutionary: because you can have your task list and things you need to do while you're using the world's best AI model, CLA 3. 5, along with the new one inside of it. So yeah, again, go to Vect.
ai, try it; you won't be disappointed. All right, so once you create a Conda environment, we need to paste in the package from Google. So let's copy this again and paste this in.
Now, for the first time you'll see a bunch of text. Now, actually you won't, because this parameter right here means "quiet. " Now, if we were to run the exact same command but without the quiet parameter, you would see all of these, like, you know, "requirement already satisfied" and all of this noise that might be confusing or overwhelming.
So that's just one tip to know with, you know, installing packages. Now that you have this, we can do the next step, which is making our first request. So just copy this code block from Google again; I'm going.
. . To link this below the video so you can easily access it and paste this in now.
There are multiple things that we still need to get done. In the bottom right corner, right underneath my webcam, you can select the Python version. Right, so in here you need to select the same environment you created with K; that way this isn't underlined.
Before, it was underlined with yellow; now it isn't, so that means you have the package installed. Second thing, as you can see, we don't have our API key in here, so we need to replace that. And the third thing is: who wants to be using Gemini 1.
5? There are much better models available. So over here in the Google documentation, if you look onto the left, you can see the model section.
Now, if you go to Gemini, you'll see all of the main models, right? But the 1. 5 models are already outdated.
So let's go to experimental models, and in here we can see the exciting stuff. This one has made a lot of waves: Gemini exp 1206. This is a beast!
But I want to show you this one, which is even newer—literally, it came out like 4 days ago: Gemini 2. 0 flash thinking experimental 1219. This has reasoning; it's basically Google's response to OpenAI's 01 and 03 models, right?
So this is what we're going to be working with in this video. Make sure to copy the name exactly; otherwise, it will not work and replace this outdated model with this new model. Command S.
Now, the last thing we need is the API key, so let's go to this AI Studio link, and here we can get API keys. Now you should absolutely treat API keys the same way you treat passwords; never share them with anybody. I'm going to delete mine before uploading this video.
So click on "Create API Key," then choose a Google Cloud project. If you don’t have any project, just create one; it's not that difficult. I'm going to select Vectal AI and create an API key in an existing project.
This takes like a few seconds. Now, one interesting thing is that all of the API costs are actually free for Christmas. It’s literally the best time ever to build an AI startup, so don’t take it for granted.
Pursue that idea you've had for a few months now and actually make it happen. I mean, look, I literally did it myself! So two months ago, Veal was just an idea; now it's a deployed product that you can go ahead and try yourself.
So it is possible, and I’ve never built an AI before, so if I can do it, you can do it. So paste in this API key right here, save the file, and now all that's left is to run this script. Boom!
We get some warnings; we can promptly ignore those. So let me stop this, and I'm actually going to change this. Instead of explaining how AI works, I'm going to say, "Explain how the world will change once AGI is achieved," and answer in short.
I don't want it to be generating like a 2,000-word essay; I just want a quick response to test out if everything works. And boom, there it is! It works.
So right now, we basically have done the first half: we've gotten a successful response. The second half is turning this into your own Gemini-powered O3 reasoning agent, and that's what we're going to be doing right now. So let's close this terminal and let's go back to Veal.
All right, so we've actually completed two tasks at once: we've gotten the API key and we've tested that it works. Now we need to implement token streaming. If you don't know what token streaming is, basically it's a way to not have to wait for the full response; you can just see the tokens being generated as they are.
Like if you use ChatGPT-CL, the tokens are being streamed, right? But if you see right here, let's run it again; we don't see anything until the whole response is fully written out, and boom, then we got the full thing. So let's implement it right now.
Google again has very useful documentation here; we can literally just copy this. Actually, you know what? Instead of doing copy-paste, let’s just take a screenshot.
I'll show you why. So go back to Cursor, and one of the main benefits of using Cursor is the Composer tab right here. In here, you can paste in screenshots, code files, anything you want, and most importantly, you can toggle the agent mode, which can create files, write terminal commands, analyze the entire code base, and do much more.
So here I can say, "I need you to implement token streaming into our main. py," and I'm going to tag the file. Make sure to follow the documentation from the screenshot exactly.
So now Cursor is going to implement these changes for us because if I just copied this code, I would get the wrong model again, I would reset our prompt, and I would not align it with the small changes that we made. So I'm going to give it to Cursor agent for it to implement it with the PR we have and with the model we have, only changing the things that need to be changed. Actually, let’s accept this file.
Now, if you're really serious about working with Cursor and getting the most out of it, let's say you want to build your own AI startup in the classroom of the new society. We have this section for advanced AI tutorials, and in there you will find this 56-minute guide for Cursor, which will take you from a complete beginner to someone who's better than 99% of C users. So make sure to.
. . Check that out if you're interested; the link is below the video.
Now, let's try if the token streaming actually works. Let's play this. Okay, now we're waiting and boom!
There it is. So it was much faster; the time between the first token was a lot lower because we didn't have to wait for the whole response. And then we see it being generated as it is; it's a much better user experience.
You can see whether the response is bad; maybe there you can add weight to stop it right there. There are many different benefits for why you might want to implement token streaming, but as you can see, it's not that difficult. It sounds complicated, but it's easy—as most things are when you know, the first time you do them.
And it's definitely true for AI, for building your startup, problem engineering, all of that stuff sounds super complex until you do it and you see like, "Okay, it's actually not that difficult. " All right, so now let's actually turn this into what we wanted to build: the Gemini-powered O3 agent. All right, so we need to let the user choose how long the agent thinks, and we need to actually implement a second agent.
So I'm going to just work with the Cur composer and tell him to do that. I want to add a second AI agent using the exact same syntax as our first; however, the prompt for the second agent will be influenced by the output of the first agent. Keep the prompts simple for now; AI is usually not the best at prompt engineering.
You should probably write the prompts yourself. And again, we have advanced trainings on this in the New Society eight-step workshop, but I just wanted it to copy the code and have a second agent right here. So, second response model: generate, continue story.
All right, so yeah, as you can see, this is not really useful. What I want—I should probably explain to Cursor what I want to build. I want to build a multi-agent AI reasoning system where the first AI agent asks the user for the topic as well as how many loops the team of agents should perform, and then the second agent executes the reasoning process based on these two inputs from the user.
Change main. py; this is a very good habit inside of Cursor—actually tagging the files. So don't just talk about what you want to do, especially once your codebase starts getting bigger.
Obviously right now we have only one file, so it's not that, you know, important. But if you have 50 files, it is hugely important to tag the files you want to be changed so that the Cursor agent knows what you're referencing and where it should do the work. Change main.
py to implement this. And okay, I'm going to give you more PRT engineering advice: take a deep breath and proceed like a senior developer would. It's a persona, and this last part is really the gold.
From spending hundreds of hours inside of Cursor, trust me, you really want to include this when it writes code. The fewer lines of code, the better. Okay, obviously if it was making a major refactor, you wouldn't include it.
But in 95% of changes, this is really good to include because it keeps it focused, it keeps it from adding unnecessary junk, and overengineering and feature bloat; this really keeps it concise, making sure we have as few lines of code as possible while achieving the desired outcome. It just makes the code less overwhelming and less confusing. So let's accept this and let's briefly look at what we have.
Right, so at the top we get the user inputs: topic and how many loops we should perform. Then we set up the prompt for the first agent: "Given the topic, formulate a deep philosophical question about that we should reason about. " Okay, so this definitely needs to change.
So, given the topic, formulate a strategy. It should be a multi-line string, by the way; let's make that a multi-line string: "Formulate a strategy plan for how to best answer this question. " And actually, we're going to use the first AI agent to kind of set the instructions and to basically direct the second AI agent to the right path.
So, given the topic of our topic, formulate a set of detailed instructions as if you were writing a detailed system prompt for an advanced AI model that encourages it to approach the topic from many different angles. Okay, we got two typos over here, so we can just say "Fix the typos. " Okay, I think that's good enough for now.
Obviously, if you want to build a real application that goes into production—that is, you know, deployed on the web—you should spend many, many hours on the system prompts. I mean, just for Vectal, the system prompts are like over 100 lines long, both for the chat agent and for the task creation agent. So, like anytime you create a new task, let's say "Install Cursor," it goes to a sorting agent that automatically sorts it based on your user preferences and based on other active tasks.
So you never have to think about sorting tasks again. So, obviously this is much shorter than the prompts I have in Vectal, but it's good enough to just get started. And obviously I want to be respectful of your guys' time because if I spend two hours writing prompts, I guarantee all of you would click off.
So all right, the first response is model that generates content with the setup prompt. Sure, stream: true. Yeah, we want it streamed.
Okay, so then this is basically for. . .
Printing the chunks of the token stream. Here we have our second agent, so current for equals questions. I mean, the variable names I would probably choose different ones, but you know, that's a detail for I in range loops.
Uh, basically, we do the for loop based on what the user has decided, you know. So if the user just says one or ten, it will change the amount of reasoning loops we have. Actually, maybe we can add one more AI agent at the end.
So I'm going to say let's add one more AI agent at the end that will summarize all of the loops from our second agent into a concise and coherent response because, you know, if you run 15 or 20 loops, that would be so much text, and nobody wants to read through that. So let's add a third AI agent that will kind of summarize all of it into a nice, clean, concise response. All right, so let's look at the prompt for our second guy.
In the meantime, based on this F current F provider deep inside, yeah, this prompt is kind of bad, so we need to change that. But instead of writing the prompt like we did last time, let's just highlight this, do Command K, and going to say rewrite this prompt so that it encourages the AI agent to answer it in the best way possible as if it was a 180 IQ Nobel Prize winner. So I want to keep it kind of general because you might use it for anything else.
And actually, if you want the code, all of the code will be in the new Society in the classroom in the templates and presets, just like with all of our other videos. So if you want the code or any of the prompts, just join the new Society and copy-paste it. Again, it's linked below the video.
So let's see if this is better based on the print fault. Yeah, I don't like this; it's confusing. The variable names are confusing.
The cursor agents—let's see based on—I'm just going to say based on these instructions answer the topic with the inter. Okay, so that's, yeah, blah blah blah. I think this is good.
This is good. Maybe perhaps even more important will be the last prompt of our third agent: final synthesis. Okay, so let's art slightly important: your task is to provide a clear, coherent synthesis of these insights in, I'm going to say, two concise paragraphs.
Boom, save, and final response. It prints the streaming tokens of the third AI agent. All right, so let's try it.
I mean, this is the last step on our list, so let's see if it works. We might run into some errors, but C help helps us solve them. So what topic should we reason about?
I'm going to say, how should one prepare himself for the creation of AGI? How many reasoning loops should be performed? Let's do seven.
This I think is a fundamental question that all of you should ask yourselves, and actually, I even wrote a pretty solid post, might I say so myself, on this in the new Society. So if you want to read this, again, join the new Society. But basically, you need to ask yourself, like, are you on the right track?
Where is your life headed? You know, what are your goals? Where are you going to be in a year, in two years, or even in six months?
Because, like, AI advancement is—guys, you understand, like, 03 just released, right? But that's not the biggest thing. The biggest thing that most people are missing is that 01 was announced three and a half months ago.
Three and a half months between 01 and 03. Now, yeah, they did skip a number for copyright reasons, for trademark reasons, but it doesn't matter. The jump in performance from 01 to 03 is insane.
Like, 03 is literally better than 99. 95% of programmers. It's literally superhuman performance.
All of that happened in three and a half months. So it's not a stretch to say that in three months we could be expecting 04, and then three months from now, 05. And what would that look like?
Who knows? You know, maybe only elite senior researchers at OpenAI know— or inop or X, whatever. But me and you— all of us certainly don't know.
But we have to prepare for this world because it's much closer than you think, and most of you, unfortunately, are not prepared at all, and you are doing things that will simply not matter in a year or two. I know it's a very uncomfortable topic, but my philosophy, at least, is that it's better to take in a small amount of discomfort right now than to face a large amount of discomfort, you know, in two years when maybe some jobs get replaced. So make sure to sit down for two or three hours and really plan out your next two or three years.
Like, where are you headed? What are you doing? What track are you on?
Are you spending enough time with these AI tools? Are you experimenting enough with the cutting-edge AI models that are coming out literally like every week? If not, then you need to make some changes.
But anyways, now that my rant is over, what's also over is our answer. So as you can see, so many tokens were burned. Literally, it was running for a solid, like, minute and a half, and here we have our final synthesis.
So this response will be much better if you just, you know, ask Gemini Advance for one prompt; you might get a pretty solid response. Same if ChatGPT, right? Or CLAE.
But if. . .
You build something like this, which is basically like a replica of '03. ' You will get a much higher quality response because our final AI agent summarizes the best insights from all of these seven loops. So, let's see: how should we prepare for AGI according to Gemini 2.
0? Identify the core argument. The central theme is that our intuitive linear understanding of time is a limited, microscopic approximation.
Okay, so this isn't really what I wanted, so let me make two small tweaks. I'm going to say in here: use simple and easy-to-understand language, writing in short sentences because this was, you know, noble low-rate, really like scientific, whatever. So now we need to simplify it.
But also, I'm just going to simplify these PRs. Let's just do "here is what the user said," and then I'm going to surround it in the limiters. This is a good practice because this is basically the same as putting stuff on separate lines.
For us humans, this is very visible because there's an empty line next to it, but for AI models, it's not that they see it visually; they see it in terms of tokens. So doing something like this is much better for an LLM; it sees it like, "Okay, this is something that should be separate," and basically, AI pays more attention to it. So this is a good way to do that.
Here's what the user said: "Your task is to [insert task here]. " This is the benefit of using a core; you just read your mind and press tab. So literally, if you're not using AI coding tools, you're falling behind.
So let's continue as if you were writing a detailed something. Okay, so yeah, maybe this is too much. Your task is to formulate a set of detailed instructions for this topic.
I'm going to say keep the instructions generic and nonspecific. Make sure to clearly and thoroughly describe how to best answer the user's question, encouraging the person you are writing these instructions for to approach the topic from many different angles. Yes, okay, so this is much better.
Then we need to simplify this. I'm going to say, "Here's what you need to do," and then I'm going to also put the limiters around this. Let’s delete this prompt written by AI.
Again, AI is not the best at prompt engineering; you're much better off just taking the prompt engineering training we have in the new society, going through it, and learning it yourself. It only takes about 20 hours to become decent at prompt engineering. You don’t have to change your whole career to be a prompt engineer; you just learn the basics.
The same as programming: if you put 20 hours into Python, you will no longer be intimidated by all these concepts, and you will unlock an entire new world of possibilities of things that you want and can build. So I absolutely encourage you to put in the first 20 hours. I don't care how uncomfortable it is; just do it.
So many people have done it; you don't have to be a genius to do it, and it's easier than ever with AI tools such as Cursor. So, okay, here's what you need to do: answer as if you were a 170 IQ person. And actually, what I should do really is go to the Google prompts and figure out how to set the temperature.
Here we go! So we need to set the temperature right here. Let’s take a screenshot of this again, and I'm going to instruct our Cursor agent following the syntax from the screenshot.
Implement these temperatures for our three AI agents: the first agent should have 0. 3, the second agent should have 1. 0 definitely, and the third agent should have 0.
1. If you don't know what temperature is, just ask any AI tool to explain what the concept of temperature means in the context of LLMs. Basically, temperature—the easiest way to explain it—is like creativity, more like randomness.
Right? Like, temperature zero is completely deterministic. I mean, Vectral will explain it better than me.
Temperature in an LLM is like a creativity dial; it controls how random or focused the AI responses are. Low temperature, near zero, makes the response more consistent. So if you look at what I just did, I set up the last agent to be super consistent.
I don't want the last agent to be inventing anything, but the second agent, the one that's doing the loops, should have a high temperature. So, high temperature (1) makes the model more creative and varied. That's what we want for the loops—we don't want the same response every loop; that would be pointless.
If I put zero right here, that would be completely pointless, and we would just be wasting time and money. We want the second agent to be creative, varied, and full of randomness, and then our third agent to summarize it. So let's accept this.
Usually, you want to look at what it's changing, but here it's relatively simple, so we don't have to worry too much. Maybe we'll run into errors, and I'll regret it; we'll see. So how should a person prepare for the creation of AGI?
Labs like OpenAI or Anthropic are very close. Let's assume we are one year away for the sake of this argument. All right, how many loops?
Let's do four loops for this time; seven was a bit long. Now, while this is being generated, one more thing I want to show you is that the chat box inside of Vectral also got improved. So let’s say a random sample text—before, it would just, you know, continue typing, and you wouldn't see the start of your response, but… Now it's a proper, you know, chat feature where there's a scroll.
You can easily go to the start. The UI of the chat has improved massively, and I also added explanatory tool tips because a lot of people are confused about how to use the app. So literally, every single day VLE is improving.
If you haven't tried it yet, go to Vect AI and give it a shot. All right, let's see—we are still being generated. Okay, last loop reasoning: Loop 4 out of four.
Then our summarization agent should give us a concise response. As you can see, these are using long, wrong words and stuff because we didn't say it the same way we did with our last agent; we said to use simple and easy-to-understand language. By the way, another PR that I love to use just makes working with agents much faster because you don't have to understand, you know, like multiple sentences with multiple commas and stuff like that.
It's much easier to look at a short sentence and understand it if it's using simple and easy-to-understand language. Anyways, we finally have the response. So let's see if this is better.
Final synthesis: the user wants a two-paragraph summary of the complex ideas about preparing. I need to distill the… wow, this is what I didn't realize—it's actually also inputting the reasoning tokens. This is one thing that 01 is not doing inside of CH GBD.
I mean, if you go to CH GBT right now, if you have the plus mode or the pro mode. I mean, the plus mode costs $20 a month, but the pro mode costs $200 a month. But even if you just have the plus mode for $20 and you have 01, you cannot see—like if you say, “Help me plan my life for the creation of AGI,” you will not be able to see the reasoning tokens, right?
It only shows as “thinking,” and then it says like a very generic “mapping the journey,” but these are not the tokens. This is like a summary—this is a very restricted summary of what it’s thinking. But here, U… maybe this is also restricted; who knows?
I mean, it's Google, after all. We all know what happened with the Google Gemini images, but it's nice that you can see how the model is thinking. The key themes are to focus on fundamental thinking skills, not specific tech; prioritize human strengths, and prepare for big societal changes.
Preparing for AGI in a year means focusing on how you think, not just what you know. Forget learning specific tech skills; instead, boost your ability to learn new things quickly. Absolutely.
So many people have this limiting belief that if you're out of college, you know, out of school, you don’t need to learn anymore. Especially the older people get, they have the belief, "Oh, I cannot learn new things. " That is so destructive—so destructive.
You need to get rid of that if you want to have any chance of thriving in the next few years. Practice thinking deeply about core ideas, science, and math; sharpen your skills in logic and problem-solving. Absolutely couldn’t agree more.
Be ready to change your mind and adapt to new information. Change is going to be happening daily, even hourly, so you need to have an open mind, but not so open that your brain falls out. Develop your empathy and creativity.
I mean, it’s kind of a woke response here: think deeply about ethics. Come on, bro, what is this? Understand that big changes are coming to society.
Yes, focus on being flexible, both physically and mentally, and ready for anything. This approach… So yeah, this is a very vague response. Now, obviously, I could do a follow-up saying, “Be more specific.
You know, here is my life, here’s my job, here’s my income, here are my age and skills, here is what I don't know," and give it more context because right now, my original prompt was very, very vague. I literally said, "How to prepare for AI creation? " So you will get a much better response if you actually describe your own life.
That said, I still think this is a pretty impressive result given the fact that we've built it in like, what, 30 minutes or something? So anyways, again, if you want all of the code, it’s going to be in the new society, as well as all the other workshops and training we have. With that being said, hope you guys found this video valuable.
Go to V AI, make sure to check it out, sign up, give it a shot, and I wish you a wonderful, productive week. Peace out!