AI Agents, Clearly Explained

759.02k views1698 WordsCopy TextShare

Jeff Su

My AI Toolkit: https://academy.jeffsu.org/ai-toolkit?utm_source=youtube&utm_medium=video&utm_campaig...

Video Transcript:

AI. AI. AI.

You know, more agentic. Agentic capabilities. An AI agent.

Agents. Agentic workflows. Agents.

Agents. Agent. Agent.

Agent. Agent. Agentic.

All right. Most explanations of AI agents is either too technical or too basic. This video is meant for people like myself.

You have zero technical background, but you use AI tools regularly and you want to learn just enough about AI agents to see how it affects you. In this video, we'll follow a simple one, two, three learning path by building on concepts you already understand like chatbt and then moving on to AI workflows and then finally AI agents. All the while using examples you will actually encounter in real life.

And believe me when I tell you those intimidating terms you see everywhere like rag, rag, or react, they're a lot simpler than you think. Let's get started. Kicking things off at level one, large language models.

Popular AI chatbots like CHBT, Google Gemini, and Claude are applications built on top of large language models, LLMs, and they're fantastic at generating and editing text. Here's a simple visualization. You, the human, provides an input and the LLM produces an output based on its training data.

For example, if I were to ask Chachi BT to draft an email requesting a coffee chat, my prompt is the input and the resulting email that's way more polite than I would ever be in real life is the output. So far so good, right? Simple stuff.

But what if I asked Chachi BT when my next coffee chat is? Even without seeing the response, both you and I know Chachi PT is gonna fail because it doesn't know that information. It doesn't have access to my calendar.

This highlights two key traits of large language models. First, despite being trained on vast amounts of data, they have limited knowledge of proprietary information like our personal information or internal company data. Second, LLMs are passive.

They wait for our prompt and then respond. Right? Keep these two traits in mind moving forward.

Moving to level two, AI workflows. Let's build on our example. What if I, a human, told the LM, "Every time I ask about a personal event, perform a search query and fetch data from my Google calendar before providing a response.

" With this logic implemented, the next time I ask, "When is my coffee chat with Elon Husky? " I'll get the correct answer because the LLM will now first go into my Google calendar to find that information. But here's where it gets tricky.

What if my next follow-up question is, "What will the weather be like that day? " The LM will now fail at answering the query because the path we told the LM to follow is to always search my Google calendar, which does not have information about the weather. This is a fundamental trait of AI workflows.

They can only follow predefined paths set by humans. And if you want to get technical, this path is also called the control logic. Pushing my example further, what if I added more steps into the workflow by allowing the LM to access the weather via an API and then just for fun use a text to audio model to speak the answer.

The weather forecast for seeing Elon Husky is sunny with a chance of being a good boy. Here's the thing. No matter how many steps we add, this is still just an AI workflow.

Even if there were hundreds or thousands of steps, if a human is the decision maker, there is no AI agent involvement. Pro tip: retrieval augmented generation or rag is a fancy term that's thrown around a lot. In simple terms, rag is a process that helps AI models look things up before they answer, like accessing my calendar or the weather service.

Essentially, Rag is just a type of AI workflow. By the way, I have a free AI toolkit that cuts through the noise and helps you master essential AI tools and workflows. I'll leave a link to that down below.

Here's a real world example. Following Helena Louu's amazing tutorial, I created a simple AI workflow using make. com.

Here you can see that first I'm using Google Sheets to do something. Specifically, I'm compiling links to news articles in a Google sheet. And this is that Google sheet.

Second, I'm using Perplexity to summarize those news articles. Then using Claude and using a prompt that I wrote, I'm asking Claude to draft a LinkedIn and Instagram post. Finally, I can schedule this to run automatically every day at 8 a.

m. As you can see, this is an AI workflow because it follows a predefined path set by me. Step one, you do this.

Step two, you do this. Step three, you do this. And finally, remember to run daily at 8 am.

One last thing, if I test this workflow and I don't like the final output of the LinkedIn post, for example, as you can see right here, uh, it's not funny enough and I'm naturally hilarious, right? I'd have to manually go back and rewrite the prompt for Claude. Okay?

And this trial and error iteration is currently being done by me, a human. So keep that in mind moving forward. All right, level three, AI agents.

Continuing the make. com example, let's break down what I've been doing so far as the human decision maker. With the goal of creating social media posts based off of news articles, I need to do two things.

First, reason or think about the best approach. I need to first compile the news articles, then summarize them, then write the final posts. Second, take action using tools.

I need to find and link to those news articles in Google Sheets. Use Perplexity for real-time summarization and then claw for copyrightiting. So, and this is the most important sentence in this entire video.

The one massive change that has to happen in order for this AI workflow to become an AI agent is for me, the human decision maker, to be replaced by an LLM. In other words, the AI agent must reason. What's the most efficient way to compile these news articles?

Should I copy and paste each article into a word document? No, it's probably easier to compile links to those articles and then use another tool to fetch the data. Yes, that makes more sense.

The AI agent must act, aka do things via tools. Should I use Microsoft Word to compile links? No.

Inserting links directly into rows is way more efficient. What about Excel? M.

So the user has already connected their Google account with make. com. So Google Sheets is a better option.

Pro tip. Because of this, the most common configuration for AI agents is the react framework. All AI agents must reason and act.

So react. Sounds simple once we break it down, right? A third key trait of AI agents is their ability to iterate.

Remember when I had to manually rewrite the prompt to make the LinkedIn post funnier? I, the human, probably need to repeat this iterative process a few times to get something I'm happy with, right? An AI agent will be able to do the same thing autonomously.

In our example, the AI agent would autonomously add in another LM to critique its own output. Okay, I've drafted V1 of a LinkedIn post. How do I make sure it's good?

Oh, I know. I'll add another step where an LM will critique the post based on LinkedIn best practices. And let's repeat this until the best practices criteria are all met.

And after a few cycles of that, we have the final output. That was a hypothetical example. So let's move on to a real world AI agent example.

Andrew is a preeeminent figure in AI and he created this demo website that illustrates how an AI agent works. I'll link the full video down below, but when I search for a keyword like skier, enter the AI vision agent in the background is first reasoning what a skier looks like. A person on skis going really fast in snow, for example, right?

I'm not sure. And then it's acting by looking at clips in video footage, trying to identify what it thinks a skier is, indexing that clip, and then returning that clip to us. Although this might not feel impressive, remember that an AI agent did all that instead of a human reviewing the footage beforehand, manually identifying the skier, and adding tags like skier, mountain, ski, snow.

The programming is obviously a lot more technical and complicated than what we see in the front end, but that's the point of this demo, right? The average user like myself wants a simple app that just works without me having to understand what's going on in the back end. Speaking of examples, I'm also building my very own basic AI agent using Nan.

So, let me know in the comments what type of AI agent you'd like me to make a tutorial on next. To wrap up, here's a simplified visualization of the three levels we covered today. Level one, we provide an input and the LM responds with an output.

Easy. Level two, for AI workflows, we provide an input and tell the LM to follow a predefined path that may involve in retrieving information from external tools. The key trait here is that the human programs a path for LM to follow.

Level three, the AI agent receives a goal and the LM performs reasoning to determine how best to achieve the goal, takes action using tools to produce an interim result, observes that interim result, and decides whether iterations are required, and produces a final output that achieves the initial goal. The key trait here is that the LLM is a decision maker in the workflow. If you found this helpful, you might want to learn how to build a prompts database in Notion.

See you on the next video. In the meantime, have a great one.