in this video I'm going to show you the single AI automation skill that can help you generate an extra $200,000 in your AI agency in 2025 and it's building a rag system otherwise known as a retrieval augmented generation system these systems include large language models like chat GPT and cloud with an external knowledge base that is stored within a normal database the system takes in normal prompts that you would see in any llm but first processes your private database for Relevant expert knowledge and then it combines that with the llm either chat jpt or cloud
or something else and then finally takes all that information and sends it back to the user and this allows you to build out your own AI tools and provide answers to questions that are grounded in upto-date facts and domain specific knowledge that your company May possess building these systems will be the number one skill that you need in 2025 to stand out from the competition in this video I'll show you the big problem that you can solve for businesses and how to profit from it and I will also break down exactly how this works with
a live demo using an air table database and an AI database platform Pine cone. all right so first let's go ahead and talk about the big problem and how we can profit from it a lot of companies have specialized data that they want to use with AI so that they can build other tools for example I've got all these YouTube videos all these videos from all my courses all these different posts that I've made into my community and it would be great if I could take all of that content and repurpose it into other formats
or a company might just have a lot of technical documentation that they'd like to index that could be used with customer service and automated voice agents there is a lot lot of untapped value and profit that can be made here by unlocking that for companies but not many AI automation experts are proficient at building these slightly more complex systems like this rag system here so if you can learn how to build these systems it's a wide open field to tap into without much competition and it's also a great opportunity because the types of companies that
have this kind of data that they can use are already well funded and they can pay for True expertise now Consultants are helping companies build out these Solutions right now but they do have limitations which gives you the opportunity so let's discuss the more common solution and their limitations so one way that people are trying to solve this problem right now is that they simply take a existing knowledge base maybe it's in a PDF or a txt file and they use the contents of that knowledge base in their prompts directly when they are making out
their automations so I'm using Cloud here I could add a message we'll create a prompt we'll use text I could do something like use the knowledge base below to write an article about X Y and Z and then of course either by cutting and pasting all of the contents of your knowledge base directly into the automation or maybe you load the information dynamically and pass in a variable either way this can work up to a certain size but once your knowledge base starts to grow like this example here once you have hundreds of videos and
posts and school post and School courses and modules you're not going to be able to cut cut and paste or embed that much content into these modules every single time it's just far too much data for cloud to process at any given time so you can directly insert your knowledge base as a PDF but it's just not always practical or efficient now you do have gpts and assistants which are good in some cases but I even have a GPT for the no code Architects toolkit API but gpts can't be used in automations directly so gpts
only work well when this particular type of interface where your users can and just talk with it it's only in this situation where it works well if you want to integrate your knowledge database into your automations gpts just aren't going to work now GPT assistants work with automations but in my experience they don't work as well as gpts you can build out your own custom GPT assistants in the open AI playground you can upload your own knowledge base you can Define how that assistant should behave using the system instructions but in my experience these assistants
just don't work as well as gpts and with gpts or assistants you can't really control how the is structured and that will limit the performance and flexibility of your solution so when I upload data here into the assistant I can't really control what type of database is being used here or how everything is indexed so I really have to rely on however they decided to do it now again it works okay and it works even better inside of a GPT here you can see I have a knowledge base where I've uploaded information about the no
code Architects toolkit but again I'm limited on how this is indexed into the system and how to actually use it so the secret here is learning how to build out your own rag systems how do you use your own database and various apis to connect to other data sources and sync it back into the database where you can sync it with a vector database like pine cone and then you can use that data with questions that come into your automated tool where we can retrieve our own expert Knowledge from our own database work with chat
GPT or Cloud to get the data in the way we want to respond to the user and then finally respond back to the user now one key thing that I want to articulate about mastering these systems and how to profit from them the true skill in 2025 is going to come from those who can manage data who can connect to various API endpoints successfully take that data in a consistent manner have their own database that they can query against and then use that to build smarter AI Solutions and so while other AI automation companies focus
on flashy new AI tools you're going to focus in on how to manage data so that you can really set yourself apart with a high income skill and integrating these private knowledge bases with AI tools that companies are using for themselves internally on a day-to-day basis all right so now let's break down this diagram here which breaks down a rag system that I have built out and the tools that I use to build this were air table make and a database platform called pine cone and I'm going to show you all the important features of
this entire system but I'm going to start on this side of the system starting with our aot table database and then talking about data collectors and how we get that data from our a table database into a vector database and again pine cone is this Vector database and I'll explain a bit more about that here and why that's cool and then we'll talk about how do we actually interface with that Vector database and an llm like chat GPT or Cloud to help us produce highly refined answers from questions that come into our system and so
again to make this real what I have here is a database that has been populated with all of my YouTube content and all of the course material that I have in my school Community plus I've also indexed all of the actual written post that I've made into my community as well now how I actually synced all of this data including the URLs and the thumbnails and the titles descriptions and the transcriptions either in DOC format where if we open this up we can see the transcription or written directly to a database field like this the
process for syncing all of this data and keeping it organized is done here on this side of our diagram and so we'll come back and cover that in a minute what we're going to talk about first is how do we get all of that data in our rational database or our air table database where we can see all of these columns and rows and data how do we actually sync that with a vector database Pine cone. which will allow us to integrate more seamlessly with our llms now the most important process for syncing our data
in air table with pine cone. is this here the data collector now you're going to see here I have data collectors scattered throughout this diagram and I'll explain them in more detail but at the core a data collector is responsible for syncing data between two different sources and also making sure that that data is in good condition we have to be able to trust that the data that we have here is the same data we have here we want to make sure that when new rows come into our a table database that they show up
in our Vector database we want to make sure that when there are updates to our error table database those are also updated in our Vector database we want to make sure that the data collector is also able to find duplicates and remove those if need be the data collector also needs to make sure that there are no data gaps that we aren't missing information in our Vector database that we find in our a table database we also want to make sure that the process of getting this data from here to there is also cost efficient
if you want this Vector database to be updated on a regular basis and this data is changing all the time it's important to think about the efficiency of this process so that you're not spending a lot of money just to maintain the database and we also want it to be fault tolerant so we don't want something to happen here or in this process that makes it so that these two systems here don't have the same data so the data collector is something that you build and it's going to be different for every situation there is
not one single format that you will use in any situation but it's more a concept so that when you go to build out this integration that you make sure that you satisfy all of these different criteria so that you can actually trust your solution so if we look at this data collector here that syncs our air table database with our Vector database we can take a look at this make automation here now again I want to point out that you really only need two simple automations here to keep these two items in sync and actually
run this entire process this part of the system is actually the easiest part to build out even though something like a vector database and pine cone might be a New Concept putting this together is actually pretty straightforward it's actually keeping all of the data in sync and relevant and updated in air table that is the hard part and we'll talk about that in a bit so now let's take a look at the data connector that helps us Connect Air table and the vector database so again we have our air table database it's full of YouTube
videos and School courses taking a look here at the source we can also have school pages and also Loom trainings you could really have anything that you want here you just have to add it and then develop the necessary data connector to connect up the API that you want to pull data from and then we have this automation here which is searching records from a specific view on air table so we can see that this search is always looking at this view in air table sync pine cone and we have this view right here and
basically the way this filter is constructed is that only new and modified rows will show up in this particular View and then when those records come into view it will trigger this Auto which ultimately syncs it into pine cone and so coming back to our definition of what is a data collector in this specific scenario where we are syncing data from Air table to the vector database we are controlling the new record sync and the updates to various records using this particular View and the filter to power it and so the technical design choice of
this View and filter allows us to satisfy these first two elements of the data connector and now I'll continue on here so the first thing that happens is of course we grab the rows and new data that we want to sync in this view here and then it's going to Loop through all of those rows here and then the next thing that happens in this particular automation is that we download the file of the transcription document and I'll jump back to a table for every one of these YouTube videos you can see that we have
a transcription document so this automation is downloading that document and then it's parsing that into Data that we can use within the automation so you can see here we had 12 operations that means there were 12 rows that we pulled into this particular search and now we have the data here and we have that plain text transcription that we can use and ultimately store in our pine cone database for retrieval later when we are processing our own requests into our own AI tool now these modules here are just helping us clean up some data for
these steps that are coming up here and then this module here is just helping us break up that transcription text again we'll open this up again and we'll take a look at the data we have all of this transcription text but sometimes you need to break it up into smaller chunks for databases like pine cone so this module here is actually using the no code Architects toolkit it's using this small bit of code to automatically Chunk Up the data into smaller bits so you can see we're sending in the transcript right here but if we
look at the output for each of these operations and we look at the data here in the response there's an array and each one of these elements here is just a chunk of the large larger transcript the transcript that we have here was broken up into five smaller parts it's also worth noting that each one of these chunks slightly overlaps with the previous so right here in section five you can see it starts with your part of the no code architect which is and then you can see here in the previous chunk you can also
see that same bit of text so there's a small amount of overlap that allows the pine cone database to tie these together down the line so that's how you actually connect different segments together even though you've broken them up into smaller chunks and then this part of the automation simply Loops through all of the chunks that come out of this module here and the first thing that we need to do is actually take that text that is coming from each of the individual chunks that we've broken down from the larger transcript and then for each
one of those chunks we need to create something called an embedding now what that is is it's a numeric representation of the text data so you can see here we're using chat GPT and again this is a API call and we're calling the API end point embeddings and we're sending the text value to it and what it responds with is a series of numbers continue to open this up here's the collection here here are the embeddings it provides us these numbers which are a numeric representation of the original Text data that we sent it and
I can't say I know exactly how this works but these numbers are used to quickly find relevant data when you're trying to search your big database for something specific so for every one of those chunks that we are are looping through right here we created one of those embeddings and then we're checking the pine cone database to see if that Vector already exists in this case we're building the vector using the URL because that's a unique value again remember if we jump back to our data collector we need to ensure that there are no duplicates
and we do that by using the URL as a key that way we know that if there is a duplicate if we have the same URL then we can find that duplicate and remove it and so this is helping us look to see if there is already a vector so that we know whether we need to create a new vector or update an existing one and so if we take a look inside of our pine cone database we can see here that we already have some records and you can see that there is some text
here which is a portion or a chunk of a transcript for a particular YouTube video in this case the YouTube video that you can see in the popup so you can see every one of these entries in the pine cone database has these vectors but it's these values here that we use to find the data not the text itself we use the text here in the pine cone database to power the queries we use this text and we send it this text to actually generate the responses but we use these Vector values to actually find
the data in this pine cone database so that we can send the proper data to the llms so these two modules here keep the air table database in sync and we're doing a little bit of cleanup here just to maintain the air table database but there's nothing there critical to this overall process so now that shows us how we're getting data into the vector database this data connector is making sure that all the new records all the updated records no duplicates there are no data gaps it's cost efficient and it's also F tolerant so this
data connector is trying to ensure all of those things are true so that when we are using this Vector database in this solution here we have data that we can trust and I didn't specifically cover this yet but the way this data connector is helping with keeping things cost efficient is that we used all of the air table automations as much as possible to process data that's coming back from the API rather than trying to use make automations which obviously can ramp up if you're processing a lot of data so when we're processing all of
this data on this side of the process we use the air table automations to process as much of this data sync as we can to avoid racking up a ton of automations here and make which are a lot more expensive so now moving along to this part of the process where we are actually taking in a question or a request from user querying the vector database for Relevant expert information that is relevant to the question and then we send it to the llm like chat GPT or Cloud for the final revised answer which then is
then passed back to the user so to represent this entire section here I have this Automation and it's quite simple this here is just representing a question from a user so this is just used for testing but it could easily be connected up into something like slack where you could ask an assistant in slack a question that would be like inserting the question from slack directly into this module here and then when we finally got the answer in the back end and we could respond back to the user but now to cover the guts of
this automation which again I like to point out is fairly simple given all of the Power here but the way this automation works is in this section here we create a prompt to create Search terms for our pine cone database so again remember we need to search our pine cone database for relevant information with regard to the query that's coming in the beginning of the Automation and in this scenario here the question is I want to create an article on how to start an AI automation School community so the first thing that we need to
do is we need to ask chat GPT hey can you go ahead and create us some Search terms that we can use in Pine Cone that will give us the data that we want so that we can then send it to Chad GPT to actually write the article or Claud in this case I'm using chat GPT and Claud just to see how the output differs so here we are coming up with the Search terms so if I open this up and look at the result the Search terms are AI automation community building starting AI school
starting AI School creating online AI Community AI automation education platforms so again these are the Search terms that chat GPT came up with to help us query our own database based off of the topic we want to write about now in this next module here we are creating embeddings again again we're making an API call because there isn't a built-in module just like we built embeddings for what we store in the database so we sent it this text which created these these Vector number values that we can use to search the database to actually search
the database we also need these vectors so coming back to this module here we are taking the Search terms and you can see we're putting them right here and it's going to Output a set of values here in the body open up the data and here are those embeddings so here we have those numbers again except this time these numbers are going to be used to search as opposed to store then we simply go to the pine cone database and we pass in those vectors so we're passing those vectors from this step here and this
is our search term so we search the pine cone database with these values in a similar way that we use these values to store the data so even without understanding how vectors work you can understand that we are saving and searching the pine cone database for the data that we actually want which is this transcription but we're doing it from these embeddings which are these Vector numbers that allow us to search and store our data in a way that we can quickly pull up the most relevant information for any given so in this situation since
we are searching for how to start an AI automation school it's going to search all of our values here for something that's relevant in the content and only return these rows and so now you can see from our massive knowledge base now we only have the rows and the data we need to actually write the article and nothing more so instead of trying to send our entire database to chat GPT to write our article which is actually impossible you couldn't really do it we're just getting the most relevant information from Pine cone and then sending
that into cat GPT or Cloud to actually write the article and so you can see here if we open this up we've got quite a few different responses that relate to the query that we had and if we open this up here to the metadata we can see we have the content and there's the chunk so we use the vector numbers to search and store but It ultimately returns back the relevant text that we can group together and finally pass into chat GPT to actually write that article and so here we're generating that article with
CH GPT and here we're generating it from claw just to see the difference and then ultimately we can take a look here and we can see that there's an article that's actually written from our knowledge base this isn't taken from just the internet it's being taken from actual expert knowledge that exists either on my YouTube channel from my post in school or even the classroom and all of the videos that I post here and so again if we look at this system here you could easily have people asking questions on a plat form like slack
here was an example that I had set up before you've got a channel here and then when somebody asks a question it will actually prompt the assistant which will trigger this automation here we create the Search terms we create the embeddings we do the actual search bring all of that data together for a final query to one of our favorite llms and then the output of this process here whether it's Chad GPT or cloud or both you could finally map those back into a slack response just like you can see from this no code Architects
assistant that is responding to me and my question with this specific answer so hopefully you can see here that this entire section here while there might be a few new things to learn it's really two simple automations right here that take care of this entire part of the rag system and just as a recap this automation here is the data connector for this part of the system and then this automation here is what allows us to take that data from the vector database send it to a llm like chat GPT or Cloud for final processing
and then returning it back to the user and again the hardest part of this entire process and also in this process which I will cover shortly is this data connector it is this logical understanding and system that allows you to isolate and keep the data between one source and another source intact and trusted all right so now that we covered the section of the rag system that deals with the databases and working with the vector database let's talk about sometimes the more complicated issue which is going out to all the various systems where our content
might be housed and developing the data connectors that allow us to sync it back to air table and again the same way that we want to maintain the Integrity of our data from Air table into the vector database we also want to do the same with our external data sources we want to make sure that we grab all of our YouTube data we want the new records and the updates and we want to make sure there are no duplicates and we want to make make sure there are no gaps in the data we don't want
to be missing a YouTube video and we want this to be cost-efficient if there's a scenario where we are syncing data quite often like if there's a situation where it needs to sync every minute or 5 minutes it needs to be efficient otherwise we're going to pay a lot of money just to sync the data back and forth and we want this entire system to be coste efficient so the automations and The Way We sync the data with our data connector needs to be efficient and again it also needs to be fault tolerant so if
there is an error our database doesn't get corrupt so when it comes to getting data into our database there's really four main ways that that's going to happen we're going to reach out to an official API we're going to develop our data connector which again is not necessarily always the same thing in every situation it's more a concept where you develop the automations or the code or whatever it needs to be so that this is true you want to isolate all the logic into an object that can work with the API to extract the data
and keep things in sync but again there are a few different ways that we tend to interface with the data collector official apis which is usually going to be the easiest because it's an official API and it's designed to connect to a specific service and so building a data connector that can help you do this is usually pretty straightforward and then we have hidden apis which are still apis they're just not publicly documented so for instance school has an API there is a way where you can download all of the course content you can get
access to those Loom videos and then from the Looms you can get the transcri descriptions so here you can see I have some of my school courses and the transcript from that video so it's possible to do this but it's not documented so you have to figure out how to do it and then often because it's not documented the data collector might be a little bit more robust so that you can handle the situations that you might not have to deal with if you're working with an normal API so there's an official API the hidden
API and then there's also situations where you simply have to scrape the data this is an example where you might use a tool called axium which can lock into a browser and extract things from a browser when there's not even an API or even hidden API it just goes to the page it looks at it and it just simply grabs text and so in a situation like that there's some website here www and some data collector that's helping you make sure that that process is done properly and in a lot of cases when you are
scraping data the data collector has to be even more robust to handle all of the different situations for instance if you build an automation that scrapes data from a website if they change the location of some text from here to over here this might cause your automation to break and then in that connector you would have to come up with some way to recover from that issue and again it'll be a case-by case basis but you'll also have to make sure you can get new records updated records you can make sure to remove duplicates there
are no data gaps it's cost efficient and it's fault tolerant so obviously working with a direct API or even a hidden API is much better than working with a automation that uses a scraping method and then of course there's the good old manual process where somebody comes in and they just put in the data into the database directly so now obviously there are major drawbacks to these two situations so as best you can make sure you use normal apis or hidden apis finding hidden apis is easier than you think you simply need to use the
developer tools that are in your browser and with these developer tools you can monitor the network and the connections back and forth between your computer and the remote server and then based off of that research you can actually find the apis that the website is using to communicate with your browser and then you can exploit it and then use it for your data collector now if you're enjoying this content make sure to like And subscribe to the channel it tells me what type of content you want more of so now let's talk about this process
in more detail where we are grabbing all of the YouTube videos from my YouTube channel we're using appify as the API and we're using make to build our data collector and to ensure that we adhere to all of these different standards and that ultimately comes back and syncs to our a table database so the automation that actually Powers this is very simple we have this one automation here which triggers a API call to appify we are using one of their actors here that triggers the Automation and then when it's done it triggers this Automation and
if we look back to our requirements here the way we are tracking new and updated videos from YouTube in our collector is simply to always get all of the new videos in this case because there are only 160 videos or however many there are getting every single video every single time is not an issue so that will obviously include all of the new and updated videos so this process here will respond with all of the videos then the rest of this automation works by downloading the data that we received from this module here and we
do that so that we can process as much of the data as possible inside of air table automations so by downloading all of the data that came from appify into a file that is stored in air table just like this instead of processing all of this data which would be hundreds of operations in make we can do it with just one automation inside air table and so that's how we are able to keep this cost efficient we find duplicates by using the URL and making sure that there is only one unique YouTube URL and so
we always look to see if that URL is here before we add a new row if it is then we just update it otherwise we add the new row and then we take care of the node gaps in the data by just always requesting all of the videos in other scenarios you would need to modify that approach you might need to use some sort of Tim stamp like last updated but depending on the situation you'll have to develop a data connector strategy that helps you make sure that all of these things are true and the
reason why I really point this out and really isolate it to these situations is that it helps you logically separate your system from these very specific data connectors and and it gives you a list of things that you need to think through when you're connecting to these other apis such that your data is going to be in good shape so then once we get all of that data from appify again it could be hundreds of rows instead of trying to Loop through the hundreds of rows here we spend three operations we drop the file here
with all the data and then on our aor table automations we can simply process that data here and only spend one air table automation now if you want access to this diagram the air table database and the four make automations that allow you to save data to your pine cone database retrieving and processing data from your pine cone database and the automations that help you sync data with appify and YouTube and your air table database make sure to jump into the no code Architects Community it's an Engaged group you can ask any question and get
text support on the calendar there's calls almost every single day you can get access to a make an air table course and a bunch of other cool automations that you can build and a whole lot more I hope to see you there but either way I hope you enjoyed this video what it can do for your business in 2025 and I'll see you on the next one