Scrape Any Website for FREE Using DeepSeek & Crawl4AI

11.43k views5683 WordsCopy TextShare
aiwithbrandon
🤖 Download the full source code here: https://brandonhancock.io/deepseek-scraper Don’t forget to ...
Video Transcript:
hey guys in today's video I'm going to show you how you can scrape anything completely for free using deep seek Gro and crawl for AI and after talking to dozens of AI developers and businesses I can confidently say that web scraping is one of the most in demand skills that companies are hiring for so you definitely don't want to miss out on everything that we're talking about in today's video so to help you guys Master web scraping we're going to go off and build an AI web scraper step by step together to go off and
scrape a website just like this to go of and capture leads save them to a file so that we can eventually follow up with these potential opportunities at a later date and because you guys are awesome I'm going to be giving away all the source code in this video completely for free link down the description below but hey enough talking let's go ahead and dive in to start off web scraping oh and real quick if you're looking for support on your AI projects and to meet like-minded AI developers you'll definitely want to check out the
free AI developer School I've created for you guys we have over 4,000 members weekly free coaching calls and we'd love for you to be a part of the community but enough of that let's go ahead and hop back to the video all right guys so let's quickly cover the three tools we need to go off and build an AI web scraper and this is just to help us set up a good foundation before we hop over to code in the next section where you get to see all of this in action so let's hop into
tool number one which is crawl 4 AI this is an open- Source library that makes it super easy for us to go off and scrape websites and it's awesome because not only can It scrape the website but it's set up in a way to where it actually can tag that content that it scraped and pass it over to an llm to where it can go off and do anything so in our case you're going to see how we're going to scrape a website and actually pull out leads from the website very excited to show you
guys this in action and they have a ton of examples that make it super easy to understand their tool but don't worry I'm going to break it down all the important parts that you need to know over in the code section in just a minute next when it comes to Tool number two it's going to be deep seek now deep seek has taken over the Internet recently so I'm super excited to show you deep seek specifically the reasoning model deep seek R1 in action few things that are super important to note it's just as smart
as open ai's 01 model and it is insanely fast and it's insanely cheap to run it's about 20 times cheaper to run super excited to show you guys this tool in action in just a second and then finally we now have our third tool which is going to be grock so what grock does is they have a ton of AI chips that are specialized in running AI models so think of your llama 3 think of deep seek so now that we have grock they also have an amazing free tier which is going to allow us
to not only just run deep seek which is a huge model but we can actually run it for free and super fast as you can see in this example I was like hey please explain deep seek R1 to me like I'm five and tell me why you know everyone's going crazy about it it's super cool because you can see in action this is one of the Specialties of deep seek R1 it has this thinking thought process in almost like a humanlike fashion of like oh by the way I need to do this oh I also
need to include this so not only thinks through its answer it then gives you the answer so you can see you know grock runs this and it runs it insanely fast you can see we were doing like 275 tokens per second and this took under 2 seconds to generate all of this information so Gro is going to be our hero here when it comes to running deep seek completely for free so now that we got that out of the way let's go ahead and actually hop over to a quick code overview of exactly what we're
building and we're going to go ahead and start coding out the entire project together step by step all right let's go ahead and hop to it all right guys so it's about time to hop into the code but I just wanted to give you an example scenario so this would all make sense before we just start coding it up so in this case we are going to pretend that we were just hiring by this wedding photographer he just moved into town and what he's trying to do is get more clients for his business because he
just wants to you know grow his business how does he want to do this well he wants to go off and just contact more wedding venues to say hey I offer these services and I would be happy to help so what he needs from us is to go off and grab wedding venue information from common wedding venue websites so in our case what that means is we're going to say yes I will happily go off and build you a scraper for a price obviously but here's exactly we're going to do we're going to go off
to whatever area they moved into and we're going to go over to a common wedding venue site and what we're going to do is build a web scraper that is going to scrape all the different wedding leads inside of this area and we're just going to continually go through Page after page after page scraping all this information and we're going to continue scraping until there's no more wedding venues found and we have an awesome Excel sheet ready that we can go ahead and send over to them and basically our end result is to go off
and generate something that looks just like this so that the photographer knows the name of the venue V where it's at how much it costs and we're also going to use a little bit of AI in here to go off and generate a on sentence description of the venue so that when he's calling the venue he knows a little bit exactly about what they do the space so when he's calling he can have an intelligent conversation so that's exactly what we're going to be building in the code we're going to scrape we're going to save
everything to SCV and eventually we're going to upload it to a Google sheet so that we can hand it off to our customer and get paid so that's exactly what we're about to do so let's go ahead now and hop over to the code all right guys so now it's time for us to actually start getting our hands dirty and start coding out our AI web scraper using our Three core tools which were deep seek Gro and crawl 4 AI That's the core tools we're going to be using here in this project and I've tried
to set up this project to be as simple as possible for you guys to just copy and paste and it will just run out of the box and I've also tried to set it up in a way to where you can easily just tweak a few configuration uh settings and then instantly go off and scrape any other website so I'm excited for you guys to see how easy this is to actually run and before we dive into code though I just wanted to show you how you can set this up on your own so that
you can get it running so first things first you just need to head down into the install instructions and all we're trying to do is go off and first create a new environment that's going to set up and have all the dependencies we need to run this project so we're going to use cond for that once we have cond installed we're then going to go ahead and activate our environment then we're going to go off and install all the necessary dependencies in our case there's really only one major one which is crawl 4 AI that's
going to be how we're going to scrape everything and then from there we're just going to go ahead and and once we've installed everything the only other piece of information you need to do is head over to your grock dashboard and go ahead and add in your API key to the environment file so that's all you need to set up this project and then it will actually just go ahead and run out the gate and it's so simple to run just python. main Okay cool so now we've like overviewed let's go ahead and actually start
looking at a quick example of what a crawler looks like and then we're going to expand on that that simple example and go ahead and look at a complete example so let's go ahead and hop over to a quick example real fast all right guys so here's a quick overview of a simple crawler just so you can understand all the core components before we go off and actually start scraping an entire wedding venu so what are we trying to do well inside of crawl 4 AI there's a few core components that you need to know
first browser config browser config is how we actually set up what browser do we want to pop up so specifically a browser config you get to pick I want to open up a Chrome browser I want the size of the browser to be this big the window I want set it to be headless to false so headless false means I actually want to watch what the browser is doing I kind of like doing that cuz I like to see what pages we're actually scraping but if you just want it to run in the background you'll
just say headless true and you get to set a bunch of other additional information so this is pretty much browser config and a nutshell and then the other thing that we need to do is set up our crawler run config so as you can see over here we have browser config and then crawler config so once you set up the browser you now get to pick the crawl which is like what do you actually want to happen so this is where we we get to do a ton of you know either simple or complex configuration
setups so inside of here you can actually pick a few of the core things are as I'm going off and crawling a website because this is made for llms you get to pick a bunch of things so I want to extract information using a deep seek model and I want to pull out this information you'll see this in action in a minute I just want to you know go ahead and pl a few seeds and then you get all sorts of options when it comes to actually like loading the page for example if your page
is pretty complex and it doesn't actually render information for a second you can pass in some JavaScript code and say hey please don't scrape the website until this JavaScript element loads um or you can say I would like to take a screenshot there's all sorts of awesome things you get to do in here and play with parameters but I just want to go ahead and show you like what's possible and you'll see in the code in a second I actually have links to each one of these so as we're coding you're going to have a
link to come back and view the documentation and go oh yeah that is all the different options I get to use so excited for you guys to see that in section now what do we do next well now that we have a browser that's going to pop up and we've specified our run configuration we actually get to go off and kick off a crawler so in our case that means we're going to say all right I would like to go off and scrape a website I want you to open up this browser and here's what
I want you to do I want you to go off and actually scrape this URL and as you're scraping that URL please use this run configuration right here so and what this would do is it would actually go off and scrape the entire website and actually it would just give you the results of that website in a nice markdown file that you could go ahead and use llms love markdown so this is a really nice way if you wanted to you know save those results and pass it over to another llm to do something later
on so that is crawl 4 AI in a nutshell now that you have that out of the way let's go ahead and actually go ahead and hop over to our code where you're going to see how we can actually take this example and expand upon it and go off and scrape an entire wedding venue get all the information we need and go ahead and save it off to a really nice Excel sheet and see how we're going to add in deep seek R1 so let's go ahead and hop over to R code all right guys
so now it's time for the fun part where we're going to go ahead and start looking through the code together line by line so that we can understand exactly how we're going to tie together deep seat Gro and crawl for AI to help us go off and scrape leads for all these different wedding venues all right so let's go ahead and dive in and start using everything we've learned so far so so far in our code there's only really one main function that we're trying to do which is go off and crawl all the venues
so under the hood here's what's going on in this at a high level and also let's tie it over to what our ultimate goal is so ultimately we are trying to go off and crawl this entire wedding venue site and grab all the leads for a city that's all we're trying to do so what do we want to do well at a high level and we're going to dive into this part by part we want to go off and continually scrape each page and keep scraping until we run out of pages to scrape so right
now we're on page one and eventually once we go over to page 10 what's going to happen is we're going to go off and go oops no more results found so goal keep scraping till no more scraping can be done all right so that's all we're trying to do now what do we need to dive in and actually set up to make this happen well at a high level we need to go off and start setting up our crawler to go off and start scrape each one of those pages so let's walk through everything real
quick so first things first as you remember in our quick example we have to set up our browser config so we're going to say hey I would like to go off and set up a browser like a Chrome browser to pop up a window and scrape the page and I want to see what's going on I want to see all the logs as we're scraping each page okay makes sense now we're going to dive into something new that you haven't seen before but I do want to explain what what it means so we need to
set up our llm strategy and all this means is how do we want to scrape and use the llm to transform the scraped information into valuable information well in our case all we're trying to do is we want to go off and scrape each one of these wedding venues so in our case we want to scrape a wedding venue model we want to get another model and another one and another one and keep scraping each one of these wedding venues until there's nothing else to scrape so the keyw I was saying there was venue model
so you can actually see we've created a model inside of our code base so in the models folder we have a venue file and all we're trying to do is for each venue I want the name location and other information that I need in order to give it over to the photographer all right so okay now we've talked about what information we're trying to get but what the heck does this llm extraction strategy even mean so here's how I like to think about it as we're going off and scraping a website we're going to get
back a ton of raw data and we need to set up an llm to say hey look at all this data I just gave you and do something with it in our case we're going to say llm I want you to extract all the wedding venues and outside of this you know raw data and I want you to get these different pieces of information the name location price all the raw information and I want you to generate a one- sentence description and what you'll notice is this information directly ties over to our venue model so
it's like a onetoone relationship we have to give our LM what it needs to do and the final model that we want to generate okay so hopefully that makes sense on our instructions and our schema now we get to pick well what model what llm model do we actually want to do all this processing for us and this is where Gro and deep seek come in so in our case we're going to say I want to use Gro and specifically I want to use the Deep seek model on Gro to go off and perform these
instructions and convert everything over to this nice venue schema so hopefully that makes sense and just to give you a quick tip you actually could change this if you want to do another free model you could use olama if you have a powerful computer and run llama 3.1 on your local computer or if you do like some of the more paid models you could easily change it over to GPT 40 40 mini whatever you want to do or 03 mini that just came out very recently so hopefully that makes sense all right so let's quick
recap we're going to set up our browser to go scrape everything as we're scraping we're going to have our llm we're going to tell it exactly what information it needs to scrape and pull out and make all these nice venue objects now we get to dive into the fun part where we're actually going to go okay cool I now have all the Basse set up now it's time to go off and start scraping so let's walk through this well in our case what we're going to do is create one crawler so that's going to be
you know the crawler that's going to go through and actually scrape all the different web pages and we're going to say I want you to open up the Chrome browser so that's just like kind of we're just setting things up to go off and scrape and once we have our crawler created now we get to dive into this core Fetch and process page function and all it's trying to do is find wedding venues for us and let us know if it didn't find any more results that's all it's going to do so let's go ahead
and dive in and see how all this ties together and I've tried to add in as many comments in here as possible so you guys can understand exactly what data is moving around inside of the code so as you're off on your own building your own web scraper you can just copy this code and tweak it all right so let's look at this line by line so first what are we trying to do well we're scraping specifically we're trying to scrape a specific URL so a URL is nothing more than a base URL and then
the page number we want to go off and scrape so in our case we actually have a nice config file set up with some of the core pieces of information so we're just going to say hey I want to scrape this base URL you could tweak it to whatever URL you want to scrape as well so we're going to scrape the wedding venues and we're going to start page number one then here's what we're going to do first we're going to go off and actually check we're going to scrape the page and check to see
if there was no results so let me show you this so we're going to pass in the crawler and the URL and we're going to say hey please go off crawler and scrape this page and on this URL and as you're scraping the page I just want you to don't cash anything just go off and scrape this page and all we're going to do is get back to see if anywhere on the page the word no result was found so scrape the page look at the results if everything came back successfully check to see if
no result was found because if it was that means we're on like a late page so page equals 10 so that means we're on like super late page and there's nothing else to scrape hopefully that makes sense so now we're going to go all right if we didn't find results we're going to return true and that means we can go ahead and cut out of this early so no results we didn't find anything let's return and quit early this is a nice way to make sure your code's nice and clean just an early return statement
that's all we're doing now if there is information on the page and we didn't get this no result found page what we can do is go off and actually scrape the page but we're going to do a little bit more detailed scrape this time where we're going to start actually trying to pull out all the wedding Vues from the page so here's how this one works we're going to say all right I would like you to crawl the page again and I would like you this time to you know it's the same page but we're
going to scrape things a little bit differently this time this time what we're going to do is add in two additional pieces of information the extraction strategy and the CSS selectors so let me walk you through what's happening so when it comes to the CSS selector all this is is we're saying hey instead of scraping you know thousands of lines of HTML code I actually only want you to process these specific elements on the web page so in our case let's go ahead and like dive in and see what elements we want to scrape so
let's make this a little bit bigger so you can see inside of our website if I go to inspect mode and I click this little cursor I can hover over some of these elements and you can eventually see when I hover over this I can see when I click on it I can see all of the classes and styles applied to this specific element I want to scrape cuz all I want to do eventually is just like I said from the get-go I want to scrape well even more specific I really just want to scrape
this information this information and so on and so on so how do I scrape that well if I look at the code over here I can see each one of these web pages if I hover over this right here it has a class that says info container so I really just want to SCP all the info containers on this wedding venue site and that's exactly what we're doing if you hop over to our CSS selector I'm saying hey please scrape any class that includes the word info container that's all we had to do to actually
get this set up now all we can do is now that we're saying you know going tying it back instead of scraping the whole website just grab all the uh info containers now as you're looking at all the info containers I want you to use that LM strategy that we set up a little bit ago which said hey I want you to use deep seek to go off and scrape all the venue information so hopefully this is all getting tied together and you can see how we're you know combining some general scraping with AI scraping
now what we're going to do is go great I've scraped the website I now can start looking at the results in the extracted content so this will just be a Json string that will include a all the extracted venues then what's going to happen is we're going to load that Json string and we're eventually going to get extracted data this extracted data is going to be nothing more than a bunch of a bunch of these models right here so it's a list of venues so pretty cool if you ask me now that's literally entire code
loop those are all the core components tying everything together so what I'd like to do now is go ahead and run the code so you guys can see exactly what's getting triggered and I'm going to explain you what's happening along the way so that all the code we just talked about makes sense so let's go ahead and start running our code and I'll walk you through everything as it goes okay so what we need to do is go ahead and open up our terminal and you need to make sure that you are in your cond
environment so you can see mine says deep seat crawler that's the conduit environment we created earlier and now what I can do is do python main.py and this is going to go ahead and actually start running our code and when we run it it's going to go ahead and pop open a browser window cuz we said headless equals Falls and now it's going to start scraping every single one of the venue pages that we have set up inside of our URL that we want to scrape so I'm going to zoom out so you guys can
see exactly what's happening so as we scrape Pages you're going to see over here it's going to start going over to page number two after it scrapes page number two it's going to go to page number three and so on and so on now along the way what's happening over here in our terminal you can see cuz we had verbos equals to true so you can see we're getting a lot of logs so you can see as we're scraping the website it's making a bunch of calls over to deep seek specifically it's trying to go
and say grock please run the Deep seek model on all this data and let me know what wedding venues you find and along the way we have a bunch of different logs so we can see like oh on wedding page three I was able to go off and scrape you know 10 different leads so it's going to keep continually doing it so in this page three we got 28 leads now we're on page four and along the way you can see what's pretty cool as you can see it's doing exactly what we've said it to
do so it's you know processing a wedding venue so the name of it still water Pond Temple Georgia probably near Atlanta you can see the capacity and the description based on you know what's over here and it's going to continually keep scraping until it's done what's nice though is we're almost at the end I think there's only six pages and once it's done you'll be able to see it goes cool I'm at the end there's nothing else for me to scrape and it's going to give us a nice little final blurb saying hey I'm done
scraping and it's going to give us a quick overview say hey I actually used this many tokens to scrape everything you just asked also by the way I went ahead and saved everything to a CSV file so we're on the last page now if I scroll down all the way to the bottom you can see we're on page six there's not another one so it's about to quit right now and say I'm done so we extracted the final ones then you can see it closed the browser for us cuz it's done scraping it's done crawling
and then it's going to show us this it's going to say hey here's all the tokens I used all in all I used 43,000 tokens so we're under the limit I think it's like 60,000 tokens per minute inside of Gro and then you can see per page what we end up scraping we actually scraped a lot now what we can do next is actually go ahead and look at our completed venue sheet and you can see we scraped so much information in no time at all and we could easily change this just throw in a
different URL and boom we're off scraping another website so all around this is everything that we needed to scrape and this is everything that we need to go ahead and just pass over to Google Sheets so what we're going to do I'm going to go ahead and show you real fast how to like make that nice table over in Google Sheets well all this new CSV information and we'll be pretty much done all right guys so I went ahead and hopped over to Google Sheets and this is where we're going to go ahead and actually
import the CSV file that we just created inside of our code and all we have to do inside of this one is go ahead and actually just we're going to click the import button inside of Google Sheets so we're going to click import then it's going to ask us to upload what we're trying to do so we can just drag and drop our uploaded CSV it has a ton of information that we've scraped so we're going to go ahead and import it then things are going to be nice and done and it's going to say
all right here's everything that we just scraped and what's nice is now Google Sheets is going to say I'd like to convert this to a table so now we have a beautiful table that we can instantly share with our client they can come in here it's all nice now we they can start filtering by prices capacity ratings reviews and you can see it also has that nice AI generated description of you know just a brief description of what this wedding venue is about and what it looks like so all around this is awesome yeah I
hope you guys see how powerful this is and I'm so excited for you guys to go off and start scraping your own websites for your own clients using this AI web scraper with deep seek Gro and also crawl for AI and that's a wrap for this video guys I hope youall love learning how to build an AI web scraper I cannot wait to see what you guys go off and build next and two quick reminders all the source code in this video is completely for free down in the description below and if you're looking for
support and to meet like-minded AI developers I have that free school Community for you guys once again more information down description below but also I have a ton of of other AI related content right here on this channel everything from cayi L chain and more just definitely recommend checking out whatever video pops up next I know you're going to love it but until next time y'all are awesome and can't wait to see you in the next video see you
Copyright © 2025. Made with ♥ in London by YTScribe.com