How I use SQL as a Data Analyst

866.57k views3026 WordsCopy TextShare
Luke Barousse
SQL for Data Science 👉🏼 https://lukeb.co/SQLdataScience Coursera Plus (7-day free trial) 👉🏼 http...
Video Transcript:
every day we generate 2.5 quintillion bytes of data and that's the equivalent of every person in the world filling up an entire excel file with data every day and that's where sql comes in developers use this tool to store a lot of the data that we're generating and then data analyst data scientists and even data engineers go in and use sql in order to access and analyze this data so let's dive in today to better understand how i use sql for my job as a data analyst what is sql and its importance in working with
other data science tools what are the most popular sql options and by the end we'll even have you up and running with some basic sql queries what up dead nerds i'm luke a data analyst and my channel is all about tech and skills for data science and without a doubt sql is the most important tool by those that work in the field of data science what better way to show this than by sharing how i use sql my job and i find that it's used in two main ways first is ad hoc analysis and the
second is data sharing ad hoc analysis of business data is typically performed to answer one-off questions for example let's say my co-worker comes up to me and asks the following hey what were sales last quarter from here i can use sql to dive into the data investigate further since i don't need to do any fancy reporting sql will be fine for this now if my co-worker comes back and says this hey sales were really low last quarter can you look into why this is now it could potentially stay in sql and do this type of
analysis but i found i'm actually more comfortable using programming languages for this so i could use something like python or r to dive further and investigate the data so moving on to that second aspect of data sharing what happens now whenever you have a stakeholder that wants to dive into the data themselves i do expect data nerds to have the skill of sql but non-data nerds i don't expect this as much so you may have questions like this hey bro can i get those sales numbers but in excel in this case i could use a
spreadsheet software such as excel to connect directly to a database with some sort of sql code and thus have a live connection to the data within the spreadsheet what ends up they come back again hopefully the final time and ask the following bro your sales analysis was spot on do you mind building a dashboard for the entire sales team so that way everybody can monitor this once again sql can be used in visualization tools such as power bi and tableau to pull the data real time and display it in a dashboard for my co-workers to
admire and this is why sql is in my opinion the most powerful tool to know as a data analyst not only can i use this to access and analyze data i can take it a step further and put it into other data analytical tools to thus extract the data and provide it to my stakeholders so now let's dive deeper into understanding what sql actually is and this stands for structured query language and it's most commonly pronounced sql sql sequence of sql sql sql is the language used to query a database basically think of it like
a common language between your computer and a database so i frequently get asked how do i install sql on my computer now just to be clear sql is a language so you don't necessarily install sql on your computer instead i find that there's two main things that you need to run sql one is an editor and two is a database let's talk about database first databases are an organized collection of data so large companies use these databases in order to store their vast amounts of information typically programmers and even data engineers are utilized in order
to build systems around collecting this data since we're talking about large amounts of information we typically don't want to use things like text files or excel files because they have pretty small data limitations databases have much larger data limitations so once these databases are designed and set up by these programmers and data engineers as a data analyst i can then go in and use sql in order to query these databases now the structure of the language or syntax of sql to query a database is actually quite simple so let's say youtube has a database full
of information inside this database there are different tables for every youtube channel that include their list of subscribers we'll say there's a table in there for this channel and it's titled data nerd channel with an entry for each subscriber of my channel each one of those rows in the table has information like name channel id and the number of comments they've given now i want to get a list of all my subscribers i then can use the following syntax which has a select statement and a from statement so select is used to identify the column
of interest that we want to grab and from is used to identify the table we want to connect to when i send this query to the database it then will provide me back a list of all my subscribers now sql can be used in a lot more different ways for example we can actually use it to go in further and analyze the data such in this case where we can add on a where statement to filter our data further based on how many users commented more than 100 times we can also use sql to insert
or store more information in a database for this we can identify the table and columns we want to insert into and then provide the values to insert into this table once i run this the table will be updated and then finally we can use sql to update record entries for this let's say we have a new comment on one of my videos once again we can identify the table we want to update the column we want to set a new value for and then a where statement to select what to filter on for that entry
once i run this the table will be updated so now we understand how to query a database and generally what a database is but what database do you use well stack overflow conducted a survey last year that asked respondents about what software tools they used for their job when we look at the most popular database options we can see that there are a plethora of options to choose from to make things a little complicated there's actually two types of databases on this survey there are relational databases so sql databases and non-relational databases or nosql databases
i'd focus on the relational databases as they typically store data in a tabular form which is more common for our job but what about those nosql databases so nosql actually stands for not only sql meaning that you can not only store this data in a tabular form it also has a manner for you to store it in an unstructured manner which is actually more efficient at handling very large amounts of data so for aspiring data analysts i think it's more important that you're aware of this that way while you progress in your career you may
potentially come in contact with these and then you can learn about it then so then what relational database should you learn or use well quite honestly i don't think it matters as all of these relational databases use the same sql syntax and you can transfer your skills between them but interesting enough the most popular options on that stack overflow list are also free and open source so i would say start with any one of these personally i use postgres and also sql lite for all my consulting projects the other options are closed source and not
always free although they are very popular especially being used in large corporations of these options i've used primarily sql server for all of my work projects this is also great because this choice has a free option available having worked with a lot of these different databases i can say it's with pretty relative ease that you can switch between them the hardest part is learning the basics of the syntax for sql so all this actually directly relates to the sponsor of this video coursera my favorite training for this on coursera is the sequel for data science
specialization which is perfect for those that have no prior experience with coding that want to master sql i even think it's great for those that have already obtained maybe the google data analytics certificate because now that you have these basics of sql you can use this specialization to dive further into it the course focuses on sql light and what i like about the platform that coursera provides is that they get you up and running with the same tools they'll be using in real life without having to install any software for example i use vs code
for all my coding in both python and sql i also like how the specializations have a capstone project so you can then take all those skills that you've learned over the courses and apply them into a final deliverable which you then can use to showcase on your resume as experience so thanks again to corsair for sponsoring this video so we talked about already sql syntax also about databases themselves but where do we keep these databases that we're trying to query well the first option is that you can keep the database locally running on your computer
and this is a great option especially if you're learning or you're new to sql it very much simplifies things now sometimes i'm even doing data analytic projects that need access to a remote database and sometimes the owners don't want you have access to it so in that case i can download a copy of that database and have it running locally on my computer i do this most frequently with postgres databases so because of this i have the postgres application installed on my computer so i can run these databases locally this will be the same for
any other popular database now running a database locally on a computer is great option if you're the only person accessing it once you get into other people accessing it you need to look at other options so the two popular options for this or some sort of server or a cloud provider for the server option many large companies have their own where they like to store their own data typically your it department will handle installing and setting up all the dependencies for the database and then grant you access this option is called on-prem now another popular
option is to use a cloud provider instead looking back at the stack overflow poll the most popular options are from amazon google microsoft and even heroku personally i've used google cloud platform in the past but many other options are just as good especially aws now as an entry-level data analyst i don't think you need to learn or master cloud providers instead what i've seen is typically data engineers or even data scientists are the ones setting up databases inside of some sort of cloud provider however it is important to know that these cloud providers provide some
sort of interface for you to run sql queries for your databases as an example in the google data analytics certificate students use bigquery inside of the google cloud platform to run their sql queries and this actually leads into the next area of where you should be writing and running your sql queries well you could use a cloud platform in order to run your sequels on those databases in the cloud but what happens if you have a different environment or your database isn't in the cloud all the major relational databases actually have their own editors for
them i've used postgres before and for this i've used their sql management software of pg admin my sql also has a similar software called mysql workbench to run their databases and then for my work with sql server i've even used microsoft's sql server management studio one note is that this tool is specifically only available for windows users but what happens if you're like me and you're always switching between different databases for this i like to use vs code vs code has many free extensions in order to support your database of choice i'm currently checking out
this extension called mysql which contrary to its name actually supports a lot of the different popular databases another popular option to check out is d beaver so i haven't used this one personally but it does seem like a good option as recommended by other data analysts just like vs code it's free and supports a variety of different databases now there is one option that i don't suggest learning for those new to data analytics and that is microsoft access it's sort of a weird in between of a relational database and also a graphical user interface that
you can build apps we used this app while i was in the navy in order to store our data and the data nerds that worked with this we're not really a fan of it now the reason why i'm recommending not learning this as a new data analyst is because i feel microsoft is actually signaling that they're going to be shutting this product down so sorry to anyone that uses this product currently but personally i feel microsoft has way better options available in order to solve these types of problems so getting back into a real world
example of how i use sql and my job as a data analyst so for one of my projects we needed to investigate the suppliers that we received a lot of our goods for them for the company i worked for at the time the problem is that the data was spread out across many tables within our database and required extensive cleaning this was a large set of data so a teammate and i worked together in order to build a sql query to complying all these different tables that we needed and clean up the data this portion
of the project ended up taking a few weeks for us to actually develop these few hundred lines of sql in order to query this data now that we have this cleaned up data set my boss now was like hey i need this data set in a more presentable form so that way my teammates can use it and also her boss could access it so that's where something like power bi comes in in this case i can use the tool within it of power query which is also in excel to actually go in and execute the
sql code we had written and this is actually pretty neat because this allows the power bi dashboard that we're going to build in order to be connected real time to the data source from here i developed an end-to-end solution that allowed the stakeholders to now go to a dashboard and access the data that they were curious about now to be clear this sql query functionality is not specific just to power bi other popular tools like tableau and google data studio also have this functionality now not related to this project but i've also used other tools
such as programming languages to utilize sql as well so as i've mentioned before i'm a big fan of python and i would say actually i'm more comfortable with python than i am sql so a lot of times i like to get the data and then import it and call or collect it with python and then from there use python's tools and libraries such as pandas to clean up the data maybe matplotlib to plot it and even maybe something like sklearn to perform some regression analysis now all these things with python you can also do with
other popular languages such as r also so i share all these examples not for you to think that you have to learn all these different tools but instead i want you to understand the power of sql it's such a powerful language and has such a powerful use that other tools are taking advantage of it because of this i feel that sql is the most powerful tool to learn as a data analyst as always if you got value out of this video smash that like button and with that i'll see you in the next one oh
wait we got a package to unwrap i'll be right back i just got a package from two of my top subscribers open the bigger box first okay so i gotta open the bigger box first oh my goodness what is this oh my gosh why am i recording this um okay so let's all right uh this is a little bit obscene this looks like a movie poster for data scientists first data analyst [Laughter] postcards to go along with the uh movie poster as well so awesome thank you dave and also thank you alex for this this
was uh this is awesome and it has this quote in order to trap him he must become him i don't even know what that means
Copyright © 2025. Made with ♥ in London by YTScribe.com