hey guys in today's video we'll be doing on another portfolio project so this time it's SQL and powerbi so I have a data set here that we use that we will clean and analyze using my SQL workbench and then we'll visualize our results using powerbi so the first thing you're going to do is create our database so you'll just do create database projects you can call yours whatever you like mine will be called projects then you run that and then here on the left side you just refresh and it appears so after creating your database
you don't yet have a table where our data will be stored so you'll expand projects and then you right click on tables and select table data import wizard so you'll browse to the place where you've stored your data set so the data set will be linked in the description so you can either download it from my GitHub account or you can download it from my telegram group so after selecting your data set for this section you have to make sure the database selected is Project or whatever you've named yours and then here on human resources
we replace that with HR it's just short and easier to work with and and then we'll click on next so in this section this is now it shows you like a little sneak peek of the data set so we are not going to change anything because if you try to change the data types you might end up not importing all of your data so the data types you have to be the exact same as those that are in your CSV file so you'll just leave it as is and select next then this part will select
next and it will start importing so this importing process it usually takes a bit of time so as you wait for that to finish importing I'll just take you guys through the data set so that you can understand what it's about the First Column is the ID which I'm assuming is the employee ID then the names the birth date the gender Ras Department job title location the higher date and then the term date the term date is the termination date so it could either be the employee was terminated or fired or the employee quit so
what it basically means is that the employee doesn't work there anymore so that's the termination date then there's the location City and the location state so when you look at this most of these columns they have to be texted except for the dates so that is the birth date the hire date and the term date so when you look at birth dates we can see that the data types are mixed so there are some which are date others which are text and also it's just the formatting is not consistent some have a for slash others
have a hyen so the case is the same with the higher dates so this you're going to change them back to the correct format in my SQL and then you can see the ter dat it has also like missing values and the date is also mixed with the time so this time is not going to really be useful to us in any in our analysis so during our cleaning process you're going to remove it those people whose ter date is missing it means that they are still employees in the company so for those employees whose
ter date has a v it means they've been terminated but when you look at this data set you can see it has a lot of values some of them are in the future like for this first value there's 2029 which is not possible because if you've been fired you are fired like either today or on a date before today so it's not possible for you to have been fired in the future so unless this maybe is like a contract but I'll show you how we'll deal with those values during our analysis this is how it
will look like when it's done so we'll click on next and then it shows you 22,24 records imported and the table HR was created so when you're importing your data set into your database you can also choose to create the table beforehand but I think this is usually I prefer this way better because you'll just leave the data types as is and then you can change them up later when you're doing your data cleaning so you'll click on finish and that's it so the next thing you're going to do is just at least view a
sample of our data before we we start using our database we have to tell my SQL that we want to use that specific database because there are also other databases here in my SQL workbench so you just say use and then the name of your database which is projects and then you run that so it will know any queries you run you are trying to query the project database so after that you're going to view sample of our data so we'll do select all from HR when we looked at the data previously in Exel the
ID was just ID but now that it's been imported into my scare workbench we we able to see how it was really like saved so you can see it has some special characters and then ID so we are going to rename that later and then the birth date you can see they are still like mixed formatting and all that yeah so you can see this is the exact same data we are looking at in the in our in our Excel sheet yeah so the data has been imported successfully so the next thing you're going to
do is actually start our data cleaning so the first thing we going to do is we going to rename our ID column to employee IDE as you can see here though it has been saved is not really convenient if you try to query anything using that column so we going to rename it so we going to use the alter table command so we alter table our table is called HR and then we do change column then this column so I'll just copy it from here on the side then paste it in to make sure I've
captured it correctly and then now the new name for our column it will be employee ID then you give it a data type so we'll do a v car it has I think about 10 so we'll just put 20 just to be on the safe side and then a constraint so it's null and then we'll run that and it will change so yeah it has executed successfully so when you again display the table you're able to see that it has changed to employee ID so for you to reflect on this side you just Refresh on
the left you can see employee ID V 20 yeah and then now the next you're going to do is actually check the data types for all our columns so you can use describe HR and it gives you the exact same output we'll start changing the birth date from text to a date so we have to change the values in that column we can see that the formatting is mixed and SQL is it recognizes it as text which means not all dates have been formatted as a date and you need to change that six is the
month and then 29 is the day and then 1984 is the year so if a value in this column has a for slash we assume it has been saved in that exact format and we convert it using the string to date function so that function we'll basically convert this value into a date from text and then you're going to use the date format function to convert it into the SQL date format which is the year month and day so and then after that you're going now to check for values that have a hyphen if the
date value has a hyphen we assume it is also in the exact same format you're going to write that code so what you have to do is up upate your table so you'll update HR and then you'll set our birth date column equals to so you're going to use a Cas when statement a Cas when statement is basically like an if statement so if a certain condition is true then you do this if it is false then this needs to be done so when the birth dates and then we'll use the like operator so in
the birth dat like this pattern so if the birth date it has a forward slash in it so basically this percentage sign what it means is that the forward slash there are characters before it and there characters after it so if the birth date value has a for slash then we are going to convert it from a string to date so you'll convert it from a string to date and then it takes in the birth date column and also you specify the format in which it has been saved so our birth date has been saved
as the month being fast the day and then the year so you'll do M and then D and then don't forget the forward slash percent why so you have to put it in quotes because it is like it's a string value so birth date so this will convert the birth date from a text to a date and then now to put it in the correct format you're going to use the date format function so we use the date format function and you enclose it around the string to date don't know why that appeared it will
convert it to the format that we want so for now I want it to be in the default Excel format which is the year iph and then the month I and then the day and then we also do the same for those with the hyphen so when birth date like so there's no point to just typing the whole thing again so you can just copy this part and then change the for slash to hens so you paste it and then you change this to pyph so remember inside the string to date function you have to
put it specify the date in the format that it is in in the data and then you change it to the format that you want inside the date format function so we do that so if it is neither of these we just set it to [Music] null and then and yeah so that's it for this so when you run this it will update our dates so it seems there's an error down here it says you're using a safe update mode so apparently we've been restricted from updating this so to get around that this is just
for security reason so to get around that we'll use this code set SQL save updates is equals to Z this means we are turning off that security feature so it is going to allow us to update our data so anytime we are doing our cleaning and want to update anything in our database we have to make sure that SQL safe updates is equals to zero but as soon as you're done with all your updating you need to change it back to one because it is a security feature that's protecting your database so we'll run that
first and then after that now we can run our update code so when you run that it has accepted so we just going to look at our birth date again just to confirm that we've done it right so select birth date from HR yeah so you can see it has successfully converted the birth date for us so all our dates they have been our date our birth date values have been converted from a text to a standard date format so the next thing after that now you're going to change now the data type of the
birth date column itself because the column itself it has still been stored as a text data type and yet it is holding date values so we need to change that so I'll just come up here and then to change that you're going to alter our table which is HR and then you're going to modify the column birth dates then give it the date data type you don't forget the semicolon then you're going to run this so just to confirm that change when I come back up here and select describe HR and run it we can
see our birth date has been converted from a text to a date yeah so that has run successfully so instead of writing this code again you just going to copy and paste because it's the same exact thing you're going to change is the column name so instead of birth date you're going to select the hard dat so we going to replace it in all of them this should work so when you run that let me just bring up the output it has updated successfully so you can just confirm that change by selecting the hard dat
from HR let me just show you how the ter date values look like so when you select the ter date from HR you can see here that the time date it has been stored as a date together with a Time Value so we don't need the time stamp because it's not adding any value in our analysis so you can see it's the date it's in the correct format but now it has been stored along with the time so this is more like a time stamp or date time data type and you don't to change it
from that to a date but now in SQL it has been recognized as a text which means not all values are put in in form of a date some of them are text so we going to change that so to change that we are going to update our table HR and then we'll set the ter date is equals to we use the date function to convert it to date so it will be equal to date and then we use the string to date function to convert it from a text to a date so as usual
we'll put in the term date and then we'll put in the format that it has been stored in so it has been stored as a year and then hyen then percent month then hyphen then percent day so don't forget the quotes otherwise you'll get an error oh and it also has the time so the time aspect will present it as it's let's see it's an hour minute and a second so hour is H then the colum you have to put it exactly how it is in your data then the minute is you would expect being
represented by an M but it's actually represented by an i because m is already being used for months so we cannot again use it for time so and then the second so it will be another percentage and then s and then UTC because that's how it's been stored so you have to write it exactly how it is and then for the ter dat values we update it where the ter date is not null so where there is not an empty date value we have to update that and also it's not an empty string so we
just going to use set here a condition where ter dates is not null and the ter date is not equal to an empty string so an empty string you just basically put quotes in the space so you run that and then you select dat from H the employees who are still working in the company the at date has been set to 0 instead of being blank and then that's for those employees who have been terminated the time aspect has been removed from from the date so now it's just the date that is there we're going
to do as we did earlier we alter our table HR and then we'll modify the column term date and we s it to the date data type so we'll run that so we yeah it has run successfully yeah so we just going to run the describe a chart to confirm that we've converted all our columns to the correct data type so if I bring this up we can see the birth date has been converted to date the higher date is still in text so you're going to check then the term date has been converted to
date so when you go back to where we wrote the values to update the High date so we forgot to alter the higher date to a date so we just copy that then you're going to convert the column itself so here we just put fire dat and then we run that then when you refresh it should be showing correctly so you refresh all yeah and they had it has now been converted into a date column so we can just confirm that yeah so you can see her date has also been converted from text to date
as well as the term date so for the rest of these columns that have the text data type if you want you can convert them into V then specify the number of characters you want but for me it doesn't really make any difference as long as it's text it's the correct data type yeah and one thing I forgot to mention is that in SQL these keywords they don't have to be capitalized because SQL is not case sensitive so even if I used small letters it will still work but we using the uppercase because it's just
best practice so when you're writing code it's usually better when you're using the best practices so the next thing we going to do is find out class in our data so when you display all our columns we need to check for columns like the birth date so the birth date the where we going to go about it is I'm going to add an age column just to make it um easier to work with because if I start doing the subtracting current date birth date and all that in every query it's going to slow down our
queries and it's just not a good practice so so we going to add an ed column to our data set so we going to do an alter table [Music] HR and then you add column will be called our column will be called age and then the data type will be in yeah so when you run that and then you select all from HR we'll be able to see whether the edge column has been added so if I scroll to the side to the very end you can see the edge column has been added but it
has no values because we are ready to calculate the age so to calculate the age we're going to update our table so we update HR and then we set The Edge column equals to so you're going to use a function called the time stamp diff so that's basically it calculates the difference between the current date and the birth date so time stamp if and then inside we want to calculate it with the year so our date obviously constitutes of the year the month and the day so we just wanted to use the ear section of
it and then you select the column that you wanted to use which is the birth date and then we want it to be subtracted from the current date so we'll use the CATE function yeah and then we'll add another column to enclose that then you don't forget the column so when we run this it should update our our age column with the correct ages so let's just confirm that so you'll select birth date and age from a Char so when you see you can see it has calculated the difference so it has taken the current
date which is 2023 minus each of the years in the birth date yeah so as as you can see even just as I've scrolled a bit here you can see there someone whose birth rate is 206 six so that's obviously wrong so the next thing you're going to do is you're just going to check for those outliers so you're going to look at the minimum age and the maximum age so that you can see the employees in our data set they fall between which range so you're going to do a select and then we do
use the mean function so you select minimum of of the age column and then we earier it as youngest you put a comma then we'll do the maximum of the age column and then you ear see it as oldest from our table which is [Music] HR and then when you run that you can see the youngest is - 46 and the oldest is 57 so obviously there are lot of values here which are wrong so we're going to select and see how how many records in our data set have a value Which is less than
zero or rather less than 18 because that makes more sense so you're going to do a select and then you'll count and then you'll count all the values from HR all the records from HR where the age column has a value that is less than 18 years old because that it would really be useful to us so we just want to see how many records there are and if we can exclude them so earlier we saw that our data set it has over 22,000 R so 967 values you can exclude those when you're doing our
analysis so these are the questions that you're going to be answering for this data set so I've come up with a list of 11 questions so at the end of it if you feel there something I've left out you can always add it to your analysis so we'll start with the gender breakdown of employees in the company so we'll select the gender and then the count of all those employees we label it as count and then from HR where remember we went need to include employees who are whose age is greater than or equal to
18 and also who are currently working in the company so that means the ter date will be equal to zero because that is how they've been represented here in our data so when you come here to ter date these zeros are for employees who are currently in the company so for every query that is what you're going to do group by gender semicolon and then you run so this this is the output the male 8 891 female 89 and confirming for 81 so you're going to save these queries as CSV files which you later on
import into powerbi and use for the visualization so you come here you save and this will be saved as gender you just sa it to the memorable name so that you're able to remember so what I'm going to do is just copy this whole section because I don't want to keep typing it again actually just copy this whole section I don't want to keep typing it because the condition will be the same for most of these Square so you go on to number two the race or ethnicity breakdown so you'll just select the race and
then count all as count of course and [Music] then from HR so in SQL you don't really have to use the capital letters but it's good just for the consistency and it's good good practice so we'll do group by race and this group we using it because in our select we've used an aggregate function so the column that the aggre function has not been used on we have to use the group by statement on it so we'll Group by race and then I'll order by count because I want it to be arranged such that the
race that has the highest number of employees representing it is at the top so order by count descending so when we run that we can see white they are over 4,000 and then those are the highest then the lowest are need 952 so again we'll save this output as race CSP yeah and then number three is the age distribution of employees in the company so for this one we first going to find out the minimum age and the maximum age of employees who are currently in the company so we going to do select then you
do age I mean minimum minimum age then label it as youngest because you want to see the upper and lower limits as oldest so for oldest will be Max not mean and don't forget our comma here max age as oldest all of this we'll be selecting it from HR and then of course we paste in our condition put the semicolon then when you run it the youngest is 20 and the oldest is 57 so this information we going to now create the age group so that you can see which age group has the highest number
of employees and which one has the lowest so to create these age groups we'll use a Cas when statement so we'll do a select and then case statement so we'll do when the age is greater than or equal to 18 and age so for me these are my age groups if you want you can change yours age group is less than or equal to 24 then that age group will be called 18 to 24 you have to put it in so we'll copy this so we'll come back and copy our condition after we're done with
this so here it will be 25 to 34 234 and then you paste it again down here cuz I have about six so this will be 35 2 44 then 35 2 44 so a Cas when statement is basically like an if so if this condition is true then this should happen if it is not this other thing should happen so that is basically just how a Cas when statement Works 254 then for my second last one will be 55 and this is 64 so if you're not in any of these lse it will be
65 plus don't forget the quotes and then you go to and as I'll call it this column will be called age group and then the other columns will be the count all it will be called as count then all of these will be from the HR table and then come back here and copy our condition because we'll use it a lot from HR where the age is that and the term date so we want only the current employees in the company so we Group by age group and then we also order by age group then
you put the colum so these are our age groups there is 18 24 there is 44 to 54 64 so we should have because the oldest is actually 57 so we need to have well it seems like those who are above 57 they're not hard or maybe I made an error somewhere let's see age oh I labeled it 15 so it is 55 yeah so be careful in your writing actually you get the or you'll get the wrong result so when you run it again yeah because you're supposed to have the five up to because
the oldest is 57 so this category 55 to 64 should also appear so that is correct so we going to save our quer you're going to save it as ageg group. CSV so you save it and then I also I'm also curious about how the age group how the gender is distributed among the age group so for this we'll just do the same query but now we'll add the gender to the column so we'll just copy all of it and paste it just right below and then we'll add gender so for here and as age
group or as count then we also select [Music] gender no I want gender to come after age group so you delete that you put gender here and then you Group by gender then you also order by gender so when we run this can see 18 to 24 female is 109 male is 1134 and confirming is 55 so this is exactly what I wanted so we'll save this as age group gender use descriptive names because in powerbi you'll be able to remember what it represents so for number four how many employees work at headquarters versus the
location so we'll select the location and then count all as [Music] counts so all of this from HR and then our condition so we come back up here and we copy this condition that is our condition because you want only for employees currently in the company so we Group by location yeah so we run that headquarters is that and that so we save it as [Music] location CSV so you save it and then just add some space here so next up we'll come to number five the average length of employment for employees have been terminated
so for this we want to see like how long are there employees working in the company before they are fired or they quit or whatever the case was so we'll do a select and then we luse the de D function so this de D function it it calculates the difference between two dates so for this you want the difference between the term dates and the high date so it calculates the difference between those and the output is given as the as in days so we'll have to divide by 365 for us to be able to
get the for us to be able to get it in here so because this is the average we're going to use the average function so we do AVG we enclose all of it as the average then you divide by 365 to get the number of years so all of this will save it as average [Music] length of employment and it's long you can give it a shorter name so all of this we do it from HR and then where because this are those who have been terminated so their term date it has to be less
than or equal to the current date and also it cannot be someone who is currently working in the company so the term date should not be equal to this zero yeah and also they have to be above 18 years old so the SQL you have to really think about the logic because even if you use the wrong logic still it will give you results and sometimes it's hard to determine if they are the correct results so for the average length of employment for employees have been terminated we calculate the difference between the term date and
the high date so it should be the ter date minus High date using this dat def function it gives us the total number of days since they were hired to the day they were terminated then you divide by 365 to get it in years yeah and then now we filter out we only want employees have been terminated so that means the termination date has to be before the current date and it also can't be any employee who is currently employed so the term date cannot be equal to those many zeros so after you write that
you'll run it will give us decimal places so if this decimal place you can reduce them so M I'll just do zero just to keep things simple so I'll round then zero decimal places so when I run this again the aage length of employment for most employees will been terminated is most of them work for at least 7 years so I'll save it as AVG underscore length underscore employment because typically most companies they they just don't want to keep training you investing in you and then you just leave like after 2 or 3 years that's
a loss on the part of the company anyway for the next one is the gender distribution vary across departments and job titles so we select the department then you also select the gender then of course the count as count then from HR so we'll again go up and copy our Condition it's going to apply here because we only want current employees who are above 18 years old so we paste it in there where the age is greater than 18 and all that so Group by departments and gender and then you order by the department want
to see distribution across each [Music] department so when you run that accounting female is 1175 male is 1375 so it seems males are slightly dominant in most of these cases so we are going to save this as gender departments so that will help me remember gender departments. CSV so the next question is number seven the distribution of job titles across the company so I will select so it's job title and then counts all as count from HR then our condition you want only current employees so you Group by the job title and then you order
by job title descending you just want to see how the job titles are distributed in the company so when you run that you can see here for example there's web developer 1 2 3 4 and all that so when you give this to the HR manager they'll be able to know whether all those web developers one 2 3 and four are really necessary or there maybe one and two can do the job for all of them so anyway you save that as job title then number eight the department the highest turnover rate so turnover rate
basically means the rate at which employees leave a company so that means like how long do employees work in a company before they leave or they quit or or are fired so for this we going to use a subquery because there will be some sort of calculation so to get this turn over rate you calculate the total number of employees who have left over a given period divided by the total number of employees that were in the department during that same time period so we going to use a subquery so for our outer query this
is where I'll Define the columns that you're going to have so I'll select the department and then I'll also select I'll have a column called total I'll have a column called total count which is the total count of employees the department are the count at the time so also the terminated count and then I also have the termination so mination rate will be calculated as the terminated count divide by the total count so with that you'll be able to get the termination rate so I'll save this as termination rate so this termination rate all these
columns they're going to be calculated inside the subquery so all of this you select from now the subquery comes in here so we are going to do do uh select again we select the very same columns department and then we'll do a count of everything as total count so you can see the columns we defined in the outer query we are now defining them here inside the subquery and then the next you're going to do is the total number of we'll use a Cas when state to find the terminated count so obviously those who are
terminated their term dat is not equal to zero because those ter dates equal to zero are for those who are currently working in the company and also their term date is less than or equal to the current date because someone can actually be terminated today so that will also be a valid termination so the term date has to be less than the current year can't be equal to zero so if it meets these conditions then that employees is counted otherwise they are not so you don't forget the end all of this you select from HR
and then our condition so where the age is greater than or equal to 18 in so here I don't think we need to put that ter date cuz you calculating the employees who are who've been terminated so we will only our condition will only include the age so and then you Group by departments and then that's the end of our subquery so we just going to close it and give it a name so I'll just call it query to keep it simple but you can give it any other name and then for outer query we
will order by the ter termination rate but descending because I want to see which department has the highest termination rate terminated count in the field list oh here at end we did not give this a name so this will be as terminated count yeah so our quer has become very long so we just run that and it shows us so auditing has the highest termination rate followed by legal and then the lowest termination rate is in marketing so we'll save this query as turnover rate underscore rate. CSV you save it and then now we'll come
to the next number nine the distribution of employees across locations by city and state so for this I'll just use the state because the cities there are so many cities and I think when you plot them on them up they'll be really congested so I'll just use the location State select the location State and obviously the count of employees as count and then you select all of this from HR and then our condition so let me just find it cuz you want for the who employees who are currently working in the company and then you
Group by location States location State and then order by the count I want to see which state has the highest number of employees so Ohio state is the highest then Wisconsin is the lowest so for this one we'll save it as location oh I'll save I mean I'll save this as State this is the distribution across States so this is your sa it as state number 10 how is the employee count changed over time based on higher and ter date so we want to see for this data it has it is Data from the year
2000 to 2020 so you want to see for example for the year 2000 how many employees were had how many were terminated and what was the percentage change so that we can know whether it is increasing or decreasing as the years go by so for this you'll also use a subquery so outer query we'll have the year so we'll just select here and then you'll have highers then terminations because this square is going to have a lot of calculations so it will not look good when you have a lot of calculation inside your query it
actually confuse you so the net change will be equal to highers minus termination so this is what I'll call my net change and then then my net percentage change will be the net change obviously over the number of highers * 100 so I'm just going to copy this and paste it here divide by the highers and then this we'll put it in Brackets divide by high no this is what will be in bracket Theus termination will be in Brackets the highs time 100 yeah times 100 and then because it will be a percentage obviously there's
going to be decimal places so we just use the round function to specify how many decimal places we want so this wanted to have at least two decimal places yeah so this will be called as netore change underscore percent so it shows me there's an [Music] error oh it's the comma has it change percent so this is our outer query and all of this we going to select it from now our subquery which we going to create just now so it will be select the same so I'll use the year function to get the year
from the higher date because that is what we interested in we're interested in the year part of the higher date this is what we call the year column that we've already so I'm just going to indent a year and then the number of hires so the number of hires is the total number of employees in that department at that time so as highers then next we Define the terminations so by now you already know this so it will be sum then you use a Cas when statement when the ter dates the terminations your term date
cannot be equal to zero because those are employees who are currently in the company and the term date also has to be less than or equal to the current date yeah then if you me these conditions you'll be counted else it will be a zero then you end and this will be called as [Music] terminations all of this we select this from HR and then our condition where age you have to be older than you should be 18 years and above then you Group by the year high yeah so this is our subquery so we
going to close it and again I'll call it subquery just to keep things simple but you can call it any other name then for the outer query you're going to order by the year sending because you want to see from the earliest here to the latest so the net percentage change in 2000 there was 87% 2011 was 81% now when you come to 2020 it's up until 98% so obviously the employee count has been increased so for our last question is the tenure distribution for each department this is basically just how long employees stay in
each department before they quit or are fired or something of the sort so you want to find the average like how many years on average do people stay in one specific department so we going to select the department and then you again use the de def function because it's the difference between the term dates and the higher dates but of course it will give us in days so we have to convert it you divide it by 1,000 by 365 I mean so that you can convert it into a year and because you're finding the average
this will be AVG inclose that in a and then obviously it will be a decimal point so you're going to round it off [Music] to zero decimal places that's how I prefer if you want it to have two or one however you want you can use that there and then this will be called as the average ten I'm not sure that's pronounced then all of this you select from HR then of course where the the term date is less than or equal to the current date because these are people whove already left the company and
then the term date also it cannot be equal to zero because this will be employees who are still in the company and you also have to be above 18 years and above and the age is greater than or equ equal to 18 so Group by Department Group by department so when we run this average 10 you can see most of them are eight seven years some are six so we are going to save this as average tenure yeah so we saved that there's one I'm not sure whether we've saved then number 10 wa the employee
change yeah I don't think you have saved that so we are going to run this again and you save that query yeah this is the net change you save it as CSV so I'll save it as employee uncore change. CSV so we saved it so that was the end of the questions that I had for us to answer if you have any more questions you can think of just you can include them so now you're going to use the CSV files you have saved to visualize our data in powerbi so the first thing you do
when creating a visualization in powerbi is just to create the title for our report so I leave a space here because there's a card that will be put there later on so I'll just call this HR employee distribution report or you can just give it any title that you feel is suitable for it I select all of it I'll put a font size of about 28 then it's times I love Times New Roman so and then you'll send enter it and then here on the right on under effects we'll change the background color so I'll
do a blue and then the font color will be white so that it's visible yeah so I think that's good enough so the first thing you're going to do is we go to home get data and import our first CSV file so you can't actually import all of them at once so I'll start with average length of employment so it will load it in and then it will appear here on the data pen on this data view so you click on load and then it appears on this side of data and then we we'll just
add our card so for each visualization we try to format it a bit as we go on so I'll add so it has come in so I select the card I reduce it and then I squeeze it here on the side and then you expand this and you select that so that is where it's going to appear so you come here to format your visual so category label we turn it off then we go to General you turn the title on expand it and then you write the this is the average length of employment in
Brackets you write here and then you make it bold and you Center it and then you can increase the font yeah you can reduce it and then visual on the call out value you can change the color from black I'll change it to the blue that I'm currently using so we'll go to our next one which will be the gender the gender file so here we'll put a chart it will be a for the gender will use a b chart so I'll just use this tacked column chart then I'll select gender so gender will go
to the x axis count to the Y AIS so this is good enough so we just expand it a bit I go to format my visual I'll add data labels and then the position is on top that's good the values will make them bold and increase them yeah and then the Y AIS you're going to turn it off and then for the title you go to title you change the Y AIS is is going to be number of employees yeah then you come to General Select Title you change the title [Music] so contr a and
then it will be the gender distribution or whichever title you feel goes best that will be gender distribution then you come back to visual and then you'll go to columns that is where the color of the bars will be changed from and you select then you go back to General for the title then you make sure you Center align it yeah so and then for the come back here for the xais we just going to turn off the title because it's explanatory from the title up here you just come back here and increase the font
of the title yeah so that's good enough so next you're going to do location so you get data text or CSV you do location you want to see how many are remote and how many are headquarters so you load it in for this I'm going to use a donut chart to show the proportions so I click on donut chart so in power bear you make sure before you select your next visual there's no visual that has been selected on your report so I'll select location count and location so I'll come again for much my visuals
for the slices for the headquarters I want it to be gray and then for remote it will be that blue yeah the legend probably increase the size a bit and then General you'll go to the title so we do employees let's do headquarters versus remote versus remote then of course you're going to center it and increase that a bit so next you're going to do the change in employee so we go to employee change you open then this will be a line chart so we load it in so after we load it in we going
to select the line chart collapse this so employee change so the year will be on the xais the net change percentage on the Y AIS so you select this and put it here just next to our donut chart and then you format the visual of course the line you change the color you go to line colors you change the color to the blue that you want you turn on the markers yeah so for the xaxis you're going to turn off the title because when you look at it you can tell that it's there and then
for the Y AIS the title will change it to the percentage change in employees yeah make it bold then you go to General and change the overall title so contr a so this is change in employees change in employee number from 2000 to 2020 so you can see that generally there's an increase and then the next one will be the states so we'll plot a m of the states and have they distributed so you select States you select open so that looks good you load it in so after it's loaded in you select this map
yeah that's exactly I want it to go to the bottom then you expand State and you select count and location State yeah and then you come here to format your visual the bubble you're going to increase the size that's too big it's a bit I don't want them touching each other [Music] so you just reduce until you get the perfect size then map settings I'll select the gray scale not the rud then you go back to Bubbles you go to Colors then you select our theme color that's the color you want and then the category
labels you put them on you expand that the value color of the values I want it black then for the background I don't want it to have any background and then I make this bold and increase the font a bit yeah so I think that looks better and then you come to General and you change the title employees by state yeah so wait we need to Center that so you come back here to General title and then you are learning to the center so the next one for the last one for this page will be
the r distribution so text or CSV then you come down to race so you load it in for this you'll again use the clustered column chart so you select this clustered column chart you expand it all the way to the end the is in x- axis the count is in y [Music] axis yeah we come to format visual you go to General title contr A we do R distribution then of course you'll increase the size and then you'll Center it and then you go back to visual the the data labels are on the columns we
do the color our color blue and then for the data labels you want to change the position to be on the inside end so you select values you make them bold and increase the size a bit so for the X AIS you're going to turn off the title because it's in the one then for the Y AIS we going to turn it off and then for the title we'll do number of employees for each R so that's good enough for page one so we'll go to page two and then for page two we'll start with
the age group so text or CSV we do age group group distribution I'll put this one at the bottom so it's loading when it loads successfully I use this stacked column chart so I'll put this one here at the bottom age group distribution so expand Edge group Edge group is on the X AIS the count is on the Y axis so by now you know how to edit this so you change the colors that will be that blue and then for the age group you click on these three dots and then you'll sort axis by
age group you click it again you sort by ascending so it's 18 to 24 25 to 34 and so on so you come here to General you change the title age group distribution of course you increase the font and then you Center it and then you come back to visual and then for the Y AIS we turn it off and then the title will be number of employees title will be number of employees you minimize that then you go to data labels you turn it on and then the position will be on the inside end
and then for the values you make them bold and you increase the font then you'll just make it a bit small so that's good so we go again the age want the age group distribution by gender so you select that everything looks good so you load it in then for this one we'll use a clustered column chart because it has the categories by the gender so you'll select this clustered column chart and then age group will be X AIS count will be y AIS and then gender will be on the Legend So you format the
visual The Columns the female I want it to be dark then the male will be light and confirming will stay orange then again you sort this so you sort axis by age group you click on the three dots again you sort axis by sort descending so it's 18 to 24 and so on so after that we'll change the Y AIS so we'll turn off oh no yeah we'll turn it off and then for the title we'll do count of employees count of employees age group then you go to General you change the title so the
title for this will be the age distribution by gender we'll increase that font and then we'll Center it so this looks good enough oh and then visual we do the data labels I just love the data labels on top so position the position is okay we increase the values so we make them bold and increase so the position should be outside end so I'll just increase this yeah so I think that's good enough for that and then we have two csvs remaining so this next CSV will be the Department's by termination R so turn over
it this will be a table so for this this table will have the department and the termination rate that it has so after loading it in you select the table visual yeah that's exactly where I want it so I want the department and the termination rate so you'll go here to format visual then the column head we make them bold and increase the font a bit up to there then the values we also increase the font of the values to fill the space then for this we'll also the totals will turn off the totals cuz
we don't actually need it so come here to totals values you turn off so what you're going to do next is come to these three dots and then you sort by the termination rate so auditing is the highest and so on and then marketing and Business Development are the lowest so yeah this table will be there then you come up to our last CSV file which will be the gender distribution by department so select gender depth CSV and then this will also be a clustered column chart so you load it in because it has the
gender categories we'll use the clustered column chart so we'll just expand this and then it's called gender depth so the department will be on the EXA the count on the Y AIS then the gender will be on the Legend So you come here to format visual The Columns change colors this will be dark blue for female for male will be a light blue and confirming will remain orange and then for this xaxis you don't need that title so we'll remove it and then for the Y AIS is our title we change the name of our
title so it will be employee number no it will be number of employees and then you go to General and you change the general title so it will be the gender distribution by department so this one will hit up a bit Yeah so it looks good so far so if there's anything else you feel you want to change based on your personal preference you can change it but this for me is good enough yeah so I'll show you how to save you just come here to file we click on Savers and then our file will
be called HR employee [Music] reports so I'll just call it HR employee report and then you save it and then you can also export it as a PDF file which you can share with someone else so I'll come here file you export then export to PDF so when it's done exporting when you send it to some someone as a PDF this is how it's going to look if there's not something you don't like you can always go back and change so I'll just save this as HR and play report then you save it so this
is how my report looks like you can add or change things however you like so the last thing I'll do is just show you how you can save your files in GitHub so you'll just sign in to GitHub then you open your repository and you come here to add file and you select on upload files so when you select on upload files you'll come here to where you've saved your files so for example this data cleaning one I'll just drag it in here the data cleaning and then also this data questions we just drag it
in there and then this will be SQL queries and then you commit the changes so after that you can also upload your PDF file and all that and then you'll add a read me file you just take a screenshot of your dashboard and you put it here then you explain the data used the cleaning what you used and then the questions you answered the summary of your findings then your limitations for example the ages we used we only used 18 and above and then the ter dates we did not use the ter dates in the
future so that's those were some of the limitations and then you can also add a section for recommendations so if this was an analysis for a certain company and then you tell them that I recommend you to do this and this with regards to the analysis that I've done for you so that is the end of this project Please Subscribe and share and like this video and I'll see you guys in the next portfolio project