BEST END TO END DATA ANALYTICS PORTFOLIO PROJECT | TABLEAU & PYTHON
142.52k views7186 WordsCopy TextShare
Mo Chen
I'M HERE TO HELP YOU LAND YOUR NEXT TECH & DATA JOB
🌐 My website: https://mochen.info/
LET'S COLL...
Video Transcript:
being able to create Dynamic user-friendly and beautiful visualizations always comes in handy when you're working as a data analyst in this end-to-end portfolio project you will learn how to gather data assess and explore the data manipulate it and then finally create a great looking dashboard in tableau hey my name is mochan and I work as a data and analytics analyst within the financial services industry this project will be great because you will learn how to programmatically gather data you will also explore it assess it and manipulate it using the pandas library in Python and then finally you'll create impactful visualizations within Tableau you'll use set actions user-defined Dynamic parameters and calculated fields to control your entire dashboard there's a link below to my GitHub repository where I have all of the files that you'll need for this project make sure to check it out if you want to follow along or if you just want to recreate everything alongside me now before jumping into the project the only thing I want to emphasize is that obviously just skip to the parts that you actually want to learn so if you don't want to code anything at all that's absolutely fine as I already have the final data set in Excel format in my GitHub repository so if you only want to visualize data make sure you skip to the part with the tableau visualizations okay so first things first let's walk through the data set that we'll be using the data is from kaggle and it's the London bike sharing data set and let's just quickly scroll down so we get to know the data set a little bit better so you can see here that we've got 10 columns and then we have timestamp the number of bike rides T1 and T2 which is the real temperature and what the temperature actually feels like a humidity percentage and then if I scroll along you can see that we also have the wind speed the the weather code is holiday this weekend and we have the season code at the end so this is the data set that we'll be using and now I'm going to jump into my code editor so I like using visual studio code and here's the code here but you can use whatever code editor that you prefer so instead of watching me type out all the code I've already written all the code and what I'm going to do now is run you through the code line by line so if we just start from the top then you can see here that I set here you can uncomment the below code if you want to install the libraries essentially the pandas the zip file and the cargo Library are the three libraries that we'll be using in this project to explore assess and manipulate the data so if you don't have these libraries then you can import pendants as PD and PD is just a common abbreviation for the pandas Library below that I have import zip file library and then underneath that I have import kaggle we import the kaggle library so that we can actually use the kaggle API to download the data set programmatically so below here I included a link to the official documentation for uh using the kaggle API you can open it and then everything is really straightforward you can just do it reading the documentation underneath I have this line of code which basically just uses the kaggle API to download the data set programmatically now you don't have to write this yourself if you don't want to and I'll show you why because let me just go back to the kaggle data set and then if you go to the top right here you can click copy API command and if I go back to my visual studio code you can see that as I paste it in it's the exact same code as above now I put an exclamation mark at the beginning of it so that I can run it as a command line command and just delete that quickly and then moving on I'm just going to be using the zip file library and I'll be extracting the files from the downloaded file London bike sharingdataset. zip whenever you download a data set from kaggle it's always going to download as a zip file you can see here in the top left that I have this London bike sharing dataset. zip so we're going to extract the file from here so once we extract the file what we get back is actually a CSV file called London merged.
csv and then after that I'm going to be using the pandas library to read in the London's London merge. csv as a pandas data frame called bikes then after that I'm going to explore the data using bikes. info so you just get some basic summary information about your pandas data frame underneath that I go bikes.
shape so you can see that the data frame has 17 400 and 14 rows and 10 columns and then underneath that I simply just want to see how the data set looks like so you can see here that this is the exact same data as the one we were looking at on the kaggle website then moving on um I'm counting the unique values in the weather code column so this means that there's 6150 values corresponding to one 4034 values corresponding to two so on and so forth now underneath that I do the exact same for the season column so you can see that there's 4 394 values corresponding to a value of zero four thousand three hundred and eighty seven values corresponding to a value of one so on and so forth moving on I'm specifying a dictionary where I put in the column names that I prefer so this is the dictionary I called it the new call sticked and then underneath that all I'm doing is I'm renaming The Columns to my specified column names then moving on I'm changing the humidity percent values to the actual humidity percent values because they need to be a value between zero and one obviously to be a percentage and then after that I'm creating two dictionaries one for season and one for weather so I can actually map the integers or rather the floats to the written values of the season and the weather so season dict will contain the mapping for the seasons and then a weather deck will contain the mapping for the weather and then underneath that I'm using the map function here to actually map the values and then underneath that I'm just basically double checking my data frame to see if the mapping has actually worked and you can see here that if I go into weather I can see broken clouds and clear now and if I go into season I can see winter winter winter and then underneath that all I'm doing is I'm writing out my bike's final data frame to an Excel file called London bikes final and then I'm gonna give the sheet the data sheet name and then if we go to the top left here you can see that we have London bikes final which is the final data set we have London bikes which is the code London merged it's the extracted file from the downloaded file and then London bike sharing dataset. zip is the file we downloaded programmatically from kaggle before jumping into the Tableau visualizations let me just quickly summarize what we're going to create so essentially we're going to create five visualizations we're gonna have a total number of bike rides here on in the top left and then we'll have the moving average chart here which is going to be our main visual underneath that we have this temperature versus wind speed heat map and then we're also going to have two extra visuals in the tooltip so as you can see as I'm hovering over my moving average chart you can see the total number of bike rides split by weather and by the hour and then the same thing happens if I hover over the heat map so that's one visual here London bike rides total two visuals with the moving average chart temperature versus wind speed heat map three visuals and then we have the two visuals running in the tooltips the moving average chart will be our main visual so we'll be able to use set actions to control and filter the entire dashboard we're also going to have these user-defined parameters here at the top to control the moving average so I can do a 30-day moving average for example or I can change this to weak and now I'm doing a 30 week moving average where I can do a three week moving average and then I have this little filter at the top right where you can filter the timeline so you can shorten it or you can lengthen it depending on what your preference is so first things first I opened up a new Tableau workbook and what I'm going to do is connect to the data so go Microsoft Excel given that our data is in Excel format and then I'm just gonna click on London bites final once table has made the connection you can see here that we have the exact same columns as we had on the kaggle website and then we manipulated the data set and then this is the exact same as the London mic spinal data set so let me just go into sheet1 the worksheet and then the first two things we're going to do is we're going to create two user-defined parameters so I'm just going to right click anywhere here create parameter and then I'm gonna call this brand where the moving average period and this is going to be of string type and I want it to be a list with three values day week or a month and then hit OK that's one of the parameters and then I go back here again right click create parameter and I'm going to call this parameter moving average duration I'll give this an integer type and say for example the current value is 10 and I'm going to hit ok now I'm going to right click and show both parameters so you can see here in the top right that I have moving average duration and moving average period now the next thing I'm going to do is go to time and then I can see here that it has this little ABC Sign Here which means it's a string currently so I'm going to right click and then go to change data type and go to date and time so now you can see that I have this little calendar icon and this is what we want so I'm going to right click and then create the moving average period measure so moving average period and the formula I'm using is date trunk and then I'm passing in my moving average period parameter and then time and then I'm going to hit OK so you can see that I have this moving average period here and actually instead of date and time let me just change the data type to date only and then I'm gonna go up here to the top left right click and then drag it into the columns select continuous and hit OK so let me also put the count here on the rows and then you see we have a pretty simple line chart at the moment okay so the next thing I'll do is I'm going to create my moving average period set so you go to moving average period the measure right click right click and then you go create and then set I'm not going to select anything for now I'll just hit OK now the next two things we'll do is we'll create a Min month and a Max month calculated field so let me just right click create calculated field I'm going to call this field Min month and the formula we'll be using is mint if moving average period set then give me the moving average period and and then we need to put these in curly brackets as we want the formula to return all marks so that is the formula essentially what the formula is saying is that if in the moving average period set then give me the moving average period so let me just hit OK here we have the Min month and then go to Mid month right click and then duplicate and then right click on the duplicate hit edit I'm call this Max month and the only thing we'll have to change here obviously is change the Min to Max and then hit OK then you right click on Min month drag it into the detail shelf select continuous hit OK do the same with Max month and yep hit OK and then go to moving average period set and then right click and then go show set so you can see here in the right that we have the moving average period set after that go to worksheet in the top left and then go to actions and now we're going to add a worksheet action to actually update our set values so you go add action change set values I'm going to call this update moving average period set and then Target set will be the moving average period set clearing the selection will keep the set values hit okay and hit OK again so now as I'm selecting something from the beginning here you see in the right column right here that all of these values are ticked now so if I take something else say for example the end values then if I scroll to the end here you can see that all of these values are selected now but what we actually don't see is what we're actually highlighted on the chart so that's the next thing we're gonna do so in the top left here go to analytics and then go to reference band and then go to table and moving average period drop it in and then the band from will be the evening month I don't want any labels I don't want any tooltips and the Band 2 value will be the max month again No Label no tooltip I'll just go with a dashed line and then for the film I'll just go with some light gray for now I'll do all the formatting at the end and then untick this box that says show recalculated and hit OK so now you can see that as I'm updating my set you can actually see it on the chart but what you can't actually see at the moment is a different color for the bits that you selected so that's the next thing we'll do so make sure you go back to data and then right click create calculated field I'm going to call this in range and the formula we'll use is moving average period greater than or equal to Min month and moving average period less than or equal to maximum then it's going to be in the range I'm gonna hit OK here and I have this in range calculated field make sure you drag this into the color Shelf and now you see that we have an orange color for the bits that we actually highlight but we have a blue line going across now this is not good because currently in range is discrete but it's not a measure so to keep it discrete and turn it into a measure you go to in range click on the drop down and then hit attribute and then there you go you see that the bits we highlight now is just purely orange there's no blue line going across okay so this is all well and good but currently we have the total number of bike rides and we want the moving average so the quickest way to do this is go to the sum of the count at the top here click on the drop down quick table calculation and then hit moving average and then go to the top and then drag this field right here I'm gonna call this the moving average right and then once I have the moving average rights right here right click and then hit edit so what we want to do here is we want the moving average here so the minus 2 value to be dynamic and we want this value to be our moving average duration user defined parameter so -2 actually here means a three day window it takes the current and then it goes back two days so what I'm gonna do here is I'll go moving average duration plus one and then that will be my moving average duration formula hit OK and there we go let's just test out that all of our user-defined Dynamic parameters here in the bottom right actually work so say for example if I change this to week then you can see that the chart is a little bit smoother which makes sense if I change it to month again it's even more smooth because we have less data points and then back today I mean just instead of 10 day moving average I'll give it a 25 day moving average and the chart is a little bit smoother so that works or I can even do a 50-day moving average and there we go we can see that everything works very nicely alright so the next thing we're going to create is going to be this total number of bike rides chart here in the in the top left as we've already done our main visual which is the moving average visual so let me just go back to my tablet workbook meanwhile I renamed my moving average worksheet to moving average make sure you give everything meaningful names let me open up a new worksheet and then within here the one thing we need to add is another calculated field so create calculator field and I'm gonna call this in range right and the formula will be sum and in range Prime speed count I close the brackets here and again I put on the curly brackets on the outside as I want to capture all marks and then hit OK so now what I'm gonna do is go to the top left here and then drag the in range into the filter select true okay I'll drag maxmont onto the detail shelf select continuous okay minmont onto the detail shelf select continuous okay and then the thing I'll do is I'm gonna find my in range rights and then put it on label and then for the title I'm just gonna double click on my title and then delete all of this I'm gonna call it the London bike rides between and then I'm gonna find my period so you can go to insert Min month and Max month 0 Pi there you go you can see that we have the London bike rides between this period and this period was 3. 2 million bike rides so the periods here are actually the periods that you select here so as you update your set there we go everything here will also update So currently we're going from the 8th of December to the 20th of May 2016 and then if we check here yeah that looks a bit right it's December here and this is the June line here so that looks good okay so the next visual we're gonna do will be the heat map on the bottom so this heat map is a temperature versus wind speed heat map and we're going to use the same periods as in the London bike rides period so this set action will control everything on your dashboard so let's go back to Tableau and then let me give this sheet a meaningful name to like total rights let me open up the new worksheet and then we'll do two things quickly here so given that we have so many temperatures and so many wind speeds I'm gonna quickly bin them up so put them into bins so I'll go temperature real right click and then I'll say create bins and I'm happy with whatever Tableau creates for me I'm gonna call this field uh temp c hit okay and I have my temp c um field right here and then the next one I'll do is wind speed kph I'm gonna go create bins and I'll just call this wind and kph and again I'll hit OK so after this I'm gonna drag the wind speed onto the columns and then the temperature onto the rows and then I make sure I'm going to drag the count onto the color and then when you go into marks make sure you select the square and then let me just go for the entire View and then quickly to put on the same filters again in range drag it into filters select the true grab the minmont drag it onto the detail shelf and continuous okay and then next month drag it onto the detail shelf continuous and hit OK and one last thing drag the count into the label so we can actually see what the values are and then I know currently this heat map doesn't exactly look like this heat map but we will do all of the formatting at the very end so moving on we only have two more charts to create and these will be the tool type charts so you can see here that the bike rides are split by whether an hour if you hover over the heat map and then the same thing happens if you hover over the timeline so let's create the weather and the hour charts back in Tableau again call this the beatma worksheet let me open up a new worksheet let's say 2D weather first this one will be a pretty simple chart so make sure you grab a weather from here put it into rows and then grab the count put it into columns I'm going to sort everything in descending order go with entire View and then there you go we have this nice chart here I'm going to grab the weather put it on the color Shelf looks a little bit nicer now let me just go to the top right here and I'm gonna hide this card and there we go we have our weather visual now and let me just quickly add on my filters so in range I'm going to drag it in here make it true hit OK Min month drag it on to the detail shelf continuous okay I'm gonna drag the max month as well onto the detail shelf continuous and then hit OK there we go it looks uh pretty nice now let me open up a new worksheet I'm going to call this hour and I'm gonna do the exact same thing here I'm gonna drag time put it into rows and I want it to be discrete and I want it to be by the hour and then I'm going to hit OK I'm gonna drag the count into the columns make sure I change this to a bar chart and there we go you see that it looks pretty nice now we'll go with the entire View and then drag the time into the color shelf so that they're all different colors and then after that I'm going to drag my in-range filter in here select the true minmont onto the detail shelf continuous hit OK next month again onto the detail shelf continuous hit okay I'm gonna go to the top right and I'm gonna hide this card and there we go we have our our visualization now the only thing I'm going to add is I'll put the label actually within here so we can see what's happening and then let me go back to the weather and I also missed the label from within here so now we can see the labels as well so that's all of the five visuals created so we have our main visual the moving average chart we have the number of total bike rides we have the temperature versus wind speed heat map and then we also have the number of bike rides split by weather and by hour okay so now before formatting the dashboard let me just quickly format all of the worksheets starting with the moving average so I'm going to right click anywhere here go on to format and I'll go on to the shading bit I want it to be this light color of shading here and then I go into the lines I don't want any grid lines I also don't want zero lines trend lines ref lines drop lines I don't want any of these basically I like a nice clean chart I'm going to click on the y-axis here edit access and I don't want the title here then I'll do the same with the moving average period right click edit access and I don't want to see this either now I also don't want to see this set so I'm going to go hide the card and then it looks a lot neater I also don't want to see this attribute here I'm going to hide this card as well okay so the next thing we'll do let's just uh go to the color shelf here and let me just give it some nicer colors so if it's false I want it to be a nice purple and I already have this here I'm just copy it in this will be the color if it's purple if it's false and there we go you can see that if it's not in the range it's going to be this purple and then if it's true so if it's in the range I want it to be this type of green let me just copy that in there we go hit apply hit OK and there we go I think that looks uh pretty nice now now also with the reference band I want to edit that to a different color so I want the line to be the color that I just did the screen line and then the fill I have a light green fill here so let me just go more colors paste that end and then there we go so this is how our chart looks like now I think it looks pretty good and then one more thing to do just in the title I would like it to be dynamic so I'll insert the moving average duration and then the moving average period hit apply and you can see that we have 50 today and after that I'm just typing moving average and then all of this um I'm gonna give it a nice cream color and then make it bold hit apply and then now you see we have the 50-day moving average but as I change this my uh chart title should be changing as well so if I go 50 week for example and then you can see that it says 50 week moving average now so I'm just go back to 50 a day and let me just move on to my next chart which is going to be the number of total bike rides so again let me just format this London bike rides um I'll make everything green here I'll make this uh bold and let me make this much smaller we don't need it that big so I think that looks nice London bike rides yeah that looks good hit okay let me go on to my text and then I'm gonna make this much bigger so say I'll give it say for example 36 give it a bold again I like the screen so there we go London bike rides between the state and the state was 2.
6 million copy and paste the formatting from the other worksheet I go on to my moving average worksheet right click copy formatting and then I can paste the formatting so you can see that the background updated now so before I move on to the heat map I'll do that again copy formatting paste the formatting in and I'm on to my heat map I don't want to see this card at the top right here so I'm gonna choose hide the card and then for the colors I will obviously go on to the color shelf edit colors and the first thing I'm going to do is choose purple and then after choosing purple I'll go custom sequential let me hit apply and I think this looks pretty nice now so hit OK and there we go now the only thing I see here is I don't want two decimals here so I'm going to right click go to format and then here numbers I want it to be you can just go into custom and then choose one decimal place and then there we go you see that we've got only one decimal now do the same here format and then now the wind speed go to custom one decimal place and then there we go you see that we have a pretty nice chart going on here now let me just um go into weather and again we can copy and paste the formatting so right click on the heat map worksheet copy formatting and then paste the formatting okay so now I don't want to see this count here so I'll go edit access and then delete this and then there we go you see that we have a nice little chart going on here at the moment moving on to our I'll do the same copy formatting and I'll paste in the formatting I also don't want to see the count here edit access I'll go delete this and there we go okay let's just quickly look through our charts to make sure that everything looks uh relatively nice okay so the only thing we didn't do is the tooltip so far so let me go into the tooltip and within here I don't actually want to see all of this information so I think what I am going to choose is I will delete them in month and the next month and then the in range I'll also delete this one and I have the moving average rides and the moving average period the period would just be the date so let me just change this to date let me hit OK and let's look at the tooltip yeah that looks pretty good to me so now we only have the date and the moving average right so what we're missing is the two actual charts that we want to include in the tooltip so let me go back to the tooltip let me just insert first I'll go say for example with the weather chart and then underneath that I'm gonna insert the hour chart for the weather chart I think I'll go 300 by by 240 hour one I think I'll go 300 by 400. and then let me just change the color of these to a nice green make it bold hit OK and let's go through the tooltip again I think that looks pretty nice now now let's move on to Total rides and let's um adjust the tooltip here so essentially I don't actually want any tool types here so I'm just going to delete everything hit okay and now if we hover over the number nothing is seen onto the heat map and again let's adjust the tooltip so what I actually want to see here only is the I want to see the temperature I also want to see the wind speed but I don't want to see anything else so I'll make these bold give it a nice green and then again insert I'll go weather first this is 300 by 200 and then let me insert the hour as well and this will be 300 by 400. there we go and then hit OK and let's hover over so now you see that as we're hovering over it looks pretty nice now we have both of these charts let me go on to my labels and let me just make my labels bold there we go hit okay and then actually I want to put them in the middle so that's just Center them and now moving on to whether I think the only thing I need to do here is in the tooltip I think I'm just gonna take out everything hit okay my label I would prefer it to be bold hit OK and yeah I think that looks good as is and then moving on to our again tooltip I don't want to see anything in the tooltip hit okay go to my label and I'm just gonna make it bold and there we go that's it so that's all five visuals done now all right so the next thing we're going to create is the dashboard let's come down here and select create dashboard and then here within the size I think I am just going to select fixed size and then within that I'll go to generic desktop and now the first thing we're going to do is drag in the total rides right here I'll make this entire View and then I'll drag in the moving average next to it and I want to make sure that uh the moving average duration here is going to be floating so let me move it in here moving average period also floating move it in here and then I'll make sure I delete this container here so after this I'm going to grab my heat map and I will just move it underneath everything and let me make this one entire view as well and actually one thing I noticed I didn't get my heat map a proper title so let me just click into that and I will call this temperature versus wind speed between and then let me just insert the dates minimum and maximum there we go I'll make everything green here and say for example let me make this bold 15 and then I'll just make this bit 10.