Hey guys, This tutorial will show you how to convert a simple excel sheet into this interactive dashboard using Python. In particular, we will be using Pandas, Plotly & the streamlit library. The beauty of Streamlit is that you can create web applications directly in Python without knowing HTML, CSS, or JavaScript.
In the dashboard, the user has different options to filter the dataset. The KPI's on top, and the charts below are updated accordingly. While coding out the solution, you will learn very cool tips and tricks for data manipulation in pandas, how to use plotly to create interactive visualizations, and of course, how streamlit enables you to create a dashboard like this one.
Before coding out the solution, let us have a quick look at the data. We have got here the supermarket sales for three different locations. Additionally, we have the information on whether the customer was a Member or not, the gender of the customer, the different Product lines and the total sales, including taxes.
Furthermore, we also have the time of the purchase and the customer satisfaction rating on their overall shopping experience. At least, those are the columns & information we are going to visualize in our dashboard. To work with the data in Python, I will be transforming the Excel data into a Pandas dataframe.
An easy way to do this is by using the following add-in, called 'My Tool Belt'. I will leave the download link and the video tutorial for using this add-in in the description box below. However, you do not need this add-in to follow along with the tutorial.
In just a moment, I will explain how to read the Excel data by using pandas. But for now, let me go ahead and click the following button. As the next step, I will select the data and hit ok.
By doing so, the add-in created a python file in the workbook's directory. Before opening up this file, I will rename it to app. py.
In the Python file, you will see that the tool belt uses the pandas' library and the read_excel method. If you execute this script, you will get back a pandas dataframe. As mentioned, you could type the following lines of code yourself to read in the Excel data.
Yet, using the add-in saved me some time, as the tool belt inserted already the excel file name, sheet name, how many rows it needs to skip, which columns you want to use, and how many rows are included in your selection. To interact with excel, I am using the 'pandas' & 'openpyxl' library. Make sure you have those libraries installed by typing' pip install pandas and openpyxl' in your command prompt or terminal.
To create the web app, we will be using the streamlit library. You can install streamlit by typing 'pip install streamlit'. And last but not least, to create the visualization, we will be using plotly-express, which you can install by typing 'pip install plotly-express'.
Once done, I will be importing the libraries we just have installed. So, go ahead and type import plotly. express as px and streamlit as st.
As the first step, I will be setting up some basic configuration of our web app by using st. set_page_config. After setting up the title, I will give our web app also a favicon.
The favicon is a small square image that symbolises a website in web browsers. Instead of an image, streamlit also supports Emojis. In my case, I will pick a bar chart.
On the following website, you can find a vast selection of emojis. Just select the one you like and copy the codename into your streamlit application. I will also include the link to this website in the script.
Next, I will specify how the page content should be laid out. The default is set to "centered". However, I would like to use the entire screen.
We can do this by setting the layout to 'wide'. Before firing up our streamlit app, I will display our dataframe onto the page instead of printing it to our console. With this change in place, I will go back to my command prompt.
Make sure you are in the same directory as your python script and type streamlit run, followed by the name of your script. After hitting enter, we will see our web app, which contains the page title, page icon and the dataframe. Ok, so far, so good.
Now that we have your web app running, I will build out the sidebar section. We will use the sidebar to gather the user's filter criteria and apply those filters to our dataset. I am planning to filter our dataset based on the different cities, customer types and gender information.
I will be using the streamlit multiselect component for all the filters and starting with the city. For the label, I will type 'Select the city', and for the options, I will use the unique values in our Excel column 'city'. We can also set a default value when starting the web app using the keyword argument 'default'.
In my case, I would like to display all City Names. Ok, with that in place, I will be saving our script and refresh the page. As a result, we can now see our sidebar with the different cities.
So far, it has no functionality, but we will take care of that in just a moment. Before that, let me copy and paste the city selection and adjust the code to display the different customer types and gender information. After a quick refresh, we have got now our filter selection in place.
For each field, streamlit will return a list with the selected options. Currently, those lists are stored in our variables, city, customer type and gender. To filter our actual dataframe, I will be using the query method and store the filtered dataframe in a variable called 'df_selection'.
I can now go ahead and query our columns based on your selection. Pandas should filter the City column based on the city list. You can use @ to refer to a variable.
I will do the same for the customer type and gender. To test this out, I will be returning the filtered dataset to our web app instead of the original dataframe. After saving the script and refreshing the page, let me test the selection by removing the normal customer type.
As a result, we will get back a dataframe, which only contains 'member' as the customer type. Likewise, we could go ahead and also test the other selection fields. Now that we have this in place, I will display some KPI's on the main page instead of our dataframe.
Before calculating the KPI's, I will insert a title. As before, I will use the bar chart emoji in the title. To separate the title from the KPIs, I will insert a new paragraph using a markdown field.
For the KPI's, I would like to display the total sales, the average rating and the average sales by a transaction. The total sales will be simply the sum of the 'total column'. This line of code will return a float number.
But as those KPIs are intended to give the user a first glance at the numbers, I will display the sales amount without decimal values by converting it into an integer. For the average rating, I will take the mean of the rating column and round it with one decimal. Next to the average rating, I would also like to illustrate the rating score by emojis.
Therefore, I will multiply the number of the star emoji by the average rating. I will take a more simplified approach here by rounding the rating. So, a result of 6.
9 would be displayed as seven stars, for instance. To multiply this number with the star emoji, I will also convert it to an integer. Ok, and last but not least, I will calculate the average sales of a transaction by applying the mean to the 'Total' column.
With that in place, we can go ahead and insert the figures next to each in three separate columns - a left, middle, and right column. To do so, I am using the streamlit columns method. As the next step, I can now insert the content into the different columns.
Within the left column, I would like to display the total sales. I am using here an f-string to concatenate US-Dollar with the actual values we have calculated earlier. To make it easier for the user, I will display the total sales with a thousand separators.
Within the middle column, I will place the information about the rating on the very right side of the page; I will put the average sales per transaction. To separate those KPI's from the next section, I will also insert a divider by using a markdown field with three hyphens. Ok, before moving on, let us test this by saving the script and refreshing the web app.
Ok, and here it is. As expected, when we change the filters, we should also see our KPI's changing. In the next section of the web app, I will include two bar charts.
The first bar chart should plot the sales by product line. Before plotting the graph, we will need to do some calculations in pandas. To explain those calculations, I will shortly switch to my Jupyter Notebook, and we will come back to our script in just a moment.
Ok, here in my Jupyter Notebook, let us see how to aggregate the sales by product lines. One way to do this is to use the pandas groupby method. In our case, I would like to group the dataframe by Product line, and I would like to sum up the values.
After running this line, pandas will sum up all numerical values by Product line. However, we are only interested in the Total sales; therefore, I will filter the dataframe accordingly. As a result, the dataframe will look like this.
Optionally, we could also sort the values within the new dataframe. By default, pandas will sort the values from the highest to the lowest number. Ok, and this line of code will be the starting point for our bar chart.
So, back in our script, I will type out what we have just seen in the Jupyter notebook. I will store the bar chart in a variable called 'fig_product sales'. We will use the plotly express library to plot the data.
To create the bar chart, we can type 'px. bar', followed by the name of the dataframe. I plan to use a horizontal bar chart to display the total sales on the x-axis and the different product lines on the y-axis.
If you have a look at our dataframe, the product lines are currently used as our index. Therefore, I have typed it like this here. As mentioned already, we use a horizontal bar chart.
You can also use HTML Formatting Elements within the title, like displaying the title as bold text. To set the colours of the bars, you can use 'color_discrete_sqeuence'. I am using a little hack here by multiplying the hexadecimal code with the length of the dataframe.
On a side note, plotly comes already with different template styles. I personally like the plotly white template, which is a very clean & tidy chart design, in my opinion. Ok, now that we have our bar chart, I will plot it in the web app using st.
plotly_chart. Like before, whenever we change the filters, the bar chart will be automatically updated. To polish up the graph, I would like to do some minor tweaks.
For instance, I would like to remove the gridlines and the background colour of the chart. To do that, I will update the layout and set the background colour to transparent and remove the grid from the x-axis. After this minor tweak, the updated chart will look like this.
Ok, let us now move on and create a second chart to plot the sales by hours. The first challenge will be to separate the hour from the time column. As before, I will shortly switch to my Jupyter Notebook.
If we inspect the dataframe, we can see that the Time column has currently the datatype 'object', which is like a python string. That is why you will get an error if you try to extract the hour information from the 'time' column. First, we need to convert the column into a DateTime object using the build-in pandas method 'to DateTime'.
Additionally, I also need to specify the current format, so first, we have the hour, then minutes and then the seconds. Once we have transformed the 'time' column into a DateTime object, I can easily extract the hour information. The new column will look like this.
And if I print out the entire dataframe, we will see the 'hour' column on the very right. Ok, so let me copy this line of code. Back in the script, I will paste it right after we have loaded the dataframe.
In fact, as we are already here, I will improve the performance of our web app. You might have noticed already, but every time we change the filters in our web app, streamlit will rerun our entire script and load the dataframe again and again. We could avoid this by caching the dataframe.
Think about it, like storing the dataframe into a short term memory. Whenever we filter our dataframe, we will not read in our excel file again. Instead, we will get the information from your short term memory.
To do that, I will wrap the following lines into a separate function. Once we have the function defined, I will use the streamlit cache decorator. With that in place, I will call the function to store the returned dataframe in a variable.
And that is all there is to it. Ok, now, back to the bar chart. The bar chart will be very similar to the one we have plotted before.
First, let me go ahead and group the sales by hour. Then, I will construct the bar chart. There is not much change compared to the previous one, except we are not using a horizontal bar chart this time.
As before, I will also update the layout and use a transparent background and turning off the grid lines. Additionally, I will set the tick mode to linear. This change will ensure that all the tick labels on the axis are displayed.
Once done, I can plot the chart. Ok, now that we know that is also working, I will place the two charts horizontally next to each other. So, let me delete the charts here and insert a left- and right column.
In the left column, I will display the hourly sales and use the entire container width. And in the right column, I will place the sales by-products. To finish up the report, I will apply some styling.
First, let me remove the hamburger menu icon, the 'made with streamlit' footer note and the colourful header. We can achieve this by injecting some custom CSS into our web app. In the CSS code, we will set the visibility of the main menu, footer and header to hidden.
After a quick refresh, we can see the changes in place, and I think this looks much cleaner now. The last tweak will be in regards to the overall streamlit appearance. Instead of the default streamlit colours, we could use a custom theme.
All you need to do is to navigate the root folder of your streamlit application. Here, you want to create a new folder called ". streamlit".
In this new folder, create a new file called 'config. toml'. Open this file with your preferred text editor and paste the following code here.
I will make sure to upload this project also to my Github profile. You will find a link to that repo in the description box. So you could just copy the code from there.
Within the theme, you can set different parameters, like the primary colour, background colour, and so. Ok, and after saving the file, we need to rerun our server to reflect the changes in our web app. So, back in the command prompt, I press Ctrl + C to stop the current session and rerun our application.
And now, you will see that streamlit changed our overall theme. Ok, guys, and that's it for this video. You could now also go ahead and quickly deploy your web app to the internet for free.
I have created already a tutorial to show you how to deploy your streamlit app. I will leave the link to that video in the info card. As always, if you have any questions, please let me know in the comments.
Thanks for watching and see you next time.