I Analyzed My Finance With Local LLMs

476.99k views3239 WordsCopy TextShare
Thu Vu data analytics
GitHub repo 👉 https://github.com/thu-vu92/local-llms-analyse-finance Learn Python for AI Projects ...
Video Transcript:
as I get older I realize money is not everything but it's kind of almost everything so every year or every other year I download all my bank transactions and review my incomes and expenses the other day I came across someone who made this income and expense breakdown and I feel really inspired to do the same usually the most tricky thing in the process is to classify the expenses from my buying transactions into appropriate categories a lot of times I just use my manual labor or some low take ways to do that this year I decide
to ask CHP to Crunch the number for me and maybe tell me when I can retire only then to realize that I can't just upload my bang statements to chat B website it's so sensitive information about places I've been to shops I visited how much I spent on buying secret items and other personal data although using open a apis may help still the data sent via the API is stored by open a for duration of up to 30 days So eventually I decide to download and run an open- Source large language model locally on my
laptop and the best thing yet they are free in this project video we do a few exciting things first we'll learn to install and run an llm for example llama 2 locally on our laptop then we use the LM to classify all the expenses in my bank statement into categories such as groceries rent travel and so on we then analyze the data in Python and create some visualizations to show the main insights I shared all the codes on GitHub so you can check out the GitHub repo in the video description this video is sponsored by
corer corer is running a discount with $200 off the corera plus annual subscription if you sign up this is half the regular price for the whole year with this subscription you can get access to tons of courses and certificates in data analytics including the Google data analytics and and the Google advanced data analytics certificates these certificates teach you all the fundamentals you need as a beginner in data analytics so check out the offer below there are a few different ways to run a large language model locally on your laptop which means you can run the
ERM without internet connection without a third- party service like an API service or website like chat TBT as you can imagine this is secure if you want to protect your personal data and it's free there are now a few different Frameworks developed to help us run an open-source language model locally on our own device some popular Frameworks are Lama CPP GPT for and AMA so you might be wondering why the heck do we even need these Frameworks remember how large language models are trained they are basically the result of taking a huge amount of internet
data and train a large neuron Network on it and the model that comes out is basically a zip file with a bunch of numbers that represents the weight of all the parameters in the neuron Network this model file can be quite large depending on how many parameters the model has so Frameworks like AMA and Lama CPP basically try to do two things the first thing is quantization it tries to reduce the memory footprint of the raw model weights and the second thing is it makes it more efficient for us the consumer to use the models
if you have a Mac or Linux machine I'd highly recommend installing orama it's super simple and I'll show you in a bit if you have a Windows machine you can also run AMA through the docker desktop okay now let's go to AMA website and here you can download AMA which is available for Mac OS and Linux and windows will be coming soon also have a quick look at the list of models that are available so here we have Lama 2 mistol we have a bunch of other coat models and so there's a lot of options
here for you to try out and if we click on any of them we can see the description also how to use API what is the memory requirements so how much RAM you need in your laptop in order to run these models so let's download AMA and install it it's very straightforward just like installing any app on your laptop so once we've installed AMA we can start using it through our terminal in order to install a language model locally through AMA we just need to run the command or Lama pool and then specify the model
that you want to install so for example I will install again mistro it's pretty fast because I've already installed it last time so and you can see here the model is around 4 GB so in order to use a model through the terminal we just need to do AMA run mistro and here we can start typing our message or our prompt so let's say hello and the model replied with hello there how can I help you today o so that's a lot of things okay so I can ask something a little bit stupid what's 2
+ two okay it comes back with quite an elaborate answer now let's try another question I want to see if it actually can do math properly let me ask what's this time this okay wow that's amazing okay it even tries to teach me how to find the product of two large numbers step by step now let me check if this is actually the correct answer so let me go to Google okay so what I have here is around 426 billion however if I look at the result here it's actually not correct 45 Millions you can
check this result using a calculator to make sure it's correct well I've checked it and it's not correct I kind of have to say I'm impressed but in terms of basic arithmetic I think large language models out of the box are not probably not the best option all right the next thing we want to test and that is very important for this project is whether the mistro model can properly classify all the different expenses in my bank statements into different categories so I just try out a prompt here to uh see how it performed now
I ask can you add an appropriate category to the following expenses for example Spotify as entertainment beta boders as uh Sports and here I just give a list of transactions as uh shown in my banking statements so let's see what it comes up with okay so it replies with three categories it's roughly correct I would say it's reasonable but it missed one uh trans section here which is bistro bar Amsterdam and also it doesn't really reply in the format that I wanted so the uh transaction together with the category separate by this hyphen so I
feel like mistro it doesn't really do the task as I expected it to do so let's try another model which is llama 2 uh let's exit this model and we will AMA run uh llama 2 but if you haven't installed llama 2 you can uh do Ama L po Lama 2 and the Lama 2 model will be installed locally in your computer now we can start using llama 2 model now let's ask the same question as we asked before and it does it pretty well it gives me a list of the expenses together with the
categories although the first two are my own example but it does give me the correct format for the answers and so each of this transaction has a category added next to it separate by a hyphen so I'm pretty happy with Lama 2 and it actually understands the task although if you keep asking the same question to these language models uh multiple times these models may come back with different answers each time and so there's definitely a certain level of Randomness in the responses if you want to take a step further to customize these language models
to your specific use case you can do that by specify a model file and a model file is basically the blueprint to create and share language models with a so you can specify the base model you want to use and also you can set parameters like the temperature for the model now let's go back to the terminal and exit this model let's clear the terminal and I'll go ahead and create a model file with Nano and I'll name this model file as expense analyzer and we'll go into the text editor okay let's first specify the
base model as llama 2 so from llama 2 and next we set the temperature parameter as let's say 0.8 temperature closer to one is more creative and the lower the temperature the more coherent and less creative the model behaves and further let's also specify the custom system message so my system prompt is quite basic if you're a financial assistant you have classify expense and income from buying transactions okay let's save this file by contr X yes enter now that we have the model file set up we can can use this model file to create a
custom model we can do this by AMA create we specify the custom model name- f and then specify the name of the model file so that is Bens analyzer so now if we run this comment basically what is doing under the hood is that uh AMA will pass through this text file this model file expense analyzer passing through all the parameters and the custom message that we put in and then customize all these different layers in uh the base model which is Lama 2 and now we can start using this custom Lama 2 model that
we just created by uh ol Lama uh run expense analyzer I also forgot to mention that we can also now uh look at the model list available by AMA list and you can also see that now we have the expense analyzer Lama 2 model available in our list now interacting with these local LMS through the terminal is also fine but I find a more convenient way to interact with these models is through the python environment and more specifically through Jupiter notebook now let's create a project folder and I'll move my bank transaction data in inside
this folder and I'll just start up visual studio codes from this folder in order to access this language models uh from olama with python we need to install the Lang chain community library and so if you haven't done so we can use pip install and now we can access all the language models we have installed through AMA by specifying the name of the model with this olama method for example the first man on the moon was dot dot dot and after a few seconds we get back the completion of this sentence so that means our
model is up and running that is a good sign now let's move on to reading the transaction data that we have and take a look at it you can see that here in the name description column we have all the transactions that we want to classify we have the indicator whether it is uh expense or income and we also have the amount in Euros of these transactions now I have anonymized a lot of these transactions so this is not my real income and real expenses now we need to find a way to insert all these
different expenses into our prompt right let's get all the unique transactions from our data you can see that this is quite a big list and this may exceed the token limit that the large language model has so if we try to insert this huge list into our prompt there's a risk that the model comes back with a completely nonsensical answer or incomplete answer because your question is already taking too many tokens so after many try and errors I found that around 30 transactions would give the most optimal response so here you can see an example
response with 30 transactions so with this approach we are going to create a for Loop to basically Loop through all the 300 transactions here in our bank statements and we take in 30 transaction s at a time and so a handy way to handle this for Loop is to get an index list so this index list is basically giving us a sequence of all the index from zero until the last item hopping 30 items at a time and with this we can also conveniently handle the last group as well which might not be 30 items
but maybe less now let's initialize a data frame to store all the unique transactions and their responding categories and with this for Loop I'll call a custom function that I created and this function takes the names of the transactions of the Aram that we are using and the only thing we need to do is to properly pass the output from the language model into a format that we can work with so if we look at this output we definitely only want to keep these lines right and we don't want the rest like certainly here's the
categories but we don't need to worry about this for now because at the end we can always remove all the rows that have categories being none the complic ation is that sometimes the language model might use a different format for the answer so we had to use some kind of like validator for the output in order to make sure that the output actually is in the format we want for example in the response we have the hyphen in between the transaction and the category so one handy python library for this is pantic so the idea
is that after getting the response we will run it through a validation check if the validation fails we will rerun the language model to get another response until we get the right output if you're interested in how to do this with pantic you can check out the code in the GitHub rapo Link in description okay now let me run this for Loop and I also print out the transaction names and also the output by the models also notice that when I'm running this for Loop for some reason if I want to stop it I really
cannot stop it so if for some reason we want to interrupt this process we can go to the terminal and do Pama and so in the back end all the processes will be stopped okay after a while if everything goes well we should get back the data frame with all the transaction with the categories that Lama 2 has categorized for us I'll just save the CSV here now let's open the CSV file and quickly look at the output overall I'm quite happy with the categories that llama 2 came up with however some categories may be
quite similar to each other but not completely the same and I want to group them together so I just quickly group these Cate ra by hand now the last step is to clean up a little bit this data frame and then we can go ahead and merge this data frame into the main transaction data frame using the transaction name so after this we should have a data frame with all information the transactions and the categories so that was exciting but the next part of the project is even more exciting we create a personal finance dashboard
based on the transaction data that we just obtained the idea for this dashboard is we want to show the the income breakdown and the expense breakdown for both years 2022 and 2023 and at the bottom I also want to show how much I earn and how much I spend per month in each year so for creating interactive visualization I love using plotly Express for the creation of the dashboard we also use panel panel is an awesome library for creating data dashboard very easily and very fast now let's read in our transaction data and for the
income I don't have a lot of income sources so I can just use the name of the income as the category firstly we want to make the pie chart to show the income and expense breakdown I create a function here to basically take the data and select for the year and uh whether we are looking at the expense or the income in the data set and similarly I also create a function to make uh bar charts that basically uh gives us a histogram of the income or expense per month over the year and you can
see that it's super nice with plotly Express that you can also h on the different sections on the graph and see the data label now we almost done we just need to combine and organize all these charts on our dashboard so here just create a panel layout with the tabs which consist of two tabs we have the 2022 for 2023 tab we have exactly the same charts and graphs but for 2023 data and to show all this nicely on the dashboard we will use a template from panel called Fast lless template we can specify the
header of our dashboard we have the sidebar which is some information and also you can put in different elements or pictures Etc and in the main here we will put in the tabs that we just created and if we do template show we can see that this is our dashboard and here's my income and expense breakdown and here is how much I earn and how much I spend per month in each year here you can see that I do save a little bit more in 2022 than in 2023 which is not really a good sign
but if you're into personal finance and things like that you may notice that this overview is probably not complete because we also have assets so if you transfer your money to an investment account or if you pay uh the mortgage uh toward watchs your house then part of that money is also your asset as well and not only your expense although as you can see I can't retire anytime soon I hope this inspires you to do your own projects and experiment with open- Source language models I think in the future it will be a norm
for us to be able to use and run large language models locally on laptops and other personal devices if you enjoyed this project also check out other data science projects on my channel thank you for watching bye-bye
Copyright © 2024. Made with ♥ in London by YTScribe.com