Learn TensorFlow and Deep Learning fundamentals with Python (code-first introduction) Part 1/2

882.86k views93998 WordsCopy TextShare
Daniel Bourke
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to th...
Video Transcript:
hello now this video is already long enough i mean ten hours if you watch part one and then plus another four hours for part two youtube's max upload is 12 hours at the time of recording so i won't keep you too long but there are some things you know before you you dive into it how i recommend going through this is when we write the code it's well actually first of all if you're here to learn tensorflow and deep learning you've come to the right place by the end of this two-part video series you'll have
written hundreds of lines of tensorflow code if you follow along with the coding videos and you'll have a code first introduction to some of the main concepts in deep learning now again how i would recommend going through these videos is to kind of have the youtube window on one side once we get to the code part you'll see it when it comes up if you want to skip this intro go to the time stamp below and then have on the side on the other side of your screen is google colab or a jupyter notebook that's
where we're going to code if you want all of the course materials that are available on github there'll be links below for everything by the way and if you need to ask a question go to the discussions tab on github or leave a comment below if you do have a question about anything to do with the video whether it's on the github discussions or youtube comment please leave a timestamp so that way i can check where you're referring to and then go to that point in the video and help you out from there but without
any further ado get ready to write lots of tensorflow code oh and yeah ps i forgot this this is part of a full course on tensorflow and deep learning that i teach on zero2mastery.io so if you like what you're going through in this video or in part two and you want to sign up to the full version which covers a lot more materials in the order of 20 plus more hours of tensorflow code and other specific parts of deep learning there'll be a link to sign up in the description below this is for real this
time enjoy all right all right all right are you ready i hope you are because we are about to learn deep learning with wait for it tensorflow now we could go back i want to watch that again deep learning with tensorflow now if you like that little animation that's just a taste of what to come because we're going to have lots of fun learning deep learning with tensorflow now if you're here for that you might be asking yourself the question what is deep learning hmm well for these type of questions and you'll probably notice this
trend throughout the course oh we've come to our friend google here what is deep learning oh there we go a type of machine learning based on artificial neural networks in which multiple layers of processing are used to extract progressively higher level features from data whoa there's a lot going on there you might be thinking daniel why did we go straight to google at the beginning of this course well for questions like this one where there's some sort of definition this is what i want you to do if you're not sure what it is i want
you to search for it and do some research because what we're going to be focused on in this course is getting hands-on as quick as possible so we're going to be writing lots of code so let's come into my definition of machine learning for this course machine learning is turning things data into numbers and finding patterns in those numbers and you might be thinking again now daniel i've signed up to this deep learning with tensorflow course why have you got machine learning there well we'll answer that question in a second but in terms of finding
patents how does this happen well the computer does this part how code and math now we're going to be writing lots and lots of code to do this more specifically code or deep learning code with tensorflow so let's have a look at machine learning versus deep learning we have this fun little diagram that you might also find on google there we go artificial intelligence artificial intelligence machine learning deep learning there we go very similar one here so if we have the broad field that is artificial intelligence so trying to get a computer to think for
itself and now if we have machine learning which comes under artificial intelligence remember the definition for machine learning from before was writing code for a computer to figure out patterns and data well deep learning is usually considered a subfield of machine learning so this is about all you need to know for the time being i mean if you want to do your own research what deep learning is go ahead and do that but again with this course we're going to be focused on getting hands-on writing deep learning code so if we come here what's the
difference between traditional programming versus machine learning or deep learning programming so with traditional programming you might start with some inputs you might code up a bunch of rules in our case we've got our favorite sicilian grandmother's chicken dish here and she's got this recipe and she knows it off by heart and she's passed it on to you because you want to host a dinner party at your place and invite everyone over so you might start with the inputs which are the ingredients we've got some vegetables here we've got the chicken of course and we've got
the rules this what we have to do cut the vegetables season the chicken preheat the oven cook the chicken for 30 minutes add vegetables and then if you've done all this correctly you'll get this beautiful output but where machine learning algorithms usually differ is that you'll start with the inputs and the ideal output so this is the major difference here we're going to be very familiar with this concept of inputs and outputs by the end of the course and so what happens with the machine learning algorithm is you'll show it examples of what the inputs
look like and what the ideal outputs look like so this is a beautifully cooked chicken here from this input so you'll start with this and then the algorithm will figure out hopefully we'll figure out the ideal rules to get from this input to this output all right so that's the machine learning in a nutshell the next video we're going to cover why use machine learning or deep learning so now we've got a little conceptual overview of deep learning very brief the next question to answer is why would we want to use machine learning or deep
learning so the good reason is why not i mean you might have seen what machine learning is capable of you might have heard of the power of deep learning the power of artificial intelligence and just all the problems we've got in the world why not we just use it um maybe but a better reason is for a complex problem such as maybe we're trying to teach a self-driving car to drive can you think of all the rules you'd have to code if we remember back to our sicilian grandmother's famous chicken recipe maybe you can coat
up the rules for that such as preheat the oven cut up the vegetables cook the chicken for 30 minutes but in terms of self-driving car like i mean when you go for a drive do you think about the rules in your head or do you just drive like i mean you need stop signs you'd need traffic lights you need what to do for another car you can imagine how that problem gets quite out of hand pretty quickly so for a complex problem can you think of all the rules probably not and so i found this
great comment on one of my youtube videos here the 2020 machine learning roadmap so be sure to check that out if you haven't checked it out this is from yashawi i think you can use ml so machine learning if you see ml throughout this course it's machine learning for literally anything as long as you can convert it into numbers and program it to find patterns literally it could be anything any input or output or we've seen that again input or output from the universe wow important piece to note here is as long as you can
convert it into numbers now again if you're reading this and you're like daniel this doesn't make sense i've never experienced deep learning before don't worry we're going to have plenty of practice throughout the course but here's the exciting part i think you can use ml for literally anything as long as you can convert it into numbers i want you to keep that sentence in your head however when should you not use machine learning if you can build a simple rule-based system well actually maybe not very simple if you're building a complex software product that doesn't
require machine learning do that where's this come from well this is from a wise a very wise software engineer it's actually rule one of google's machine learning handbook so this is our first external resource for the course there's going to be a bunch of these so don't worry i'll link them to wherever you need them so this is the rules of machine learning best practices for ml engineering we're not going to go through this but i'm going to link to this if you want to read through this in your own time feel free to do
so this is a very exhaustive resource to get you up to scratch on how google thinks about using machine learning so we'll come back so yeah if you think about using machine learning but you think you can code up a simple rule-based system you should probably do the rule-based system rather than machine learning so what is deep learning good for problems with long lists of rules so when the traditional approach fails machine learning slash deep learning again whenever you see throughout this course machine learning slash deep learning you can you can kind of think of
them as the same thing may help continually changing environments so if you're the problem you're trying to solve is constantly changing i mean the information you're dealing with is continually changing the good thing about deep learning is that it can adapt to new scenarios so if the problem you're working on changes you can train another deep learning model on those changes discovering insights within large collections of data so imagine the problem of trying to take photos of food and identify what's in that photo can you imagine trying to hand craft the rules for 101 so
you just wanted to start with you were building an app called 101 dishes so you've got maybe spaghetti ramen steak eggs kale coffee can you imagine trying to hand craft the rules for 101 different kinds of food look like i can't now what is deep learning not good for typically again take this with a grain of salt because as you'll see as you get deeper into your part in the pun deeper into your deep learning journey you'll see there are lots of ways if you're stuck on a certain problem about what we're about to see
on this slide you can kind of formulate ways to circumvent that problem when you need explainability so the patterns learned by a deep learning model again we haven't seen these or you will get familiar with them are typically uninterpretable by a human so if a deep learning model is finding lots of patterns in numbers which we'll see throughout this course typically those patterns are quite hard to interpret once you look at them when the traditional approach is a better option remember if you can accomplish what you need with a simple rule based system you should
probably do that errors are unacceptable so since the outputs of a deep learning model aren't always predictable meaning they may contain errors maybe deep learning isn't your best option if you need a software system to do the same thing every single time potentially a non-deep learning based system is what you're after and when you don't have much data so deep learning models usually require a fairly large amount of data to produce great results however we'll see how to get great results without huge amounts of data so as i said typically not good for there are
ways to kind of try and circumvent these points here so let's have a look again machine learning versus deep learning so in terms of when to use machine learning versus deep learning traditional machine learning algorithms have typically performed best on structured data so what i mean by structured data is data or information you can find in an excel spreadsheet or a google sheets so you've got rows and you've got columns so this is clearly row 2 for example we've got car sales we've got a make we've got a color we've got odometer doors and we've
got a price so maybe we want to build a deep learning model or a machine learning model to take these parameters here which are often referred to as features so columns features make color odometer doors and predict the price whereas deep learning typically performs better on unstructured data such as natural language text or in the case of we wanted to build our food 101 application to take photos of food and identify what's in that food or if we have a wikipedia article talking about deep learning so this is example of data that you could argue
that there is structure to this picture like i mean you've got a shape here you've got shapes there and here you've got a sentence there but it's certainly nowhere near as structured as what you find in an excel spreadsheet and then again you could have something like voice so sound waves if we go here we'll dig a little deeper into the structured versus unstructured data let's have a look at some of the most common algorithms you're going to come across in the machine learning and deep learning world of course this slide only goes through just
the names of the algorithms but we're going to be getting hands-on with building some of the most common deep learning algorithms you see coming up in a minute so on the machine learning side of things you might have the random forest the naive bayes nearest neighbor support vector machine and many more and since the advent of deep learning these algorithms here are often referred to as shallow algorithms now what that means for now is not too important i just want you to get start getting familiar with some of the terms you're going to hear in
the machine learning a deep learning world so when they come up you're not going wow what does this mean now on the deep learning side of things we have neural networks fully connected neural networks convolutional neural networks recurrent neural networks and the transformer architecture is a fairly new one that's only come out in the last couple of years and for this course we're going to be focused on building these four type of neural networks here with tensorflow these are kind of like the bedrock type neural networks the ones that a deep learning field over the
past decade or so has been built on top of and again if we're looking at through the structured data versus unstructured data paradigm these algorithms here the machine learning algorithms are typically uh better performing on structured data whereas neural network type architectures are typically better performing on unstructured data however depending on how you represent your problem many algorithms can be used for both structured data or unstructured data now we've covered the names of a few types of different neural network architectures so you're probably wondering or the question in your head is probably what exactly are
neural networks so let's tackle that in the next video all right so in the last video we saw or we heard that neural networks are a common deep learning algorithm so what exactly are neural networks so remember for these type of what questions or definition questions i want you to get really familiar with going in what are neural networks explained neural networks neural network okay wikipedia these are generally quite wordy definitions a neural network is a network of circuit or neurons or in a modern sense an artificial neural network composed of artificial neurons or nodes
ah so you might get diagrams like this and so that's just a one sentence definition but let's have a look of an overview of what neural networks are so we might start with some data here whether it be images of food or some natural language text or some sound waves like what's going into the microphone here and then if we want to use that data with a neural network we're going to have to somehow turn this into numbers so this is called a numerical encoding there are many different names for it but we're going to
treat this as a numerical encoding and you might be wondering what are these square brackets well that's to represent that we've got this data here and we've turned it into a numerical encoding which is often referred to as a tensor but we'll see that we're getting ahead of ourselves here and so before data gets used for the neural network it needs to be turned into numbers okay remember our one sentence definition of machine learning machine learning algorithms are about turning data into numbers and finding patterns in those numbers so this is our numbers here now
we might feed these numbers here that represent our data into a neural network if we come here this is a simple neural network this is a very similar diagram here but probably a little bit more colorful and what is this neural network going to do well it's going to learn a representation all the patterns features weights these are other terms you might hear of what kind of representation it learns and again depending on what problem we're working on whether it's recognizing images or discovering meaning from text or trying to turn a sound wave into text
you're going to have to choose the appropriate neural network for your problem we'll see how to do that in upcoming lessons so this is okay we've got three steps so far we've got data as inputs we've turned that data into numbers we're going to feed that to our neural network and this is going to learn a representation or find patterns in these numbers hmm what does it do then okay then it's going to create some representation outputs you might notice that these numbers have been transformed and so this might not mean anything to us now
we're going to inspect this in a moment with tensorflow code but what these representation outputs are are basically the neural network going you know what i've taken in these numbers here i've found the patterns these are the patterns that i found in a numerical representation form okay all right and then what do we do next well it's up to us to take these representation outputs that our neural network has learned about our data and convert them into human understandable outputs so say for example in classifying images food we might have a photo of ramen here
might have to turn that into the numbers in some way we have to feed it through our neural network and get our neural network to discover patterns in that so it outputs into a representation output and then it's our job to write some code to convert these representation outputs to hopefully if it's learned it right it's going to output the label ramen and the same goes for this photo of spaghetti we could pass that through and it goes to there and maybe for this tweet we want to say we were building a system that was
to read a tweet and say we wanted to know where natural disasters are going on in the world can we convert this tweet into numbers feed it through a specific type of neural network convert it into some sort of representation output and then label it as say a disaster-based tweet or not a disaster and then finally say we're building a smart speaker or something to recognize sound waves how would we turn those sound waves into numbers pass it through our neural network get our neural network to discover patterns in those numbers create a representation output
and then translate or transcribe i think transcribes a better word there transcribe that to hey siri what's the weather today whoa there's a lot going on here but again this is the whole premise of neural networks here we have some inputs we turn those inputs into a numerical encoding our neural network that we've chosen for our specific problem will learn a representation referred to as patterns features or weights will create some kind of representation output based on the patterns that's figured out and then it's up to us to define what we want our outputs to
look like based on this so let's look at the anatomy of a neural network so if we have it here the first layer of a neural network is referred to as the input layer so when you see diagrams like this this is actually a very common diagram of what a neural network is if we go here neural network back what are neural networks there we go so we have inputs something in the middle outputs remember in our previous slide our input the data goes in here in this case this would be labeled with these are
two hidden neurons again we're going to get very familiar with these terms as we keep going and in the middle here there's hidden layers you'll notice this is plural so usually there's one input layer or depending on what data you work with you might have multiple input layers hidden layers this can be an arbitrary amount of layers so you could have one hidden layer or you could have 152 hidden layers so these hidden layers learn the patterns in the data and then it has a output layer so this outputs the learned representation or prediction probabilities
again we're going to get very hands-on with all these terms i don't want you to get too flustered with the terms that i'm putting out here because we're going to see all of these in action and in this case with the input layer there's two neurons and there's three hidden neurons here and there's one neuron in the output layer and i want to put a little tidbit here is that when i say learns patterns and data it's kind of arbitrary term because you're going to hear lots and lots of jargon and different terms for different
things in machine learning and i'll try use as many different terms as possible but also tie them back to kind of a single term so when you hear me saying patterns you'll often hear different things like embedding or weights or feature representation or feature vectors and these are often all referring to very similar things now when it comes to neural networks we've seen the anatomy but there's also a few different types of learning the first one is supervised learning semi-supervised learning and then unsupervised learning and transfer learning whoa so supervised learning often involves having data
and labels so in the case of identifying different images of food we may have the data would be the images of the food and the labels would be the labels of food so ramen spaghetti coffee steak pizza associated with each one of those images semi-supervised learning has some data or actually could be as much data as supervised learning but it only has some labels so for example maybe we had 10 000 images of food and we only had labels for 1 000 of those images and we wanted to train a neural network on the images
which have labels and then we use that neural network to try and label the other images of food the ones we don't have labels for and then unsupervised learning is you basically only have data and no labels you kind of pass your data to a machine learning model or a neural network or a deep learning model again very similar terms for very similar concepts and you kind of go hey i'm not sure what patterns are going to be within this data but see what you can find anyway and then transfer learning which is actually a
very very important concept we're definitely going to get hands on with this one the beautiful thing about deep learning models is when they learn patterns in some sort of data set those patterns can be used for another problem type so transfer learning would be taking what one deep learning model has learned on some set of data and then using it in another problem on another set of data so in our food image discovery or image classification problem we may take what a machine learning model has learned on just pictures of the world so general pictures
and apply that to our specific images of food now in terms of this course we're going to be writing lots of code to do supervised learning and transfer learning but again the premise of what we're doing starting with inputs turning that into numbers and then having some sort of algorithm go through those numbers find patterns create outputs can also be used for these other two types of learning here all right so now we've heard a little bit about deep learning we've heard about neural networks what is deep learning actually used for now what is deep
learning actually used for well we'll return back to this comment we saw before because this is actually a beautiful comment thank you very much yeshui hopefully i'm saying that right i think you can use ml for literally anything as long as you can convert it into numbers and program it to find patterns literally it could be anything any input or output again lots of emphasis on input or output from the universe and so i want you to keep this sentence in your mind because by the time you finish this course you're going to look at
the world through a different lens i mean you're going to walk down the street and you're going to wonder how could i represent this experience that i'm having in numbers and then program a machine learning algorithm or a deep learning algorithm to find patterns in those numbers seriously once you learn the concept of machine learning and deep learning you start to look at almost everything through the lens of turning something into numbers and finding patterns in those numbers now let's have a look at some common deep learning use cases that you've probably experienced in your
day-to-day life so the first one is recommendation translation speech recognition computer vision natural language processing again emphasis on some here because there's many more but these are some of the main ones so the recommendation here this is my youtube home screen we've got some programming videos we've got a video of my friend's youtube channel an interview from peter norvig one of the heroes of artificial intelligence some runescape videos because that was a game i love to play jiu-jitsu some bodybuilding oh my goodness more hacking videos so i can safely say youtube has learned me pretty
well and now this is based off again my history or the things i've watched on youtube and so if youtube has all of that information it can go you know what daniel i've seen a hundred different people or maybe in the case of youtube there's actually close to i think over a billion users on youtube maybe there's a hundred hundred thousand people out there like me and they're like you know what you've watched programming videos so we're going to show you this one and you're interested in the movie her great movie by the way we're
going to show you this soundtrack based on the patterns that we found in everyone else's youtube browsing history and then in the case of translation we might have some language here this is english the language that i speak and if i wanted to speak spanish i'm not even going to pretend that i can pronounce that um i can type in my sentence here deep learning is epic and then google will run a deep learning algorithm to figure out the patterns in this english sentence and translate that into spanish so if you're a spanish speaker out
there you can tell me or send me a message of how accurate this translation actually is in the case of speech recognition we may have some sound waves again like what i'm saying here and when i find out from siri it's like hey who's the biggest big dog of them all and so deep learning probably go through that sound wave and transcribe it into this sentence here computer vision actually this image is a bit painful for me because this is actually from some security footage from one of my neighbors you see this car here if
i had a computer vision algorithm that was running on this security camera maybe i could pick up this car or found the license plate because this car actually the trailer it had a trailer on the back the trailer came off as it was driving past my car on the street and the trailer hit my car and basically totaled the whole car so now i need to find a new car and this was the car that did it so if their security footage had a computer vision algorithm maybe we could have found it so i'm still
on the investigation for that but that's one use case computer vision can be used for so again a deep learning algorithm may take the pixels of this image find the patterns and go you know what i've seen a car that looks like this before maybe it's a toyota hilux that you're looking for anyway we'll leave that problem there before i get too upset i really loved my car and the other one is natural language processing so maybe taking in an email you got your text here so unstructured text here hey daniel this deep learning course
is incredible oh thank you so much i can't wait to use what i've learned i hope you send me a lot of these type of emails here that is my actual email address so feel free to send me whatever you want and this is not spam so this is something i want to see in my inbox but if you're sending me messages like hey daniel uh congratulations you win that is a big number i'm not gonna pretend that i can read that out loud without slowly deducing how many zeros there are that's spam so i
don't want this in my inbox i want a natural language processing algorithm a deep learning algorithm to put that in my spam box and so if we dive a bit deeper you might see these referred to as sequence to sequence or sector sec for short deep learning problems now the premise here is if sequence to sequence doesn't make much sense just think about it like this you have a sequence of words and you want to translate or transform that into another sequence that's what you're trying to doing with seek to seek problems same with here
you might have a sequence of sound waves and you're trying to convert them into another sequence of words and for this problem here it's called classification or regression now classification is this email not spam or spam is it one thing or another there's also multi-class classification but we'll get on to that in a second and then regression you might be wondering hey daniel i've heard regression is more so for predicting a number yes well in this case in our computer vision problem our numbers might be the pixel coordinates of where the corner of this box
should be so maybe we start with an image that doesn't have a box on here and then our computer vision algorithm looks over it or our object detection algorithm looks over it and goes you know what i think the car is most likely in the box here so maybe that's 50 pixels in from the x-axis and 60 pixels down from the y-axis there and so that's that corner and then it does the same for each other corner now i want to show you another phenomenal use case for deep learning that only just recently happened so
if we go here now you could also just go deep learning use cases and you're going to get a whole bunch of different ones there we go automatic speech recognition image recognition natural language processing but there's one i'm specifically looking for now this is deep mind deep stands for deep learning now this is a deep learning research company boom alpha fold a solution to a 50 year old grand challenge in biology i'm not going to go too much into this but this is possibly one of the biggest breakthroughs in ai powered by deep learning as
well as oh do i want cookies fine i'll have the cookies as long as they're delicious now do we have deep learning here here we go now new deep learning architectures we've developed have driven changes in our methods for casp14 enabling us to achieve unparalleled levels of accuracy again i'm not going to go too deep into this but this is the type of i believe they've used a deep learning algorithm this is this is actually brand new to figure out how a protein should fold now again i haven't read up on the research here but
this is just the type of things that deep learning can be used for if you remember the comment we looked at before the youtube comment i think you can use ml for literally anything as long as you can convert it into numbers and program it to find patterns so that's what deepmind have done they've taken proteins turned them into some sort of numerical representation crafted a deep learning architecture to find patterns in those proteins so this just blows my mind as i said by the time you finish this course you're going to have a new
lens on the world so we've seen what deep learning can be used for how can we write these deep learning algorithms so this is where tensorflow oh that's worthy of seeing again tensorflow comes in so in the next video let's figure out what tensorflow is we've seen that deep learning can be used for a range of different problems including how deep learning architectures are now figuring out the protein folding problem in other words finding patterns in proteins i mean proteins is in the building blocks of you and i so if proteins are the building blocks
of you and i and deep learning is finding out patterns in that how might we create such deep learning architectures and if you see deep learning architecture deep learning model is kind of another word that you can substitute in here for architecture so when you see deep learning architecture you can think deep learning model and when you think deep learning model you can think deep learning architecture so how might we build such a thing now this is kind of a segue to tensorflow the tensorflow is going to help us build deep learning architectures or deep
learning models and now of course naturally you're probably asking well what is tensorflow let's figure this out tensorflow is an end-to-end machine learning platform you can write fast deep learning code in python which is the language we're going to be using slash other accessible language such as javascript and when you write that fast deep learning code using tensorflow that code is able to run on a gpu which is a graphics processing unit tpu which is a tensor processing unit hence tensorflow we're going to see the significance of that in a moment with tensorflow you're also
able to access many pre-built deep learning models when we saw types of learning transfer learning is where you'll use a pre-built deep learning model in other words leveraging the patterns that one deep learning model has learned on another problem to your own problem we'll see how we can use tensorflow hub to do that which is a tensorflow resource now it's a whole stack this is a follow-on from this point here tensorflow allows you to pre-process data in other words turn that data into numbers model that data in other words build a neural network to find
patterns in that data and then deploy your model into your application so depending on what you're building you might want to take the deep learning model such as in the case of our security recognition system if we wanted to identify different types of cars we might want to take our computer vision model and deploy it into our security camera and use that to detect cars that have maybe crashed into other cars in my case again you can probably hear the sadness in my voice it makes me emotional when i talk about my my port little
car and it was originally designed and used in-house by google one of the largest internet companies in the world right now that basically machine learning runs everything they do machine learning and deep learning and it's now open source which means we can leverage the tools that google use to work on our own problems so that's a bit of an idea of what tensorflow is but why tensorflow well allows us easy model building now again this is from the tensorflow website we're gonna get very familiar with this website robust ml production anywhere so train your models
deploy them in the cloud on-prem which means on your server if you have one locally in the browser or on device no matter what language you use powerful experimentation for research so if you found a new idea such as deep mind building a new deep learning architecture to discover patterns in protein folding you might want to rebuild their research using a tensorflow powered deep learning model and so we're going to be doing lots of this easy model building and we're also going to do a little bit of this throughout this course but that's for later
on in the meantime let's check out the tensorflow.org website so here we go an end to end open source machine learning platform there's a lot going on here so you've got some examples here you can see you can install it you can learn it there's the api probably the exercise for this lesson is to go to tensorflow.org and have a play around just get yourself familiar with the website if you've never looked at this website before there's probably almost too much here so it can be a little overwhelming but that's all right just follow your
curiosity and see what's there let's go back here now why else might we want to use tensorflow now this is francois chile and this is a tweet that he put out recently with tools like colab we'll get familiar with collab carers keras is a part of tensorflow and tensorflow virtually anyone can solve in a day with no initial investment so that means basically for free problems that would have required an engineering team working for a quarter quarter is three months and twenty thousand dollars in 2014 which is so true so the resources we're going to
use throughout this course i mean just five years ago or just a little bit over five years ago would have cost a fairly large amount to even get up and set up and not only like the things that we're going to be doing ourselves would have required a whole team working for a couple of months that we're going to be getting up and running within in some cases a few lines of code so this is why it's really really exciting to be able to use tensorflow and these other resources here to work on deep learning
problems so now we mentioned before tensorflow allows you to run your numerical code on a gpu tpu now what exactly is a gpu slash tpu if we come here this is what if you google the photo of a gpu you'll get up this graphics processing unit these are actually nvidia cards this is an rtx 3080 and i believe this is a p100 so you don't need to worry too much about these you can install these in your own computer like these type these are usually built for servers say you have like a warehouse of different
computers this is often what you'll get when you connect to a cloud-hosted gpu but we're getting a little bit too ahead of ourselves the major thing you need to know about these graphics processing units is that they're very fast at crunching numbers so when i say we want to find patterns in numbers these type of computing chips are very fast at finding or doing numerical calculations in other words finding patterns in numbers the numbers that we've converted our data into and in the case of tensorflow it also allows you to use tpus which are tensor
processing units now this looks like a pretty advanced piece of hardware if we go here and what is a tpu again for these questions where you have what is something don't be afraid to look it up tensor processing unit whoa tensor processing unit is an ai accelerator application specific integrated circuit developed by google specifically for neural network machine learning particularly using google's own tensorflow software ah okay so this is what we're going to get familiar with and so if you think of a tpu if graphics processing units are fast at crunching numbers well a tpu
is probably even faster but we're going to see throughout this course how we can get access to these chips if you don't have i mean unless you're google you probably don't have this in your computer right now or in your bedroom or something like that you maybe not even have one of these but that's okay we're going to look at resources throughout the course that we can use to gain access to these fast computing chips for free now we've heard a lot about tensors or specifically tensorflow and naturally you might be asking what is a
tensor so let's look at that in the next video we've seen tensorflow discussed what it is and why we should use it but we haven't actually discussed half of its name i mean what exactly is a tensor great question now remember our slide for neural networks i'm going to give you a little hint well we kind of revealed this earlier so spoiler alert remember how we started with our inputs which is maybe some images maybe some natural language text maybe some sound waves and then we create a numerical encoding we pass that numerical encoding to
our neural network which learns patterns in that neural numerical encoding and then our neural network outputs some kind of representation outputs and then we convert those into something that we can understand now the secret here is is that these are tenses whoa and so the most basic definition i can think of for a tensor is some way or some numerical way to represent information now what that information is i mean that's totally up to you but just think of it as if we wanted to encode these images into some kind of numerical form we're going
to be turning them into a tensor and we're going to see this in practice as we write tensorflow code we're going to pass that tensor to our neural network our neural network is then going to figure out different patterns in these numbers and then it's going to output another tensor which is the patterns that it's learned in our original numerical encoding and then we take this tensor the representation outputs from our neural network and convert them into something that we can understand so if we take away some of the excess parts this is the founding
principle of neural networks and tensors in general i mean this is where the name tensorflow comes from if we imagine our inputs our food image classification problem we turn it into a numerical encoding in other words a tensor we pass it to our neural network our neural network finds the patterns and then outputs those patterns that it's found and again this is another tensor into something that we can understand so if you imagine this is the flow this is where the flow and tensorflow comes from it starts here by creating a numerical encoding then it
flows through our neural network then our neural network flows it on again the output here and then we again another little flow here into something that we can understand so that's what you'll have to understand for the concept of tensors if you want a little bit of an extension i'll show you one of my favorite videos on what is a tensor this can actually be your homework is this video here so this is available on youtube what is a tensor so i want you to watch this video figure out dan fletcher's i'm not quite sure
how to pronounce that name but dan also a great name watch dan's explanation of what a tensor is and then come back to the other dan me and then see if you can line these up remember a tensor is a way to represent some sort of data in numerical form so if we come back we've covered what tensorflow is we've covered what deep learning is we cover what neural networks can be used for what else are we going to cover we'll look at the next video we've covered a fair bit in this course already but
let's get specific what are we going to cover throughout this course well i have a great great tweet here from elon musk dosex machine learning so learning ml or machine learning deep learning from university and you get this small little brain here online courses such as this one you get some activation here from youtube whoa we're really starting to activate the brain here from articles look at that superpower and then from memes i mean this little figure here looks like they can control the whole universe so that's we're strictly going to be focused on learning
deep learning through memes no i'm kidding as fun as that is here's what we're going to cover now this is pretty broad but this is just like sort of a list of the topics we're going to look at tensorflow basics and fundamentals pre-processing data in other words getting it into tensors so turning it say from a picture into a tensor building and using pre-trained deep learning models actually going to be building our own a lot of them from scratch and then we're also going to be using pre-trained deep learning models fitting a model to the
data so in other words learning patterns so fitting the models that we build and even pre-trained deep learning models to the data we've pre-processed we're going to make predictions with our model so using the patterns that our deep learning models have used we're going to figure out how we can evaluate those model predictions so if we were building a food classification app and our deep learning model took a photo we took a photo of pizza and it identified it as steak and it did that a thousand times how can we evaluate our models predictions to
better understand them and figure out how we can improve them going forward we're going to see how we can save our model and load it such as if we trained a food classification model and we wanted to use it in our application we might train it somewhere and then save it somewhere else so that we can use it later on we're going to figure out how we can use a trained model to make predictions on custom data so the custom data here is part is important because a lot of the time you'll practice training deep
learning and machine learning models on data sets that have already been created for you but the real test of where a machine learning or deep learning model whether it performs well or not is on data it has truly never seen before we'll get familiar with this as we go on and how we're going to go through all of this well we have the cook and we have the chemist chemist you can imagine is very exact everything has to be millimeter precise or milliliter precise if you're doing some sort of chemical experimentation and then you have
the cook maybe the cook is your sicilian grandmother and she's making this beautiful roast chicken dish that she's made a hundred times and of course she has the rules she knows it off by heart but this time she decides you know what i'm gonna sprinkle a little more seasoning in here i'm going to try a different set of vegetables instead of the traditional set of vegetables that i use so the cook is a little less exact and cook likes to experiment and try different things and in our case that's exactly how we're going to go
through all of this we're going to experiment in fact that's going to be basically our motto for the entire course is experiment experiment experiment we'll be cooking up lots of code so if we flip this in boom we're going to be building ourselves a tensorflow workflow we've seen this before this is what i want you to get really familiar with is that getting the data ready turning it into tensors then we're going to learn how to build or pick a pre-trained model to suit our problems using tensorflow and tensorflow hub we're going to fit the
model to the data and learn to make a prediction using our trained model so prediction is where our model takes some sort of sample and then makes prediction outputs that we can turn into something like if we fed an image of a bowl of ramen what does our model think that that image is of we're going to learn how to evaluate our model's predictions then we'll figure out how to improve our models through experimentation again experiment experiment experiment and then finally we're going to look at saving and reloading our trained models so with that being
said we know what we're going to cover how should i approach this course great question let's answer that in the next video we've been through a plethora of different what questions and oftentimes you'll see in online courses and online resources there's a lot of what going on like what we're going to be doing but not so often is there how you should approach this course so we talked about a lot of different concepts deep learning neural networks tensorflow tensors how should you approach trying to learn these things well here's some guidelines write code lots of
it follow along so you're gonna see me writing a lot of code and i'm 100 going to make a lot of mistakes so follow along if you can and make the mistakes with me so our first motto is if in doubt run the code if that doesn't make any sense if you haven't written much deep learning code before don't worry we're going to be writing lots of it this motto is going to be ringing in your head as much as you're going to look in the world through the different lens of trying to figure out
how machine learning and deep learning can be used with almost any problem in your life you're also going to be hearing this in your head as well explore and experiment so our second motto is experiment experiment experiment this is a great follow-on if in doubt run the code we want to try as many different things as possible because when we're running experiments i'm going to emphasize trying a lot of different small experiments why because that helps us even if the experiment fails it helps us figure out what doesn't work and in a lot of cases
figuring out what doesn't work is often just as helpful or even more helpful than figuring out what does because how rare in terms of anything you've worked on previously how rare is it that you got everything you needed to do correct the first time especially in deep learning i mean chances are for the problem we saw for the deep mind figuring out how to write a deep learning architecture for protein folding i can only imagine how many experiments they would have done to set that up now model number three is going to be visualize visualize
visualize and what i mean by this is if you're not sure of something in the code that we're writing recreate it in a way that you can understand it and oftentimes that will be visualizing it in a different way so say we turn some sort of data into tensors what do those tenses look like can we turn data into a tensor and then turn it from that tensor back into data those are the type of things we'll want to be looking at now again i cannot stress every point on this slide is worth writing down
ask questions including the i'm going to put inverted commas here you can't see what i'm doing but i'm raising my fingers up towards this sky and i'm curling them in as if i'm saying the dumb ones because there is no such thing as a dumb question you'll get very smart if you ask lots of dumb questions so make sure if you don't understand something you can do exactly what we've done go here what is deep learning you can search that for yourself spend 10 minutes reading here you'll know a lot more than what you did
before you ask the question or ask in any of the resources that you have available so there's the discord chat and i'll put some more links in another resource of where else you can ask questions do the exercises so each of the code notebooks of concepts that we have have exercises attached to them so i want to emphasize again try them yourself write lots of code before looking at the solutions now this course doesn't cover everything of course if this is your first introduction to deep learning you're going to quickly realize how broad the field
is so if you want to learn more on something look it up i want you to become an expert at searching for things that spark your curiosity share your work if you want to learn something i find aside from doing it yourself or replicating it yourself the next best way or possibly even better is to teach someone else so if there's a concept that you've learned in this course and you want to really nail it you want to get better at it figure out how you can explain that to someone else so maybe you write
an article sharing how you've learned how to turn data into tensors write a deep learning model with tensorflow figure out patterns in those tensors and then turn those representation outputs from that neural network into some sort of human understandable output maybe you want to share that with others that's going to be a great way to really cement your knowledge avoid the following things overthinking the process so i can't stress enough this comes back to our number one motto if in doubt run the code again we're going to be learning so many different concepts you're probably
going to be overwhelmed at different points but don't worry everyone who's learned anything has gone through the trouble of basically creating new patterns in their brain to understand the new concept that they're learning so if you're overthinking the process you're going to hold yourself back and avoid the i can't learn it mentality that's you can learn it all right enough talking i love that fire let's do it again let's code we've got an overview of deep learning we've got an overview of tensorflow and tensors it's time to get hands-on this is very exciting so i'm
going to open up my web browser this is the tool we're going to be using throughout basically the entire course is google colab so if we come here to colab.research.google.com if you're unfamiliar with google colab check out the collab overview or if you just go to colab.research.google.com you can go to the examples tab here and you can open up a whole bunch of different tutorials to go through and learn about google colab if you're just starting out i'd check out overview of collaboratory features or the markdown guide but as i said if you want another
overview check out the overview video because we're going to be using colab a whole bunch throughout this course so let's get started i'm going to open up a new notebook here because we're going to i don't know if you can hear that but i'm rubbing my hands together because i'm so excited we're going to get hands-on with tensorflow so some of the most fundamental functions of tensorflow and let's give our notebook a title here so let's go zero zero tensorflow fundamentals the reason i'm doing zero zero at the start is because we're gonna by the
end of this course have probably about 10 or so of these notebooks so the zero zero at the front just lets us know what order they come in so tensorflow fundamentals and now let's put in here in this notebook we're going to cover some of the most fundamental concepts of tensors using tensorflow beautiful and we can put a little hashtag at the front and then what i did there was i did command mm and turned it into a markdown cell and then if i press shift and enter we get another code cell here beautiful and
so to enable us to write code we're going to have to connect here so just press the connect tab there and let's get a little outline of what we're actually going to do so more specifically this is what i do with most of my notebooks whenever i come to something before writing code you're going to hear me say write code as much as possible but i just like to give myself a little bit of an outline so i know the direction of where i'm heading so we're going to cover what do we have introduction to
tenses we might as well get some information from tensors if none of this makes sense don't worry we're going to code it up by hand manipulating tenses manipulating tenses so changing the information that's stored with intenses and then we're going to go tensors and numpy if you've ever used numpy tensorflow you'll find has very very similar features to numpy using at tf function which is a way to speed up your regular python functions in tensorflow because remember the whole premise of using tensorflow is so that we can use gpus with tensorflow or tpus to do
fast and numerical computing that's what we're after here and at the end we're going to have a few exercises to try for yourself alrighty let's just jump straight in i'm going to put another little heading here introduction to tenses and then i'm going to press command mm to turn that into markdown now for another cell here i'm going to press command mb oh i didn't press escape while i had this cell highlighted so escape command mb will give me a new cell now i'm saying command however if you're on windows it's probably control because i'm
on a mac it's command for me so the first thing to do is to use tensorflow is that we have to import tensorflow now i want you to try and follow along as much as you can with these videos right when i'm writing code i want you to be writing code by my side it's like we're partner coders here and if you can't keep up because i'm writing a little bit too fast that's all right i've had a lot of practice writing tensorflow code so again i've spelt tensorflow wrong maybe you catch my errors before
i do if you need to slow down the video or watch something again that's perfectly fine i'm probably going to need to slow down my code so i don't write as many typos so this is how we're going to import tensorflow tensorflow becomes the alias tf in python tf is basically universal you can put it as what you want but use tf trust me and then i'm going to use print tf dot double underscore here version to check what version of tensorflow we're using if you're using google colab you should be using tensorflow 2 point
something i'm currently using 2.3.0 by the time you watch this video there may be a newer version of tensorflow but everything that we do in this fundamentals notebook should still work perfectly fine so look at that we've already got tensorflow ready to go now let's jump in and create our first tensor creating tensors with tf constant now you're going to see there's a few different ways to create tensors but in general you actually probably won't create many tenses yourself this is because tensorflow has many built-in modules as we'll get familiar with throughout the course such
as tf.io and tf.data which are able to read in your data sources such as if you had a whole bunch of different images and automatically convert them into tensors as long as you write the code for that that is and then later on our neural network models will process these tensors for us but for now we want to get familiar with tensors themselves so let's start creating tensors and what i've written here is the word scalar which you might not be familiar with for now but that's all right we will learn what that is beautiful
so when we create a tensor with tf.constant we get this we return this it says tf.tensor it has a shape that's empty the data type is int 32 and in numpy this value is seven so again numpy and tensorflow are quite intertwined now if we want to get the information or the docs string for what this function is in google colab you can press command shift space if you're on windows it may be control shift space whilst you're in the function and here we go creates a constant tensor from a tensor-like object or if we
just wanted to search tensorflow tf dot constant this is what i want you to get really familiar with is just searching up something like this and then going into the documentation tf.constant yes creates a tensor a constant tensor from a tensor-like object all right and then we go here example use cases so oftentimes if i'm not familiar with something in tensorflow i'll just search it up like this i'll find the example code here and then i'll just rewrite that code in the notebook but let's keep going let's keep writing all right how about we check
the number of dimensions of a tensor this is another important concept that we're going to get familiar with as we go along so ndm stands for number of dimensions you might be wondering daniel what's n dim so we've got scalar and dim don't end in number of dimensions so zero hmm number of dimensions what does that mean well let's keep going we'll come back to that in a second how about we create a vector so we've got a weird word here you might not be familiar with scalar now another word you're going to come across
quite often in the deep learning world is vector so let's see what they are again this is the structure of what i want to go through is write the code first and then go through the concept or concept code concept code concept code vector let's see how these two differ we have scala which is a tf tensor has shape which is blank d type in 32 numpy seven okay now we created a vector and now we passed it a python list to tf constant which has tf tensor shape equals two comma maybe that's because there's
two two items or two objects within this list the data type is in 32 and the numpy array version is 1010 so the same as this and the d type is in 32 for the numpy array version okay all right now what about if we check the dimension of our vector so we go vector dot endem what do you think it'll be one okay so is that because this shape is empty maybe zero came from that and this shape has one element so maybe that dimension came from that well let's keep going create a matrix
so a matrix has more than one dimension again if these terms are unfamiliar for now we're going to get very familiar as we go along so matrix we want to go 10-7 and and then 710. two of my favorite numbers if you ever played poker my favorite hand is ten seven matrix let's see what this is okay tf tensor shape equals two two does that make sense two items here comma two items here okay yep that makes sense d type is in 32 so these are integers and then the numpy version is just the same
thing here with a data type of int 32 wonderful now before i write the code i want you to have a think if we write matrix.ndm if the vector had a dimension or number of dimensions as one and if a scalar had number of dimensions as zero what do you think the matrix will be three two one boom two okay so i'm starting to see the number of elements in shape is starting to relate that's where i'm drawing my own pattern between and dim all right now let's create another matrix but this time we're going
to try out how about we try and specify the data type so if we go here d type that's what we want to try and use so let's try and do that create another matrix now another matrix equals how can we do this tf constant what do we want to put in what are your favorite numbers i'm going to do my favorite numbers again 10 dot 7 oh you might have noticed something different about this one already so we have a dot after our number and if you're writing in python code what does a dot
typically mean after an integer and eight and nine that'll be nice and simple and then what did we say we're going to do we're going to use the d parameter float 16. so we're going to specify the data type with d-type parameter now the reason i'm getting you familiar with the d-type parameter is because by default if we create a tensor with tf.constant we get the data type into 32. so this is known as 32-bit precision so if we go here 32-bit precision which is a quite i'm not going to dive too deep into that
the concept is the higher the number of precision uh the more exact these numbers are stored on your computer so if we do float 16 it's a lower number than 32 so that means storing these numbers or storing this tensor on our computer's memory is going to take up less space the reason i'm getting you familiar with this is because when if you get a data type error in the future of your tensors being the wrong data type you can manipulate them using the d type parameter so let's have a look here another matrix what
does this come up as there we go boom do you have tensor the shape is three by two okay so this is rows columns so we have three rows one two three and because we specified the data type as float 16 we get d type equals float 16 which makes sense and our integers here have a dot after so they're actually floats and then the same goes for our numpy version of our tensor matrix so what do you think the number of dimensions is of another matrix so if we come back up here what was
our number of dimensions of our matrix here and our vector and our scalar we've just created another matrix if we were to go another matrix dot endem what do you think the output would be based on the shape here let's find out boom it's still two so even though the shape here is different the total number of dimensions which is the element i'm really starting to notice a pattern here the total number of dimensions is how many elements is in the shape okay i've got that down how about we why don't we see how might
we increase this number of dimensions i know how we can do it let's create a tensor so typically with this nomenclature a scalar will have no dimensions a vector will have one dimension a matrix will have two dimensions and a tensor will have we'll see what that has let's go tensor i can't reveal all my tricks before we try and code them out remember if in doubt code it out now to create these you're going to have to get pretty fancy with where you put your commas so while i'm creating this if you're finding that
it's like daniel i can't really follow along that's okay just wait till i'm finished here watch what i'm doing and then you can pause the video replicate what i've got and then we'll run it together all right and then this one has to have two now you see how tedious it is creating tenses from like with your hands i mean this is why it's so helpful for tensorflow to create tensors for us as we'll see in future modules hopefully i've got all these little square brackets and commas in the right place so if we look
at our tensor shift and enter what's it going to output oh hello we have how many more elements in the shape we have an extra one okay so there's three so one two three yes two so one two and then three again one two three ah beautiful now if we wanted to check the number of dimensions of this tensor what do you think it would be i think you might know this one tensor.nd boom three dimensions wonderful now the important thing to note is that although we've given these different names tensor matrix vector scalar is
that all of these in tensorflow throughout the entire course we're basically going to continually refer to these no matter if they're a three-dimensional tensor or if they're a two-dimensional tensor which is a matrix we're going to consistently refer to them as tensors so if you're getting confused when i say matrix or tensor chances are it's the same thing throughout this course and so how about we write down some little definitions what we've created so far nice and simple so scalar a scalar is a single number a vector is a number with direction so eg wind
speed and direction a matrix is a two-dimensional dimensional array of numbers wonderful and a tensor is an n dimensional now when you see n so the n here for n dimension stands for number whenever you see n it's often referred to as number of something so n dimensional means it could be zero dimensional or it could be up to a thousand different dimensions or more array of numbers where n can be any number a zero dimension as we said before dimensional tensor is a scalar and a one dimensional tensor is a vector beautiful so that's
what we've created so far we've created our first five have we created five five different tenses now in the next video let's start to we've created them with tf constant let's have a look at another way of how we can create tensors this time with tf dot variable i'll see you then so we've created some of our very first tensors which is so exciting using tf constant let's look at how we might do it with our tf variable so oh again a little tip here if you want to create a new cell in collab you
can press code or text but most of the time i just do the shortcut so escape and command mb boom but i'm going to put a text cell here and i'm going to put a little heading here creating tensors with then if we go back tick tf variable wonderful so let's see what type tf variable is here we could jump straight into it but we want to have some practice looking at the documentation so tensorflow tf variable now the first probably i reckon at least 100 times you read something in the documentation especially if you're
new to tensorflow it probably won't make too much sense but once you've had a fair bit of hands-on practice you'll start to get used to it so tf variable look how many different parameters we have here far out traditionally like in in practice i don't really read all of these i go down and i like to see things being used so here we go tf variable 1 dot so that should be a float yes v a sign hmm what's that well we could keep going through that but let's get hands on again if you want
more about something from tensorflow look it up in the documentation come through these code examples write them out for yourself and just check what the inputs and outputs are of the code that you're writing so let's go back tf.variable how might we create the same tensor with tf variable as above this is going to be as you might have guessed by the name a changeable tensor hmm what's that tf variable 10 7. so remember up here we created a vector of 10 10 oh i said 10 10 10 7. there we go that's what we
want 10 7. now we're doing it here and let's have a look at this one unchangeable tensor tf constant and then we want 10 7. wonderful we're going to write these out we're going to visualize them unchangeable tensor did i type out the variable names correctly yes i did wonderful okay now we get some different outputs so the first one here is our changeable tensor that we created with tf variable so tf variable wonderful shape okay 2 comma d type in 32 we've seen that before numpy okay now this is tf.tensor which is the unchangeable
tensor that we created with tf constant you see these are output in order here so this one first and then this one which is exactly what we've seen above before shape 2 d type equals in 32 numpy array is of that wonderful what do you think might happen if we try to change one of the elements in our changeable tensor it's all right if you're not sure but let's try change one of the elements in our changeable tensor so changeable tensor zero or let's try the index at first what comes up is numpy 10. so
we've got the first value here so the zero index so what if we wanted to set that we want to make a a tensor which is seven seven so we want to set that first number ten to be seven also what happens if we try to visualize it here okay we get an error resource variable object does not support item assignment okay what if we go back to this documentation so tf variable they've created one here similar to how we've created ours but they've only got one element which is a scalar then they've used v
dot assign all right and then it's changed from one dot to two dots so what if we tried that so let's come back here how about we try dot assign all right changeable tensor and the zeroth element dot assign and then we want to change it to seven just seven what do you think will come out here let's have a look oh wonderful there we go all right so how about if we tried to change a value in our tf constant tensor or our unchangeable tensor now let's try change our unchangeable tensor let's just try
to do the same change make it simple unchangeable tensor and zeroth element we'll first we'll index it as well just to check same output here numpy 10 and what if we go equals seven what happens there oh object is not a support item assignment okay so similar error to what we got up here oh that's okay we know the fix we can just go assign seven oh get that bracket back up on there and then we'll check out the unchangeable tensor oh another attribute error tensorflow python framework ops eager tensor object has no attribute a
sign so what do you think's going on here we've got a changeable tensor which is a tf variable and an unchangeable tensor where our tf variable is changeable using the assign attribute whereas our unchangeable tensor which kind of makes sense if its name doesn't change even when we use the assign hmm now i might be wondering why can we change some tenses and why can't we change others well it comes down to behind the scenes when we're writing neural network code we might want some tenses to their values to be changed whereas we might want
some tenses for their values not to be changed so there's a variable tensor and there's a constant tensor now again we're going to reiterate the fact that a lot of the times you won't have to make the decision between using a variable tensor or a constant tensor the decision will be made for you behind the scenes when tensorflow creates tensors for your neural networks but we're going to get very hands-on with that and so speaking of tensorflow creating tensors for our neural networks let's have a look in the next video which is going to be
creating random tenses exciting i'll see you then so we've seen how to make tenses with tf constant and tf variable let's see how we might create random tenses so actually i'll make that a size three heading and i just want you to point out something here that i've added to the last little section of code that we run this is going to be a trend throughout the course is that if if there's like a a little tidbit that you should be aware of i'm going to use this key emoji and add a little note there
so rarely in practice will you need to decide whether to use tf constant or tf variable to create tensors as tensorflow does this for you however if in doubt use tf constant and change it later if you need so that's a little tidbit going forward so if you ever see the key emoji with something coming after it throughout this entire course throughout any of the notebooks that you use and the github and whatnot that's just a little tidbit to take noto for later but let's get hands on creating random tenses so random tenses are tenses
of some arbitrary signs which contain random numbers you might be wondering why on earth would i ever want to create a tensor where's an example of one of our tenses like this filled with random numbers and i have a great illustration to show you so let's go to our keynote slide here this is what neural networks use to initialize their weights in other words the patterns that they're trying to learn in our data so if we imagine this diagram here we have our inputs say we had pictures of food that we're trying to classify into
ramen or spaghetti we might turn that into tenses we'll see how to do this in a future project so this is our numerical encoding of our images then we might pass that numerical encoding to the input layer of our neural network then our neural network might learn representations which are called patterns features or weights then output those representation outputs and then we convert them into something that we can understand rather than just tensors but where does a neural network get these representations from well in the beginning it's going to initialize itself initializes another common word
you'll hear in deep learning initialize just means start with initialize its weights or the patterns it knows with random weights or random tenses so this would be a random tensor to begin with and then as it sees different examples of photos of ramen and spaghetti it's going to update its representation outputs in other words start to tweak these random weights and patterns to better suit our data if we go through again we're going to repeat this cycle with more and more examples so if we imagine the float here we've got images we numerically encode them
our neural network initializes itself with random weights in other words a random tensor and then as we start to show it more examples of what photos of ramen and spaghetti look like it starts to tweak these random weights to be better adjusted to the patterns that are in our images so that our representation outputs line up better with our desired labeled outputs so maybe in the beginning it gets these images these two here wrong but then as it keeps going it starts to learn them and it starts to get them right like these two here
so let's go back that's a brief overview of how the neural network learns but this is this that's the context of where you might use a random tensor so how might we create a random tensor so let's create two random but the same tenses now before we even write any code we go how to create random tenses with tensorflow tensorflow.random.uniform wonderful so we could go through that or we could start to write the code seriously when i don't know something that is the type of search that i will put into google and look up as
much as i'm trying to teach you tensorflow itself i'm trying to teach you how to search for answers to solve your own problems because at the end of the day i can only show you so much but let's uh create random1 which is going to be dot tf.random.gen dot from seed 42 hmm what is a seed so the set seed for re produce [Music] stability so we'll see what the random seed means in a minute if you've ever used numpy it's very similar to numpy's uh random seed so random one equals random one dot normal
i want you to guess if we put in the shape parameter here so if we come here go to this first one random uniform shape we've got random normal shape 3 2. what shape will our tensor be if we were to press shift and enter ah tf tensor shape three two now we've used normal here but this is uniform what's the difference so output random values from a uniform distribution what is a uniform distribution here we go wolfram.com a uniform distribution sometimes also known as a rectangular distribution is a distribution that has constant probability whoa
so often times when i read stuff like this it takes me a long time to grasp it if you're the same i want you to realize that that's not necessarily a problem it's because in practice when you're writing these random tenses again a lot of it is done behind the scenes for you what i want you to get familiar with is writing as much code as possible like we've done here and then starting to see what the outputs of that code is the more and more you do it the more familiar you will get with
it so let's create another one maybe we want we'll come up here random2 equals tf random generator from seed 42 random two equals random two dot normal shape equals three two you might be wondering daniel you've just looked up uniform and you haven't done anything with it so what is normal well great question so what is tensorflow random normal ah tf dot random.normal outputs random values from a normal distribution okay tf random normal shape all right what is a normal distribution see how we're sort of just pulling the thread here this is what i do
with any problem that i'm working on a function that represents the distribution of many random variables as a symmetric bell-shaped graph images okay so that's the normal distribution again you can look into this more if you want but we're going to be practicing writing more random tenses so are they equal random one random two and now we'll use this equality operator to compare the two random one equals random two are these equal let's have a look what do you think are they going to be equal oh yes we do we have random one remember these
come out in order and then random two and so they've come from a normal distribution this equality operator is this last one here which is saying that the top left element is matching this top left element true the top right is matching here and here so that comes out true and then we get the same for the rest of the array beautiful so i want you to know that the random tenses here although these appear pretty random these numbers they're actually pseudo random numbers so they appear as random but they really aren't that's because we've
set the seed here setting the seed says something along the lines of um hey tensorflow create me some random numbers but flavor them with x where x is the seed so what do you think would happen if we changed the random seed to my favorite number which is seven i have a dog called seven she's beautiful what do you think will happen here ah we get some slightly different random outputs but they're still the same between random one and random two okay we've seen how to briefly create some random tenses and i mean if we
go how to create random tenses with tensorflow there's going to be a whole bunch of of more different ways but if we had our random tenses or if we had just non-random tenses what if we wanted to shuffle the order that the variables appear in here let's uh let's have a look at that in the next video in the last video we checked out how we can create some random tenses and we tied that back to the concept of when a neural network starts to learn if it wants to learn patterns in some sort of
data set it starts off with random patterns and then slowly adjusts them as it continually learns on more and more examples so if we come back here in this video let's see how we might uh shuffle the order of what should we call it elements in a tensor all right why would you want to shuffle the order of elements in a tensor let's go back to our example here so let's say we were working on a food image classification problem and we had 15 000 images of ramen and spaghetti let's keep it nice and simple
and the first 10 000 images were all of ramen and the last five thousand images in our folder like they're in order so the first ten thousand all of ramen and the last five thousand are all of spaghetti now this order could affect how our neural network learns so if it goes through these images in order it might start to adjust its random weight too much so to the images of raman because if it goes through ten thousand images of rama it's like okay well i only have to learn what ramen looks like it doesn't
know that it has to also learn what spaghetti looks like until it goes to the last 5000 images so instead it might be a good idea to just mix up all the images in our folder so that they basically have no order at all so we might have ramen ramen ramen spaghetti spaghetti ramen sweetie ramen sweetie ramen and then the neural network can be fed different examples of different images and adjust its internal patterns or weights the random tensors it got initialized with to learn both types of images at the same time let's go back
to our collab notebook and let's see how we might shuffle a tensor so we go shuffle a tensor so this is valuable for when you want to shuffle your data so the inherent order doesn't affect learning now what we might do is create another tensor we'll get some practice creating tensors here not shuffled equals tf constant and then we might just create a very similar tensor to what we created above ten seven my favorite combination of numbers my favorite poker hand and there we go and two five why not there we go so a little
test here is if we did not shuffled dot endem what do you think that's going to output well let's find out two okay why might that be not shuffled ah because it has a shape of three two so there's two elements in the shape attribute beautiful so let's see how we'll shuffle this maybe if we just go how to shuffle a tensor in tensorflow tf.random.shuffle okay beautiful so randomly shuffles a tensor along its first dimension tf random shuffle seed equals none value okay there's an example output let's try it out so the value what's value
this is how i read the documentation so the arguments so the value is a tensor to be shuffled that's what we want and the seed could be a python integer used to create a random seed for the distribution c random dot set seed for behavior okay let's get hands on with this so shuffle our non shuffled tensor so let's go tf.random.shuffle and then oh look at that the docstring appears for us if we wanted that on our own we could press command shift space so i'll come off in collab it automatically sometimes the docs string
just automatically comes up if you're just chilling in here but we can press shift command space or if you're on windows shift control space there we go we just get the exact information from the tensorflow documentation how handy is that so let's pass it what does it take args value a tensor to be shuffled let's pass it in not shuffled okay what do you think is going to happen here well let's find out together boom okay look at that order there so if we get 10 7 three four two five and now the order's changed
up here okay so there's still two five still ten seven still three four okay what it's done is if we come back up here randomly shuffles a tensor along its first dimension hmm so if we look at not shuffled it's got a dimension of three times two which is one two three elements so it's it shuffled it along its first dimension which means it shuffled it along this dimension here so the three one two three so it means that the elements in the second dimension which is 10 7 3 4 2 5 have stayed in
order but they've been shuffled around so 10 7 was at the top but now it's in the middle 3 4 was in the middle but now it's at the bottom and 2 5 was at the bottom but now it's at the top okay now what if we were to run this cell again i want you to take note actually let's copy this so copy that and i'm going to add another code cell here i'm going to run this cell here okay ten seven three four two five same as before but what if we run it
again ten seven two five three four okay different order notice how every time i run it it's a different order and i mean there's only three elements here so sometimes you're going to get the same now what if we were to put seed equals 42 because if we come here a python integer seed used to create random seed for the distribution what happens if we do that seed equals 42 seed equals 42 ah we get still different results so what if we did tf dot random dot set seed 42. okay ten seven three four two
five ten seven three four two five ten seven three four two five okay ten seven three four two five now we're getting the same order that's quite confusing so how might we figure this out if we come back here um seed we set that to 42 but we didn't get the same results so seed it says here used to create a random seed for the distribution c t f random seed for behavior okay so let's click on that here we're kind of pulling the thread of what our problem is we're trying to figure out how
to set random seeds or random operations in tensorflow so operations that rely on a random seed actually derive it from two seeds the global and operation level seeds hmm so i think this one might be the global level seed and this one might be the operation level seed because it's it's it's a part of an operation here that's my intuition now it's interaction with operation level c's is as follows okay so we've got four rules here i'm not going to read these out but what i want to do is put this here and i want
you to start exploring or start practicing so this is your homework for this lesson is to so this one can be here if you see this symbol here this means exercise so exercise we're going to go read through tensorflow documentation on random c generation and practice writing five random tenses and shuffle them that's your takeaway for your homework for this lesson it's all right if you want to skip over that and just go straight to the next video you can but to really understand the concept of tensorflow creating tensors just go through and try create
maybe five different tenses here with tensor co with tf constant then shuffle them then try to get the reproducible shuffled order using tf random set seed and a combination of seed and then figure out see just see what happens if in doubt run the code you can't break it so have a practice of that and i'll see you in the next video we'll see a few other ways to make tenses in the last video we went over the concept of shuffling the order of elements in our tensor and we tried it out a few times
but we ran into some problems using the global random seed and the operation level random seed but the main intuition behind shuffling the order of tensors is that if we were trying to build a food image classification neural network and we had 15 000 images of food 10 000 images of ramen 5 000 images of spaghetti in our neural network all it saw was the ramen images first the patterns that it learns may be too aligned with what's in a ramen image rather than a spaghetti image so that's why we might want to shuffle the
order of our images so that the patterns that our neural network learns are attuned to both kinds of images throughout the entire training cycle so let's go back here we also left off geez last video was full on we left off with an exercise to read through the tensorflow documentation again the first few times or the first probably hundred times you read through the documentation you might not fully understand it but with a lot of practice you'll start to get the concepts and so what did you find did you try to create five random tenses
and shuffle them i'll tell you what i found after reading these rules number four i think is what relates most to us it says if both the global and the operation seed are set both seeds are used in conjunction to determine the random sequence okay so if we come back here if we set tf random dot shuffle we try to shuffle out so this is operation level seed so if we run that okay we get a different order each time but then if we set the global random seed what do you think is going to
happen so we'll go here global level random seed operation level random seed do we get the same order every time three four two five ten seven three four two five ten seven three four two five beautiful we could keep going but we're just going to trust that this rule is correct so here we come here let's write down here it looks like if we want our tenses or maybe our shuffle tenses that's a better way of saying it our shuffled tenses to be in the same order we've got to use the global level random seed
as well as the operation level random seed and then we'll put in here if you want to put in a quote you can do this little arrow thing here rule four there we go so again now this might not make a lot of sense of why you want reproducible tenses but as you start to run more deep learning experiments you'll often find is that because a neural network initializes itself with random patterns you could get different results every single time you run this experiment so to make reproducible experiments you probably want to shuffle your data
in a similar order initialize with a similar random pattern and then run through this experiment but for now just be aware that if you want to set the random seed tensorflow has a few rules that you have to adhere to in order to get reproducible randomness well that's a bit of a tongue twister so let's have a look at some other ways to make tenses other ways to make tenses the first one is if you're familiar with numpy you can do numpy ones what's this numpy ones let's have a read return a new array of
given shape and type filled with ones so for many numpy operations because numpy is one of the most prevalent numerical computing libraries out there tensorflow has similar operations so tensorflow ones creates a tensor with all elements set to one okay let's try that out so tf ones what happens here let's check the docs string can we do that yeah there we go tf ones it looks like we have to pass it a shape and then a data type so why don't we make it ten seven what happens wonderful create a tensor of all ones and
then we want to go maybe create a tensor of all zeros oh how do you think we might do that if we created a tensor of all ones how might we create a tensor of all zeros i want you to try and guess this before we even bother looking it up let's go tf zeros and then what shape should we pass this under so this is in square brackets shape you can also pass in curly brackets here or the curved brackets how you'd create a tuple 3 4 put a space there wonderful okay so if
we wanted to create a tensor of all ones and all zeros we can do that using tf1s or tf0s very similar to numpy and another thing speaking of numpy we might as well cover that while we're here you can also turn numpy arrays into tensors whoa into tensorflow tensors so remember the main difference between or well actually i don't think we've covered this but actually let's put this here we want text so i'm going to push command mm we'll go here turn numpy arrays into tensors and the main difference between numpy and tensorflow ah this
should be numpy arrays is that tensors can be run on a gpu much faster for numerical computing otherwise they're very similar so let's try this out we want to import numpy as np and then we want to create numpy a this capitalization often you'll see a matrix created with a capital constant some matrix and then often you'll see a vector with a non capital so that's the a little tidbit there too so we'll put that there and then we'll go little capital for matrix or tensor non-capital for vector so we want to make a just
a simple numpy array range a d type is going to be int actually we'll make it in 32 so this is going to create you might be able to guess so numpy a range between 1 and 25 so create a numpy array between 1 and 25. let's have a look at that numpy a now how do you think you might turn this into a tensor so if we come back to our tf constant documentation it creates a constant tensor from a tensor-like object value okay what's value where's the dot string for that value a constant
value or list of output type d type hmm that doesn't really give us much but let's try if in doubt code it out a equals tf constant do you think we can just pass our numpy a this one here directly to tf constant i mean what do you think is going to happen if we do this oh wonderful there we go we've now just converted our numpy array so if we have a look here the output here is an array that's of type numpy then if we go here we've just passed it to tf constant
now it's into the form of a tensor how beautiful is that so anything we've got numpy we can convert to a tensor by just passing it to something like tf constant now what if we wanted to change the shape of this so this is a shape 24. how about if we wanted to make it into a three-dimensional tensor yeah that sounds fun so let's change it into two three four is that the right two times three times four now why do you think i checked that we'll see in a second hey look at that maybe
we'll make this b equals t f constant numpy a and then we'll go a b oh so there's a change shape one our first one a we've got the shape modifier there so this is one two so two is the first dimension and then three one two three and then four one two three four wonderful but the unmodified shape is just the same shape as our original numpy array which is just 24. so this is a vector and this would be a tensor because it's got more than one dimension now the important thing to know
about shape is that if you want to readjust the shape of a tensor or an array the new elements here must add up to give the same number of elements in the original tensor so what if we tried here what's going to happen it doesn't work eager execution of tf constants with unsupported shape value has 24 elements shape is 2 3 5 with 30 elements so if we go here hmm that doesn't work we go there wonderful we need it to equal 24 so what about 3 times 8 let's try that 3 times 8. we're
kind of getting a little bit of ahead of ourselves here with this reshape oh look at that beautiful so 3 times 8 equals 24 so that works so this is one two three rows and then one two three four five six seven eight eight elements per row beautiful okay so we've seen how we can turn numpy arrays into tensors and we've seen how to create tensors with all ones and all zeros it's probably time that we get a little bit more information from our tenses so let's make that in our next video so getting information
from tensors all right so have a play around create some numpy arrays turn them into tensorflow tensors and then try to adjust their shape so they fit into a different size and then you could even check the number of dimensions of a but otherwise give that a go and i'll see you in the next video we'll see how we can get more information from our tensors we finished up the last video checking out how numpy arrays which are a very common form of of representing numerical data can be used or actually also converted into tensors
and the main difference being between a numpy array and a tensorflow tensor is that although they may store the same information here so this array here numpy a has the numbers 1 to 24 and this tensor here b has the numbers 1 to 24 is that because this one's in a tensor format it can run on a gpu which we'll see later on in the course is a lot faster than a non-gpu chip at finding patterns in numerical data that is let's have a look at how we'll get a bit more information from our tensors
but before we do that i just want to show you a little tidbit so i took a break after this lecture and came back to my notebook and what happened was the runtime disconnected now you can't see it here but we've got a green tick saying we're connected but when you disconnect in collab you might find that oh it's still there but if the memory has gone so for example if we did a and it told us that a didn't exist what i like to use is go to runtime and then run before what this
is going to do is run every cell before the current cell meaning if we've instantiated some variables such as numpy a those variables are going to get reinstantiated and tensorflow is going to get re-imported etc we don't necessarily need to do it now because or in my case because it seems that collab has remembered my variables but if you do come into that case go runtime run before and as you'll see all the cells here are going to run now they run pretty quickly we get the same error we got before but then we can
just go shift enter run keep running down to where we were and then all of the variables we've been working with will be in reinstantiated wonderful so with that being said let's get back to our getting information from tenses there will be times where you want to get different attributes of your tenses let's have a look let's have the following vocabulary at least so we want shape these are the most important tensor terms axis or dimension and then you'll also want the size so if we come here to our keynote we'll have a look at
how to do those in code in a minute but if we go into the next slide these are some of the main tensor attributes here so we've got shape axis or dimension and then size and the meaning here the shape is the length or the number of elements of each dimension of a tensor and we can get that by going tensor dot shape the rank the number of tensor dimensions for example a scalar has rank 0 or 0 dimensions a vector has rank one or one dimension and a tensor has rank n or n dimensions
where n can be almost any any number zero and above and then we can get that by going tensor.ndim we've seen that a few times access or dimension so this is how you access a particular dimension of a tensor for example if we wanted to index on the first dimension tense is a zero index so the first dimension is actually zero one etcetera and then if we wanted to get all of the elements in the zeroth dimension and then the one axis we can use some indexing like this we'll see this in a minute and
then if we wanted the size attribute the size is the total number of items or elements in a tensor and we can access that by using tf.size so let's go back when dealing with tensors you probably want to be aware of the following attributes boom shape axis rank size i said that out of order but that doesn't really matter so the probably the most important one here will be shape but we'll see that again in practice let's create a tensor create a rank for tensor now if i say rank four tensor if we come back
here what does that mean rank the number of tensor dimensions hmm so what might that look like we want four dimensions we come in here we want rank for tensor equals t f dot zeros what does tf.0s do this is a little test from before we just covered this one two three four five so remember this is probably the shape parameter here so rank four we've got one two three four dimensions here now let's have a look at this rank four tensor beautiful now this is all zeros so if we see here this is one
so this is the first axis this is one two then the next one is three so we've got within the one and two we've got one two three now this will take some getting used to the only reason i'm able to sort of deduce which is which of these elements here is because i've had a lot of practice writing different tenses now if we look at four within the three so one two three we've got 1 2 3 4 and now we've got 5 so within the 4 1 2 3 4 we've got 1 2
3 4 five now as we'll see later on i know i keep saying later on we're getting the fundamentals down pat here is you'll probably spend most of your time dealing with lining your shapes of your tenses up when you pass a tensor into a neural network it typically has to be in a certain shape and then the output tensor that comes out of a neural network so in other words the the patterns the neural network is learned the tensor again has to be in a certain shape so it's good to be able to deduce
different elements of a tensor by its shape so if we wanted rank for tensor um let's get its zeroth element what does that look like which is this three tensor here so see how we've indexed onto this second shape here now the shape changes because we're getting the first we're getting this set of three here this set of three tensors and so the shape here is one two three one two three four one two three four five again it'll take quite a bit of practice to get used to that but that's just a brief overview
now what we might do is get all of the so the shape the rank axis or dimension and the size of our tensor so we just saw the zeroth axis let's go rank for tensor dot shape you might be able to guess what this is already rank four tensor dot n dim so the number of dimensions and then let's get the size of our tensor before i even type any of these you might be able to deduce what they actually are already so what's the shape probably going to be similar to what we set the
shape as the end dim remember that's the number of dimensions in our tensor and the size is what if we come back to our keynote the size is the total number of items in the tensor out of all of these i think size will probably be the one you use least often but i'm just putting it here for completeness so if we go there boom so we look these come in order again so the shape is two three four five yep the number of dimensions is four that makes sense because there's one two three four
beautiful and then the size is there's 120 elements now what might that be i'll give you a hint 2 times 3 times 4 times 5 what do you think this is going to equal 120 beautiful okay how about we print out some various attributes of our tensor make them a bit more readable so i want to show you get various attributes of our tensor this is what i often do to sort of create pretty i call this pretty print statements i mean there is a function called pretty print but um when i'm trying to figure
out or visualize my tensor data i typically set up a bunch of print statements like this especially when i'm inspecting the outputs of a neural network so print so i want to get the data type of every element boom that's what i want and then i want to get the number of dimensions which can be the rank and again typing this stuff out is is tedious when you first do it but it does help visualize i mean we could put this into a function if we wanted to later on print and now how about the
shape of our tensor that's going to be rank 4 tensor dot shape i'm not sure why that disappeared beautiful and then now how many elements are along the zero axis so elements along the zero axis oh axis doesn't have that and we want to go rank four tensor zero dot shape oh no maybe we want shape zero that's what we want and then how about we want elements along the last axis so rank four tensor shape now negative one we can use for the last axis so instead of we have one two three four here
we could have used three because it's the fourth index but we're going to use negative one just to grab the last index that's a little trick for python indexing and then we want to go the total number of elements in our tensor which will be tf size rank four is this going to auto complete rank for tensor that should be ready to go let's have a look what do we get okay data type of every element d type is a float so okay that makes sense because we've got zero dot yeah number of dimensions rank
four yes that makes sense one two three four shape of tensor yes we've seen that before elements along zero axis so two elements there yeah beautiful elements along the last axis five that's correct wonderful and the total number of elements in our tensor is tf tensor 120 oh numpy 120. now that output can be a little confusing so i want to show you just a trick to convert it into a numpy integer so you can just go here with a tensor output for a lot of them you can just add dot numpy on the end
and we'll watch the conversion here so this is there or actually we might do it with and without just so you can see the difference i'm just going to copy that line boom there we go so the total number of elements in our tensor that's it coming out as a tf tensor type but if we add the numpy on the end we get it in this single element which can be handy if you didn't want to have the tf tensor type you just want the element itself try using dot numpy at the end there all
right wonderful since we now know a fair few pieces of information on how to get them from our tenses well practice one very important thing is being able to index on tensors so we'll do a little bit of practice of that in the next video so i'll see you then so in the last video we checked out how we can get various attributes from any tensor that we can create such as the data type the number of dimensions the shape and then various other attributes of it which can be helpful when we're trying to figure
out just what's going on with our tensors because i mean although we can see this one often times with neural networks you'll be dealing with tensors that you can't actually visualize meaning that they're so big that you won't be able to just look at them so you need to be able to to use code to find different information or different attributes about them so now let's have a look say we did have a tensor like this and we want to index it so the beautiful thing about tensors is that we don't want that i just
want to turn that into text so we'll go here indexing tensors so tensors can be indexed just like python less so if you've ever done indexing on python lists we'll see how it relates here but if not that's okay because we're just going to get hands-on as we do that's the theme of this course so get the first two elements of each dimension in our tensor so if you have had experience with indexing python lists how do you think you might get the first two elements of each dimension of our rank 4 tensor so remember
we've got one two three four dimensions here and we want the first two elements in our case it's going to be zero for every single one of them but we want the first two elements of each dimension so give that a try if not i'm going to start writing the code to do it and we're going to have a look at it in a few seconds so this is how i would get or how you would index a tensor to get the first two elements of each dimension there we go so just like a python
list so if we make here some list equals one two three four and we want the first two elements of some list we go here use a colon we'll get 1 2. beautiful so now we can do the same but with our tensor we just have multiple dimensions we separate them by column so let's have a look at what the output of this is wonderful because our tenses all zeros we get two two two two and each value is zero zero zero zero zero zero zero zero zero zero but can you see the shape here
this is the first the outer tensor this is the second one on the zeroth dimension and then we have two here one two and then we have two within these brackets so one two and then the final two the elements within the most inner brackets so zero zero again takes a lot of practice but just look at these diagrams and start to count the tenses yourself of tenses that you can visualize as a warning if their tenses are too large it'll probably separate them with three dots but we'll get to that in a bit how
about if we wanted to get the dimension or get each dimension from each index except for the final one now does that make sense so we want get each dimension from each index except for the final one oh sorry that doesn't make sense get the first element from each dimension from each index except for the final one okay how might we do that actually let's try it with our sum list if we wanted the first element one beautiful now we want the first element from every dimension except for the final one so let's try that
rank for tensor and we want first element from each dimension except for the last one will that work we don't have the last one there let's try wonderful one one one five beautiful so that means all of our tenses kind of get flattened all the other shapes get flattened into one but we've still got five beautiful we can get the same thing if we add this colon here so this colon if it's there without a number it just means get the whole thing so there we go same output as for four one one one five
for the last dimension now if we wanted to change this up say we wanted the except for the second last one what do you think this is going to output if we go here rank 4 tensor if we want to get the second last axis and we go shape so we're getting the first element from each dimension of our tensor except for the second last one so what shape is this going to output if we run this line let's try wonderful so there we go one one four one again plenty of practice but that's how
we're going to learn things so there we go one three one one then we could do the same here with the zeroth access beautiful okay we'll convert that back to that comment there makes sense now what else might we want to do if we've got our tensor of this shape sometimes what we might want to do is expand or reshape so in this case we've seen a little bit before and how to reshape a tensor but let's practice changing or adding an extra dimension to the end of a tensor so how about we create a
rank two tensor which has how many dimensions two dimensions so we can do that by rank two tensor you can pause the video here if you want and try to create your own rank 2 tensor without me coding now we want tf constant my favorite numbers 10 7 and then how about 3 4 because 3 and 4 add to 7 and now let's have a look at our tensor rank two tensor we could even get some attributes about it rank two tensor and dim again lots of practice here figuring out information about our tenses okay
the shape two yep and dim that's exactly what we want now if we wanted to get the last item of each row of each row of our rank 2 tensor how might we do that so again let's remind ourselves what it looks like rank 2 tensor let's remind ourselves what our python list looks like some list how do we get the last element of a list some list negative one so there we go so if we wanted the last item of each of our of each row of our rank two tensor well the row comes
first so row column so we might go [Music] hmm rank two tensor so the last item negative one does that make sense let's see does that get it so seven four beautiful that's what we want if we wanted to add an extra dimension onto this shape here so maybe we have two two one we can do so in two ways um and now this is helpful for later on when we're creating neural networks and we need to alter the size of our tensors so that their shapes line up so might not seem like it's very
helpful now but i just want to sort of plant the seed so that when we come across it in future videos it's not like whoa daniel we haven't covered this method before so let's go in here add in extra dimension to our rank to tensor so we want to turn our rank 2 tensor into a rank 3 tensor but keeping the exact same information that is stored in our rank 2 tensor so we're not going to change these numbers we're just going to change the shape that the numbers fit into and we can do that
by going here dot dot dot now this is a little bit of fancy notation you might not have seen this before we can go tf dot new axis let's see what this does oh we need to visualize our tensor don't we beautiful so you see what that did there that's now added a dimension on the end of one so let's look this up tf dot new axis tf tensor if we search again i'm just pressing command f you might press control f t f dot new axis there we go so one of eight it appears
eight times so insert another dimension does it have the actual definition here notes tf new axis is none as in numpy so if you're using numpy you probably use none in tensorflow it's tf.new access it doesn't actually say what it does sometimes where you'll find the documentation it can be quite hard to sometimes find the exact definition of this method but you can kind of deduce what it does by running through the examples as we see here so let's go in there it's better off just to remember if in doubt write the code try it
out so the other alternative to using this oh by the way these three dots means on every previous access to this one so this means instead of just going um like this so see how we've got access 0 axis 1 access 2 in this case added access on the very end we can just scrap these and go every access before the last one include those because that's dot dot and add a new axis on the end so the other alternative to tf.new access is alternative to tf new axis you might also see tf.expand dimms so
that stands for expand dimensions and then we're going to pass it rank 2 tensor and then we want to expand it on the final axis so negative one means expand the final axis there we go we get the exact same output as this notation here it's just slightly different so if we want to go here tf dot expand dimms there we go we've got some documentation for this one so we have the input we come down it says what is the input the input is a tensor and the axis is integer specifying the dimension index
at which to expand the shape of input giving an input of end of d dimensions so if we come back here what do you think will happen if we go access equals zero so look at our shape there what i might do is keep that there we're going to retype this out tf expand dimms rank to tensor we want as much practice as writing tensorflow code as possible right so expand the zero axis so let's try that boom so now instead of being on the end our extra dimension is at the front and then we
can even change it we want the extra dimension in the middle wonderful and then if we want to put it on the end we can go negative one beautiful so we might put that there expand the zero axis and so notice even with we run this and we check out rank two tensor the numbers inside stay the same 10 7 3 4 accept the dimensions change so how those numbers are stored changes we've covered a fair bit in terms of our getting information from our tensors and also indexing our tensors now let's have a look
at how we might we've done we've actually done a little bit of manipulation here too but the next videos we might go through manipulating tensors in other words known as tensor operations so if we do have elements within our tenses how do we manipulate those and how do we combine them into different ways so go back through practice expanding the dimensions of tenses practice getting different attributes from tenses that you've created and then in the next video we'll see what kind of tensor operations we can work with if your data is stored in some sort
of tensor format finding patterns in those tensor formats often involves manipulating tensors now again when building models in tensorflow much of the pattern discovery within tensors is done for you however oftentimes that pattern discovery is through the extended use of a few basic operations so let's start getting into those otherwise known as tensor operations we'll start off with basic operations now do we have a tensor so you can add values to a tensor using the addition operator and by the way when i say basic operations i'm talking about the default python operator such as plus
minus multiplication and then if we want divide we can go like that right so we can add values to a tensor using the addition operator so let's create a tensor equals tf constant and we might make it 10 7 my favorite and 3 4. and now if we go here plus 10 what do you think will happen so this is our tensor here and if we just use the basic operation of tensor plus 10 wonderful we get 10 plus 10 it's 20. 7 plus 10 is 17 3 plus 10 is 13 and 4 plus 10
is 14. wonderful now you notice that the original tensor is unchanged this is important because when we're manipulating tensors we don't necessarily always want to change the underlying tensor the only way it'll change is if we go tensor equals tensor plus 10. we'll reset it now this is going to be plus 20 isn't it there we go oh come back here there we go tensor but if we got rid of this tensor we rerun this cell oh invalid syntax and then we rerun this one we still get the same values as what our original tensor
was and we might just press enter there so it's more succinct with how it comes out beautiful so of course other operations also work so multiplication also works so if we wanted to go tensar times ten same thing we get a hundred because ten times ten hundred seventy thirty forty etcetera and then subtraction if you want so tensor -10 can go into the negatives that's no problem tensors can store a whole range of different values in addition to the operators like the python operators we can also use an equivalent tensorflow function now what this means
is that if we do use the tensorflow function so that means that we're accessing the tensorflow library let me just show you because it's easier to talk about something if we can see it we can use the tensorflow built-in function 2. so tf.multiply um and we want c here we get the dot string here we get tf.mat dot multiply so we'll have a look at that in a moment but i just want to demonstrate it to you tensa 10. we get the same output here so if we go here let's look up tf.math.multiply so for
a lot of these tf.math functions so tf.math we can do the alias of just tf multiply so if we go to tf multiply it's just going to lead us to the same thing wonderful now does it say here what the benefits are it doesn't quite but what i'm going to get you to do is just trust me here when i say that if you have to do some sort of tensor operation like this again usually a lot of these will be done behind the scenes for you but if you want your code to be sped
up on a gpu typically it gets sped up when you use the tensorflow version of some sort of operator so if we wanted to do addition i mean this is going to be very fast because only a small tensor and we could do tensor plus 10 but if you want to have the advantages of tensorflow built in to your basic operations use the tf math something tensor tensor to manipulate your tenses but here again even though we've used the tensorflow function here the original tensor is still unchanged so these are the basic operators create some
tensors have some practice at running the addition minus multiplication and division on them and also do the equivalence of tf.multiply or tf.edition or something like that is it tf.edition tf math edition or is it just add yeah add okay that's your little homework for this brief video here but in the next one we're going to have a look at a very important concept in neural networks and that is matrix multiplication we'll see how to do that with tenses so we left off in the last video figuring out how we can manipulate our tenses with the
basic operations and so hopefully you've tried out a few of these for yourself but now we're going to go on to matrix multiplication so in machine learning matrix multiplication is one of the most common tensor operations so the ones we've been through already these basic operations are often referred to as element wise operations so that means that if we go here for the addition for example we've got our tensor here which is 10734 now element wise means go through one element at a time and just add ten so this element add ten this element add
ten this element add ten this element add ten but there's a few different types so matrix multiplication and the dot product are not necessarily element wise so if we go here what is matrix multiplication how to multiply matrices math is finer i really like this domain name actually let's have a look here a matrix is an array of numbers to multiply a matrix by a single number is easy so 2 times 4 equals 8 yes that's what we've done right so that's element wise so 2 times 0 equals 0. 2 times 1 equals 2 2
times negative 9 equals negative 18 beautiful however what if we wanted to multiply a matrix by another matrix and remember because we're using tensorflow even though this is a matrix we're referring to them as tensors as well so i want you to read this as matrix and tensor interchangeable but to multiply a matrix by another matrix we need to do the dot product of rows and columns what does that mean let's see an example so the dot product is the first row in the first column okay so we've got a row here times this column
equals 58 okay the dot product is where we multiply matching members then sum them up so one two three times seven nine eleven okay one two three times seven nine eleven yes equals one times seven okay yeah two times nine yeah two times nine three times 11. okay i've got that that equals 58. then we do it again for the second column and that's 64 and then again again again a great website i like to visualize this is i believe matrix multiplication dot xyz we'll go to that so here beautiful we can even customize these
so we'll go here one two seven my favorite number seven to one just to be fancy and then maybe three three three because i like three as well and then this can just stay the same so i might zoom in here so let's go multiply what happens whoa we get some so that tensor just came up there and jumped on top of that and then if we go through we go one times two two times six seven times one okay so that's two plus twelve plus seven hmm how did this get to twenty one two
plus 12 plus 7. hmm we might try a different set of numbers how did that work five maybe our matrix multiplication demo is busted multiply let's try 1 times 3 plus 12 plus 5. does that make sense so 3 12 plus 5. yeah that makes sense 15 so this is 3 plus 12 is 15 plus 5 is 20. there we go and if we step through there step through step through well we get the output there before we even start to write code go to this website start practicing around see what happens and after you've
done that for about a minute or so or maybe three minutes give yourself a few different goes let's go back to the notebook and see how we can do this exact operation in tensorflow code so i'll give yourself a second there but i'm going to start coding so you can pause the video now if you want try out that website and come back so let's see matrix multiplication in tensorflow what if we just google that matrix multiplication in tensorflow matt mull there we go tf.lin-aug which is short for linear algebra and dot matmul multiplies matrix
a by matrix b producing a times b a b we get a whole bunch of documentation which is beautiful we could read through that or we could write the code ourselves so what do we have do we have a tensor print our tensor we can print it if we want print yes we do now tf matt mull now for a lot of these operations you can usually drop the intermediary here so lynn alg this is a little trick of tensorflow you can just do tf.map mall which is what i'm about to do here tf matt
mall tensor tensor can we matrix multiply those oh yes we can wonderful we get 121 98 42 37. beautiful now if you wanted to do the matrix multiplication python has an operator a now you see how the outputs here actually are different to if we just did what if we did tensor times tensor we get different outputs here okay because if we have a look at our tensor again this is element wise so it's just gone 10 maybe we'll print out twice 10 times 10 is 100 7 times 7 is 49 3 times 3 is
9 4 times 4 is 16. whereas with matte mole we've gone in this fancy little way here so maybe i've got an idea we recreate this tensor that could be a little exercise for you as well is to recreate this tensor in tensorflow and recreate this tensor in tensorflow so the same elements in the same shape this would be a what one two three a three by three tensor and this would be a one two three by two tensor so three by two and then run tf.map mule on left tensor right tensor see if you
get the same outputs as here but we come back here if you wanted to do matrix multiplication with a python operator we can do tensor at tensor so the at symbol in python actually is for matrix multiplication now both of these examples work because our tensor variable this is an important fact this is where we come in and we check the shapes attribute because our tensor dot shape attribute is two and two so both of them have the same shape of two and two but what if we wanted to do matrix multiplication on tensors which
had differing shapes now rather than just think about it let's try so create a tensor of maybe three two similar to this one here so this is three two remember rows come first one two three columns one two let's create a similar one create a three two tensor equals tf constant and we want one two keep this one nice and simple three four five six beautiful and then we want create another three two tensor and in this case it can be y tf constant and we'll just increase this one so that can go from 7
8 9 10 11 12. wonderful let's have a look at both of those x y okay so we've got two tensors here both of the same shape three by two and they got slightly differing elements now what if we tried to matrix multiply them so let's try to matrix multiply tenses of same shape we could do it like that see what happens ah matrix size incompatible in three two in three two what if we try the tf map more what happens here x y invalid matrix size incompatible same thing three two n zero so this
is our first element here our zeroth element and this is our other element here our first element which is also three two this is where we have two rules for matrix multiplication do these websites tell us why do we do it this way actually this is a great example i'm going to put this as an external resource if we come back if you ever see this emoji in the course this is a resource so resource info and example of matrix multiplication so check that out if you ever see that little book with moji that's an
external resource that i definitely recommend now let's go back to here why doesn't this work i don't think this webpage explicitly states it but i'm going to tell you anyway you can look this up you'll find this uh wherever you find matrix multiplication is that there's a couple of rules that our tensors or matrices need to fulfill so come up here let's put in here there are two rules our tenses or matrices need to fulfill if we're going to matrix multiply them now rule one the inner dimensions must match and rule two the resulting matrix
has the shape of the inner dimensions so knowing these two rules i'm going to set you a challenge as before the next video we're going to go through these two rules and see how we can fix our problem here but considering these two tenses so we've got if i say rule one is that the inner dimensions must match what are the inner dimensions here hmm maybe two and three they don't really match so how would we get this the inner dimensions of these two matrices to match that's your challenge if you're not sure that's completely
fine but that's your challenge before the next video if you can get it done amazing work if not we're going to go through that in the next video so i'll see you there in the last video we kind of left off on an error and we said in this video that we're going to fix it so let's do that now we tried to matrix multiply two tenses of the same shape specifically x and y they're both of the shape three two we've got this little error here saying n zero is three two and n one
is three two so matrix size incompatible so if we come up here back to the rules we set ourselves the inner dimensions must match and the resulting matrix has the same shape of the inner dimensions now if you had a go at fixing this and you got through it excellent work you may want to skip over this lecture however if you didn't make it that's completely fine matrix multiplication takes a little while to get your head around if we come over here i'm going to slide matrix multiplication is also known as the dot product so
when we call the code tf.mat mole and we have a matrix of some size here and a matrix of some size here remember a lot of the operations that we do can be on matrices or tenses of an arbitrary size but just for illustration purposes i've done this size here so here's our first rule so numbers on the inside must match and there's our second rule new size is the same as outside numbers okay so if we come back does that line up with what we've said here the inner dimensions must match the resulting matrix
has the same shape of the inner dimensions oh no of the outer dimensions there we go getting our own rules incorrect so we come back let's have a look at what's going on here we've got tf map mall we've got one matrix or tensor here a b c d e f g h i which is the shape of three by three wonderful and this will run because the numbers on the inside match see these two threes here with the red background and this is of shape three two so one two three and two columns okay
so that matches and then we're gonna result in here if we did the dot product which is the same as matrix multiply in tensorflow we have a times j plus b times l plus c times n and then we we follow through with that rule for every single cell in these two or every single element in these two tenses and now the new size is three by two one row two row three row two columns which is the blue numbers here okay wonderful and then if we swap these elements from letters to actual numbers so
the same shapes here we take five times four okay this is the same as a times j yes zero times six okay zero times six yes which is the same as b times l wonderful and then we have three times eight which is the same as c times n all right now again looking at these diagrams they're quite for the first i would say 20 30 50 times you look at doc products and matrix multiplications it takes a lot of repetition to get used to but then if we have 5 times 4 equals 20 plus
0 times 6 equals 0 plus 3 times 8 equals 24. so we add these 3 out we get 44 in the top left repeat the same for each element and we get this resulting matrix here now the live demo we've already had a look at matrix multiplication.xyz this is what's happening here we reset and multiply we take that we flip it on the top then we go here we get 20 and we get the next two elements the next two and the next one so this is a moving diagram of what's going on here so
let's go back to our code let's fix it up let's make it work so knowing our two rules number one that the inner dimensions must match and that the resulting matrix is the same shape as the outer dimensions how might we get this to work well we're either going to have to change the shape of this matrix or change the shape of this matrix again i'm using the term matrix and tensor interchangeably here or we're going to just have to create an entirely new tensor with the same shapes on the inner axises so let's see
it in action let's change the shape of y so we can change the shape using tf dot reshape y and then we enter the shape that we'd like in our case it is currently 3 2 so maybe we change it to 2 3 and let's see what happens oh wonderful let's see the original y beautiful so we can see that the nine has come up to here in this one and we've now got 10 11 12. so this has got two rows and three columns okay so maybe we go try to multiply x by reshaped
y so x and we might have to put matrix multiplying here so we know what type of multiplication we're doing x t f reshape y and then shape 2 3. now you see what's happening here so if we remind ourselves what shape x is x is shape 3 2 let's pay attention only to the shapes t f dot reshape y shape equals 2 3 dot shape paying attention to our rule that the shapes of the inner dimensions so this tensor multiplied by this tensor the inner dimensions must match does that rule get fulfilled to me
it does because we've got an inner dimension of two here and an inner dimension of two here let's test it out beautiful it looks like that worked so we come here we want to make a new cell what if we tried to use tf.mapmul on this so if we go tf matmal x tf dot reshape and then we want y shape equals 2 3. this is going to work as well beautiful now we get the same outputs there now hold on now do you think if we were to reverse this it would work as well
now i mean we reshape x to be a different shape instead of reshaping y if in doubt code it out so x t f and reshape x we're going to change that to be 2 3 this time and we're gonna not change the shape of y let's see what happens ah what happened there so let's see we'll just put a little note here try change the shape of x instead of y so you see here how this one has the shape 3 3 but this one has the shape 2 2 that's because in this operation
we multiply two tensors with the shape three two and two three so the resulting tensor finishes with the dimensions of the outer dimensions whereas in this case if we go tf dot reshape x shape equals 2 3 y dot shape we also want the shape here this is our other rule of matrix multiplication because we reshaped x the resulting matrix here ends up in a shape of 2 2 because it becomes the outer dimensions so see here back to our rule numbers on the inside must match new size is the same as outside numbers okay
so in both these cases depending on which tensor we reshaped the matrix multiplication works but the resulting output is different that's a very important point when you're dealing with matrix multiplication or dot products it really depends on which tensor you manipulate in terms of what your output would be so see here how different these two outputs are again we're multiplying the same numbers but just in different shapes so that's an important concept to be aware of and now we can do the same thing with reshape as with transpose so let's have a look this is
another important tensor transformation so we can do the same with transpose however as you'll see in a second transpose is slightly different to what reshape is so transpose and if we do tf reshape x shape equals 2 3. so this is the transpose tensor they're the same shape as reshaping it however you'll notice with the transpose is that well let's get x the first one as well it starts off as one two three four five six then the transpose is one three five hm the odd ones are up the first row and then two four
six and then the reshape is one two three four five six hmm so the difference between transpose and reshape is that transpose flips the axises whereas reshape just shuffles the tensor around into the shape that you want so if we go here tensorflow transpose tf transpose here we go transpose a where a is a tensor wow that's a great description uh permutes the dimensions according to the value of perm hm the returns tensor's dimension i will correspond to the input dimension perm i if perm is not given it is set to n minus 1 to
0 where n is the rank of the input tensor hence by default this operation performs a regular matrix transpose on 2d input tensors ok so we see here 1 2 3 4 5 6 transpose now the shape is 3 2. again we could try a bunch of different examples but let's go back and write some different code here so what we might want to do is because transpose also changes the shape of our x tensor let's try try matrix multiplication with transpose rather than reshape so tf map mall tf transpose x and y what might
be the output here let's find out hmm 89 98 116 128 now see how that is different here now that is because transpose flips the axises rather than shuffles around the elements of a tensor this can be quite confusing however this kind of data manipulation is a reminder that you're going to spend a lot of your time in machine learning and neural networks reshaping your data in the form of tensors to both prepare it to be used with various operations such as feeding into the model and once you get it out of the model to
be able to deduce patterns from it and convert it into something human understandable now again the numbers that we're dealing with are just basically toy numbers to see an example of how matrix multiplication is actually used this is a great example this may seem an odd and complicated way of multiplying but it is necessary i'll give you a real life example to illustrate why we multiply matrices this way so example the local shop sells three types of pies and then we have details here and if you go through this this is actually a dot product
to work out the value of sales for any given day now this is just a small example here but in the case of neural networks they're actually going to go through an absolute multitude of these different numerical transformations to find patterns in our data the good news is most of the time is that once we've set up our neural network code once we've gotten our tensors into the right shape a lot of these are done behind the scenes for us so with that being said let's uh finish this video here and in the next video
we'll go through a little bit more we'll really cement down or we'll get some more practice with transposing and reshaping different matrices and practicing matrix multiplication so i'll see you there we've had a little bit of hands-on practice with matrix multiplication and transposing and reshaping different tenses we're going to do one more video on it to really nail it down so let's call this one the dot product and we want to turn that into a markdown cell beautiful so let's go here matrix multiplication is also referred to as the dot product and now you can
perform matrix multiplication using we want tf dot map more matrix which is short for matrix multiplication or we can do it with tf dot tensor dot we haven't seen this one yet but they essentially do the same thing just with different parameters so we come here we've also gotten hands on with matrix multiplication using matrix multiplication.xyz and we've seen this colorful example of the dot product but let's once again get a little bit more hands-on with some code so let's see how we might use tensor dot again i'm typing i'm spelling tensor wrong i'm gonna
do that throughout this whole course aren't i there we go so perform the dot product on x and y now this requires x or y to be transposed we've also seen how transposing results in different outputs than reshaping and transposing is flipping the axis whereas reshaping is just reshuffling so if we come here and if we reset this i want you to have a guess at what type of operation this is this a reshape if i go like that and flip it or is that a transpose i'll give you a hint what did we do
we had to flip the axises to get it up like that so in that case it's going to be a transpose now let's use tf tensor dot and actually before we do that let's remind ourselves of what x and y are x is a tensor of shape 3 and 2 and y is a tensor of shape 3 2. they have increasing elements from 1 through to 12. beautiful now if we go t f transpose x and y oh i've typed y as a mini case there we go tensor dot oh missing argument accesses so we
want to type in on the first axes beautiful 89 98 116 128 now if we look up tf tensor dot what is this going to tell us tensor contraction of a and b along specified axises and outer product tensor dot also known as tensor contraction sums the product of elements from a and b over the indices specified by a axises and b axises now again you can read through this if you'd really like to or you can practice coding it i would suggest doing both now in this case you might notice that we could use
transpose or reshape however we get different results now let's try matrix multiplication by transposing y and reshaping y we've done it with x let's try it with y so perform matrix multiplication there's going to be a lot of repetition when manipulating tensors because we want to get practice y transposed so tf mat mole let's take x and we'll transpose y transpose there we go now perform matrix multiplication between x and y reshaped tf dot matt mo you might be wondering why i'm just continually showing you different ways of manipulating matrices it's because you're going to
spend a lot and a lot of your time reshaping tensors and matrices into the shape that you want them so as if we can get as much hands-on experience as we can it might not mean too much to us now because we're just working with toy data it means later on down the track when we have to reshape and manipulate our tensors and matrices is that we've got all of this practice under our belt wonderful so we see we get different results from transpose and reshape now to really demonstrate the fact here or the fact
that calling reshape and transpose on y don't necessarily result in the same values is that we're going to go through the same steps we did up here before for x but let's make it a little bit more pretty so check the values of y this is how i really try to understand things and transposed why so if i am writing some code as i've said before and i'm i'm wondering why i'm getting different outputs of my tenses and the calculations that i'm making aren't really sort of making sense in my head by the way this
slash n is for new line as i've said one of the most common errors you're going to get in writing neural network code is misshaped tenses and even more like sort of just as common but deceivingly or deceiving error that is is when your tenses line up with shape but the outputs that you're getting so you get no error message it just works so it's important then to be able to investigate what's going on and figure out that's called a silent error is that when your code works like no error gets outputted from the code
itself but the results just clearly aren't correct so this is the sort of thing that i will do to investigate those silent errors tf transpose y that's what we want to do so i'll create some sort of print statement like this and then i'll use this to really get my head around what's going on so this is our normal y 7 8 9 10 11 12 beautiful now this is reshaped again just reshuffled we bring the 9 up here we move the 10 across the 11 comes up here the 12 goes on there so 7
8 9 10 11 12 beautiful now this is the transposed so all we've done here is just flip the axises let's recreate this one nice and simple so 7 8 9 10 11 12 over here we'll just let that run through this is a great way to just visualize let me reset this 7 8 9 10 11 12. okay we come back to this tab you see how we've just got the same tensor here 7 8 9 10 11 12. this is the same over here beautiful now watch this is a transpose we're going to
bring this up flip it boom that's what has happened here you see except this one is upside down in this case so it just displays it slightly differently but the shape and where the values end up in this case is what's going to happen if we were to write say some code like this so if we go tf.map mall x tf transpose y beautiful there we go so this is a sort of investigative work that we do we've had a fair bit of practice now transposing and reshaped if you're still unclear what the difference is
between transpose and reshape or you're wondering which one should i use well most of the time these operations will be done behind the scenes for you so if you write neural network code it's going to do a lot of matrix multiplication behind the scenes to figure out different patterns and numbers what sort of patterns are they well remember here's a simple example of the kind of patterns you can work out with matrix multiplication but again it'll be different depending what problem you're working on but generally whenever you're performing a matrix multiplication on two different tenses
and the shapes of the matrices or tensors don't line up you will transpose one of the tenses rather than reshaping them so i'll write that here generally when performing matrix multiplication on two tensors and one of the axises doesn't line up you will transpose rather than reshape one of the tenses to satisfy satisfy yeah satisfy the matrix multiplication rules alrighty so again if there's anything that here that doesn't quite stick out or make sense i would definitely encourage you to create a whole bunch of different tenses try to use tf matt mull on them try
to use tensor dot if you want or try to even use the at operator between them generally if we're using tensorflow we're going to be using map mole but get some errors fix the shapes with transpose and reshape and see what kind of different outputs you get so a covered matrix multiplication now let's have a look at if we've got the default data type is often in 32 so sometimes depending on what data we're working with the data type might not be in 32 so what if we had to change the data type of a
tensor that's what we'll look at in the next video i'll see you there all right so as we said in a previous video the default data type of most tensors will be end 32 depending on how they've been created however sometimes you'll want to change the data type of your tensor so let's see how we'll do that so we can create a new tensor with the default data type which is float32 yeah let's start off with float now i just said that the default is is in 32 but if we go the default data type
will actually depend on what data is inside your tensor so if there are floats inside your tensor we will have b.d type tf float 32 now if we create c equals tf constant now in here we want 7 10 and this is going to be c d type into 32 wonderful so in the current version of tensorflow that i'm using which is uh tf i believe it's 2.3 version 2.3.0 you have a slightly different version depending on when you're watching this video at this time if we create a tensor with floats inside it they're going
to be a float 32 type and if we create a tensor with integers inside it they're going to be of tf.int 32 type now let's say if we wanted to change from float 32 to float 16. now this is called reduced precision let's go 32 bit precision what does this mean so i'm reading a book about opengl es development a float in java has 32 bits of precision while a byte has eight bits of precision this might seem like an obvious point to make but there are four bytes in every float okay so that's in
java that's not we want so 32 bit precision tensor verse 16 precision tensor mix precision here we go tensorflow why did we ever doubt the tensorflow documentation for having something mixed precision is the use of both 16 and 32-bit floating point types in a model during training to make it run faster and use less memory okay by keeping certain parts of the model in 32-bit types for numeric stability the model will have a lower step time and train equally as well in terms of evaluation metrics such as accuracy using this guide this api can improve
performance by more than three times on modern gpus and sixty percent on tpus today most models use float 32d type which takes 32 bits of memory there we go however there are two lower precision d types float 16 and b float 16 each which takes 16 bits of memory instead beautiful modern accelerators can run operations faster in the 16-bit d types okay so the main takeaway from this is that in tensorflow the default is 32-bit precision right however we can also do with modern pieces of hardware we can also do um float 16 which takes
16 bits of memory instead and modern accelerators which are talking about hardware accelerators so when we run our code on hardware can run operations faster in the 16-bit d types that is what's exciting so in our case if we wanted to change our numbers from 32-bit precision in other words storing this tensor in 32 bits on memory we wanted to change it to float16 how might we do that well let's have a look we can use tf.cast and then we're going to pass it our original tensor and then its new d type is going to
be tfloat 16. oh and then we want to do b find out b wonderful d type equals float 16. the original b what we might do is type out b and then b dot d type so we'll reinstantiate b up here or maybe it's better just to go d here yeah that's a better idea daniel wonderful look at that we got float 32 and now d is the same as b but it's been stored as float 16 so again might not matter too much for tenses of only two elements but if we had a tensor
of a million elements and we reduced the floating point size from 32 to 16 we've basically halved the amount of space our tensor is taking up on memory allowing a harder accelerator to make calculations on it potentially twice as fast now again that scale might be different because there's a lot of different things that go into uh computing scale and speeds and whatnot but that's that's the concept i want you to imagine is that although these numbers here take up less space on memory they may be able to compute faster and the same thing goes
for if we wanted to change from int 32 to float 32 we can do it in a very similar manner so let's go change from int 32 to float 32 so we can go e equals tf cast which one do we make c d type equals tf float 32 now let's have a look at e beautiful so c was originally there we go d type equals in 32 and 710 we just changed it using tf cast so e is now take c and turn it into a float 32 beautiful and now we can take e
float 16 equals tf cast e d type equals tf float 16 e float 16. wonderful so there are many different data types in tensorflow so if you ever come across the the need to change your data type of a tensor because some calculations aren't working correctly tf cast is going to be your friend so with that being said let's have a look in the next video we're going to go aggregating we'll see what that means so have a play around with this one create some tenses of float type of integer type and then change them
from you can go from float32 to float16 or you could go from in 32 to float 32 see how many different combinations of uh of data type you can create and change before we see how we can aggregate our tensors welcome back now after the last video i took another little break and so what you see here is when the cell numbers are empty i just want to show you this this is rather than going straight on to the next video in practice i just want to show you how i actually work with google colab
notebooks every day is that when you leave it for a while all your information in collab is going to shut down so that google's compute resources can be allocated to other people who want to use colab so you see here i haven't got a connection but if i decide to run a single cell shift and enter it's going to allocate some memory to me basically a computer in google's warehouse full of computers somewhere around the world and once it's ready to go boom we're connected however if i try to run this cell above here tf
is not defined so i haven't imported tensorflow so what i often do is we saw this in a previous video but it'll work better now because uh see here we've got 0 1 these are the first cells i've tried to run in this notebook since taking a break let me do if i wanted to run i could do restart and run all so that would run all the cells in this notebook or i could just do run before which is what i usually do it's going to try and run all of the cells before whichever
current cell that you're on but boom we get our type error right back up the top here where we created our changeable tensor with tf variable and we deliberately caused an error so if that happens it's your errors are not going to fix themselves but then we can just go back to here and then just we should be able to run all of our cells i'm just continually pressing shift and enter should be pretty quick because we're not processing much data here but we'll go right down to where we were aggregating tensors oh we got
another error that's all right we'll keep going oh we've covered a fair bit of ground here are we at yes here we are so we're back to where we were so aggregating tenses now let's have a look what does aggregation mean define aggregation a group body or mass composed of many distinct parts or individual a galaxy is an aggregation of stars and gas the collecting of units or parts into a mass or hole hmm the condition of being so collected well that doesn't really help us with understanding tenses but we're going to see hands-on what
aggregation means in the in the concept of or in the topic of tensors so to me aggregating tensors equals condensing them from multiple values down to a smaller amount of values that's how i conceptually understand aggregation you can create your own definition and in fact i actually want you to start doing that more more and more with as you get more familiar with the world of deep learning is to of course is going to be agreed upon definitions in the field but i also want you to start conceptualizing your own definitions in a way that
you best understand things but let's see how we might start aggregating our tenses the first way we're going to do is by getting the absolute values so this is actually probably not the best form of aggregation to start with but we're going to include it here anyway just for completeness so we create a tensor what we're up to e we might recreate d here tf constant just because uh daniel begins with d and it's uh aside from the number seven d is probably my favorite sorry number seven is probably my favorite number and d is
probably my favorite letter so fun fact about me there we go get the absolute values beautiful so oh sorry we just created a new tensor and if we wanted to get the absolute values we can go t f dot abs d and we'll see what this does beautiful so the absolute values are basically just take all the negative numbers in one tensor and turn them into positive numbers so if we see here let's check out the dot string i'm pressing shift command space to get this dock string computes the absolute value of a tensor given
a tensor of integer or floating point values this operation returns a tensor of the same type where each element returns the absolute value of the corresponding element in the input beautiful that's your formal definition and there we go we've got some example use cases so let's actually start to see a few more versions of aggregation let's go through the following forms of aggregation we'll turn that into a markdown cell and we'll come down here the first one i want to go through is get the minimum and then we might want to get the maximum of
a certain tensor we also want to we might want to get get the mean of a tensor and we also might want to get the sum of a tensor now we've i've shown you a few different ways of how you can explore different methods in tensorflow so if you wanted to start with a tensor and find the minimum the maximum the mean or the sum how much you research so if we go here find the mean of a tensor in tensorflow here's a little hint if you were to search that you can pause the video
now skip and go ahead and try to create your own tensor do these four things and then come back and and we'll compare notes all right but otherwise if you're going to watch on let's start by creating a random tensor so that we can practice each of these four things here create a random tensor we might uh do it create a larger tensor yeah how about we do that with values between 0 and 100 of how long should we do it of size 50. let's do that so e equals t f constant we'll do it
using a numpy random array so again this is going to just be return random integers from low inclusive value to high exclusive value so we can just go 0 to 100 size equals 50. now this should give us a tensor beautiful 50 values between 0 and 100 and we might go e how do we check the size of our tensor let's practice and then how do we check the shape of our tensors and then how do we check how many number of dimensions our tensor has beautiful so we get size is 50 beautiful uh the
shape is also 50 and the number of dimensions is one all right now how about we start with finding the minimum so i'm going to go find the minimum to find the minimum of a tensor you can do reduce min pass it a tensor put the output there now this is going to be a little bit of a trend with uh tensorflow aggregation is that the reduce underscore in some like numpy if we did mp min i believe we can just go zero but tensorflow tends to put reduce underscore min in front of its aggregation
methods here so we'll get rid of that one so knowing that this is the way to find the minimum how do you think we might find the maximum let's find it anyway find the maximum tf reduce max wonderful so there's 97 so we yeah that makes sense because we have values between 0 and 100 so our lowest value so far is 0 and the highest value is 97 and find the mean let's go tf reduce mean of e and wonderful so 48 is about the average so get the sum of a tensor hmm let's have
a guess find the sum tf reduce underscore sum of e and there is the sum of our tensor beautiful now there's a few other things that we can do such as i'm going to put a little challenge here i want you of our e tensor before the next video i'm going to put a little emoji here exercise is to with what we've just learned find the variance and standard deviation of our e tensor using tensorflow methods so if you're not sure what variance and standard deviation are this challenge has two sides to it first you
have to find the code to find the variance of our e tensor and then you have to look up what variance and standard deviation are so go and give that a try and i'll see you in the next video we finished up the last video with a little challenge so did you figure it out did you find the variance the standard deviation of our e-tensor using tensorflow methods i hope you gave it a try but if not we're going to check it out here so find the variance of our tensor now as you might have
guessed we can do tf reduce var e oh has no method reduce var is it variance has no method reduce variance ah tensorflow variance i type tensorflow wrong ah okay we can do the reduced standard deviation have i led you on a wild goose chase yes i totally meant for all this to happen tfp so what we might need to do is what's tfp this is what i like to do is just figure out go down the rabbit hole tfp tensorflow probability okay how do we import it can we do import tensorflow probability as tfp
there we go and then we might do tfp.stats dot reduce variance of e what was that method again we come back here this is the type of research i want you to start practicing okay tfp.stats.variance there we go wonderful okay so there's a variance there i'm going to put here won't work and then if we wanted to find the standard deviation so let's put a little note here actually to find the variance of our tensor we need access to tensorflow probability and then we go here find the standard deviation see i thought that you would
be able to just do reduced variance so i've learned something there too apparently we need tensorflow probability to find the variance now i believe we can find the standard deviation here reduce std ah what's happened here tf math reduced std does it not have an alias hmm what if we do tf.math.reduce std must be either real or complex so if we come here input tensor what's wrong with our tensor reduces input tensor along the given axis unless keep dims is true the rank of the tensor is reduced by one input tensor the tensor to reduce
should have real or complex type does it have to be a float hmm this is a good little challenge i'm glad i issued this exercise we can work this out together so where's our here d type int 64. you know what i think this method might need is if we cast our e tensor as d type tf int 32 this might work int 32 input must be real or complex maybe float32 there we go okay so this is what i mean things may not be as they appear when you go through the tensorflow library so
up here we've got a few relatively easy methods compared to what we had to do here but this has been a great practice as to how you might go about researching and trying things out to make something work so if you ever get that error that we got just before let's write this again so tfmath.reduce std e for our tensor so type error if you ever get the type error message come to the documentation which is what we did here right so there's tf math reduce std and check out the types that they're using in
the example workflows so see here the reason why i picked up that we might need to change our tensor to data type float32 is because they're using floats in their input sample here so there we go returns the reduced tensor of the same d type as input tensor note for complex or complex 128 input the return tensor will be of type float32 or float64 respectively so usually a lot of the time when you're working with tensorflow float32 is generally sort of the standard type throughout the entire library so if you need to change the type
of your tensors chances are it might have to be in float32 but that was a good practice in exploration now in the next video let's see how we're going to find the positional maximum and minimum you might want to try this out now that you've had your expert problem solving skills upgraded in this video see how you might find the positional maximum and minimum of a tensor i'll see you in the next video so i was all ready for this video to jump straight into finding the positional maximum and minimum however after a little bit
more exploration of the tensorflow documentation which can be a bit of a beast to tame i found that we didn't actually have to jump through some of the hoops we did in the last video to find the variance so look at this i thought that we might just be able to go tf dot math sorry tf.reduce variance it turns out we can find the variance of our tensor in a very similar way than what we've done here finding the standard deviation which is strange because with these methods you can get rid of the math but
with the variance in standard deviation you require the math and so we've both learned something here so i want to just show you here if we want to go find the variant of our e tensor we go t f math dot reduce variance e just exactly like we've done here in the the documentation and if you're ever stuck you can just go through the documentation here like tf.something and just read up all the different methods i mean we're working with tf.math so you might want to even try a fair few of these just to see
what happens with what you're working on but anyway let's go back here we'll find the variance input must be real or complex how do we fix that we go tf.cass let's change this d type to tf float32 and oh we've got a little bracket on the end wonderful we get a very similar number up here to our variance that we found using the tensorflow probability so turns out we actually don't need access to tensorflow probability however what we just went through was a great exercise and figuring things out i mean at first you'd try something
and it might not work and then you'll you'll find a fix to it and you'll find that hey the fix probably wasn't even the best way of doing it we did a little bit more research and we found we wanted to find the variance without inputting any other libraries aside from tensorflow we can just do it like this but anyway let's get into finding the positional maximum and minimum now when might this be helpful well you're going to see this a lot when your neural network outputs prediction probabilities which we haven't seen yet but if
we go here so remember our little example where we got our images we input it into some numerical encoding goes through our neural network finds the patterns in there adjusts its random weights that it initializes with it updates its representation outputs and then here oftentimes the representation outputs are referred to as prediction probabilities so in this case 983 is the highest number 0.04 0.013 so imagine if above here if this column was say ramen and this column was say spaghetti and this column would say not ramen or spaghetti so in this case this number the
highest output this is our raman column would be dedicated to raman because the first index here the zeroth index the highest number goes to raman and then if we wanted to turn this row into labels we might go ramen spaghetti uh not ramen or spaghetti which one has the highest value the index where spaghetti occurs and then for this one which is no food so ramen spaghetti no food or spaghetti this is the highest number so we might want to get this index so that's what i mean by finding the positional maximum minimum at which
index of the tensor or of this row does the maximum value occur so in this row it occurs at the zeroth index in this row it occurs at the first index and in this row it occurs at the second index but let's have a look how we might do that in code so the first thing we're going to do is create a new tensor so create a new tensor for finding positional minimum and maximum we go here f equals tf random we'll create a random tensor with uniform this time we'll give it a shape of
we'll get it 50 long let's have a look beautiful now i wonder if we go here tf random set seed equals 42 do we get the same numbers so this one is you are you help me remember what's this first number six two 6645621 okay beautiful so if you use random seed 42 you should get the same numbers as me don't worry too much if you don't if something's changed between now and when you watch this video but we just need a tensor with 50 random numbers and we can work with this so if we
wanted to find the maximum element position of f meaning at which index does the maximum value occur so i have a i'm going to guess let's have a quick look there's 948 can we beat that yes we can there's 967 so that'll be about i'm not even maybe 43. let's try um tf so find the positional maximum now the positional maximum is often referred to as arg max so i believe it's the same in numpy mp.arg max if we wanted to find the doc string here returns the indices of the maximum values along an axis
same with arg max returns the index with the largest value across axes of a tensor so let's do it let's put in our 42 ah so we were close now if we were to index index on our largest value position so what i mean by this is if we grab our f tensor and then we find arg max f what value are we going to get boom it's that value there and then if we go find the max value of f and t f dot reduce max f do all these values line up wonderful so
this should now let's check for equality just to be safe if we go argmax tf arg max f does this equal we might put a little assert here because this will error if they don't equal no error and then if we take away this assert we should get true beautiful so this is where indexing does this make sense that they are equal because look we found the positional maximum which gives us position at index 42 which is this one that's the maximum value of this tensor and then we can index on our largest value position
so it's saying hey take the f tensor and grab position 42 or grab the value from position 42 which is this yes now if we use just our reduce max method which is saying hey find me the max value of f we get this value and then we check to see if they're equal beautiful now another one you might use i don't use this one as often is finding the the minimum so wherever the positional minimum occurs in here can we find it there's 0.03 that's pretty low so maybe it's that one or there's 0.009
so i'm guessing that might be at uh let's go 21 or 20. so find the positional minimum we want to do tf you might be at it max finds the maximum what do you think finds the minimum if you guessed arghmin 100 points for you boom so position 16 let's find out what it is find the minimum using the positional minimum index so f t f arg min f 0.009 so that was actually 16. there we go so that's index 16. beautiful alrighty then so what we might do in the next video is we can
see how we can squeeze our tensor in other words removing all single dimensions so have a go at this maybe create your own tensor and can be as big as you like and practice around using the arg max and arg min methods to see if you can find the minimum maximum positional values of your tensor and i'll see you in the next video how'd you go did you create your own tensor did you find the minimum and maximum positions i hope you did but for now we are on to the next part which is squeezing
a tensor oh we're going to give our tenses a big hug in other words removing all single dimensions you're going to fall in love with tenses by the end of this course if you haven't already now what we might do is as always we need to create a tensor to get started now it might go here g are we up to g yeah e f g h j k l elemental p i won't do that again no promises but i like to sing now if we go here what have we got so i'm creating a
similar tensor to what we did before uh random uniform but this time we're going to use the shape parameter to add a few single dimensions right to the start so we check g look at this so we have just actually let me uh use our faithful random seed as 42. beautiful we're getting the same numbers there yep six six four five six two one maybe i'll maybe i'll know that off by heart by the end of this course and so we have here the shape is what we have to pay attention to with squeezing so
this is shape one one one one fifty so see here we've got a few square brackets one two three four and then the innermost square brackets is where the big dog 50 is sitting now what we're going to do with squeezing it's actually quite a quite a fun little concept so if we go g shape we get that now if we were to go uh g squeezed that could be our tensorflow wrapper name g squeezed tf squeeze and now actually we'll check the dock string removes dimensions of size one from the shape of a tensor
now if i were to run this what do you think g squeeze new shape is going to be i'll give you three two one to guess and now let's check boom so there we go so what squeeze does is we just read that before with the dot string but we'll read it again removes dimensions of size one from the shape of a tensor so if your tensor has too many single dimensions and you want to just reduce it so get rid of all these extra dimensions here and reduce it back to its essence you can
use the squeeze method so that's a little tidbit there nice and quick video this one and the next one we're going to cover one hot encoding so i'll see you there okay so if you wanted to one hot and code actually one hot encoding tensors now what is one hot encoding let's check this out what is one hot encoding if you've done any machine learning before you might have uh come across one hot encoding but it's a very important concept in terms of uh preparing data this is a great website by the way machine learning
mastery so let's check what this says there okay so this is a great example we'll create our own in a minute but let's just have a look here so we've got three columns red green blue so row two has a 1 for red and 0 for the rest so that is 1 hot encoded for red row 3 has a 1 for green but 0 for red and blue so that is a 1 hot encoding for green and row 4 has a zero for red a zero for green and a one for blue so that's a
one hot encode for blue so that's a great way to turn if we just remember what do we have to do when we're passing our data to neural networks we have to find a way to numerically encode it one hot encoding is a form of numerical encoding so if we had the colors red green blue and we wanted to pass it to our neural network it would look at that and go you know what i'm very smart i can find out patterns and numbers but when you pass me words red green blue i can't understand
them so we could one hot encode those words red could be one for red zero for blue zero for green and so on and then pass those as a tensor to our neural network so let's have a look how we might do that in code and let's uh what we should do is create a list of indices so values pretend this is let's get our sum list some lists can be 0 1 2 3. in other words maybe we go could be red green blue or purple and now if we wanted to one hot encode
this we can go one hot encode our list of indices is tf dot one hot sum list we're missing one positional argument hmm all right let's look up the dock string what does this say tf1 hot returns a one hot tensor if on value is not provided it will default to a value of one with type d type okay where's the example here we go for example indices equals that depth equals three we need to pass a depth parameter okay so the output's going to be three pi three because this is three long and the
depth is three so let's figure this out depth if we have four elements here how many or how what i've kind of given it away here so pretend you can't see that well it's going to be four let's try it out there we go so do you see what's happened here what kind of tensor we've created we've transformed this tensor some list into one hot encoded version of itself so for the first row this relates to the first element so it's got a one as you see for where zero occurs but then zero for all
of the other values and now for this one we've got this refers to this value here it's got zero yes and at one because this is a number one here and then so on for the other two values for each rows beautiful now a fun little thing that we can do is that if we wanted to change this from i mean you're rarely going to use this in practice but uh i want to show you anyway because it's cool if you wanted to show uh get rid of rather than using ones and zeroes we can
use on value and off value and pass them different values so let's go specify custom values for one hot encoding if we want to do tf one hot sum list depth can be four and then on value can be yo i love deep learning and then the off value can be let's go what can we do i need something fun here i also like to dance boom let's see what this comes out with ho ho yo i love deep learning i also like to dance i also like to dance i also like to dance i
also like to dance yo i love deep learning i also like to dance did you did you know i like to dance so have a practice with uh tf1 hot maybe create some of your own on values off values again rarely will you ever use them when you're passing them to a neural network because a neural network what does it love it loves numbers but yeah this is just a little fun exercise to practice one hot encoding now if we look up the tf.1 hot i believe it requires so the indices yeah the indices have
to be a tensor of indices okay well that makes sense so yeah go through the example there create your own list of indices and then one hot encoder practice around see what happens if you change the depth parameter and then change the on value and the off value and i'll see you in the next video did you have some fun creating one hot encoded tenses with different on and off values i hope you did and i hope you're loving deep learning we actually haven't covered too much deep learning yet we're still in the fundamentals but
i do believe you you do like to dance so by the end of this section you're going to be able to dance through your way through tensorflow code and speaking of which let's do a little bit more dancing through our there are some math functions i just want to show you how to use a few more of these that you may come across there's a lot here as you see tf.math overview we've got a whole bunch here but let's let's practice with a few more we can never get enough practice we might go a few
common mathematical operations a squaring log and square root so let's see how we'll do that first things first let's create a new tensor so this time we might go h i believe we're up to so tf i'm going to show you a new way to create a tensor 1 to 10 nice and simple oh let's see it there we go now if we want to square it how do you think we might do it tf square h beautiful so we get just the square of all that so 1 times 1 is 1 2 times 2
is 4 3 times 3 is 9 etc etc and then if we wanted to find the square root let's find the square root we come over here maybe we go square root will this show us there we go tf math square root beautiful let's try that now i believe it may have an alias we can just skip the math part oh what do we got here invalid argument error value for attribute t of int 32 is not on the list of allowed values okay this is good so again we're coming across a value error where
it's saying that in 32 is that the type of our tensor yes it is in 32 is not in a list of allowed values for this function so we want b flow to half float double complex 64 complex 128 okay let's change it into a float tf cast and the d type is going to be tf float 32 oh forgot again boom look at that there we go actually let me put this one above here so that you'll know for future reference is that this is going to be will error method requires non-int type so
if we put here just h we get an error wonderful how about we find the log so we go here find the log now again tensorflow oops we want to go up here and search for log tf math log wonderful computes the natural logarithm of x element wise has it got an alias pretty sure we can skip the math part but if not we can try it out so find the log tf log of h has no attribute log oh do we need math for this one we do oh again we get the same error
as above so what do we have to do here value for t of into 32 is not in the list of allowed values what do we have to do we'll give you a hint we have to turn it into a float or at least one of the allowed values we've got some rules here so we go here tfcast h d type equals tf dot float 32 will this work wonderful there we go so that's just an example again of how many different like functions you're going to find in tensorflow and a little bit of practice
again as i can't stress this enough is how much practice we're going to need sort of just making sure our tensors are in the right data type so have a play around with some of the functions in tf.math and see how many of them pick three and try them out the ones that we haven't covered so far and see if you come up against any errors like this what you can do to fix them so that's a little challenge before the next video we've covered a whole bunch so far we've even seen before how tensors
can interact with numpy arrays but let's have a dedicated session on tensors and numpy so if you've done any numerical computing before or any type of data science or machine learning you've probably used numpy so if we come here what is numpy not going to dig too much into this because we want to get focused on using it but chances are there we go the fundamental package for scientific computing with python so numpy is a fundamental library for any type of numerical computing with python and it's all built upon the numpy array so um tensorflow
interacts beautifully with numpy arrays and we can see an example of that by if we create a tensor this is what we've seen before directly from a numpy array so tensorflow is built upon the tensor numpy is built upon the array and the beautiful thing is is that they basically have full interoperability i believe that's a word that's like a fancy word for uh they work together quite nicely so there we go we create an array here and then we pass it to tf.constant to turn it into a tf dot tensor beautiful and now we
can convert it back from a tf tensor to a numpy array so convert our tensor back to our numpy array by just going np array j and then we can check the type here by going type and np array j there we go so np array j turns it into a numpy array and then when we check the type it turns into a numpy.nd array which is numpy's base type just like tensorflow's base type stf.tensor all right and we can also do the same thing as a cell above by convert tensorj to a numpy array
using if we go j dot numpy we've also seen the numpy method in tensor before as well dot numpy and then if we check the type j dot numpy we go here beautiful so we can convert it back to a numpy array there now where this might be helpful is sometimes if you have say if we have let's rewrite j to be tf constant it's just a single number there if we wanted to have access it just as uh the float on its own we can actually do numpy and there we go that'll give us
access to it there and if we want to go there so the beautiful thing about being able to really easily convert our tensors uh to numpy arrays and numpy arrays into tensors is that if there's some sort of functionality that we want to use with that doesn't work with our data and tensor types we can convert it to a numpy array and use it there and vice versa so if we delete that let's see hmm there's one more thing you should know and that's uh default types of each uh slightly different so if we go
we want to create numpy j equals tf constant we'll create this directly from a numpy array just if we as we've done before it's going to go np array same as above 3 7 10 wonderful and then if we want to go uh tensorj equals just tf constant we'll create this one directly from a python list three 7 10. now if we want to go check the data types of each now we've had some practice doing this so you might be able to know if we wanted to check the data types of numpy j and
tensorj what might we use if you guessed we use the d type attribute you'd be 100 correct tensor j and if you didn't guess correct well that's all right we get to practice boom so there we go so if we create a tensor from a numpy array the default type is float64 whereas if we create a tensor from a python list or directly through tensorflow the default type is going to be float 32. now the reason why this is important is because remember we actually had some errors up here where we had to change the
type of our tensors from a certain data type to another data type so this is where i want you to be aware is that if you do convert your numpy arrays into tensors they may have a different data type to compared to if you created your tensors directly from tensorflow so just be aware of that one of the main issues you'll run into when computing with different tensors is different data type issues so treat this as your heads up but that's a quick reiteration of tensors and numpy remember numpy is a fundamental package for scientific
computing with python and chances are when you're doing machine learning and deep learning you're going to run into numpy somewhere so keep in mind tensorflow works beautifully with numpy arrays and vice versa welcome to neural network regression with tensorflow now we've seen some of the basics of tensorflow in the previous section now we're going to get hands-on building some neural networks with tensorflow specifically neural networks for regression now before we get into things i'm going to start this lesson off with a slide called where can you get help now this is very important because learning
a new concept is challenging of course so that being said if you do get stuck here are the steps i want you to take first of all follow along with the code i'm going to be writing it with you remember our motto if in doubt run the code if i'm going too fast that's alright slow the video down then try it for yourself if i'm going too slow you can speed the video up and beat me if we're writing tensorflow code don't forget in google colab you can press shift command space to read the docs
string that will get you a little bit of information about any of the functions that we're running and then if you're still stuck there if the docs string doesn't have that great of an explanation remember most doc strings in tensorflow have examples of different code so try it for yourself but then still stuck search for it you're going to become very familiar with these two resources here stack overflow and of course the tensorflow documentation which is forever improving forever getting better so it's very vast but with a lot of experience hands-on practicing checking it out
you'll start to get much more familiar with it then once you've learned some things from searching for it don't forget to try the code again remember back to the motto if in doubt run the code you can't break anything and finally if you're still stuck ask a question i'm going to emphasize here including the i'm putting inverted commas you can't see my face in this video but i'm i'm doing the inverted commas symbol with my fingers the dumb questions don't forget the discord chat ask a question if you're stuck very important skill to have is
asking the right question now with that being said if we're going to do neural network regression with tensorflow you might be thinking what is a regression problem so let's have a look at a couple of examples of what regression problems are here we go some example regression problems say we're trying to predict the house or the sale price of a house we're interested in so how much will this house sell for if we've got a house down the street and we want to try and predict how much it's going to sell for we might ask
ourselves how much will this house sell for we just said that didn't we another regression type problem is how many people will buy this app or how much will my health insurance be or how much should i save each week for fuel now you'll notice a trend here with these questions it's how much or how many that's one of the key points of a regression problem it's predicting a number of some sort and if you're thinking about in terms of price or numbers in these types of questions well there are other types of problems where
a regression well we can turn them into a regression problem such as trying to predict the coordinates of where the boxes should be in an object detection problem so these are numbers here so we've got 1390 so that would be on the x-axis this corner the top left corner of this particular box here should be at 13 pixels in but 90 pixels down and so on and so on for the top right corner the bottom right corner and the bottom left corner and then again we could do the same with these corners here the top
left remember this is the perpetrator who hit and run on my car so if i was going to build an object detection model to look at security footage around my home to see if i can find the people who hit and run on my car i might train an object detection model specifically a neural network regression to try and predict the corners of where the bounding box should be around my target object so this is what i want you to start thinking about although we have a kind of a one-liner definition for regression problems predicting
a number is that a lot of the time in machine learning and deep learning how you sort of think about the problem will definitely define how you approach that problem and again if you're not satisfied with the definitions here this is what we can do so what is a regression problem regression analysis here we go in statistical modeling regression analysis is a set of statistical processes for estimating the relationship between a dependent variable often called the outcome variable or one or more independent variables often called predictors covariates or features okay so we want to predict
the relationship so say this line that's the relationship there so the dependent variable in our example problem of trying to predict the house price our dependent variable might be the price of the house so the house price is the outcome we're trying to predict and the independent variables often called predictors covariates or features might be the number of rooms in the house or the number of bathrooms or number of garage spaces or all three of those combined so the independent variables so we might have 10 different houses with 10 different numbers of bedrooms 10 different
numbers of bathrooms and 10 different numbers of garages and 10 different house prices and we might want to build a model to take in all that information and then predict what the house price might be so with that being said we're going to get into more examples of this later on as we write code but let's have a look at what we're going to cover broadly of course we're going to be writing lots of code to do these things we're going to cover the architecture of a neural network regression model so specifically the building blocks
of a neural network regression model the input shapes and output shapes of a regression model in other words the features and labels or in wikipedia terms the dependent variables and the independent variables we're going to look at how we can create custom data to view and fit we're going to take care of the steps in modeling such as creating a model compiling a model fitting a model and evaluating a model if all of these don't make sense right now don't worry we're going to get very hands-on with them and they'll start to make sense as
we code them we're going to look at different evaluation methods for regression models and finally we're going to look at how we can save and load our models so that if we do train a machine learning model and we save it we don't have to go through that process of retraining and if we want to load it into our applications going forward we can do that using our save and load methods and finally how are we going to do this a big emphasis between the chef and the chemist chemist is very exact whereas what's a
cook do a cook tries things out to cook experiments pours in little bits of different flavoring we're going to be cooking up lots of code so with that being said here's what we're going to cover in the next video let's have a look at what the inputs and outputs of a regression problem or neural network might look like so if we're dealing with regression problems well we're trying to build neural networks to solve regression problems what might be our inputs and outputs remember in machine learning and deep learning a lot of the time your focus
will be on defining your inputs of your algorithm and what the outputs of your algorithm look like so let's say we wanted to predict the sale price of this house now this is an actual house that was up the road from me not far from where i live that was up for sale and say i wanted to build a machine learning model to try and predict what i should offer if i was going to auction what this house should sell for so again here's our little diagram of what we're going to be focused on we
have the inputs so what might our inputs be if we're trying to predict what our house might sell for and what might the outputs be so if we're just looking at this diagram and if we want to build a machine learning algorithm we might have to build this ourselves but first let's focus on the inputs and outputs so maybe we've got our house and we know a few things about it the number of bedrooms the number of bathrooms and the number of garages that it has and we might have a whole bunch of other houses
that are nearby and we know what their sale prices are what might we do with that so we know these okay if we come back to our definition of a regression problem here we're trying to figure out the relationships between a dependent variable often called the outcome variable and one or more independent variables often called predictors covariates or features okay so in our case our independent variables might be the number of bedrooms four the number of bathrooms two and the number of garages two as well so for our machine learning model or deep learning algorithm
we're going to have to encode these in some numerical way because we can't just pass four bedrooms two bathrooms two card spots so we'll put them into a numerical encoding and as you might see here this is a one hot encoded we've seen this before in the tensorflow basics section and so if we had this array here or this tensor this vector could be number of bedrooms so it's zero zero zero one now what that would mean is that does it have one bedroom no does it have two bedroom no does it have three bedrooms
no does it have four yes so we put a one there and in this case does it have one bathroom no does it have two bathrooms yes does it have one garage yes so zero in case of one hot encoding codes for not that thing and one encodes for that thing so that might be our numerical encoding wonderful we put these into here and that's our inputs to our machine learning model and these are often called input features or in wikipedia terms predictors covariates or features so features is probably the most dominant term that i
use in practice and we're going to use throughout this course if you ever hear input features it's some kind of information about the data we're using that goes into our machine learning algorithm and so in our case if we want to put our inputs into a machine learning algorithm so often one and you'll find this in deep learning a lot is that often an algorithm for your problem already exists so that means someone has built something that has worked before for their problem might be very similar to your problem and you can utilize that for
whatever you're working on however if it doesn't already exist we can see how we can build our own that's what we're going to be practicing we'll see how we can build our own but later on in the course we're going to see how we can use algorithms that already exist and finally if we have a look at what our outputs might be we might if we're trying to predict the sale price of this house here based on the fact that it has four bedrooms two bathrooms two car spots and it's on a pretty big piece
of land if we wanted to make an offer we're using our algorithm to figure out how much we should offer at auction our algorithm is telling us we should offer 939 700 now where does it get this predicted output from ah if we have a look at what it actually would sell for 940 000. so we weren't too far off here we'll figure out some evaluation metrics to compare predicted outputs of our deep learning models to the actual outputs of what they should be for regression problems that is now this predicted output comes from looking
at lots of these actual outputs so remember how we talked about supervised learning well this is an example of supervised learning so we might have a hundred or a thousand or tens of thousands of different homes with all of their input features and their actual sale price and then we might feed these inputs and outputs to our machine learning algorithm and it's going to learn the relationships remember we come back to our definition of a regression problem regression analysis is a set of statistical processes for estimating the relationships between a dependent variable right so our
sale prices and one or more independent variables so our input features here so after our machine learning algorithm has looked at many many examples comes from looking at lots of these of input features and outputs of house sale prices it's going to learn the relationships between the input features and the outputs here and then for use cases where we would like to predict the potential output of a home that we don't know the actual sale price for we can feed its input features into our algorithm and have a suggested output of what the sale price
or what we should pay if we were going to auction so this is just an example of what regression inputs and outputs look like we're going to have some practice writing code to do this very shortly so another big point is once we've defined our inputs and outputs and i can't emphasize this enough is this is what we'll be focused on a lot of the time in machine learning and deep learning is the blue parts here defining our inputs defining our outputs and now another key term is when we're talking about inputs and outputs is
input and output shapes so say here we were taking our input features numerically encoding them feeding them to our machine learning model as inputs our machine learning algorithm works out the patterns or utilizes its already learned patterns to produce an output aka the sale price of our home what might be the shape of our inputs because remember our numerical encoding is going to be in the form of a tensor and this output here doesn't look like it but it will also be in the form of a tensor so here we go here we've got bedroom
bathroom garage represented as a tensor and in this case the shape is going to be three because we've got three input vectors here now don't be confused by this outer bracket here because we've got three input features the shape of our input layer to our machine learning model is going to be three one for bedroom one for bathroom and one for garage we'll represent this as a tensor and for our outputs in our regression problem in this case can you guess the shape is going to be one so in our case for each example that
we're going to pass through our machine learning algorithm each sample will have an input shape of three so we need number of bedrooms number of bathrooms number of garages and each sample will have an output shape of one for the price of what we're trying to predict and now this is just for our housing price prediction example we can adjust these so here the bedroom bathroom garage to be almost as many different inputs as we want but usually for a regression problem the output shape is often one because we're trying to predict some sort of
number so again if we come back to wikipedia terms we're trying to figure out the relationships between a dependent variable often called the outcome variable and one or more independent variables so in our case these are our independent variables and this is our outcome variable or our dependent variable now with that being said we've covered inputs and outputs of our regression problem let's now have a look at what that might look like in terms of building or writing neural network code we'll have a look at it in the concept of an architecture of a neural
network regression model recall in a previous section that we covered the anatomy of neural networks so essentially every neural network that you build will have an input layer some number of hidden layers and an output layer and for your input layer this is where your data goes in now hidden layers is plural on purpose or optional plural it can have one up to a hundred some neural networks even have a thousand hidden layers it really depends on what problem you're working on and now that's where the deep and deep learning comes from so if you
imagine if we had 10 of these hidden layers laid out here or the more we had the deeper our deep neural network would be and then for the output layer this is where the outputs or the learned representation or prediction probabilities of your neural network come and within the hidden layers is often where your neural network is learning patterns or weights in the data and so keeping this in mind if we have an input layer a hidden layer and an output layer what might the architecture of a neural network regression algorithm look like if we
wanted to build one with tensorflow so this is what the typical architecture of a regression model in tensorflow will look like so in our case we have a few hyper parameters here remember a hyper parameter is a setting that you as a data analyst or a machine learning engineer can change so we might have the input layer shape the hidden layers the neurons per hinder layer output layer shape hidden activation we haven't covered that yet the output activation the loss function and the optimizer now we haven't covered the bottom half of this little graph here
but that's okay we're going to get very familiar with that as we write code and now i just want to emphasize that this has been i've adapted this from page 293 of the hands-on machine learning with scikit-learn carers in tensorflow booked by aurelion garonne which is a phenomenal book this is probably going to be my number one external resource to go along with this course is this book i've read it end to end and it's an entirety and i'd highly recommend it if you're looking at learning more with tensorflow but that's beside the point we're
getting hands-on here now in terms of the typical value for each of these the input layer shape we've covered this a little bit before so if we were trying to predict the sale prices of different houses we might have an input layer shape as the same as the number of input features that we have eg would be three for number of bedrooms bathrooms and car spaces in housing price prediction hidden layers the value you have here is very problem specific the minimum is one the maximum is unlimited again the neurons per hidden layer is going
to be problem specific again so if we come back to our anatomy of neural networks so this has three hidden neurons so not only can you customize how many of these layers that you have you can customize how many of these little circles here also referred to as neurons so you could have a hundred neurons here times a hundred layers and that would be a very deep model now the output layer shape this is the same for the desired prediction shape for example one for our housing price prediction problem hidden activation is usually relu which
is rectified linear unit output activation this is going to be problem specific but these are some typical values here the loss function for a regression model is the default one is usually mean squared error or mean absolute error slash huber loss so which is a combination of mean absolute mean squared error if you have outliers in your data again we haven't covered these four here we're going to see what they mean shortly and then the optimizer which is a way of how our neural network improves its predictions is usually going to be stochastic oh that's
a fun word to say stochastic gradient descent stochastic is a fancy word for random or the atom optimizer which is a very good default value so let's have a look here what this looks like in tensorflow code we haven't written any of this so don't worry if it feels foreign to you we're going to write a lot of this going forward but i just want to relate the architecture of a regression model see what we're going to start working towards by the end of this module that we're covering this section you're going to be able
to write these yourself so let's have a look this is the input layer so the input layer is in the blue so we see the shape is defined here as three because we're working with our housing price prediction the hidden layers is in this other little blue shade here so these are known as hidden layers so we've got an input here and then an output here remember all neural networks have some sort of input some sort of output then the hidden layers in our case we have one two three and if we look for the
neurons per hidden layer is often this first number so we have it in green in our case we've got a hundred neurons in the first hidden layer we've got a hundred neurons in the second hidden layer and again the same for the third hidden layer now our output layer shape we can see here in the yellow this is the same shape as our desired prediction shape in our case this would be a greater output shape for our housing price prediction then we have the hidden activation parameter in our case it's the relu which is a
rectified linear unit so that we can set that using the activation parameter and then for our output activation which is just simply the output activation function for our output layer we've decided to set this one as none and then for our loss function if we come down here we've got step two which is compiling a model so this is step one is we created a model step two is we have to compile it in tensorflow again we're gonna be writing a lot of this code going forward so we define our loss function this measures how
wrong our neural network's predictions are so when it's learning the relationships between our independent variables and our dependent variables in other words the the number of bedrooms bathrooms and car spaces and the sale prices of our homes this loss function is going to measure how wrong our neural network's relationships are and the optimizer here in our case we've set it to which is in the black square here we've set it to the atom optimizer is going to inform our neural network how it should improve our patterns to reduce the loss function so again if we
look at this in the context of our housing price prediction problem we see we have the input features here which is going to go into our input shape they'll pass through the hidden layers which will learn the patterns between the input features and the output variable and then it's going to come out into our output layer which is going to output something like this and then the loss function is going to tell us or tell our neural network how wrong the relationships between these and this is and then the optimizer will tell our neural network
how to improve the patterns it's learning between the the input variables and the output variable and then here the fit model is telling our model to look at a whole bunch of different examples in the training data for 100 laps of the data so that's what the epochs variable stands for again a lot of things we haven't covered here but as i said we're going to be writing a lot of code and actually i think it's about time we did that so we've looked at some code here we haven't written code so we've gone against
our rule if in doubt write the code let's see how we'll create some of this and yeah we'll create some data for regression problems we'll write some code and we'll get hands-on with neural network regression models we've covered some of the fundamental principles of regression modeling with tensorflow specifically what is a regression problem but now let's see how we might write some code to actually do that or to actually work on our own regression problems i'm going to start a new web browser here i'm going to come to colab.research.google.com i'm going to create a new
notebook now we'll wait for this to load up just going to zoom in here there we go now i'm going to call mine o1 neural network regression with tensorflow and i'm going to add the little video tag here this is the notebook that i'm writing during the videos if you want the ground truth notebook so the notebook that i'm getting the information for this video off remember you can go to the course github which is github.com tensorflow deep learning this will be linked throughout the course at the moment while i'm recording this it's still a
work in progress but if you come here to o1 neural network regression in tensorflow you'll have all of the information we're working on in a very succinct manner so there's a lot more annotations in this one in the notebook that we work on during the course this one here we're going to be focused on writing code so if you want all the commentary around it check out the course github specifically the notebook without the video tag at the end so i'll just close that and here we go so let's write our title in markdown so
introduction to regression with neural networks in tensorflow beautiful i'm going to turn this into markdown by pressing command mm on my mac might be control mm if you're using windows machine so we're going to write here there are many definitions for a regression problem but in our case we're going to simplify it predicting numerical variable based on some other combination of variables even shorter predicting a number that's what we're going to do we're going to be writing neural network code did i spell anything wrong here probably definitions is wrong predicting a number based on some
other numbers nice and simple actually that's probably a definition for neural networks in general but this is what we're going to start with and we had a lecture before that was what we're going to cover so refer to that one but let's just get started we need to get hands-on by first we're going to need some data so how about we do that what should we start with import tensorflow so we need to import tensorflow as tf and then we're going to check our tensorflow version so for this course we need two point something so
two plus and you'll notice here that we're not connected to a collab instance just yet but the beautiful thing is as soon as we run a cell it's going to automatically connect to a colab instance so we'll import tensorflow here wonderful 2.3.0 so this is the version that i'm on if you're watching this at a later date you might be on a later version but as long as you're two point something you should be right for this series of videos let's start off by creating some data to view and fit so we had a look
at what is a regression problem before regression analysis and it looks something like this so we have the blue dots could be our data points and our regression model is this red line through the middle so that's the relationship that we're trying to learn right between a dependent variable and one or more independent variables so let's create some data that looks like this how might we do that let's try numpy as mp and we'll also import matplotlib adopt hypot as plt so we need to create the features which we'll call x again the microchip for
features is generally called x in the form of a capital so this is np array we're going to just create this manually maybe we start from my favorite number 7 or specifically negative seven negative four negative one you might be able to work out the pattern of these numbers that i'm creating as well that's what i want you to start thinking about is that whenever you're viewing data before you like write a machine learning model or some sort of deep learning model to figure out patterns see if you can understand them yourself what's your gut
feeling so np array will create our labels as well which is typically defined by y in a lower case so we'll go here 3.0 6.0 9.0 now if you were trying to figure out the relationship between x and y what might you do there we go we've got x and y now if we want to visualize it remember one of our other motors is visualize visualize visualize we're going to see that a lot throughout the course boom plot does scatter okay we've got a very simple line here so you could think of x as our
independent variables and y as our dependent variable if we have a look at our regression analysis little diagram ours looks something like that but just this has got many more samples we're starting nice and simple so there we go now what might we want to do that so let's try work out the pattern between x and y just as it is so x is negative seven where y is three and then x is negative four where y is six and negative one nine two and twelve are you sensing a relationship here how might we manipulate
x to get y well i'll tell you the rule i just figured out y equals x plus 10. does this work x plus 10. do we get y beautiful so if we want to go y equals x plus 10 true true true beautiful so the relationship we would try to get for our neural network to learn is this here this is the relationship or the function between our x and y in other words our input features and our labels in wikipedia terms our independent variable and our dependent variable so let's have a look at what
our input and output shapes so if we were to build a model between x and y what might be the input and output shapes of our model if we come back to our keynote remember we defined our input shapes for our housing prediction problem or housing pricing prediction problem if we were to take in these independent variables number of bedrooms number of bathrooms number of garages so bedroom bathroom garage the shape would be three and then the output shape would be one and remember these will vary depending on the problem you're working on so in
our case our problem if we just want to predict this simple line here it varies from the problem we're looking at here so if we had three input variables for one output variable here what might be the shapes of our input and output variables for this problem let's create a demo tensor for our housing price prediction problem this is a very important point for all the neural networks that you build so i want you to pay attention to this one so the house info equals let's create a tensor remember our input shapes to neural networks
often numerical tenses but this one will just we'll just treat it as a string tensor for now garage and now the house price is going to be tf constant 930 9700 or something like that and now let's check out house info and house price ah okay there's our shape beautiful so what we've done is we've just converted here this demo input and output shape into actual tensors wonderful so the input shape is going to be three and the output shape is going to be one because we're using the house info to predict the house price
now in our case we want to use x to predict y so hmm let's have a think what might be the input shape so if we go input shape equals maybe it's x dot shape and the output shape equals y dot shape let's check the input shape and the output shape eight and eight does that make sense because really we want to just know what the y is based on this sample of x and really we just want to know what the y is here based on this sample of x so what if it was
just one sample so if we did x zero we want to use x zero to predict y zero so negative seven to predict three and we also want to use x one to predict y1 just like in our housing price demo we want to use one house to predict the price of one house or the input features of one house to predict the price of one house same for this sample problem we want to use one input feature of x to predict one y value same again here so what if we check the shape of
this huh from this it seems that our inputs and outputs have no shape how could that be i mean it's because no matter what kind of data we pass to our model it's going to take an input and return as output some kind of tensor but in our case because of our data set it's only two small lists of numbers here we're looking at a special kind of tensor do you remember when we went over the different types of tensors that you might come into we're looking at scalars here specifically a rank zero well scalars
is specifically a rank zero tensor so if we go x zero dot n dim it has zero dimensions so that's why it has no shape so let's take a look at the samples individually so x0 y0 wonderful so in our case we're trying to predict or build a model that's going to take as input negative 7 and produce as output 3.0 so this is where you're going to run into a whole bunch of input and output shape trouble is when you get examples like this that don't really make sense but how i want you to
think about this is we're going to use one x value to predict one y value so keep that in mind and in the next video we'll see how we might create a model to do so in the previous video we created some sample data so some sample input features and some sample labels what we're trying to do is model the relationship between x and y in other words the relationship between our features and labels and we checked out our input and output shapes which are kind of confusing because when we checked out the shape of
our in our case numpy arrays actually let's turn these into tenses how might we do that turn our numpy arrays into tensors so remember from our tensorflow fundamentals section we can turn numpy array or numpy arrays into tensors by just passing them to tensorflow.constant or tf.consonant and then if we wanted to check the x dot shape y dot shape oh we'll just print x and y actually so there we go now our numpy arrays are in the form of tensors beautiful and we got a little confused when we check the input and the output shape
so let's do it again while they're in tensorflow so input shape equals x0 dot shape and output shape equals y0 dot shape input shape output shape and we're going to get a little confused because they have no dimension here because what we're doing is when it has no dimension it's a scalar value so it's a single value but then we figured out by just investigating our data that we want to use one input value to predict one output value so that's our input and output shapes there now how might we build a model to do
that how might we on scatter x y how might we build a model to figure out the relationships here well we haven't covered that so if you don't know the answer to that that's perfectly fine but let's start with steps in modeling with tensorflow that's what we're going to cover here so the first one is number one is creating a model in here you're gonna define the input and output layers as well as the hidden layers of a neural network and if you're using deep learning that is of a deep learning model wonderful number two
we have to compile a model we need to define the loss function in other words the function which tells our model how wrong it is and the optimizer the optimizer is tells our model how to improve the patterns it's learning and evaluation metrics so what we can use to interpret the performance of our model beautiful and then finally three is fitting a model so this is letting the model try to find patterns between x and y or features and labels beautiful so we have three steps here now if we come into we've got a beautiful
diagram here which is steps in modeling with tensorflow look at that a beautiful colorful diagram step one is we have to get our data ready so if we were working on an image classification problem to figure out the what type of food is in an image we'd have to turn it into tenses but in our case we already have our data intenses because if we have a look we've turned x and y into tensors so our data is already in tensors wonderful now step two once our data is in tensors we can build or pick
a pre-trained model to suit our problem we can use pure tensorflow or tensorflow hub we'll see that in a later video so that would be step one here would be creating a model specified to your problem wonderful then we fit the model to the data and make a prediction so that would be down here fitting the model but actually in tensorflow building or picking a pre-trained model often involves step one and two are very synergistic with each other meanings if you do create a model or picket pre-trained model once you instantiate it you basically always
have to compile it that tells tensorflow that hey i've got this model instantiated i've got it set up now i'm compiling it i'm telling you that i'm ready to use it so then we go into step three is to fit the model wonderful model.fit and we might fit it to the training data we might let it look at the training data five times and then we have number four is evaluate the model so once the model has found patterns in the training data we might evaluate our model on the testing data and then if we
keep going we can improve our experimentation and then we can save and reload our trained model but let's go back here let's see how we might actually go through these three steps here and what i'm going to do i'm going to turn these into bold so they're just a a little bit more pretty because that's what we're doing here we're creating art so what i might do is set the random seed so we have some reproducibility tf.random set seed 42 i'm going to use 42 which is the if you're wondering why i'm using 42 it's
the answer to the universe i'll let you look that one up and then we're going to go number one create a model using the sequential api you might be wondering what that means but we'll get into that in a second i'm going to write the code first model equals tf dot carers dot sequential and then we're going to open bracket and then we have so this is basically saying to tensorflow hey i want to create a model and i want you to sequentially go through the following we're going to just make this with one layer
tf.layers.dense we're going to have one there we go you might be wondering why i'm using one because in our case what do we want to do we want to build a model to take as input one number and predict one number so that's why i have one here we go down we're going to go step number two compile the model so model.compile equals loss equals tf carers dot losses dot m a e so in our case we have loss mae is short for mean absolute error so let's have a look remember if you don't know
something mean absolute error what is this so in statistics mean absolute error is a measure of errors between paired observations expressing the same phenomenon hmm what about do we have images comparison of two observations where x one equals two mean absolute error this is what you're going to come across you're gonna come across a whole bunch of different explanations but what the best thing to do is to just check them out and see if something catches your eye so examples of y versus x include comparison of predicted versus observed ah comparisons of predicted versus observed
right subsequent time versus initial time and one technique of measurement versus an alternative technique of measurement what if we looked up just the function that we wrote tf carers losses dot mae here we go computes the mean absolute error between labels and predictions wonderful so why true why pred alright ah and there's uh the function so it's the mean of the absolute of y minus y pred so i'm guessing that y prediction is the prediction our model makes and why true is the actual value it should be so we make it the absolute value here
so it's a positive number and then we get the average from that okay so mean absolute error is just saying on well there's the absolute and there's the mean so it's just saying on average how wrong are our predictions oh okay that's a little bit easier than what it first looked like now the optimizer is we're going to set it as tfkara's optimizers dot sgd now sgd is short for stochastic gradient descent wonderful now again if you're not sure what sgd is go what is stochastic gradient descent so i'll let you go through those in
your own time of what stochastic gradient descent is but just what you need to know for now is that an optimizer tells our neural network how it should improve and then we'll go here and we want metrics is we're going to use mae as well now a lot of functions in tensorflow if they have a shortcut name mae or sgd you can often use a string variable to define the fact that you want to use that specific function so in this case we could remove that and write sgd but we're going to remove that there
we go so that's our compile done and now we're going to fit the model which is model dot fit and we're going to fit it on x and y for five laps so this is what the fit function takes so we create the model we compile the model and we fit the model aka telling our model look at x and y and try and figure out the patterns and you've got five opportunities of going through all of the x values and all of the y values and trying to find those patterns or figure out the
relationship if we set this to 100 we'd say you have 100 opportunities of going through all of x and all of y but because we want to keep our experiments nice and short to begin with we'll only set it to 5. all right so before we run this we did say that we use a sequential api how do we find out what sequential means in tensorflow so if we wanted to find out what the docs string of this method here is oh there we go colab automatically opens it up but in my case i'm going
to press shift command space as it goes here so sequential groups a linear stack of layers into a tf.keras dot model sequential provides training and inference features on this model all right so we get an example of how we can create a sequential model now in tensorflow and carers there's two different types or two main types of creating models we've used a sequential api here so we can go tensorflow carers now this is if i already know what the difference is between the sequential and functional api is that's a little reveal of what we're going
to look into but this is how i'd find out information about tf care is sequential i go here sequential groups a linear stack of layers into tfkara's model now there we go tf.keras dot sequential equals layers equals none here's a whole bunch of different guides i could go through to figure it out i could go through some tutorials if i wanted to and there's actually the documentation is showing us a different way so we could create this is what i also do sometimes i just copy the code sample i come back to my notebook i
copy it here and then below it i'm going to write exactly what it writes out here so model equals tf keras dot sequential and then i do here model dot add dot key f occurrence.layers.dense 8 input shape now this gives me the feeling of writing this actual code input shape 16 and then so on so on now i want you to think about if you've seen this example here how might we convert our own example into this so let's try we'll go we've got our sample here we might go model equals tf carers and again
looking at this one dot sequential wonderful and then model dot add looking at this we want to put this in there model dot add tf dot keras dot layers dot dense one so there we go we've just got reproduce the exact same thing here now reason why i'm showing you this is because in tensorflow there are actually a fair few ways to do things so this is with the sequential api you can put your layers you can use the dot add method to add them to your sequential model or you can put them into a
list as we've done here and add them to your layer so tf carers.layers.dense.1 would be the same as adding that there does that make sense if not that's all right have a little bit of a play around have a practice check out the we'll delete this cell here because we're going to use this methodology for the time being have a look at the tensorflow documentation if you'd like to have a little bit more practice rewrite all of this code for yourself but let's get back into here we've set up our model here steps and modeling
we've created a model we've compiled the model and it's time to fit our model so you're ready to run your first neural network of this entire course i hope you are so let's try it out three two one look at that so we get a warning here what does this say dance is casting an input tensor from d type ah from float 64 to the layers d type of float 32 which is new behavior in tensorflow 2. okay the layer has d type float 32 because it defaults to float x so we discussed this before
of if you intended to run this layer in float32 you can safely ignore this warning if in doubt this warning is likely only an issue if you are porting a tensorflow 1.x model to tensorflow 2. no we're not doing that to change all your layers to have d-type flight 64 by default we can set the back end to float 64. how about how might we just change our layer or our input data to d type equals float because you see how we create it with numpy the default data type is float64 so let's change the
d type to tf float 32 and same again d type equals tf float 32. um with d typefloat32 not hmm well what if we go tf.cast and then we go pass here d type equals float 32 and then we'll do the same here ta for cast d type equals float 32 ah there we go we'll change this with d type float 32. so now we have float32 tensors what if we to run this model again remember this warning was telling us that the layer uses float 32 by default but originally our input x and y
were float 64. let's run this again boom wonderful so this goes very quickly so you can see here this is what the output is when you're going to fit a model is it tells us this is lap one of the data lap two of the data lap three this is how long it took and this is our loss function so how wrong our machine learning our deep neural network is when we're trying to predict x and y so right now on average when our model uses an x value to predict y it is wrong by
11.5 and then it slowly improves to be wrong by 10.9 and then because we've set our metrics to be the same as the loss function we get the same output here so this is loss this is our evaluation metric now if we wanted to use our machine learning model to make a prediction let's check out x and y so right now we have a trained machine learning model so this is tried to work out the patterns between x and y by doing five laps over the data so let's remind ourselves of what x and y
are there we go so negative seven three negative four six negative one nine now try and make a prediction using our model so let's go model.predict this is how we can make a prediction with our trained model if we wanted to make a prediction on x equal to 17. so if we had another value on the end here right if we added x equals 17 to the end here what do you think the output should be for y if you guessed 27 you'd be correct because that's the pattern between our x and y is that
y equals x plus 10. so let's see what our model learned ah our model predicts that if we had an x value of 17 the y value should be 12.7 now that's pretty far off but as we can see from our loss and mae values that actually reflects what our models our training output has shown us here is that on average our model predicts something that is basically 11 points off where it should be so if we wanted to let's go y pred equals model.predict have a look at why pred so this is on average
11 points off where it should be y print plus 11 because i'm getting this value here what does that give us 23.7 so it's still off so we fit a model now it doesn't find the correct patterns between x and y so what we might look at doing in the next video is seeing how we can improve improving our model so let's check out that in the next video we finished up the last video with you writing your first neural network with tensorflow code specifically we stepped through the steps in modeling with tensorflow we created
a model we compiled it and we fit the model to our data however our neural network didn't turn out very well i mean we tried to make a prediction on a new piece of x data and the output was pretty far off where it's supposed to be you might be thinking hey daniel we kind of stepped through this code relatively quickly without discussing the concepts behind it and you'd be 100 right with that but there's a reason for it we come to our keynote we checked out this slide steps of modeling with tensorflow we had
a brief look at our workflow what we're sort of working on throughout this course we had an even briefer look at the steps in creating a model compiling a model and fitting a model and then of course evaluating it well we do have a bit more of an in-depth guide as to the steps in tensorflow modeling more specifically talking through the code we wrote now this is an important point i want to emphasize and it's going to be a theme throughout the rest of course is that i would rather you learn the concepts by writing
code than by looking and reading a slide so i'll say that again i would much rather well i held this slide back because i would much rather you write code then spend time reading slides so if we got the steps here number one construct or import a pre-trained model relevant to your problem compile the model in the compilation we define the loss we define the optimizer we define the metrics number three we fit the model to the training data so that it can discover patterns epochs is how many times the model will look at the
training examples and number four is evaluate the model on the test data now these steps don't necessarily come in this order all the time but generally you can adapt them to whatever problem you're working on so with that being said we've built our first neural network with tensorflow but it didn't perform very well let's have a look at how we might improve our model so i want you to take a guess we've got three steps creating the model compiling it and fitting it based on what you've seen so far throughout the course what do you
think are some steps that we could do to improve our model's performance now if you're not sure that is completely fine i just want you to have a think about it before we start going through it remember how we define a model it's got a number of different hidden layers it's got a number of different hidden neurons in each of those layers and we fit it for five epochs which means it looks at the data five times so they're just a few little hints but let's get on to here if we wanted to improve our
model let's take a look at the three steps we used before to create a model i want you to think about this is that when we create a model using these three steps we can actually improve a model by altering how we went through each of these so when we create a model we can improve it via steps in the creation when we compile the model we can improve it by steps in the compiling and when we fit a model we can improve the model by steps in the fit method so let's write that down
we come here we can improve our model by altering the steps we took to create a model so number one is creating a model if we wanted to improve our model here here we might add more layers increase the number of hidden units also called neurons within each of those layers within each of the hidden layers and we might change the activation functions of each layer number two when we're compiling a model here we might oh here i've written air up here here we might here we might change the optimization function or perhaps the learning
rate we haven't looked at this parameter yet or hyperparameter should i say but we will in a second learning rate of said optimization function and number three when we're fitting a model here we might fit a model for more epochs in other words let our model look at the training data more times so leave it training for longer or on more data so give the model more examples to learn from now this is a very brief overview of how we might improve our model we do have a dedicated slide for looking at this so improving
a model from our models perspective that is so we might start with a smaller model and then we might build it into a larger model now before we continue with this we've discussed the concepts of improving our model now what i would rather do instead of going through each of these one by one is let's start to code this larger model so we'll do that in the next video i'll see you there welcome back we finished the last video by looking at this slide here comparing our smaller model to a larger model now there's a
few key differences between these two you'll notice in step one here the larger model has three hidden layers or a total of four layers but we've added three layers here to the beginning whereas this one only has one layer this large model has four layers so we've increased the number of hidden layers the number of hidden units or the neurons in each of the hidden layers is 100 whereas this one only has one hidden layer with one hidden neuron and then if we come down here to step two compile the model let's have a look
what's the main difference here the main difference the loss is the same but we've changed the optimization function from sgd to atom so if you're not sure what the atom optimizer is just for now remember the optimizer tells our model how it can improve and the atom optimizer is a very common and very useful optimizer so you'll often this will often be the default optimizer that you start with we've also got a lr parameter here in our optimizer where this one doesn't have it so this is lr is short for learning rate in other words
when our atom optimizer tells our model how to improve how much should it improve each step so the higher the learning rate the more the atom optimizer pushes the model to improve whereas the lower the learning rate so say for example if we had more zeros here the lower the learning rate the smaller the steps our optimizer tells our neural network to take to improve and now finally in fit the model we here we see here we've got fit x train subset y train subset whereas here we've got x train full y train full and
epochs equals 100 versus epochs equals five so what you'll often do with your data sets is split them into a subset say for example 10 percent of your training data rather than the full data set so you can run many smaller models to make sure that they work before upgrading to a larger model because this larger model here may take more time to run or time to figure out patterns in the data set than our smaller model and what do we want to do as data analysts and machine learning engineers is we want to run
as many experiments as possible to figure out what doesn't work before we increase the parameters of our experiments and try run some larger models so let's see we're going to do it one step at a time let's recreate this larger model but as i said we'll start it we won't change every single thing we won't change the number of layers to begin with we won't change the optimizer let's just see if we can improve it by increasing the number of epochs so this is a very similar model to what we've built let's do one upgrade
to it by letting it look at the training data rather than five times let's let it look at the training data 100 times so if we come back so let's rebuild our model so number one is create the model again you could scroll up if you wanted to but i'm just going to recreate the same model we created before tf.keras.sequential and if you can't remember each of the three steps so create the model compile the model fit the model don't worry because this may be the first time or very likely is the first time you've
ever written this code however by the end of the course after you've built i'm not even sure how many we're gonna build i think it's well over a hundred once you've built a hundred plus models these steps you'll start to be very familiar with and even now like i've built probably thousands of these is i still make mistakes and i still get things wrong and have to tweak them so what we've done is we've created the model which is just the exact same step we did before it's a sequential model meaning it's going to run
from bottom to top so we're going to pass through this one layer we've defined our loss function as mean absolute error we've defined our optimizer as stochastic gradient descent and then here we're going to define our evaluation metric as mean absolute error as well and now this time in step 3 fit the model we're going to go this time we'll train for longer we go model dot fit x y epochs equals 100. so this is the exact same model as we created before up here the only difference is here we've changed epochs from five to
one hundred so remember the final loss in the final mae for this after five epochs was around about eleven so let's just remember eleven that's the number we're trying to get lower that's what we're trying to improve by increasing the number of epochs so let's try run this model you should be very excited too oh see what i mean i've written thousands of these but i still get it wrong it's optimizers not optimizer i miss the s on the end run it again beautiful look at that oh nice and quick how exciting is that epoch
so if we come down here oops scrolling everywhere right up to the top so we see here we've got epoch 1 out of 100 and then this is where our previous model stopped after 5 epoch so again the error is very similar it's very close to 11 but then watch as we increase the number of epochs so 6 7 8 9 10 that's double has the loss gone down it has so we're getting closer to 10 after 10 epochs and now let's keep going down to 50 ebox here we go 50 so about halfway through
oh we're almost at seven loss okay let's keep going all the way down and we get to a hundred wow okay so we're just below seven here this is amazing so just by altering one hyper parameter of our model specifically the number of epochs it's gone through we've decreased our loss and our mae mean absolute error from around about 11 to around about seven so let's see what our prediction is going to be so remind ourselves of the data ourselves of the data so we have x and y what did we try to make a
prediction on before we tried to make a prediction if we added another value here 17 which should come out to be about 27 based on the other values of x and the other values of y let's see if our model's prediction has improved so we go here model if we want to make a prediction predict we pass it 17 the predict data has to be in the same format as the training data here so it's in a flight there and ready three two one boom oh yes look at that so much closer 29.73 to what
it actually should have been is 27 so much better than before what did we get before let's come up here we got 12 before so which is i mean it should have been 27 that's about 15 off so we've reduced our error to be about three off how cool is that by just tweaking one little parameter of our smaller model and turning it closer towards being a larger model now what if we were to alter another one of our models parameters what i'll do before i do that before i write the code i'd like you
to give it a try so you see here we've got our model code here i'd like you to rewrite this cell however this time keep the number of epochs at 100. i want you to choose one thing from this larger model slide here it could be the optimizer to change you could add in here one hidden layer or you could add in all three to the creating a model step but i'd like you to try one change and then in the next video i'm going to go through one of the changes i'm not going to
let you know which one but just try one of these for yourself and see if it improves our model's results just go through the exact same steps that we've done here creating a model compiling the model fitting the model except this time maybe add in a hidden layer here tf carers dot dot i'll let you complete that or or change the optimizer to adam and see what happens so give that a try and in the next video i've got to decide what change i'm going to make and we'll see if we can improve our prediction
even further welcome back in the last video we saw how we can improve our model's prediction capabilities by just increasing the number of times it looked at the training data now important point to note here is that although we've seen a couple of ways of how we can improve our model it's not always the case that any one of these parameters here or hyper parameters will result in an improvement so i also issued you the challenge to see if you could change one of the hyper parameters here in creating the model or compiling the model
and fitting it to the data keeping the number of epochs the same and seeing what happens so depending on what you tried it might have improved the model might not have but let's see if we can make another change to improve our model so now i'm going to write some code to get even closer towards our smaller model becoming our larger model my change i'm going to choose by adding an extra layer here so rather than three layers i'm gonna make the smallest change possible and this is what i'd like you to think about as
well going forward is that you don't necessarily always have to add three hidden layers or add ten hidden layers you can just adjust one thing on your model try it out and see how it goes in fact that's how i would like you to run your experiments is many many small changes rather than always doing extremely large changes because otherwise if you do too big of a change you might not be sure what caused the improvement or non-improvement of your model so let's go here let's try oh i'll keep that output there so this time
create the model this time with an extra hidden layer with a hundred hidden units so let's go here model equals tf.keras dot sequential open a list i like to come back right to the start here and tab in tf keras.layers now 100 and the activation on this one is going to be relu we'll have a look at what that is in a later video but if you want to test yourself at researching have a go at just searching what is relu activation or however you want to pronounce that so tf keras dot layers dense one
beautiful now we're going to number two is compile the model whenever you create a model in tensorflow you have to compile it compile the model model.compile the loss is going to be tf keras losses mae and the optimizer is going to be tf carers optimizers dot sgd now i just want to show you a little tidbit to just to prove that it works is if you didn't want to write this out you can change this to be mae that should work fingers crossed and then metrics is going to be metrics has to be within a
list as well metrics is going to be mae now three is going to be fit the model we've got our data model.fit x y so the features come first as you see here model.fit this is the dock string x equals our features y equals the labels and then we're going to go epocs equals 100 so the only difference we've made to this model above is we've added this hidden layer here so that's the one change we've made so let's see how it goes shift and enter beautiful oh would you look at that right from the
start remember our first model after five epochs had an error of about 11. well this one's hitting about 10 but then after 10 epochs it's already just above oh it's already below our other model without a hidden layer so this one finished off with a loss of just around about seven and an mae of around about seven as well remember mean absolute error is about on average how wrong are our models predictions so if we come down here what did we finish up with here oh my goodness how cool is that our next model by
just tweaking one little thing by just adding an extra hidden layer here that's all the change that we made we've basically cut out mae and our loss in half so let's remind ourselves of the data so we have x y remember the little prediction test that we've been trying to make is if we had another x over here which was 17 just increasing in the same way that these numbers are increasing let's see let's try to make a prediction so we want model dot predict and we want to make a prediction on number 17 to
see what the y value might have been oh okay so it should be 27 but now we're seeing it's 31.2 huh if we come up here it seems that our previous model did better it got 29.7 which is closer to the actual value it should be 27 because y equals x plus 10. hmm i wonder why that is even though our loss and mae are lower what it might be doing is our model is overfitting meaning it's learning the training data too well so it's learning the patterns between x and y far too well so
when it sees a new x it's just relating it back to what it knows and the error that it's producing during training is not a really valid representation of what it's actually doing see the real way we evaluate our machine learning models is not the metrics it gives us from the training data it's the metrics we get from data it's never seen before now if we come back to our improving a model section we saw a number of different concepts here but let's step back through them and i want to emphasize as we've seen in
practice not all of them lead to an improvement of our model and as we'll see going forward sometimes the metrics you see here during training aren't necessarily representative of what you'll see for data the model hasn't seen before so we're going to cover that in a little bit but i just want to go back and step through some of the concepts that we've looked at for improving a deep model so the first one is adding layers we just saw that we added one layer and we got improved metrics during training so when i say during
training it's when you call the fit function however in practice trying to predict on a sample our model hadn't seen before the results weren't as good as when we didn't add an extra layer another way to improve a model is to increase the number of hidden units we also saw that if we come back to our model here we could build another one by going 50 instead of 100 and we see how this goes so we'll train the model that experiment ran nice and quick again we're getting a fairly low loss and a fairly low
mean absolute error but the real evaluation is on a sample that the model hasn't seen before and we get an even worse value so we come back here again this is just a great example of how improving a model or the steps that we're looking at here don't always result in an improvement of our model another way is to change the activation function which we've tried here so by default if we have a look at our dock string let's see this in practice activation if we go here get the dock string the activation is none
by default so what if we just did that none and we run what do we get here okay so we get a slightly higher loss value than before and a slightly higher mae but again those are just metrics during training we want to evaluate our model on a sample that it hasn't seen oh we're getting closer 29.5 that's strange we've reduced the number of hidden units and taken away an activation function and we're seeing an improvement we come here change the optimization function so in this example the larger model we're using the atom optimizer rather
than sgd so let's go back to our code and how about we change this from sgd to atom and we run it again oh so we get a slightly higher loss than before compared to using sgd but let's remind ourselves of the data the training data we already have labels for so what if we had a new sample that was 17 we want to figure out its label make a prediction 31 okay so it's worse again change the learning rate all right so we have atom and it has a parameter lr the learning rate is
if the optimizer tells our model how it should improve the learning rate tells it by how much so adam's learning rate by default command shift enter is 0.001 so how about we increase this we increase it by 10 and for this function here learning underscore rate can be learning underscore rate or it can be the abbreviation lr so let's run this again oh wow look at that our loss and mae are barely even one so theoretically this model should be really good oh 26.2 that's our best model so far because remember predicting on 17 the
ideal value should be 27. so in our case adjusting the learning rate of our optimizer has resulted in the best change so far so that's an important point i want you to keep these things in your mind as we go through and again if you've only just experienced this for the first time i'll give you a little hint the learning rate is potentially the most important hyperparameter you can change on all of your neural networks so just keep that in mind going forward don't worry too much about it now but just if you want to
take up one note from this slide just write down the learning rate is the most important hyperparameter of many different neural networks so let's go here so fitting on more data so in this case we have x train subset and x train 4. well in our case we don't actually have more data than x and y so maybe that's our next experiment to try is creating a larger data set so for now we only have eight samples so in practice you'll probably have a lot more samples when you're building your neural networks let's come back
we've got one more is fitting for longer now this is the first one we tried and it's actually probably one of the most easiest to try because we just adjust this from five to a hundred so now we've seen a few different ways to improve our models and fit them to the training data let's have a look in the next video we're going to have a look at a probably just as important as fitting a model to data is evaluating a model's performance so we've seen here our test example of trying to make a prediction
on a sample the model's never seen before and we keep getting this different number but how exactly do we tell how good our model's predictions are or how good the patterns it's learned or the relationships that it's learned between x and y let's check that out in the next video in the previous video we checked out a whole bunch of different ways of how we can improve our model we even tried a few of them well most of them actually we adjusted the number of hidden layers we changed the optimizer we changed the learning rate
and we saw that the learning rate actually had probably the biggest influence on how our model performed and we changed the number of times our models look at the data by altering the epochs and in practice this actually a fairly common workflow so let's write this down in practice a typical workflow you'll go through when building neural networks is we'll start off with build a model and then fit it evaluate it and then tweak a model fit it evaluate it and then tweak a model fit it evaluate it yeah yeah yeah and now looking at
this you might think well have we actually done this and the good news is we have well that's just what we went through in the previous video we created a model we compiled it and then we fit it and then we had a look at the things that we can tweak in our models we tried adding a layer so we tweaked our model we fit it and then we evaluated it we tried increasing the number of hidden units then we fit it and we evaluated it we tried changing the activation function the optimization function the
learning rate we didn't try this fitting on more data but we're going to see that in this video or maybe the next one then we tried fitting for longer by increasing the number of epochs and because you can alter each of these they're referred to as hyper parameters remember hyperparameter is like a dial on your neural network that you can adjust to see how it improves whereas a parameter is usually the patterns a neural network learns so these are the things that we don't code ourselves so let's go back let's see some other ways for
evaluating our model we tried by making a prediction on an example the model hadn't seen before but what are some other options that we have well when it comes to evaluation there are three words you should memorize when we're building models you want to experiment experiment experiment but when we're evaluating models it's visualize visualize visualize so that's the most important step when we're evaluating our models now what should we visualize so it's a good idea what i mean by visualize we'll see in a second visualize but you can probably guess it means to when we're
looking at things like this can be hard to really understand what's going on but when we put them in a way that we can visually see what's going on so we might visualize the data so ask ourselves what data are we working with what does it look like we might also visualize the model itself what does our model look like we might visualize the training of a model so how does a model perform while it learns and we also might visualize the predictions we've done this one of the model how do the predictions of a
model line up against the ground truth the original labels we tried this one so this is lining up this prediction here against what it should actually be so if we try to predict on 17 that's our x value we know that y equals x plus 10 so it should be 27. let's really dig into these steps here a bit further by working on a little bit of a larger problem which is going to suit this step here so fitting on more data but first we'll need to make a bigger data set and we can do
that let's set up another x we'll use tf range to create a range of numbers between 100 negative 100 and 100 with a step of four let's see what this looks like beautiful so we have 50 numbers here that start from negative 100 and increase by 4 all the way up to 96 because the maximum was 100. so 50 how many do we have in our previous x oh it's not saved up here there we go so eight so we've got five times as much data that's a good amount for now and to do so
we're going to need to make labels for the data set so y can just equal x plus 10. this is the formula we want our model to learn this is a pattern we want our model to learn now let's have a look at why just as before there we go negative 90 yep negative 100 plus 10 is negative 90 all the way up to 106. so we have the same shape so one y value for every x value beautiful and now what should we do so it's a good idea to visualize the data all right
so let's visualize the data what we might do is create a plot so import matplotlib dot pi plot as plt google collab completes that for us so plt.plot x y what does this look like oh wonderful how about we get that in a scatter plot i think that's a better plot for this type of data yep we should have dots there now before we trained our model on x and y and then we evaluated it on a sample our model hadn't seen before now this is a very one of the most important concepts in machine
learning and deep learning in general and it's probably better explained using three sets so before we get into visualizing further individualizing the data the model itself the training of a model and the predictions of a model and even further into evaluating our model let's take a look at the concept of the three sets now if you're familiar with machine learning you may already know what the three sets are and so i would like you to apply your knowledge of the three sets and actually we're probably only going to use two sets so split x and
y into a training and a test set only if you're familiar with it if not just go straight into the next video if you are familiar with it split x and y into an 80 training and a 20 test set we'll see how to do that in the next video welcome back in the last video we touched upon the concept of the three sets so let's dig into what i mean by that so go here now in practice in machine learning you're often not going to fit and evaluate on the same data set and what
i mean by that is what we did up here we started with x and y and we just fit the model to all of the data in one hit and then we had to evaluate on our own custom sample whereas when you actually work on machine learning problems you're often going to have three different sets of data you're going to have the training set so the model learns from this data now this is typically depending on what problem you're working on typically 70 to 80 percent of the total data you have available and then we
want a validation set so the model gets tuned on this data so this is where you would tweak different things so if we come back up to here and we built a model we fit it we evaluated it then we tweaked it so say changing the number of hidden layers or changing the optimizer you'll often test how these tweaks affected your model's performance on the validation set so this is typically 10 to 15 of the data available again this will generally depend on how much data you have and then finally you have the test set
so the model gets evaluated on this data to test what it has learned this set is typically 10 to 15 percent of the total data available so if you had a goal at creating your own training and test set you might have split it i think i said 80 and 20 so that's another valid training and test split if you are going to drop one of these sets usually you'll get rid of the validation set and you'll only have a training set and a test set again will depend on how much data you have will
depend on the experiments you're running but we've spent enough time talking about these let's get into coding them up so here if you want another analogy you've got the slide here we have three data sets which is possibly the most important concept in machine learning is to have the training set which could be if you were studying at university your course materials so the the things you learn throughout the semester the validation set is the practice exam and the test set is your final exam to evaluate the knowledge you learned throughout the semester and what
we're going for here with the three data sets the training validation and test set is generalization so this is the ideal state we want our machine learning model or deep learning model to be in is the ability for a machine learning model to perform well on data it hasn't seen before so if you've learned well the course materials throughout the university semester you should be able to perform well on the final exam so something you haven't seen before if you've learned them well now this is exactly what we want for our machine learning models to
do is we want them to learn patterns on the training set so that it can perform well on samples that it has never seen before in other words perform well on its final exam so let's go back let's see how we might code these in practice so check the length of how many samples we have so len x this is the data we're working with we created this just before by x is tf range negative 100 to 100 wonderful now knowing this we're going to skip our validation set for now we'll see it later but
because we only have a relatively small sample 50 is pretty small in the world of deep learning what's a large one well again i would say minimum 100 plus samples for deep learning but we'll see in future videos different sizes of samples now the training set is typically 70 to 80 percent of the total data we have available why don't we use 80 and that means we're going to create a test set so in other words the final exam of 20 of the total data available so how might we do that well we can go
here split the data into train and test sets x train you'll often see or this is going to be the notation for a training data set throughout the course is just with the train tag after this is very common in practice sometimes you'll see something like train data equals this they usually just mean similar things and the same for x test equals or test data equals so just going forward if you see different variable names for training and test data they're often the same thing i mean sorry different ways of representing the same thing and
then we'll go x test so this will be the training data so we want the first because we're working with 50 samples we want the first 40 training samples this is 80 percent of the data wonderful and then x test is going to be the last 10 samples so last 10 uh testing samples 20 of the data beautiful and here we'll create y train as well just in the same format and we'll also create y train here oh we've got a space there oh we'll go to space there too why 40. wonderful and we can
have a look at length of x train and len x test so this should be 40 and 10 respectively beautiful so what we've done is we've created a training data set and a test data set we've also got y train which are their training labels and why test which are the test labels why test is not defined ah there we go see we're catching errors together beautiful so we've got training features testing features training labels testing labels now what should we do how about we go back to what our steps are in evaluating so when
it comes to evaluating we should remember three words visualize visualize visualize wonderful that's our data evaluation motto it's a good idea to visualize the data all right well we did that before before we split our data into training and test but now let's visualize our data split into training and test samples so how about we go here visualizing the data and we'll go now we've got our data in training and test sets let's visualize it again now if you're wondering what i'm putting here when i write text in a collab notebooks or in jupyter notebooks
depending on what you're using i often write like text cells these are just kind of notes to myself or if i had to share this notebook with someone else it's information for so they know what's going on kind of like a comment but just in a text format that's easier to understand so that was a little bit of an aside let's write some code so we want to set up a matplotlib figure because this time we're going to be plotting two samples of data our training and tests so 10 7 is my favorite plot size
so plot training data in what's a good color blue plot scatter we're going to go x train first so we need to plot x train and y train we can use the c parameter of plot to set as b for blue and then we're going to label it with training data wonderful and now we're going to plot test data in my favorite color green plot scatter x test y test we're going to set the color to g for green and we'll set a label as testing data beautiful and we want to show a legend so
we can tell the different data apart and we'll put this little semicolon at the end so we don't get the matplotlib output shift and enter what does this look like ho ho there we go we've got a very similar plot to just above so this one here except now we have our testing data sorry training data in blue and our testing data in green so any time you can visualize your data your model your anything it's it's a good idea so that way it's much easier to understand well in my case i find looking at
something like this much easier to understand than looking at something like this all these numbers in a tensor so in some cases you won't be able to plot your data if you've got more dimensions than just x and y sorry x there y there now what are we going to do referring back to our concept of three data sets in our case we haven't got a validation set we've got a training set which is the blue and the test set which is the green so what we want to do is build a neural network to
take in the training data here to learn the relationship between x and y and then we want to if we want to uh feed in we want our model to learn the relationship in the training data so that it can predict our test data so if we fed it in the x values of our test data we want our model to be able to predict the y values so where this green line should be so now we've visualized our data in our training and test sets let's build a model in the next video to figure
out the patterns in the training data so we can make predictions on the testing data welcome back in the last video we split our data into training and test sets and then we had a go at did i spell that right no again another uh spelling mistake plenty of them throughout the course we had a go at visualizing our data more specifically comparing our training data to the testing data and we want our model to learn on the training data and we want our model to be able to predict the testing data in other words
given x what's the y value so let's have a look at let's have a look at how to build a neural network for our data now we've actually already done this so if we scroll back up we built a neural network very similar to what we need right up here so let's recreate something like this nice and simple neural network one layer and we'll create it compile it and fit it we'll come back down beautiful so step one i told you we're going to get lots of practice creating a model create a model so model
equals tf keras sequential and then put in one layer tf keras layers dense one hidden unit because remember where our x and y vectors or tensors we're using one x value to predict one y value hence why our dense layer has one hidden unit two we're going to compile the model model.compile we need to set a loss function tf carers losses dot mae we're going to set an optimizer which is telling our model how it should improve optimizers sgd stochastic gradient descent and we're going to set the metrics to be mae wonderful number three is
fit the model i'm going to do model dot fit on x train this time is different we want to fit on the training data only for let's go 100 epochs so notice here the important point we're fitting on the training data which is this blue line here so we want our model to learn the patterns in this training data to be able to predict the patterns in the test data so before we fit the model i'm going to comment out this line and i'm going to hit shift and enter i'm going to instantiate our model
here now because what i want us to have a look at is visualizing the model so we can get an idea of what our model looks like before we've even run it by running model dot summary ah what happened here a value error this model has not yet been built oh why is that build the model first by calling build or calling fit with some data or specify an input shape argument in the first layers for an automatic build now you might see why i commented out this line i did this on purpose because i
wanted you to see this error so build the model first by calling dot build so we could go model dot build and we see what that might do so builds a model based on input shapes receive we could define the input shape there so that's one option that's the first one you could try that out for yourself or we could specify an input shape argument in the first layers for an automatic build so with that in mind let's see how we might let's create a model which builds automatically by defining the input shape argument in
the first layer let's see what that might look like because this is something you'll do quite often in practice is defining the input shape usually your neural networks can determine the input shape so this is what i mean by input shape is this parameter here input shape usually they can figure it out the input shape on their own however sometimes you'll need to manually define it depending on what problem you're working on so let's see how we might do that so we'll set the random seed so tf random set seed for as much reproducibility as
we can so we'll create a model this is just the same as above and we're going to go model equals tf keras sequential remember a sequential model just runs from top to bottom we're going to go tf carers to layers dents one now here's what we're going to do the input shape argument we need to define that so how might we find the input shape remember what are we trying to do we're trying to predict why based on x so what is the shape of the data that we're passing our model x dot shape 50
but we want just one sample of x dot shape it's a scalar value so what if we do what is x0 and y0 what do they look like again they're just one number so in our case the input shape will be one because we're passing at one number to predict one number so there we go we've specified the input shape argument now again the shape might be different depending on the input tensor you're passing in you might have three different variables so you could pass the input shape as three but for our case we have
one input for one output that's what we're after now we might go number two is compile the model model.compile loss equals tf carers losses mae optimizer equals tf carers optimizers dot sgd stochastic gradient descent metrics equals mean absolute error so if we run this just the same model we've created before this is also the same as above now let's check out our model dot summary whoa okay we've got a few things going on here so calling dot summary on our model shows us the layers that it contains the output shape and the number of parameters
of each layer so the output shape here is remember we want one input for one output so the output of one that makes sense the layout here is a type dense so another word for dense is fully connected so if we go here fully connected layer images what this means in a fully connected layer is that all of the neurons here connected to all of the neurons in the next layer so that's what fully connected means and in tensorflow a fully connected layer when you see that is the same as a dense layer so dense
is just another word for dense connections if you see all those connections there and then we've got parameter numbers here now there's a few different things here we've got total params trainable params non-trainable params so let's define what each of these are so the total params is total number as you might have guessed of parameters in the model these are the patterns that the model is going to learn so remember when we had a look at our overview of a neural network it creates tenses of different values so patterns the total number of parameters are
how many different patterns our model is going to try and learn within our the relationship between where's our x and y data here the relationship between our x and y data so we come down and the trainable parameters these are the parameters the patterns the model can update as it trains so in our case the total number of parameters here two is equal to the trainable parameters that means all the parameters in the model are trainable so they can be updated you might be wondering when is ever total params different to trainable params and different
to non-trainable params well when we later on in the course when we import a model that has already learnt patterns in data what we might do is freeze those learned patterns so in that case it might have a whole bunch of non-trainable parameters because we want that model that is already learned on data to keep its existing patterns we just want to train a few parameters and apply it to our own problem so let's write this down non-trainable params so these parameters aren't updated during training this is typical as we discussed when you bring in
already learned patterns or parameters from other models during transfer learning we haven't covered transfer learning yet but we will in a future video so this is a little bit of an overview of what you get of calling dot summary now if you want to have a a look at what the actual parameters are in a dense layer you're going to probably find something called a weights matrix and a bias vector so if we go here neural network weights and biases here we go fundamentals of neural networks on weights and biases so if we look at
this basic neural network structure so within our hidden layer here we've got hidden layer 1 hidden layer two hidden layer three we have a whole bunch of different parameters where's the weights and vices metric there's a lot of good stuff going on here weights and biases this is the type of research that you're going to have to do hmm maybe that's not the best article but weights and biases is a great website here we go neural networks biases and weights we'll try another one or i've got a better question so here's an example of how
to do better research is what were we actually looking for now i i gave away there that it's a weights matrix and a bias vector but what were we actually looking for to begin with we don't know what the trainable params are so if we search what are the trainable params in a neural network learnable parameters here we go trainable params much better question so if we come back here this is an example of how to rephrase what you're looking for to find out in a better way so if we come here learnable parameters keras
python deep learning trainable params there's a video there have we got here model.summary there we go trainable params this model has far more 2515. there we go if you recall from that episode in our first convolutional layer we haven't looked at convolutional layers yet giving us a total of 2 515 learnable parameters i'll leave you to to do some some more research here but that's a great way to phrase a question another i have an external resource for you here while we're at it so this is one of my favorites we'll create the resource emoji
resource if you want to learn what the actual parameters are here that's going on remember we're focused on writing this code if you want to learn what's going on in the background i recommend this resource one you could ask a question like this to google or you could go to this resource for a more in depth overview of the trainable parameters within a layer so this will be one of the extra curriculum for this section check out mits introduction to deep learning video so we've got mit introduction to deep learning and there's a full-blown course
here we go introduction to deep learning intro to deeplearning.com i'll leave this video by the way in the resources section so intro to deep learning oh they might be uploading it oh it's a 2021 look at that so this is the video i'm talking about by the time you watch this course they may have the 2021 version i will share the most relevant one to the course so this is intro to deep learning if you want to learn what's going on behind the code so the trainable params here you can find out in that video
otherwise i've got an exercise for you before the next video in this course oh dancing emoji you can dance if you want but here's your exercise exercise so try playing around with the number of hidden units in the dense layer and then see how that affects the number of parameters total and trainable by calling model dot summary i'll give you a demo so if we were to change this from one to three shift and enter and then hit model summary look what changes three hmm so that gives me an inkling that there's two trainable parameters
per hidden unit that's all i'm going to let you know for now because i want you to have a go at going through an external resource as well as adjusting the hidden units on your own and seeing what comes out for now all you need to do is think about these parameters as they are learnable patterns in the data we're working with so with that being said let's fit our model to the training data let's fit our model to the training data we can do this by going model.fit we commented this out before actually so
just the exact same thing here we can call that here x train y train epochs equals 100 and i'm going to set a little parameter here called verbose equals zero remember how can we check what that does verbose equals one by default what does verbose do here what does it say epochs verbose 0 1 or 2 for verbosity mode 0 equals silent so that means there will be no output progress bar 1 that's default note that the progress bar is not particularly useful when logged to a file so the bose equals two is recommended when
not running interactively so verbose equals zero oh what happened there we didn't even get any outputs because we set for both to zero if we set it to one we can watch our model train look at that we'll set it back to zero now remember because we're running this continually every time we call fit it's going to fit for an extra 100 epochs so we've actually just fit our model for 200 total epochs now if we call that again that's 300 total epochs because we've called 103 times to reset that we'd have to go up
here reinstantiate our model get the summary i might change this back to one model.summary and then we're going to fit it and then we're gonna leave it at that and we'll continue on in the next video welcome back we finished the last video by fitting our model to the training data for a hundred epochs we set the bose equal to zero so we don't get any output now i did also set a little exercise here try playing around with the number of hidden units in the dense layer and see how that affects the number of
parameters total and trainable by calling model.summary so let's try that out if we call or we'll just get a summary of our model so model.summary okay and notice this number here continually increases because we've created a total of 14 sequential models so far at least in this collab instance so as i said before when you first start writing models you may not get familiar by the time this number approaches 100 plus we've created 100 plus sequential models maybe we'll be pretty familiar with what's going on and we had a look at the total params trinomial
params and non-trainable params which are the patterns in our neural network and the parameter numbers here per layer and if you wanted to learn a little bit more about what they actually are we've got this extracurricular resource here so mit's introduction to deep learning video now let's do a little exercise if we were to change this to 10 we'll get model summary okay number of parameters equals to 20. if we fit our model it takes almost no time at all wonderful we get a summary of our model notice here that sequential went from 14 to
15 and dense went from 19 to 20 because we instantiated a new model here and then we've got a different output shape and a different number of parameters so because we've got 10 hidden units in our dense layer there seems to be two trainable parameters per dense hidden unit now there's one more way we can visualize our model and that's using the plot model function from carers utility utils or from tensorflow.cadres.utils let's import plot model and then we can go plot model and we'll check the the dock string here is it going to come up
we'll just error it out so sometimes when you write an import statement and the function hasn't quite been import yet checking out the docstring by pressing command shift enter or control shift enter on windows might not come up until you've actually imported that function so that's why i just ran it it's not going to to run anything it's going to error out because it's missing one required positional argument but that's all right because we can check it here convert to keras model to dot format and save to a file okay what's the example tf.keras.utils.plot model
model let's have a look at what it looks like model equals model ah okay so we've got an input layer it's going to pass that to our dense layer all right so if we go show shapes so if we look at what this parameter does def plot model model 2 file if we wanted to save this as an image we can plot it to sorry we can set the 2 file parameter but if we want to see the shapes of our model show shapes is by default set to false so we want this to be
true remember a lot of the time we'll spend with our neural networks making sure our input and output shapes are correct so in our case we have a dense input layer with an input of one and a dense output layer with an output of 10. although this type of model here is relatively simple this plot model function is going to be very handy later on when we start to create more complex models with more hidden layers so we see here we've defined the input shape as one so that's why we have an input shape as
one and our output shape is 10 because we have 10 hidden units in our dense layer now how might this change if we are created tf keras dot layers dense one and i'm going to name it output layer and this can be name equals input layer and i'm going to name the whole model as name one of many models we're going to build all right model.summary now notice how this changed model is one of many models we're going to build of course you can name it something more specific to your problem but you also see
how we've changed the layer names so before that was like dense 20 or something like that we've now got a layer with the input layer name and we've now got a layer with the output layer name now this is very helpful if you have models with say 20 or even five layers confusing if you're not sure which layer is which so then we fit our model one of many layers we're going to build oh the value error so we can't actually call uh our model that so that's an invalid name so let's just call it
model one model summary fit the model and that worked wonderful model summary again there we go model one layer input layer layer output layer and let's see how this updates plot model beautiful that makes a lot more sense so the input layer input is one then we've got an input layer with a hidden size of 10 then we've got an output layer of one so the input this layer takes is 10 and the output it gives is one now again looking at this might seem quite confusing but as i said as we go on as
we start to build more models as we start to visualize them it's going to start to make more sense so with that being said let's uh stick with our theme of visualize visualize visualize and in the next video let's check out how we might visualizing our models predictions so have a go at creating your own different models maybe add some hidden layers here give them different names change the number of hidden units you have here and then fit it to the data get a summary see how high you can get this number of trainable parameters
and then plot the model here using the plot model utility and i'll see you in the next video in the last video we saw how we could visualize the different layers in our model and also get a summary of the layers in our model which included the number of total parameters the trainable parameters and the non-trainable parameters now let's have a look at how we might visualize our model's predictions so we can further evaluate how our model is performing and so to visualize predictions if i could type correctly it's a good idea to plot them
against the ground truth labels so in practice you'll often see this in the form of something like y test or y true versus why preds or why pred whatever you want to call it where it's this is the ground truth versus your model's predictions and so to do this we'll first have to make some predictions so we've got a trained model so let's look at how we'll make some predictions to create y pred now it's important here that y test is usually the nomenclature you'll see for the the test labels same with why true and
why pred is also a very common nomenclature like so variable name for when you're making predictions with your model or white reds whichever you want to do so model dot predict and we're going to predict on the test data set so let's have a look why pred see what our model comes up with okay wonderful we get a tensor in the same format as y test there we go so here are the ground truth labels and here are our models predictions so in an ideal world these would be the exact same numbers so if our
model learned the data perfectly and could predict the test data set 100 these numbers would line up with these numbers so rather than going through them one by one and comparing let's see how we'll visualize them so maybe we want to how could we do this how about we build a plotting function to to figure this out this is in case we wanted to we're building a function because if we wanted to visualize our predictions going forward we're probably going to reuse this plotting function so let's create a plotting function this is actually i'll put
this down as a tidbit because i want you to remember something like this and put a little key here oh where'd my key emoji go that's what we want so note this is just a python concept in general too if you feel like you're going to reuse some kind of functionality in the future it's a good idea to turn it into a function nice and simple concept there so if we did want to plot it so let's go maybe def plot predictions and then it's going to take some training data which will by default equal
x train it'll also take some training labels which will be y train by default we also want some test data which can be x test by default and then what do we want test labels yes we need that and what comes after that oh of course we need the predictions so the predictions can be why pred by default wonderful and we'll give ourselves a little doctrine here so what can we call it plots training data test data and compares predictions to to ground truth labels nice and simple we could talk about what the we could
put in the in the doctrine what the different variables are but i'll leave that up to you and so i'm going to start by creating a figure we'll set the fig size to my favorite size 10 7 also a great hand in poker plot training data in blue so that we're just basically taking the same plotting code as above so this is where this is coming from i'm not just pulling this out of the air we're just functionizing this you could just copy just copy this and bring it down but since we're in the habit
of writing code rather than copying pasting code let's uh get some practice so we want a scatter plot here and we're going to do train data and train labels so we're just taking this train data train labels and we're going to put the c into blue and we'll give it a label of training data wonderful and then we'll go plot testing data in green plt we want to scatter plot again the plots will differ here based on what kind of data you're working with so scatter is is pretty good for the problem that we're working
on here just a nice and simple regression so label equals testing data beautiful and here's where we're going to add the extra dimension or add the extra data set we're going to plot models predictions in red nice and simple color scheme plot scatter and we want to compare them to the test data so remember we're passing our model to make the predictions we passed it the x so we've already our predictions already have the x data built in and they're predicting the y value so for our scatter plot for the predictions it's going to on
the x-axis plot the test data which is x test and then on the y-axis we want to plot the predictions we'll give that a nice color of red beautiful and the label can be predictions and of course we want to show the legend just in case we know the color scheme but in case we wanted to share this plot with someone else show them how beautiful our model is going that should work so plot predictions train data x train i mean we could we could look at this all we want but hopefully python will tell
us if we've got something wrong here now we're going to go plot predictions we could almost just call this as it is if all the variables are set up plot predictions boom oh look at that how good that's some good predictions there now you see how this is at least to me i'm not sure you can agree or disagree if you want but to me that's something to to really visualize how well our model is performing so ideally these red dots would line up perfectly with the green dots but to me that looks like something
hey if i wanted to present this to someone else they could pretty quickly pick up what's going on in terms of how good our model is now of course depending on the scale of this this graph here this distance here could actually be something that's way off now that is where we'd want to actually compare these numbers in a different way so i might just fill out this so we know what's going on so train data equals x train we don't have to do this but this is just for completeness train labels equals y train
test data equals x test and then we go test labels equals y test and then finally we finished the predictions with y print wonderful all right so from the plot we can see that our model's predictions aren't totally outlandish however as we said before the distance here between these two depending on the scale could be a fairly large error so the way we can figure this out is by with some evaluation metrics so let's cover that in the next video so evaluating our models predictions with regression evaluation metrics so maybe before the next video you
can have a play around try and see if you can get that red line closer to the green line and how you would do that is you would try to improve our model so you go back up here maybe add an extra layer maybe change the optimizer maybe fit it for longer and see how close you can get the red line to the green line maybe you can get them to fully overlap but otherwise i'll see you in the next video welcome back last video we saw how we could evaluate our model's performance by visualizing
the predictions now that's one great way to do it but the next best way or you could actually say these are on par depending on what kind of problem you're working on is to evaluate our model with evaluation metrics now depending on the problem you're working on there will be different evaluation metrics to evaluate your model's performance and so since we're working on a regression problem two of the main metrics you'll see now again there are plenty you can look these up however two of the main ones you're going to run into uh mae which
is mean absolute error we've been using this one so far which is basically saying on average how wrong is each of my models predictions and then you have mse which is mean square error which is very similar to mean absolute error however you square the average errors square the average errors and find out yeah so take the errors from your model's predictions square them and then find the the average so that being said let's uh before we implement these with tensorflow let's have a look at the keynote here we have a slide for some common
regression evaluation metrics so let's start off with the mean absolute error m-a-e here's the formula here written in fancy mathematical notation now this is just saying these are our labels here and x i which could also be represented by y with a little hat we'll see that in a second depends on where you find this formula i think i just found this one on wikipedia what this is saying is these are our labels and these are our predictions so across all of this when you see this this means some of this little uh fancy looking
e here i think it's a greek symbol for sigma now i want to give you a little tidbit here for math so math kind of gets a bad rap but it's actually a beautiful way of representing nature so when you see these symbols and i know when i first started in deep learning machine learning i saw greek symbols like this and it would would freak me out but it's it's basically the same thing as learning code so it wasn't until i started to replicate these formulas in writing actual code that i really started to understand
them more and so when you see this this is sum of n starting from the first sample so across all of our samples minus the so take the label minus the prediction get the absolute value of that meaning that if the if the label is say 10 and this is 20 so this number would be negative 10 but we want the absolute value so it's going to be remember if we did tf.abs which is absolute it would be positive 10. and then divided by all of the predictions that you have so all of the samples
that you're making predictions of and when you see this in combination so this sigma notation here with n in combination with the divide by n sign here we'll see another form of it in a second this is kind of like a fancy way of writing average and this is the absolute error and so we can do this in tensorflow code we're going to see this in a second and when to use this metric well this is a great starter metric for any regression problem it's very easy to understand because it's just basically saying i like
to think of it as on average how wrong are our models predictions so then we go to the mean square error so again this is a this is a very similar way this little setup here of writing the exact same thing as you see here so when you see one on n in combination of sigma with an n on top this is another fancy way of saying mean and this here is another very similar way of writing take our labels and minus the prediction when you see a y like this with a little hat on
top this is often standing for remember up in our we created why pred so this y with a hat is means y print or y predictions and we square it so see here we took the absolute value in this one we just square it with this little two here and we can write the tensorflow code to do that and when should you use this one well this is a great metric to use when larger errors are more significant than smaller errors why because of this little square here so if you've got a model and making
an error of say an absolute error of 10 is okay but an absolute error of 100 is just a catastrophe you might want to use mse because it's going to amplify the error value for larger errors or larger values because of this little square here and then finally we've got huber which has a little bit more complicated formula we won't dig too far into that but again we can write this in tensorflow code but huber basically takes the combination of mse and mae and it's less sensitive to outliers than mse so mean squared error here
so with that being said these are some common regression evaluation metrics take note of this slide you'll see these in practice there are many more however let's get hands hands-on and start writing them with code we'll come down here all right so how might we start so if we wanted to evaluate our model using evaluation metrics one quick way that we can do that is evaluate the model on the test set we've got our trained model here we can go model dot evaluate x test y test because if we pass the test data set here
remember we want to train on the the training data set evaluate on the test or validation data set so we come here check the docs string returns the lost value and metrics values for the model in the test in test mode so if we run this what's it going to give us comes in the order of loss and then evaluation metric so where did this mae come from so if we come back up here when we created our model remember when we compiled it we set metrics equals mae we also set loss to be mae
so for our model in this case because the loss and the metrics are the same the evaluate method is going to return the same figure so there we go now that's one quick way to start evaluating it how about if we wanted to we wanted to run one of these evaluation metrics on their own by not calling the evaluate method we just want to take our compare the prediction array that we have y pred to y test let's see how we might do that so if we come up here we want another cell and let's
go calculate the mean square error actually let's do absolute error first because that's what we have already now before i do this i want to issue you a quick challenge i want you to see if you can figure out we've got this slide here we've got the tensorflow code for how to calculate the mean absolute error i want you to see if you can use one of these to calculate the same value as we've got here so you're going to have to compare why pred to y test so probably try type out the code here
and then read its docs string and see how you might use these values to calculate the mean absolute error so give that a go and i'll see you in the next video and then we'll go through it together how'd you go did you manage to calculate the mean absolute error if you had to go and you figured it out well done if not let's see how we might do it so if we come back to the slide we've got the code here tf.metrics mean absolute error maybe we'll try this one because we've already seen this
in our loss function so let's see what this is we'll start exploring the tensorflow library so tf.metrics dot mean absolute error there we go oh here we go computes the mean absolute error between labels and predictions beautiful that's what we want so that's the function that it actually implements so loss equals mean absolute y true minus y print beautiful now how might we use this okay loss equals tf.carers dot losses dot mean absolute error y true minus y print now metrics don't mean absolute error looks like it's the same as tf.keras.losses.mean absoluteerror well i mean
it should be because it's the same formula right now why true why pred do we have y true we don't have y true but we do have y test so let's type that in y test and y print and actually we might save this so m a goes that and we'll set up the parameter names y true because y test and y pred equals y print see what happens oh that's interesting now mae here gave us one overall metric whereas this is showing us we've got a metric for each of our test labels and predictions
why might that be well we've actually got our predictions here and our test labels here so what if we did y test minus y pred what happens there do we get the same output nope we get a very strange output there hmm what if we did y prayed minus y test hm we don't get the same output there what's going on here let's have a look why pred why test now is it because this is this is not a tensor maybe why pred let's turn this into a tensor and see what happens so tf constant
y pred because if we do that have a look here this is what happens tf.constant y pred okay so now that's in a tensor format yep that's what we want this is in a tensor format beautiful this should work we go here oh same output as before hmm let's have another look at what's the difference between these two we'll go through it step by step tf tensor shape equals 10 1 d type equals float 32 now this is tf tensor shape equals 10 comma ah that might be where we're i see so they're not in
the same shape now this is a a very important tidbit so if we ever want to compare different tenses remember how we did in in a previous video we had to reshape our tenses so that they could be in the same format or the right shape to do a dot product well this is very similar when we're doing or running evaluation metrics a lot of the time tensorflow is very smart but it can't interpolate the fact that our y print tensor actually has an extra dimension here compared to our y test tensor so do you
remember how we can remove the one dimension here of our y print tensor if not that's perfectly fine let's revisit the squeeze method so we go tf.squeeze and then pass in y pred what happens ah there we go much better gets rid of that one dimension there so now our tenses y pred and y test are of the same shape so let's let's redo this calculate the mean absolute error we could copy and paste that but we're going to practice rewriting it so calculate the mean absolute error so let's see what happens if we go
mae equals tf metrics dot mean absolute error yes that's what we want y true equals y test we're comparing and again this is another thing for all types of um evaluation metrics it usually involves comparing the why true labels so the true labels versus predictions mae oh we forgot to tf squeeze our predictions shift and enter wonderful look at that how beautiful so we get the same result as our evaluate function up here nice remember how i said a lot of the time you'll be spending your your efforts reshaping tenses to make sure that they're
in the right shape or the right format for whatever function you're trying to work with now we've done the mean absolute error how about we try the mean squared error so i'm going to leave this as a challenge for you calculate the mean square error so just as we've done before we've got the formula here mean square error try out this function here and see if you can calculate the mean square error before the next video how'd you go did you manage to calculate the mean square error if not let's see how we might do
that we'll go back to our little keynote here and we can see here mean square error mse is very similar to mean absolute error so tf.metrics.mean square error let's go back here and go mse equals tf dot metrics dot mean square or squared error y true equals the test labels and why pred equals our predictions our models predictions that is mse run ah what's going on here we should have one value ah you know what we didn't do we didn't remove the single dimension from our predictions so let's do tf dot squeeze we're gonna go
here and let's see what happens there we go okay now msa will be typically higher than mae because if we come back to the formula here because of this little square here so the errors are typically larger so the resulting singular metric will be larger generally than mae so if we come back when should we use it well mae is a fairly intuitive metric says on average how wrong are our predictions however when larger errors are more significant than smaller errors so for example if being 100 off is far worse than being only 10 off
you may want to pay more attention to the msc now you could try hoover but i'm going to leave that to your own extension for now since we're going to be running some modeling experiments in some upcoming videos it's probably a good idea to because we also functionized our way to visualize our predictions let's function eyes i mean this is pretty simple but let's make a a little function for mean absolute error and mean squared error so that we can use both of them going forward so let's uh right here make some functions to reuse
mae and mse def mae it's going to take in some y true labels and y thread labels and then it's just going to return tf metrics dot mean absolute error y true will equal y true and y thread will equal y pred nice and simple and then the same thing for mse why true why pred equals return tf.metrics dot mean squared error y true equals y true and y pred equals y thread beautiful now do we need to do that not necessarily however it's good to have some helper functions ready to go for when we
run some experiments so let's do that in the next video so for the past previous videos we've uh made some predictions with a trained model we've visualized them so compared our model's predictions to the test data set we've also evaluated our model's predictions with regression evaluation metrics such as mean absolute error and mean squared error now the next logical step you might be thinking is how do we get these error values lower so how do we minimize the difference between our model's predictions and the test labels so in other words get these red dots to
be close to the green dots and so that's what we're going to tackle in the next coupling videos more specifically we're going to be running experiments to improve our model remember our workflow that we've discussed in a previous video usually start by building a model fit it evaluate it tweak it fit it evaluate it tweak it again fit it again and then evaluate it again so on and so on so if we come back to our keynote if the machine learning explorer's motto is visualize visualize visualize in other words taking a look at our data
taking a look at our model taking a look at our models training we haven't done that but we will see that in a future video taking a look at our model's predictions and visualizing these wherever possible well the machine learning practitioners motto because we're we're an explorer and a practitioner is experiment experiment experiment and so that's what we're we're going to do we're going to try run a few a series of experiments to see if we can improve our model following this workflow we've already built a model we've already fit it we've already evaluated it
now it's time to tweak it a little we'll refit it we'll evaluate it again tweak it fit it evaluate it so let's see what this might look like in practice or if we remember back what are some ways that we can improve our model the top three are probably you'll see get more data so get more examples for your model to train on in other words more opportunities to learn patterns or relationships between features and labels number two would be make your model larger we've seen this briefly before so in other words using a more
complex model that's what you'll often hear larger models referred to as more complex so this might come in the form of more layers or more hidden units in each layer remember how we added some uh some extra layers to our models before we also increase the number of hidden units in each of the layers and number three is train for longer so give your model more of a chance to find patterns in the data so with these in mind how about since we've got our data set already we've got x train and y train this
is our data set how might we design we can't really do get more data unless we just artificially make our data sets bigger so we'll rule this one out but we can make our model larger so use a more complex model and we can train for longer so let's see let's design some experiments that we could do how about we do three experiments let's do three modeling experiments now what might you design how about model number one what could we do for this one so how about we do the same as the original model one
layer but trained for a hundred epochs number two could be model two this could be two layers trained for a hundred epochs and number three model three can be three layers or maybe we won't change that we'll try to keep it at two layers trained for 500 epochs so 500 chances to look at the data so see how we're we're just tweaking one parameter for each experiment so the first one is one layer trained for 100 epochs and the second one we're increasing the the number of layers but keeping the number of epochs train the
same and for the third one the difference is between the second model is the number of layers is the same however we're training it for longer this is sort of the mindset i want you to start getting in when you run your modeling experiments start with say a baseline model and then change one of the parameters for your next experiment then do the same for the next experiment and so on you might want to actually i'll issue you a little challenge is after you've seen us setting up these experiments maybe you can create your own
four and five so you can design them yourself that would actually be some great practice but let's get started so the first one we want to do is we'll create this we're going to go build model one so what did we say same as the original model one layer trained for a hundred epochs so i'm gonna set the random seed for as much reproducibility as we can get favorite number 42 answer to the universe so step one is create the model it's model one equals tf keros you could scroll up and see the first model
that we built or you could just take my word for it that it was just a singular layer here tf carers layers not dense not one there we go and what's step number two when we're creating a model we've created a model what do we have to do we have to compile it so let's compile the model remember i said you're going to get a lot of practice writing this sort of code because that's what i want i've said it before and i'll say it again i'd rather you learn concepts from writing code and from
looking at slides so the optimizer is going to be tf keras optimizers dot what do we what do we build the first model with oh look at that we get a few options here now there's actually a fair few optimizers built into tensorflow the main two ones that i regularly use is atom and sgd so you can go sgd i'll let you research the other available optimizers that tensorflow has we'll set the metrics to mae now if we go here fit the model model 1 dot fit we're going to go x train y train now
100 epochs that was the experiment design we were going to set up so should this work i think it should let's go shift and enter oh look at us already modeling experiments experts sorry so there's our first modeling experiment built now what might we want to do if we're running experiments what is a what does a scientist do well they track their results so how might we do that we've done it before we created some functions before what's the first one we might use how about we visualize or at least make some predictions with our
model and then we visualize them great idea make and plot predictions for model one so how do we make predictions with our trained model we can call or we'll set up the prediction variable we can call model one dot predict and we're going to make predictions remember on the test data because our model has never seen the test data and we're interested in how our model performs on data it hasn't seen before that's the the real value of machine learning plot we have a function plot predictions look at this look at us making our functions
so we can just easily call them back again predictions this is our doctrine here plots training data test data and compares predictions to ground truth labels now we hard coded some of these parameters so the only one we have to change is the predictions variable or predictions parameter so predictions is going to be why preds 1 because we just created that why preds 1. let's see what it looks like how did our model 1 go oh not too good so we can see there's quite a large difference here between our testing data and predictions remember
ideally these would line up here but that's that's all in the nature of experimenting we're going to try and improve it so what else could we do well we we also created some functions to calculate evaluation metrics thank you past us for creating such easy to use functions so create model 1 evaluation metrics so we want mae 1 for model 1 by the way equals a e y test and then we're going to pass it in y preds 1 beautiful and then we're also going to do mse 1 equals mse y test y preds 1.
let's see what this looks like ma1 mse one oh no what did we do wrong here we forgot to squeeze these variables tf squeeze to get them into the same shape now i guess what we could have done if we knew that all of our prediction tensors had to be squeezed remember what does this do if we visualize why preds one if you ever forget what something does just create a new code cell and run it there we go that's what it looks like on its own and then if we squeeze it y press one
it removes that one dimension so we have to actually turn this into a constant a tensorflow constant or tensor to see that shape so there we go as the predictions come out they have a one here and it's a first axis dimension but then if we squeeze it it removes that one dimension and it turns it into this tensor so shape 10 so we can get rid of that now and if we squeeze them we can run that beautiful so we see the error is quite substantially larger than what it was before this is the
mae so the mean absolute error so on average each dot is 18.74 away from where it should be that's the mean absolute error and if we square the errors we see the error is even larger now how about we change our mae and mse functions to automatically squeeze our predictions so let's go up here this is making our functions better so tf squeeze and go there and tf squeeze and we'll also go there shift and enter now if we come down here we shouldn't need to run tf squeeze here anymore get rid of the extra
bracket take note of the output here make sure it's the same there we go it is the same beautiful okay now that's model one done so what was our second model model two two layers trained for a hundred epochs now we didn't actually specify how many hidden units i'm going to let you decide how many hidden units you should put into this second layer so give that a try we'll come down here we'll go create a little note build model two and then we go we'll just set up so this is two layers trained for
a hundred epochs i better say two dense layers two dense layers also known as fully connected layers now our first model had one hidden unit and it's one layer but i'll let you decide if we're going to add a layer to model 2 so we'll set up model 2 equals something i'll let you decide how many hidden units there are and i'm going to make my decision on how many hidden units there are so give this a try before the next video building model 2 you could even calculate the model 2 evaluation metrics and plot
the results if you're game enough but otherwise we'll go through our second modeling experiment in the next video alrighty how'd you go did you set up model 2 did you run the experiments did it go better than model one hopefully these are these red dots were closer to the green dots but if not that's all right let's see how we can do it so what's our first step we might set the random seed tf random dot set seed for as much reproducibility as we can get so model two what did we say two dense layers
trained for 100 epochs all right so number one create the model tf carers sequential and then we're going to come down here so this one has two layers we have tf keras layers dense now how many hidden units did you decide if you built your own model how many do you decide i'm going to say 10 for this one remember this is really an arbitrary number we could set this to 1. we could set this to 10 if we set this to 100. i mean most often you'll find values like this like that are evened
out you'll rarely find values that are like 67 but you could try that if you want so i'm going to try 10. tf carers layers dense and one there we go and i'm also going to what do we do after we've created the model what's our next step i've given you a little hint here compile the model now we have to do model 2 dot compile it's going to be the exact same thing tf carers dot losses dot mae and then the optimizer is going to be tf keras optimizers dot sgd just exactly the same
before and the matrix is going to be m oh let's try mse you know just to mix things up because we're going to calculate our evaluation metrics anyway so the only thing we've changed from our model up here is we've added an extra layer everything else is the same except we're calculating the mse as its training it's important to note that this metrics here will be calculated during training and if you call say the evaluate method as we've done before on a trained model so we come down and what do we have to do once
we've compiled our model step three fit the model so model 2 we're going to fit on the same data the training data of course x train y train how many epochs are we going for 100 epochs equals 100 you ready let's run our second modeling experiment boom oh wow the mse starts fairly high remember mse is often much higher than mean absolute error because of the squared value so 1084 does it decrease by the end surely it does okay finishes on 608 let's see let's plot our we got to make some predictions so make and
plot predictions of model 2. so we'll call this one ypres 2 equals model 2 dot predict on the test data of course and now we're going to plot our predictions using our beautiful plot predictions function and again all we have to change is this one remember this is so helpful if we're going to be running the same code over and over again it uh it's helpful to functionalize it we'll go shift and enter what does this look like oh yes very nice so our red dots are a lot closer to the green dots beautiful come
up here look at this one we did before looks like mud compared to what we've got here so now we've got we've plotted some predictions now let's calculate some evaluation metrics so calculate model 2 evaluation metrics you might want to pause the video and and uh just go blaze right ahead and finish this video yourself but if you don't want to do that let's rock and roll here mae 2 equals our function from before y test and now y press 2. we don't have to squeeze them this time because we've implemented that into our mae
function and we can press shift command enter or we should have actually put a dock string of what's going on here but that's all right we can do that later on for completeness and then we want mse2 equals mse y test compare it to our predictions with the second model and mae2 mse2 let's check them out boom doesn't that look great oh much better than before 3.1 and 13. i think this is actually very similar to a model we've run before but let's keep pushing on we're now up to our third modeling experiment so build
model three now what was the experiment that we set up for model three we scroll back up lucky we took note of this um here we go model three two layers trained for 500 epochs okay so the only thing we're going to change between model 2 and model 3 we could actually just copy and paste this if we wanted to and then change 100 epochs to 500 so we're fitting it for longer however as always we're going to favor writing the code out again ourselves so build model 3 it is two layers trained for 500
epochs now you could sit here and watch me write this out or you could just skip ahead and write out the model 3 code yourself plot the predictions and evaluate it using evaluation metrics and i could just see you in the next video otherwise if you want to stay on board and we'll write this together let's do it so uh let's um set the random seed so tf random set seed and then we go what's step one create a model we are getting some prime experience here tf carers dot sequential open up our list we
come back what did we say two layers we're going to keep it exactly the same as model 2 so i believe it was a dense layer with 10 hidden units wonderful and then another dense layer with one hidden unit because that's going to be the output layer and then we're going to what do we have to do after we've created a model what's our step compile the model i mean by the end of this you're going to be like daniel you are a broken record but that's all right if you're writing tensorflow code like it's
nothing well then i've done my job carers losses.mae and we're going to go optimizer equals tf carers optimizers sgd wonderful i'm going to set up our metrics this time we'll revert back to mae just to spice things up oh too late oh well optimizers i know we've made a typo there classic and now we're going to fit the model so model 3 dot fit x train y train how many epochs are we doing it for what's our third experiment 500 oh no we did 5 000 far out that would have been interesting i mean you
could do that that's a challenge for model 4 maybe 5 000 epochs we'll see how this one performs first shift and enter there we go it's going to take a little bit longer but not too much longer because it's only a small data set but look at that we can uh get ready to make and plot some predictions oh what is it before we do that mae i would have expected that to be lower maybe if we were training for longer but again don't uh just trust the training metrics let's uh visualize our models predictions
and run our evaluation functions so let's go y threads 3 equals model 3 dot predict and we're going to do it on the test data set beautiful and then our wonderful plot predictions function we're going to set the predictions variable to be the predictions for our third modeling experiment this should be preds 3. let's have a look oh golly gosh that's even worse than the first model that's terrible far out what's happened here you know what i think our model has trained for too long we've totally missed the goldilocks zone of where we should train
it for so this is a prime example of how tweaking some hyper parameters of your model even ones that you intuitively think should result in a better result actually don't lead to a better result and we've actually just experienced here probably the first uh first time we've come across our model overfitting which is a very important concept in machine learning but we're not going to cover it now i'll leave that for a future video however if you want to look it up i would search overfitting essentially it means that the model has learned the training
data too well and it doesn't generalize very well at all to data it hasn't seen before so give that a look in your in your spare time as a little extension to this video but let's uh let's run our evaluation metrics for completeness so calculate model 3 evaluation metrics we'll do our mae 3 for model 3 is mae we'll compare the y test labels to the y preds for the third model wonderful and then we'll do the mse3 equals mse y test y preds 3 then we'll have a look at mae 3 and we'll also
have a look at mse3 whoa now this is substantially higher than our other models that we ran i think model 2 was actually the best one now what we might do is rather than just scroll back and forth comparing our models metrics like we could look at that one yep model 2 is the best let's uh let's put them into a more structured manner and so in the next video let's look at how we might comparing the results of our experiments so we've run a few experiments now let's compare the results so in our workflow
we've said before come back to the keynote machine learning practitioners motto is experiment experiment experiment but as you might have guessed any budding practitioner or any budding scientist doesn't just experiment relentlessly they do a few compare them see which one works and then they disregard the ones that don't work and move forward with the ones that do so let's see in the next video how we might compare the results of our experiments welcome back so we've run a few experiments now it's time to compare the results of our experiments to see what works and what
didn't now again we said we could just scroll back up and check the numbers but that's not really a structured way like we've only done three experiments and already it's getting tedious to just go back through and see what works and what didn't so the beautiful thing is is that we've saved all of our model results to different variables so to compare them oh actually wait i just want to put a little tidbit here something that i just thought of as i finished the last lecture is note you want to start with small experiments and
make sure they work so by small experiments i mean small models and then increase their scale when necessary because what's our motto as a machine learning practitioner experiment experiment experiment and it's hard to run a lot of experiments when you're trying really large experiments all the time so start small like we've done up here with our first model model one we started with one layer and then we started to increase the complexity as our experiments went on so just keep that in mind when you're running experiments start small build up add complexity when needed so
let's go down here and if that doesn't make sense for now we're going to see that a lot throughout the course it's going to be our theme it's start small build up when needed let's see how we can compare our results so how might we do so well how about we create a pandas data frame which is just a table so we can compare our results of a different model let's compare our model's results using a pandas data frame so the beautiful thing about collab is that it's got pandas in built uh import pandas as
pd and then we'll probably set up our model results as a list of lists so we can pass it to our so model one can be ma1 mse one and then model two can be mae2 mse2 and then model three model 3 can be ma3 mse3 beautiful and then to create the data frame how about we call it all results and then we'll go pd data frame and then the first thing we'll pass it to is our list of lists and model results and then we can set up the column names which we call the
columns when we have models m-a-e-m-s-e that's nice and easy so model mae mse beautiful and now what does this look like this is going to work it worked but that's a little hard to read you know what we can do here we can just get the numpy value of all of these so we go numpy maybe we could have uh built this into our functions up above but because we're already here we'll just do some uh some hacky code to get it looking nice but when you're writing your code you're going to make it beautifully
functionalized aren't you let's see this there we go that looks much better so from our experiments and comparing our results in this beautiful table i mean maybe if you've run a few more models down here you may have model 4 model 5 model 6 etc etc it looks like model 2 performed the best so what was our model 2 model 2 dot summary so it looks like it had two layers so one with 10 hidden neurons and the output layer with one hidden neuron and it was fit for we can come back up here we
can see it up here model 2 100 epochs all right so hmm what can we do with this you might be thinking comparing models is very tedious and it definitely can be because we've only compared three models here but this is part of what machine learning modeling is all about trying many different combinations of models and seeing which performs best and actually seeing which doesn't perform best so each model we ran is an experiment to figure out what doesn't work and i'm going to put a tidbit here is that or just right here looks like
model 2 performed the best wonderful so the tidbit the takeaway here is this is a follow-on from before so we said start with a small model and another one is one of your main goals should be to minimize the time between your experiments so that's why we start small and build up when needed because as we said before if you're having to wait like 10 minutes between each model experiment and sometimes you you definitely you will have to with a larger data set as we'll see in future videos but the more experiments you do the
more things you'll figure out now this is getting too big because it's not a markdown cell the more things you'll figure out which don't work and in turn when you know what doesn't work you will get closer to figuring out what does work takes a lot of trial and error remember the machine learning practitioners motto experiment experiment experiment by the end of this course you're going to have that tattooed on your brain as well as if in doubt run the code and as well as visualize visualize visualize and so another thing here is that what
you think might work such as with model 3 we increase the number of of epochs so the number of times model 3 could look at the data sometimes intuitively you'll think it will work we mentioned this in the previous video but it in fact turns out to be the exact opposite so look what happened with model 3. so that's why experiment experiment experiment is so valuable now you're probably wondering it's like daniel hey we saved all of our results here and we kind of set up a pandas data frame and we had to save all
of our evaluation metrics to different variables i mean if we're you say here we're probably going to run potentially dozens of modeling experiments per problem that we're working on it's going to end up very tedious and a lot of different stuff all over the place and you're exactly right well the good thing is is that the people like machine learning practitioners around the world have developed solutions and so i'm going to put here tracking your experiments these are a couple of little extensions for now but we are going to cover one of them in the
future so let's just put here one really good habit um in machine learning modeling as you may have seen is to track the results of your experiments and when doing so it can be tedious if you're running lots of experiments luckily there are tools to help us so the first one this is a resource we're going to put the little resource emoji here resource remember all of the resources exercises extensions etc in the course github as well as going to be links that there'll be links where you can find all of these things so resource
as you build more models you'll want to look into using so one of my favorite tools built straight in tensorflow is tensorboard so this is a component of the tensorflow library and again i'm just introducing the names of these things now but later on we're going to we're going to get hands-on so this is tensorboard is a component of the tensorboard library to help track modeling experiments a very important part of machine learning and we're going to we'll see this one later and another one of my favorite ones that links straight into tensorboard there's a
fair few of these on the market but this is the one i have the most experience with it's called weights and biases so a tool for tracking all kinds of machine learning experiments and the beautiful thing is plugs straight into tensorboard so if you're using tensorboard you can definitely use weights and biases so if we have a look we can search up tensorboard tensorflow.org tensorboard here we go tensorflow's visualization toolkit we're not going to dive too deep into this if you want to jump ahead and have a look at what tensorboard is capable of and
maybe even plug it into our models you can check it out otherwise the other one is weights and biases one of my favorite external tools for machine learning but this is since this is outside tensorflow we're not going to be covering weights and biases you can look into that as some extra curriculum if you want we will be seeing tensorboard later on but for now that's all we're going to cover for tracking our experiments in the next video if we've trained a model say model 2 and we wanted to use it say in one of
our applications what we might want to do is get it out of our colab notebook and somewhere else so let's look in the next video at saving our models so we've trained a few models and we found out that from our modeling experiments model 2 is performing the best so far so let's say we wanted to to save that model like right now it's uh it's just sitting in our jupiter collab notebook it exists as a python object here how would we save that and export it somewhere else so let's figure out how we might
do that i'll just put a note here saving our models allows us to use them outside of google colab or wherever they were trained such as in a web application or a mobile app so how might we save a model in tensorflow this is where i'd start if i wasn't sure i'd look it up there we go save and load models all right caution tensorflow models are code and it's important to be careful with untrusted code see using tensorflow securely for details i'll let you have a look at that but trust me we're not using
tensorflow maliciously here now we've got a few options here so defining a model they've created a function here to define a model model.summary here we go we could save checkpoints during training if we were training for a long time we're probably actually going to have a look at that later on in the course but where is the save model function what if we go save there we go model.save save the entire model as a saved model now in tensorflow there are two formats or two major formats of which you can save a model to the
first one is the saved model format so the save model format is another way to serialize models models saved in this format can be restored using tfkera's models.load model ah okay so that's i'm guessing once we've saved our model if it's in saved model format we can load it in using this method here and are compatible with tensorflow serving hmm we haven't seen that but if you wanted to look it up that's something you could look that up the saved model guide goes into detail about how to serve slash inspect the save model all right
well we've read up a little bit about it let's try it all right what's down here another format okay so it looks like there's two formats that we can save our models in but they're both using the save method and one has a dot h5 extension hd f5 standard all right let's try one at a time so how might we save a model or we'll put a note here there are two main formats we can save our models to number one the saved model format and number two the hd f5 format now i believe the
saved model format is the default actually i know it is because i have experience with this but let's just see so save a model using the saved model format now we can do so go model to dot save and then if we check out the dock string here saves the model to tensorflow saved model or single hdf5 format okay the save file includes the model architecture allowing to reinstantiate the model the model weights in other words the patterns our models learned the state of the optimizer ah allowing you to resume training exactly where you left
off that's handy alright so let's try it out so what was the parameter we have to pass in first file path or let's call it something simple you might want to get more creative with your model naming so i'm just going to call mine best model saved model format and shift and enter what happens here we get a whole bunch of warnings so i've looked at these warnings before and they come up a lot of the time when you're saving a model essentially what it is is some behind the scenes libraries that have been updated
and if you're just using the save format there you can safely ignore these you can also find what the warnings actually are in this documentation here but just for now trust me the save model format if these warnings come up it's all right uh unless you're getting some sort of failed the model hasn't saved correctly but in our case let's have a look at what happens we check the files tab google collab and here we go we get best model hello i'm not going to interrupt for very long but i just want to give you
a congratulatory message on making it seven hours seven minutes and seven seconds into this video all my screens have gone blank but i just wanted to remind you if you're enjoying what you're watching and coding along with if you want to sign up to the full version of the course because you've made it this far i have a little present for you if you go to the xero2mastery.io website and decide to sign up to the tensorflow deep learning course and use the code t f l w you'll get 15 percent off your sign up price
but keep this a secret will you because i want to keep this as a surprise for people who've made it this far through the video anyway that's all i wanted to say use the code t flow for 15 off and don't tell anyone else you can tell your friends but please don't leave this in a comment in the video because people like surprises anyway i'll see you later saved model format and in this file we get a few things we get assets okay so we got what's in here and what's in the variables and we
get savedmodel.pb now this dot pb format is called a protobuff file again you can find more on this in the save save and load documentation but to make sure that our model is saved correctly one of the best ways that we can do it is by loading it back in and checking it out but before we do that how about we save a model in the hd f5 format as well to test that out so let's search that hdf5 provides a basic save format using the hd f5 standard all right what's the hdf5 standard hierarchical
data format all right is a set of file formats hdf4 hdf4hd5 designed to store and organize large amounts of data okay well i'll let you know that going forward you're probably going to start training some fairly large models and sometimes you might need to store them in a universal data format so something like hd f5 so something that you can pass around to many other different programming applications and tensorflow allows us to save our models directly to dot h5 by adding the h5 extension onto the end of our file path so let's have a look
at that we're just going to run the exact same code up here except one difference save model using the hd f5 format now which one of these should you use well it's really going to depend on your use case if you're staying within the tensorflow environment so you want to just use your model with pure tensorflow code you're probably better off using the saved model format however if you're going to use your model outside of pure tensorflow code maybe hd f5 is is better off for you but we're getting a little bit ahead of ourselves
here just know that there are two main formats that you can save your saved models to the save model one is also the default so majority of use cases you'll probably be using the saved model format now let's see what happens if we run this see how we've just got the exact same code here but all we've done is we've changed it to have dot h5 on the end we'll run that beautiful now what happens ah there we go so you'll notice the main difference here as well is saving a model to the saved model
format we get it in we get a folder here and to the dot h5 format we get a single file now i mentioned before that a way that we can check to see if our models have saved correctly is by loading them in and testing them out again so much like we've evaluated our model 2 i'll just close this to get these evaluation metrics here if we load our model back in if the documentation is correct saying that it saved all of its weights and optimizes state when we load our model 2 back in from
its saved model format or the h5 format theoretically it should get the same results as before we saved it so in the next video let's see how we can both load a model in and re-evaluate it to make sure it's saved correctly welcome back in the last video we saw how we could save our models or a trained model and we saved them both to or model 2 which is the best performing model so far to both the saved model format and the hd f5 format so we've still got our models here in the files
tab of google colab now what if we wanted to load that model back in let's see how we do that loading in a saved model so if we come back to the documentation when we looked at load and save models there we go so new model equals tf carers models dot load model now i believe if we go load model there we go models saved in this format can be restored using the tf keros models dot load model method now the beautiful thing is is that with both formats in tensorflow the saved model format and
the hdf5 format we can use the same method to load them in so let's see how we do that um which one do we want to do first load in the saved model format model and let's do what can we call it loaded saved model format equals dot tf.models model and then we're going to pass it the file path of this folder here now in colab you can do this by copying the path so i'll just show you that there we can copy the path there turn that into a string now again in collab you're
going to get a little uh content in the front of it i believe it works regardless of whether you have content in the front or there or not so we could do it like that hopefully that works otherwise we can just revert back to what we want and load save model format how should we evaluate it oh we'll get a summary that should we do a summary let's see if this works beautiful now is this the same as our model 2 above we've got two layers dense with 10 hidden units and another one here with
one hidden unit if we go model 2 dot summary is that the same thing yes beautiful now to see if our loaded saved model format model far out we're going to say model a lot here aren't we is actually still the same as model 2 because right now we've the architectures we've confirmed are the same and these are the same here however in the documentation it told us that it's going to save the weights as well so in other words the patterns model 2 is learned so to really check that how about we make some
predictions with model 2 and the saved model loaded model and then compare those predictions to make sure they're making the same same predictions because if they are that means the patterns they've learned should still be the same so compare model 2 predictions with saved model format model predictions so this is kind of like we're writing a test to see if our models are the same thing our loaded model and our saved model and we're gonna go loaded save model format threads equals loaded save model format dot predict on x test and then to compare them
we could do let's just test them for equality how about we do that loaded save model format this should return true false false false false false false false what's happened here let's have a look at model 2 threads and loaded saved model format threads five two eight one three nine nine one hmm they seem pretty comparable to me what we might have to do is how about we calculate the mean absolute error of each other wise we might have to bring in a numpy function here so mae we want mae of y true equals y
test and y pred equals it's going to be model 2 preds and then we want to compare these two to m a e y true equals y test and y pred equals loaded saved model format threads so they should have around about the same error how does this go true ah beautiful so i'm not sure why their predictions are different you know why it might be it's because of the how a computer stores numbers so if we go model two threads have a look at this maybe we'll squeeze that so it's uh by the way
you can do dot squeeze on the end if you wanted to there's a numpy method coming in there i believe yeah there we go and then if we do loaded save model threads dot squeeze i believe these two should be very similar yeah so why aren't that if we copy this up here we're going against our rule of always writing code should this equal true true true true true true okay so i would say the reason why here we're not getting this correct is because dimensionality rise they aren't of the same shape is that why
but that doesn't make sense so this is this is i want you to to experience this this is me exploring uh an output that i didn't really expect while i was creating this video so i really want these videos to be as if like we were sitting side by side and coding and figuring things out together that doesn't work okay oh you know why because it's not preds what a simple typo did you catch that why didn't you tell me earlier this is a challenge we face when we're bringing in so many similar variable names
now this should work okay true true true true true all right all that extra code there we we wrote well the news is we didn't have to write it so i mean this is you're going to see these things happen a lot of the time and i'm glad you're you're watching me make mistakes on the fly because as you learn you're probably going to make a lot of mistakes too now we've loaded in a model using the save model format how about we try the h5 format so load in a model using the dot h5
format and we're going to go loaded h5 model can be tf carers models dot load model and then we're going to pass it in just the file path we can come over here copy the path if we wanted to command v or control v depending on if you're on windows i'm going to turn that into a string get rid of that now again you don't necessarily need content but just to check if it works so that's the loaded model and we'll get the loaded h5 model dot summary just to make sure it's the same thing
does this match our model 2 summary i believe it does model 2 dot summary so first things first that's nice and easy to check the architectures are the same now we're going to check it to make sure its predictions are the same as model 2 so if we go here check to see if loaded dot h5 model predictions match model 2 so we can go model 2 threads we've already got that variable but we'll just have some practice writing predict on x tests writing some more code it never hurts to write a little bit more
code and then we'll go loaded h5 model threads equals loaded h5 model dot predict on x test and then if we want to compare them we can remind ourselves to this time add preds on the end daniel equal equal loaded h5 model threads how do we look here true true true true true beauty full uh we could do the same again for working out the mae but i'm just going to if all the predictions are the same the mae should be the same as well so we can get rid of this extra one and maybe
move this one up to the loaded save model format we'll just put a note in here of what we did so we don't confuse ourselves when we come back to our notebook compare the mae of model 2 threads and loaded saved model threads so what we just did we repeated ourselves writing a code here if you wanted to sort of functionalize this going forward that could be one of the tests that you run so say for example you saved a model and then you loaded it back in you could write a little function to make
sure that the loaded model has the same prediction values or error rate or summary or something like that as your saved model that way you make sure you're using the same the model with the same learned patterns but otherwise in the next video we have our files here in colab how might we download them to our local machine if we wanted to use them outside of colab let's check that out welcome back so we've seen how we can both save a trained model and load it in within a google colab notebook and by the way
a lot of these methods will work in just a jupyter notebook but how might we download a model or any other file from google colab now this will be specifically this lecture specifically if you're using google colab if you're using a jupyter notebook often times your files will be on your local machine or on your server hosted somewhere else but this is specifically if you wanted to get a file off google colab now we've got a couple of ways so the first one number one or if you want to download your files from google collab
number one is you can go to the files tab and right click on the file you're after and click download so let's try that out so files tab this is this one here we have code snippets we have search we have table of contents far out look how much we've covered so far you should be really proud of yourself but we'll come into the files tab because that's what we're trying to do and say we wanted to download our best model hdf5 format dot h5 we could just go here and click download now depending on
the size of the file will depend how long this takes but now if we check my files downloads i've got the best model hd f5 format saved here in my download so if i wanted to import that into some sort of application or into another coding environment i've got it saved and trained here now the second option to download a file from google colab is we can use code see the cell below let's write some code to download from google code lab if we want to download a file from google collab we can import from
google.colab import files and then files allows us to files.download and then if we just pass we'll copy this one see if we can download it again copy path we pass in the file path of our dot h5 model shift and enter oh has no attribute downloaded type that in ron so don't type in downloaded like i did just type in download and here we go we're going to get a little loading bar and there we go we've got another copy of it so we'll check the downloads ah there we go we've got the same file
because we've i've got a little one here because we downloaded it twice but that is a quick and easy way if you wanted to download your files from google collab remember depending on the size of the file and your internet connection may it may take a little while to download so just be patient the other way is oh maybe a third way is if you wanted to while we're here you can save it to google drive by connecting google drive and copying it there so see second code cell below so let's see how we might
do that i'm going to mount my google drive so this is going to if you have a google account which you kind of require to sign into google collab it's going to mount your google drive which is online cloud storage hosted by google we're going to wait for this to load and when it loads we should see our drive file up here in the files tab beautiful so this is my google drive here if i open that up i've got my drive and i've got a few other things that i'm working on there wonderful so
if i wanted to save it here copy path or actually copy path there and let's save a file from google colab to google drive this requires mounting google drive so we can do bang cp for copy so we're going to copy this file here h5 so that stands for copy this file at this path to which path do we want to go to i believe i might have a tensorflow course i'm going to copy that path now you might have to create a file in your google drive called tensorflow course you might be able to
go new folder and create something like that but mine's called tensorflow course it's already existed so there we go copy best model hdf5 format dot h5 to content drive my drive tensorflow course let's see if this works out all right that looks like it went through pretty quickly because it was a relatively small file so let's go ls and then just the path here to see if it's copied across oh best model hdm5 format dot h5 and a little spoiler alert for what's to come but i'm not going to talk about that for now we're
going to see that in upcoming videos so there's three ways there if you wanted to download a model or any other file from google collab you can right click and click download you can download it with code or if you wanted to save it to your google drive and access it later you can use the copy method to save it across there in your your target folder alrighty so we've been it through a fair few of the fundamentals in the next video or in the next series of videos actually let's have a look at how
we might tackle a larger example and i'm not going to say any more i'm going to wait until you're in the next video to show you what's gonna happen i'll see you there we've said it before and i'll say it again we've covered a lot of ground so far i mean we've seen the fundamentals of building neural network regression models in tensorflow so you should be proud of yourself look at all the subheadings we've covered give yourself a pat on the back but now it's time to step it up a notch and build a model
for a more feature-rich data set so have a look at our data look at what we've been playing with x train and y train just a couple of tenses with single numbers trying to predict single numbers but in practice you're probably going to be dealing with a little bit more of a complex data set so what data set are we looking at well we're going to have a look at the publicly available medical cost data set available from kaggle and hosted on github so let's find it so medical cost data set here we go medical
cost personal data sets from kaggle now kaggle is a website which is basically an incredibly great place to compete in data science competitions find different data sets see example notebooks and also learn more about data science and machine learning in general so you're going to become very familiar with kaggle over your data science and machine learning explorations so we're not going to go through it for now but just just know that if we want example data sets kaggle is probably one of the best places to find them now if we have a look at this
so medical cost personal data sets insurance forecasts by using linear regression let's have a read of what's going on so context so machine learning with r so r is another programming language that can be used for numerical computing just like python by brett lance is a book that provides an introduction to machine learning using r okay so there's a long story short here is that this is a data set that's available online and we've got a bunch of columns here age sex bmi children smoker region charges now what we're trying to do here is use
these columns so age through to region to predict what someone's individual medical costs built by health insurance will be so we're using these features here to predict a number so it's a regression problem i believe this example has used linear regression but we're going to build a neural network regression model so the data set is publicly available on github here wonderful so there's a fair few data sets here we're looking specifically at insurance.csv now i'll show you a little trick that we can use to import this straight from github to our google colab notebook if
we go raw there we go raw.github user content now if you can't find this link i'll put in the resources section for you we're going to copy this link come back to our notebook i'm going to get out of this and the first thing we're going to do is actually import the required libraries for this larger example we've already imported these but it's good practice just to if we were starting from scratch import tensorflow as tf import pandas as pd wonderful and we're going to also need matplotlib in case we want to do some plotting
those libraries are pretty standard so now because we've got the insurance data set copied to our clipboard read in the insurance data set let's see what it looks like we'll call it insurance equals pd dot read csv and then the beautiful thing is we can just paste a link in here and the read csv function will read it directly from here all of these values so there's a columns age sex bmi children smoker region charges we can import directly let's have a look insurance all right so 1338 rows times seven columns yes now we're talking
this is a little bit more complex than the problem we've been working on so far so what do we have here we have age sex bmi children smoker region and then charges this is an amount so this is how much someone's medical cost or medical bills were medical insurance was based on their age sex bmi number of children are they a smoker yes or no and whereabouts do they live south west northwest etc i'm not sure is it from a city anyway doesn't matter you can do some research i'll put the the links to where
you can find this data set essentially what we're focused on here is writing tensorflow code to take in these features learn the relationships between them or more so the relationships between these features and this target variable here charges so what is a regression problem relating it back to our wikipedia definition in statistical modeling regression analysis is a set of statistical processes for estimating the relationships between a dependent variable often called the outcome variable and one or more independent variables often called predictors covariates or features so in our case what is our dependent variable our dependent
variable is the charges because this is what we're trying to predict and what are our independent variables in other words known as i like the term features so that's what you'll hear me use a lot our independent variables are these columns here age sex bmi children smoker region so what do we have to do what is our first step in getting our data ready to pass into our machine or neural network models can we just start building a model model equals tf carers sequential i mean we could we could keep that going but what do
our machine learning models like can we pass in this column sex if it reads female or male what's it going to do so if i go insurance what type is this column oh no insurance data type object hmm what about smoker data type object what's the difference between smoker and insurance age n64 ah so we have some columns here that are numerical and some columns that aren't numerical do you remember what we have to do to non-numerical columns before we can pass them to a deep neural network or a machine learning model we have to
turn them into numbers right if we come back to our regression inputs and outputs so we're trying to predict the price of homes the sale price of homes based on the features here bedrooms bathrooms garages before we can pass it to our machine learning algorithms we have to create a numerical encoding which is often referred to as input features so that's what we're going to have to do to these variables here now one style we've we've covered one style of numerical encoding and it's the style we're going to use for this problem it's called one
hot encoding so if we go what is one hot encoding probably one of the simplest methods to turn categorical variables into numerical variables so if we go here is there a diagram maybe we just go to images there we go nice and simple so one hot in coding if we have the food name apple chicken broccoli for one hot encoding apple gets a one and zero for everything else for chicken chicken gets a one and zero for broccoli and apple for broccoli broccoli gets a one and a zero for chicken and apple so that's what
we want to do because we're working with categorical variables here we're going to create a column that is maybe sex underscore female and if the value of this sample is female we want to put a 1 there and if it's male we want to put a 0 for the sex underscore female column now we could do this manually but that's going to take a whole bunch of work so what i prefer to do is use if we go to pandas data frame we can use pandas get dummies function let's see what this is or better
yet if you didn't know what this function was called how to one hot encode a pandas data frame what do we get how can i one hot encode in python 100 coding a feature on a pandas data frame examples what do we get from this web page again i'm not sure what this web page is going to show us so this is kind of running the gauntlet here all right there we go with one hot encoding a categorical feature becomes an array whose size is the number of possible choices for that features so we create
a data frame of country different countries here wonderful ah they use get dummies pandas provides a very useful get dummy so once we'd seen that oh gives a great example there beautiful we could go to the get dummies page but we already have that open get dummies convert categorical variable into dummy indicator variables which is a i'm not sure why they use dummy indicator variables i i just treat this as another name for one hot encoding come down we see some examples with pandas there we go all right so let's just run some code and
see what happens rather than diving into documentation who who needs the docs when you can just run hundreds of different little mini experiments and figure things out for yourself um no that's not to say that documentation is actually very valuable so get used to reading documentation but also get used to writing lots of code so let's see what happens if we run this data frame object has no oh we can't run it like that ha ha now i'm eating my own words here we have to run out like this pd dot get dummies then pass
at our data frame come on daniel pd.getdummies try one hot encode our data frame so it's all numbers and what do we call our data frame insurance oh yes look at that okay that's what we want so we have age bmi children charges and now for the columns that were categorical columns such as sex we have sex female so see here this first sample is has a value of female so it gets a one for the sex underscore female column and then for the sex underscore male column gets a zero and then wha were they
a smoker yes they were a smoker or maybe why that's why their charges are so high so they get a yes sorry a one for the smoker yes column and where were they from they were from the southwest so they should have a one in the southwest region wonderful okay that looks good now we might actually save this to a variable insurance underscore one hot and then insurance one hot and not ahead excellent all right so now we have a few more feature columns so because charges is still the column we want to predict so
we want to use all of these columns age bmi children sex female sex male sex no oh sorry smoker no smoker yes region northeast et cetera we want to combine all of those and learn the relationship between what charges equals so we'll probably end this video here before it gets too long but if you want to have a sort of here's a bit of a challenge too if you want to go ahead maybe we'll in the next video create x and y values so features and labels have a go at creating the x and y
values then we'll need to create a training and test set and then we can build a neural network so here's your challenge if you want to go ahead create training and test set and then build a neural network it'll be sort of like model 2 above but it'll probably have it might have different input and output shapes or we'll see but that's your challenge if you want to go ahead otherwise in the next video we'll start to tackle these three things how'd you go did you take on the challenge did you create x and y
values features and labels did you make a training and test set and did you perhaps even build a neural network sort of like model to above to find the relationship between our features and target variable if not that's perfectly fine if you did wow that's that's incredible work otherwise uh either way we're going to go through each of these three steps in this video so let's uh split these off i think the command is um command m minus yeah there we go if you want to split a cell so command or control m minus in
google colab so we have three different things here this is what we have to do all right create x and y values features and labels let's do this x equals now what did we say before are our features basically every column except charges so let's um how can we do that we can drop a column so insurance one hot dot drop and we're going to drop charges on the first axis i believe that's how we do it if not we can find out in a second and then y is going to be just this column
so let's make sure we can get that insurance one hot just charges wonderful okay now let's uh view x x dot head first five rows wonderful hbmi children etc etc got a fair few columns there and let's view why to make sure remember always visualize visualize visualize beautiful there's the charges there which are all float 64s now these are in a format of uh right now they're in a format of a pandas data frame however we're going to see very shortly that we can turn them into formats capable to be used with our neural networks
that's actually probably going to happen automatically because we're about to create training and test sets now previously we'd indexed our data sets going something like the first 40 examples of training examples however there's a much better function to create a training and test set if you've got a feature matrix and a label vector so we're going to use scikit-learn train test split this is a very popular function model selection train test split remember creating a training and test set probably one of the most important things in machine learning that's why this function is so beautiful
so split arrays or matrices into random train and test subsets wonderful that's exactly what we want oh no we don't need to copy that we can just import sklearn.model selection train test split let's see how we can use this from sk learn dot model selection import train test split so if there's one function from scikit learn you want to remember is this one very very popular so we go here x train because it's so useful x test y train y test equals train test split and then we want to can we get the dock string
or is it not imported yet not imported yet that's all right i know you need an input but we're not up to that yet here we go what's the dock string split arrays of matrices into random train and test subsets okay so we want what does it take arrays sequence of indexable with same length or shape zero allowed input to list numpy arrays scipy sparse matrices or pandas data frames that's us that's what we have beautiful test size if float should be between 0.1 and 1.0 and represent the proportion of the data set to include
in the test split so before we made an example right up the top right back up here in the three sets we used an 80 20 split meaning 80 training data 20 testing data that's a very common split very useful split for our problem so let's keep it at that and i just want to see what's the example here this is where i got this line of code by the way okay train test split it passes x y test size equals they've used a 33 percent test split and they've set random state 42 to make
sure that the data is split in the same way each time so otherwise if you don't set the random state the split will be random so let's go in here i'm going to pass x y oh i've got caps lock on goodness scratches me test size equals 0.2 and i'm going to set the random state they've set 42 so we'll set 42 too all right and let's see what happens here we'll go len x just to see the length of x over all and then length of x train and length of x test what's going
to happen here b e a beautiful look at that it's created our train and test split so this is how many were originally in x and because we've set the test size to 0.2 our test data set has 20 of the samples so if we were to go 0.2 times 1 338 what does it give us okay it's going to be around it's been rounded up all right nice and fair and it's also random so if we go x train what does that look like ah all right all these indexes are randomly shuffled beautiful that's
what we're after so let's get out of that now how might we build a neural network to take in x train and y train and learn the relationships between the two so the hint is it's sort of like model two above so if we go uh we get another code cell here model two dot summary what does model two look like all right so we've got two layers very familiar with this model too how about we just recreate it what should we do tf set the random seed first random set seed 42 nice now step
one is create a model how do we do that we go model or actually we call it insurance model there we go tf pretend we're working for a large insurance company and we've been asked to model this data set so that when someone when sally smith comes to buy insurance of us we know if she tells us how old she is where she's from if she's a smoker or not we know how much we should charge her based on our other customers that's what we're doing now there we go now what do we do always
after we create a model step two is we compile it compile the model and here it's called insurance model dot compile what's our loss function going to be tf carers dot losses what have we been using mae mean absolute error we can keep it like that we could even change it to mse if we were filming a little adventurous but since we're going to stay with a similar experience to what we've been running we'll keep it as mae optimizer equals tf carers optimizer dot the old faithful stochastic gradient descent sgd and the metrics we're going
to put here is mean absolute error beautiful and now number three is we fit the model now i said before we were going to format our data into a way that it's going to work with tensorflow so right now we're writing tensorflow code here but up here this is pandas code pandas scikit learn and now tensorflow i wonder if this will work do you reckon it'll work if in doubt run the code you ready we're about to build a neural network model on want that or really because we're fitting on the training data it's just
over a thousand rows of data which is a big increase in what we've been working with so far so don't hold your breath three two one is it going to work what did we mess up i knew there'd be an error ah spelt insurance wrong i want an anti-climax all right three two one and again here we go here we go now all the typos come out all right third time lucky three two one i'm done oh you typed in all right no countdowns anymore just fit the model boom look at that all right now
this one takes a slightly little bit longer because we're working with more rows so we come down here this does our mae decrease as we're going up it starts off around 8600 and it finishes around 7100 okay so a fairly good decrease and we're just going to get rid of this cell here wonderful so what do we do now we just train a model ah and yeah by the way we didn't even have to reformat this into tensors the reason being is because pandas is built on top of numpy so really when we see it
like a pandas data frame this is actually a big numpy array and when we pass that to tensorflow it automatically knows how to deal with numpy arrays because we've seen tensors work with numpy arrays before so what's next after we've trained a model what should we do should we evaluate it on the test data great idea let's do that so check the results of the insurance model on the test data so insurance how can we do that we can use the evaluate method evaluate on x test y test let's see how it performs all right
okay so on the test data set it's performing even just a slightly little bit better than on the training data set that's pretty cool so now what might we want to do what is this error telling us actually mean absolute error it means that on average our model is wrong by about seven thousand now is that number large is that significant compared to the other values in our data set let's have a look at our training variables y train whoa so if our model is off by 7000 what's the the median the middle number of
our target variables well so the median variable or actually what's the mean as well thirteen thousand three hundred nine thousand five hundred so right now our model is pretty substantially wrong because i mean look at this the average value of y train so the average insurance cost is thirteen thousand three hundred so and our model is on average off by seven thousand that's pretty significant considering the total amount is only thirteen thousand three hundred well that's the mean but the median it's even worse it's only nine and a half thousand so if we're off by
7000 there we might be charging someone 2 000 when they should be being charged 9500 so naturally what might we want to do if you guessed improve our model you'd be 100 correct so let's uh in the next video right now it looks like our model isn't performing too well let's try and improve it so we're going to start to run some experiments to improve our model but before i do again i'm going to issue you another challenge if you want to go right ahead we've already run some experiments to improve our models how about
you try running some very similar experiments on the model we just built here maybe you add an extra layer maybe you change the optimizer maybe you train for longer give it a play around and see how you go but otherwise we'll we'll try a few experiments in the next couple of videos how'd your modeling improvement go did you get a better mae score was it lower hope yours did better than the one we first built but if not that's all right we're gonna see if we can improve it now so what should we do first
when we did it up here we did some experiments so the first thing we did was okay we added an extra layer okay let's try that we'll come down here a larger example first experiment might write this down actually so to improve our model or actually we'll put try to try improve our model we'll run let's do two experiments so one can be add an extra layer with more hidden units and then two can be train for longer and then i'll leave the rest up to your imagination insert your own experiment here you can put
to to use what we've learned so how about we recreate the model we'll try the first one so add an extra layer with more hidden units so set random seed and then tf random set seed wonderful now number one create the model so tf carers oh we need to go insurance insurance what do we call before just insurance model right insurance model 2 equals tf care is sequential wonderful and then we should go tf keras layers maybe the first one here has a hundred now then tf carers layers dense 10 and then tf carers layers
dense one is that an increase what was our first yeah we only had one layer with 10 there so this model has an extra layer with an extra 100 hidden units so if we come down here if we go there two is compile the model and we want to go insurance model two dot compile i'm going to go loss equals tf carers losses dot mae what's the optimizer df keras optimizers one day i'll get this right sgd beautiful and then the metrics can be mae as well and now let's go number three is fit the
model is insurance model 2 dot fit x train y train epochs equals 100 all right you ready this time we're going to set verbose equal to 2 so we don't have a massive output for both sorry equal to zero what did we get wrong here again oh metrics equals mae all right we finished there again we get some warnings i was just saying that our data types are maybe not optimized but the beautiful thing is that tensorflow fixes those for us and now we can go evaluate the larger model so insurance model 2 don't evaluate
hopefully because we've added more hidden units and an extra layer we've given our model more potential to learn so or that's at least in theory did it work oh what do we get wrong here hmm maybe we should output the training huh dance 100 insurance model 2 insurance model all right now we definitely have some troubleshooting to do what have we got x train y train should be the same maybe our model is too complex to learn anything so we might have built a two bigger model hmm what's another lever that we can tune here
we could remove this layer let's see if we just remove that layer does this work okay so it works there hmm this is some good troubleshooting now what if we keep that layer on and try again we get losses nan if you ever see your loss or mae as nan it means that there's probably something wrong with your model so this is some good troubleshooting now let's think about this we've got to create the model our first experiment we put down here that we wanted to add an extra layer with more hidden units yes but
it looks like our model may be too complex for our data set so that it's not even it's so large that it's our data set is not large enough to to teach it anything so what we might try to change is we haven't looked at anything in number two now can we alter the learning rate so the learning rate is 0.01 how about we try old faithful adam so there's a few different optimizers here sgd and adam are probably the most popular if sgd doesn't work try adam let's see what happens look at that yes
oh did that's so far so good that looks like it's doing better than the previous model all right evaluate the larger model ho ho look at that now we've just if we go insurance model where's the first one dot evaluate x test y test what were the metrics from this one mae holy goodness we've just decreased it by two thousand so about thirty odd percent around about thirty odd percent decrease in error rate by tweaking two little things we added an extra layer and we changed the optimizers again this might not always work but that's
just one of the one of the levers that you can turn on your models to improve them or at least try to improve them we put that up here on purpose to try and prove our model so i'm going to put here we actually modified this experiment add an extra layer and use the atom optimizer so this one we're going to train for longer same as above but train for longer maybe 200 epochs 500 probably too many well who knows if in doubt run the code experiment okay so that's insurance model 2 let's run insurance
model 3 see if we can improve on this number here how do we create this so set random seed so tf random set seed 42 beautiful now number one is create the model same as above so we'll go insurance model 3 equals tf carers sequential go right to the end here tf keras layers the first layer we had there was a dense layer with a hundred hidden units and now we have tf carers.layers and dense 10 and then tf carers layers dense one got to make this insurance company happy you know what step number two
compile the model beautiful insurance model three dot compile loss equals tf carers losses.mae that deleted itself out of nowhere the optimizer is remember what optimizer are we using now we're using adam optimizers come on daniel you can remember that and then the metric is mae wonderful and now let's have a look number three is fit the model insurance model three dot fit x train y train epochs equals 200. we might actually put this history equals that you'll see what this means in a second so you ready let's run this three two one all right we're
past 100 epochs oh yes the mie is going down beautiful all right so if we go now let's evaluate evaluate our third model dot evaluate do these results on the training data set translate to the test data set because that's what we're really concerned about eval u8 that is what's up we've decreased our mae to three and a half thousand i believe we are now so if we look at back to our first insurance model insurance model evaluate x test y test we just halved our error rate in about five minutes how cool is that
now there's a few more levers that we could try but there's also a little thing that i want to show you and we saved history equals insurance model 3 dot fit what is this history variable well this is something we're also going to get very familiar as we go so plot history also known as a loss curve or a training curve or a training curve so how we can do this if we go pd dot data frame equals history so history has a history parameter saved to it or attribute sorry and then if we go
dot plot inside of pandas data frame that is and then we're going to set the y label to loss and then the x label to epochs and shift and enter look at that one of the most beautiful sights you will ever see in machine learning and deep learning is a loss curve decreasing so why is this so beautiful it's because if we look at where this history variable originates from we instantiated it when we started to fit our model to the training data set specifically insurance model 3. now if we scroll right back to the
top of the epoch chain here all the way up to 0 out of 200 look at this the loss starts at a massive 13 273 so the mae when our model first started to learn it was its average prediction was off by 13 273 which is basically the the average of the whole data set but then we scroll down as our model trains as it learns its loss goes down its mae goes down and we finish off with a loss of 36 3600 and because our loss function is mae we finish with an ma of
3600 as well and this is what this curve is reflecting so when we're training our neural networks generally we want our loss curve to go down because that means that the predictions our model is making are becoming less and less wrong so it actually looks like our model's loss would probably keep decreasing maybe not as extensively as it has here if we kept it training for longer so that's probably a little extension that i'll leave for you to try out for yourself train the insurance model 3 for a few more epochs maybe another 100 or
200 or so and see how low the loss will go but you're probably asking after seeing this how long should you train for well that's a great question let's put that there actually we'll do the question emoji question how long should you train for so if you're asking that i'm going to hit you with an answer that's going to say it depends remember how machine learning and deep learning is very experimental well how long should you train for is i mean it really it actually really depends really it depends on the problem you're working on
however however many people have asked this question before so tensorflow has a solution it's called the early stopping callback and we're going to use this in a later video but i just want to introduce it now so it seems familiar so tensorflow early stopping callback here we go and essentially what this is is this i'm going to set the link here early stopping callback i'll let you read what the documentation says but i'll just tell you it's called the early stopping callback which is a tensorflow component you can add to your model to stop training
once it stops improving a certain metric so in the case of our insurance model 3 if we wanted to say train it for a thousand epochs we could set the early stopping callback to say hey i'm gonna set you off train for actually for an unlimited amount of epochs but once your loss stops decreasing for say three five or ten epochs in a row which means your our model stopped improving stop training all right does that make sense now a little extension you could do would be to look at the early stopping callback in tensorflow
and see how you could implement it for the code that we've written here all right but otherwise i think this video is getting a little bit too long so we've covered a few modeling experiments here what we're going to look at next is another way of pre-processing data specifically normalization and standardization so again if you want two pieces of extra curriculum before the next video read up on the early stopping callback and search for what is normalization and standardization and think about how it might relate to our training data but anyway i'll see you in
the next video welcome back so in the previous few videos we've been putting together all of the different concepts that we've learned so far and they've been a fair few we started working on a larger example specifically trying to predict the cost of someone's medical insurance based on features such as their sex bmi where they lived and smoking status now we've got one more thing we're going to cover before we move on to the next section and it involves pre-processing data specifically to do with normalization and standardization now you may have heard of these terms
before if not that's perfectly fine we're going to go through them you might be thinking hey daniel why are we going back to pre-processing data i thought we already had our data ready we turned it into tensors we'd one-hot encode it et cetera and you'd be 100 correct but there is one step we can do to improve the pre-processing of our data so if we come back to our keynote here this is our steps in modeling and you might have noticed if you hadn't looked at this in a while we've actually done each of these
steps throughout this entire section we've gotten our data ready we've turned it into tensors we've built a tensorflow model we fit the model to the data and we've made predictions on the data we've evaluated our model through both visualization and using evaluation metrics we've improved our model through experimentation and we've saved and reloaded our trained model but now we're going back to step number one possibly the most important step in this entire pipeline getting our data ready turning it into tensors so we've turned all our data into numbers remember neural networks and actually most machine
learning models can't handle strings they need everything to be in numerical form number two we've made sure all of our tensors are the right shape because as we've seen before if our tenses are the wrong shape we run into a whole bunch of different issues and number three we could do or we haven't done yet is to scale features so normalize or standardize now neural networks tend to prefer normalization again these are just words to you at the moment but we're going to see what both of these are in a minute so let's go back
to our notebook again if you wanted to figure out what normalization and standardization are what could you do here's what i would do let's go what is normalization in machine learning if we zoom in here so we've got here normalization is a technique often applied as part of data preparation for machine learning the goal of normalization is to change the values of numeric columns in the data set to a common scale ah without distorting differences in the range of values all right so let's have a look at our data set x train or maybe just
x as a whole what does x look like again so we see here we've got age bmi children but do you see here how age is on a different scale to what bmi is so if we go x what do we want we want age dot plot so see how these values are all over the place that's probably not the best plot maybe we can go hist we want a hist hmm oh we need the kind word there we go okay so here's the distribution of our age column now what about our bmi column we
want plot kind equals hist is like this so what if we wanted to get these both on a similar scale so see this goes from 20 to 60 this goes from 15 to 50 and then children i'm assuming goes from something if we go x children dot value counts what's the max number of children value counts so there's a lot of people zero children and there's a few people with five children so what if we wanted to get these so 15 to 50 00 to 5 and 20 to 60 what if we wanted to get
all of them say between zero and one well that's what normalization does if we come back up here the goal of normalization is to change the values of numeric columns in the data set to a common scale okay beautiful now let's come back to the keynote feature scaling so we have some options here now the scaling type of scale a lot of different names are used for this but i'm going to be referring to it as normalization but if you hear scale features typically they're referring to normalization and what it does converts all values to
between zero and one whilst preserving the original distribution the distribution of our data is just the spread so here we can see that the most values occur what's this this is bmi at about 30 and as we get towards the tails here the values start to spread out whereas the distribution of age is pretty pretty fair along all the values maybe most people in our sample are about 20 years old so we come back here now if we wanted to do this we can do it with scikit-learn using the min max scalar and when should
we use it this is often used as the default scalar with neural networks and it's the one we're going to get hands-on with another very common scaling type is standardization what it does is it removes the mean and divides each value by the standard deviation you can use the standard scalar scikit-learn function and when to use this one maybe you want to transform a feature column to have a close to normal distribution but remember if we scale our features or standardize our features this reduces the effects of outliers so if we come back this is
a normal distribution let's have a look normal distribution what does a normal distribution look like all right there also known as a gaussian so we come here if we were to if we were to change this age column to be a normal distribution what would happen it would reduce the effects of this outlier value here so see how this column if we reduce them all to be the shape of that the tails we might lose out there now what are we going to focus on we're going to focus on normalization so if we go here
in terms of scaling values neural networks tend to prefer normalization now how might we do this well actually let me just put a little tidbit here if you're not sure on which to use you could try both and see which performs better now one more resource i want to show you before we actually get hands-on in code is jeff hale's article on scale standardize or normalize with scikit-learn so this is a phenomenal article i'll make sure it's in the resources section for you but if you want to read through this for a bit more information
on what scaling is what's what the different kinds are why you should do it so here we go here's a great reason many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and or close to normally distributed examples of such algorithm families include neural networks which is what we're building but we're going to see this we i don't want you again to just continually just learn from from reading things i want you to write code to see see it for yourself so let's let's jump into that enough talking
daniel all right what's this first step what should we do we might import or start again we'll start fresh so import pandas as pd because i want you to be able to just come to this uh section in the notebook later on and just run it straight from here then we want to import pi plot matplotlib.lib.pyplot plt and then import tensorflow as tf wonderful and then we'll read in the insurance data frame so we go here insurance equals pd dot read csv we'll have to come back up and copy the url just the same url
we used as before there we go we're going to copy this one just to reinstantiate this data frame that's all we're doing so that we can start from scratch and see what pre-processing our data or normalizing our data does here we go we'll make sure there's a it's a string and then we'll check insurance beautiful all right so we've discussed the concept of pre-processing data normalization and standardization we come back to the keynote we've discussed the names of them but now it's time we're going to get hands-on we've reinstantiated our insurance data frame in the
next video we're going to go through normalize the numerical features and we'll also prepare the the non-numerical features as well and then we'll fit our neural network so i'll see you there welcome back in the last video we discussed the concept of pre-processing data in terms of normalization and standardization now we're going to get hands-on normalizing the numerical features of our insurance data frame here and then we're going to build a neural network to learn on the features once they've been normalized so let's see this in practice alrighty how can we do this so to
do this we're actually going to borrow a few classes from scikit-learn so what i'll do is i'll code it up and talk through it as we go you follow along as best we can and then we'll discuss it once we've gone through it so first of all we're going to need or actually to prepare our data we can borrow a few classes from scikit learn wonderful okay so from sk learn dot compose we're going to need make column transformer remember you know how to look up the dock string of these things now so you can
see what these mean if you're not sure but you could probably guess import we're also going to need min max scalar and i'm not pulling this name out of the air one hot encoder if we come back to our keynote scale or normalization we can use the min max scalar in scikit learn so we come back now what we're going to do is create a column transformer this is very helpful so why do you think it might be called column transformer well if you've guessed because we have columns here and we need to transform them
in some way before passing them to our neural network you'd be correct but let's see it in action let's call it just ct for column transformer we'll run make column transformer and we might just run this cell first so that we can see what the dot string is for make column transformer here we go construct a column transformer from the given transformers and we come here transformer transformer estimator columns string where's an example example here we go so make column transformer standard scalar oh where have we seen that one before that's this standard scalar and
what else was there one hot encoder alright so we're actually going to need both of those but instead of standard scalar we're going to use min max scalar which is transform features by scaling each feature to a given range so the default is between 0 and 1 which is what we want with normalization so let's see this in action we're going to pass it the columns that or the column names that we need to normalize so let's put a note here turn all values in these columns between 0 and 1. now how did we know
this it's because when we look at our data when we explore our data we've got age here we've got bmi and we've got children these are the numerical columns we don't need to scale this our target variable but we might want to scale our feature variables and we know for the sex smoker and region columns we're going to one hot encode them just as we've done before so let's come down here and now we need to make a one hot encoder and we're going to set this parameter to the handle unknown equals ignore so this
just means if there's any columns that the one-hot encoder doesn't know about just ignore them and then we're going to pass it the columns we want to one hot in code smoker and region wonderful so have all the brackets correct yes there we go beautiful now we've got our column transformer ready so as we pass our data through this it's going to get min max scaled on these columns and one hot encoded on these columns so now let's create our x and y values because remember we just reimported our data frame as afresh so x
is going to be insurance dot drop charges and then we'll do that on the first axis y will be what are we trying to predict we're trying to predict charges beautiful no typos there and now what do we want to do next we've got our x and y but what do we want our model to learn on remember our three data sets we want to train our model on some training data and evaluate it on some data it hasn't seen in other words test data so the beautiful thing here is that we can use scikit
learns oh we should put this up here from sk learn dot model selection import train test split so now let's uh build our train and test sets so we want x train x test y train y test equals train test split x y test size equals 20 of a test data set and the random state we're going to set to 42 so that the split happens exactly the same as you can scroll up before as we did before otherwise it'll be random and we'll get different results so now we're going to fit the column transformer
to our training data so the important thing here is that whenever you have some sort of column transformer you want to fit it to your training data and then use that fit column transformer to transform your test data because otherwise if you do that separately remember the test data is data the model has never seen before so it's basically data from the future so if we're transforming our training data set with information from the test data set it's like taking knowledge from the future and altering the data that we have now so let's go here
ct dot fit on x train now we want to transform training and test data with normalization min max scalar and one hot encoder so let's go x train normal for normalized equals ct dot transform so now we're taking what we've learned from the training data and we're transforming it we're normalizing the features and one hot encoding the features that we've defined up here and now we've got x test normal equals ct transform x test beautiful so now we've been through a fair few steps here but when we break it down we've just created a column
transformer with the minimax scalar and the one hot encoder we've turned our data into features and labels we've split our data into training and test sets we've fit the column transformer to the training data only and then we've transformed our training and test data with normalization and one hot encoding let's see what oh we got a typo here beautiful that runs without errors now we've normalized our data and one hot encoded it let's check out what it looks like what does our data look like now so we want to go x train dot lock let's
look at the first one oh sorry we want to go oh well that's what it originally was so then if we go x train normal dot lock oh no maybe we just want the first one alrighty so here's what we started with ages 19 sex is female bmi 27.9 children smoker region now we've got value here i'm guessing that's the age this must be the bmi this must be the children and now all of these other values are one or zero beautiful so is that the same for another sample and how about another sample wonderful
what does the whole thing look like ah it's all in numerical format now what does that mean well if you guessed we can pass this to our neural network as all encoded data you'd be 100 correct so let's check the shapes one more thing oh we'll keep that sample there and then we'll check the shapes of our data now how has our shapes changed so x change shape and x train normal dot shape okay so you see our x train value had one two three four five six six different columns here well now since we've
normalized it as well as one and hot encoded it we've actually added some extra columns here so we've got 1 2 3 4 5 6 7 8 9 10 11. all righty so it looks like that our data is ready to build and pass to a neural network model so what we might do is actually i'll write a text cell we'll write beautiful because it is beautiful beautiful our data has been normalized and one hot encoded now let's build a neural network model on it and see how it goes so the challenge for you is
to now build a neural network model to fit on our normalized data so you probably want to build one similar to the insurance model we built before just in the same way we've been doing it build the model compile the model fit the model and then evaluate it so fit the model on the training data set and then evaluate it on the test data set so if you want to give that a try go right ahead but otherwise we'll take care of that in the next video welcome back did you give it a shot did
you try building a neural network to fit on our normalized data if not that's perfectly fine that's what we're going to do in this video so let's get started we'll set the random seed so we can have as much reproducibility as as possible let's set seed 42 and now we're going to step one is create the model now if we if you scroll back up you'll see that our best performing model is or maybe you just remember was insurance model 2 now if we get a summary of what that was insurance model 2 dot summary
we're going to just reproduce the same model oh another typo but this time we're going to call it so there we go three layers one dance of 100 units one dense of 10 units and the output layer of one unit so let's reproduce this and we'll make it insurance model 4 equals tf carers sequential and we'll come right to the start we'll go tf keras layers we'll add the first hidden layer to our model with a hundred hidden units tf carers layers dense 10 and tf cares layers dense one there we go we reproduced that
now what's the second step when we're creating our models step two is to compile the model so we'll take insurance model four dot compile and the loss function is going to be tfcareslosses.mae and the optimizer is going to be tf cara's optimizers we're going to use atom because that's the same optimizer we use for model 2 and then we can go metrics equals mae beautiful now number three here's where the different part is going to be so insurance model 2 we fit it on if you scroll up you can check it out on unnormalized data
so we just fit it on x train only with one hot encoding but not normalized so to change the experiment here's where our experiment we put on our experimenter hat we when we're running these experiments we're just changing one small thing at a time so the thing that we're changing in this experiment insurance model 4 is the data we're using everything else remains the same x train normal because we want to see but it can still fit on the same labels and we want to fit for the same amount of epochs so we can still
see oh sorry we only change one little parameter here so we can see how that influences our model now you're ready and go beautiful training is going to be nice and quick all right okay mae 3636 now we're going to evaluate our insurance model trained on normalized data so we got insurance model 4 dot evaluate let's get it on now we have to evaluate it on the same type of data it was trained on so that's an important point too is because we trained it on normalized data we have to evaluate it on normalized data
shift and enter insurance keep typing n before a there we go how did it go okay three four three eight now if we come back up to our insurance model two which was the best performing model so far where was that so there's insurance model three all right so that didn't do too well on the test data set now what we're after is insurance model two oh look at that evaluating the larger model so if we copy this let's copy this so we can compare our other model if we were doing this probably we might
have saved the results of insurance model 2 to a variable so we didn't have to go up and copy this let's make a code cell so this is insurance model 2 insurance model 2 results look at that what are reductions that's 5000 mae just by normalizing our data we've gone from 5000 mae to three and a half thousand mae that's incredible that's a reduction of like 30 percent or so so now i hope you're starting to see the benefits of all of the different hyper parameters we can tune with our model so not only can
we tweak how we construct our models how we compile them how we fit them we can also tweak the data that we pass it to them so if we come back and look at our keynote this is what we've done we've done the scaling type all we did was we convert the values or the target features that we wanted to convert to be between 0 and 1. that's all we changed and we did that with about 10 lines of code here and those 10 lines of code saw a reduction in error of about 30 percent
how cool is that so that's that's the experimenter mentality that i want you to start developing throughout the rest of this course is that the first result you get when you build a neural network model is often is actually it's definitely not your last result because there are lots of different things that we can tweak we can tweak the model we can tweak the data we're going to have a lot more experience with this going forward but that's one of the main benefits of normalization is we see a faster convergence time what that means is
that our model gets to a better result i mean if we trained insurance model 2 for longer say for 200 epochs it might have reached this error right here but when we normalize our features our models tend to converge faster in other words they get better results faster in less epochs so again normalizing data doesn't guarantee improved results but it's something worth trying because of how easy it is to implement and in fact in a lot of our pre-processing code that we do going forward normalization will be built in such as when we're dealing with
images alrighty so i think we have covered well and truly enough for neural network regression with tensorflow have a look at this look how far you've come you should be incredibly plowed we started by creating data to view and fit we dealt with input and output shapes we went through our steps in modeling with tensorflow if we come back this big boy here look at that we went through all of these different steps how cool is that now we improved our models we evaluated our models through visualization and evaluation metrics we've run a whole bunch
of experiments we've loaded and saved our models and we've even gone through a larger example from an open source data set so give yourself a pat in the back i've set up some exercises and extra curriculum activities to go along if you want to test your skills even further i really encourage you to try out the exercises before looking at the solutions that come along with them but otherwise go back through all the concepts if there was something you didn't really understand practice it write a bunch more code get a few things wrong see what
works see what doesn't and i will see you in the next section neural network classification with tensorflow now before we get into anything what we're going to cover or any code at all i need to stress something and that's where can you get help i'm going to be writing a bunch of code so or we're going to be better yet we're going to be writing code together so make sure you follow along with the code if you can remember our motto if in doubt run the code don't forget to try it for yourself if you
need the docstring which is a bunch of information about the functions that we're going to be using if you're using google codelab you can press shift command plus space or if you're on windows this might be control if you're using jupyter notebook it's probably going to be shift and tab if you're still stuck you can search for whatever problem you're having and we're going to get a lot of practice searching for our problems as well because that's what i i want to stress as much as we can teach in this course you're still going to
have to develop the skill of being out of search for answers to your own problems now if you do search for it you'll probably come across resources such as stack overflow or probably more importantly the tensorflow documentation since we're going to be writing a lot of tensorflow code and then after you've searched for it try again rewrite the code if you have to look at the examples in the tensorflow documentation and copy them out by hand and try them in your own notebook and then finally this is probably the most important out of all of
these steps is if you're still stuck ask a question don't forget the discord chat and that means including those dumb questions that you think you have all right the person who asked the most dumb questions gets smartest the fastest oh and actually i want to add one more to this list and that's the course github so if we come to here github this is mr d burke that's me tensorflow deep learning this will have all the materials related to the course you're currently going through i'll leave a link at the moment it's still a work
in progress but by the time you go through this video it'll have a bunch more stuff and everything you need now every notebook and concept we go through has a ground truth notebook so for this one zero two neural network classification and tensorflow this is the the information we're going to go through in the videos we're going to be writing all of this code out however the notebooks here so this one in the github have a lot more text around the code that we're writing so if you want a more text-based explanation to go along
with the videos and along with the code so say in the video we just write this code if you want all this annotation around it be sure to check out the tensorflow deep learning github and that'll have all the information you need so without any further ado let's get back to our keynote and since we're covering neural network classification you're probably asking yourself what is a classification problem well let's have a look at some example classification problems the first one is binary classification is it is something one thing or another now an example of this
would be asking the question is this email spam or not spam that's actually probably a trend you'll see in binary classification is this thing something or not something so here we've got clearly a nice email to my email there danielle mrdburk.com hey daniel this deep learning course is incredible i can't wait to use what i've learned oh thank you so much that's a beautiful email i definitely want to see that in my inbox so we might train a machine learning model or a neural network to classify this text as not spam and then if i
have this email here which is again to my email danielle mrdburk.com hey daniel uh congratulations oh that's a one a bit of a typo there you win a very large sum of money hmm as much as i'd like that large sum of money that's probably spam and i don't want to see that in my inbox so that's binary classification now another very common classification problem is multi-class classification this is where you'll have a question such as is this a photo of sushi steak or pizza so we have a photo of sushi we have a photo
of a beautiful delicious steak and then a delicious pizza now multi-class as you might have guessed is more than one thing or another now we've got three classes here sushi steak pizza but in multi-class classification we could have a hundred different classes or a thousand different classes but the principle would still remain is this a photo of so we had a hundred different foods uh maybe we added over here um a sandwich so the principle would still rain there's multiple different things that these photos could be of so multi-class classification not only relates to different
photos of food it could be different photos of cars could be different photos of animals anything you could imagine and finally another very common classification problem would be if you are asking what tags should this article have so here we have the deep learning article from wikipedia now maybe you want to if you're wikipedia you want to sort your articles because you have millions of them into different categories and so just labeling it with one category such as deep learning may not be enough you want to relate it to other things such as machine learning
or representation learning or artificial intelligence so in this case multi-label classification now these two are going to be pretty confusing to begin with you just need to to figure out a couple of pro example problems to sort of uh start to understand them but i know when i first started deep learning i often got multi-class classification and multi-label classification mixed up but the easiest way or the way i remember multi-label classification is that there's multiple label options per sample that means that this one sample this one article here could have three different tags or it
could have five different tags whereas in these photo examples in multi-class classification we only want one label per sample so we have one photo and the one label would be sushi one photo one label would be steak and so on for pizza whereas in multi-label classification we have one sample and multiple labels so now that we've had a look at some of the most common classification problems you're going to come up up against binary classification multi-class classification and multi-label classification let's have a look at the things we're going to cover or this specifically the things
we're going to write code for so here's what we're going to cover broadly now i say broadly because these are just subheadings what really matters is what we actually write code for so we're going to look at the architecture of a neural network classification model we're going to see the input shapes and output shapes of a classification model in other words the features and labels we're going to look at creating custom data to view and fit we're going to go through the steps in modeling such as creating a model compiling a model fitting a model
and evaluating a model we're going to see what different classification evaluation methods we have and then finally once we've trained some models we're going to look at how we can save them and load them and how we're going to do that well we've seen this before but we'll see it again we're going to be cooks not chemists cooks are very experimental they try a whole bunch of different things and they might measure something a little bit but they're also going to just if a recipe isn't going exactly how they want it well they're going to
improvise so that's what we're going to be doing we're going to be cooking up lots of code so now we've covered some example classification problems and what we're going to cover in the next video let's check out what the inputs and outputs of a classification model might look like so let's say we're working on a classification problem and we saw this before the idea of taking some photos and then perhaps we're building an app that we want to be able to take a photo of whatever food we're eating and classify it as sushi steak or
pizza that's the app that we're building what might our inputs and outputs look like well we've kind of already got them here our inputs are going to be some kind of image of food and then our ideal output something over here will be if we took a photo of this one we want it to be sushi and if we took a photo of this steak we want it to be steak and then if we took a photo of pizza pass it through our app the ideal output would be pizza now what might this look like
in practice so all right well here's our little scenario here this is our machine learning algorithm that we might build or deep learning algorithm and we've we've already discussed what our inputs are and what our ideal outputs are but what's the what's the rule that we have to before we can pass our inputs to our machine learning algorithm or better yet what does our machine learning algorithm like to look at can it understand photos straight away like if we just took a photo and just said hey machine learning algorithm tell us what's in this photo
will that be able to do that well if you remember from a previous section what do we have to do to our inputs before we pass them to our machine learning algorithms or maybe this is a little inkling so if you're not sure that's perfectly fine but what this is is so if we have a photo let's say we already know what the the width is and what the height is so it's 224 by 224 so this is a square image now what does that sort of what are we moving towards we're starting to numerically
encode what an image is so we're starting to break it down from being just an image to being okay this is a 224 by 224 image all right we know a little bit more information about our image so we know the width we know the height oh what's c we might know the color channels so that's three so c equals color channels red green and blue so we might know that some combination of red green and blue pixels in 224 by 224 some combination of those different color values are going to give us this image
of sushi and the same thing for steak and the same thing for pizza now these width and height are arbitrary but very often you'll see you'll change your numerical inputs of a machine learning model to have all of the same width and all the same height when you're dealing with images so what might we do before we pass these inputs directly to our machine learning algorithm well we're going to turn them into a tensor in other words a numerical encoding now here we've got normalized pixel values so that means we've changed them to be whatever
they were in terms of red green and blue to be some value between zero and one we're going to see this very hands-on in upcoming videos so don't worry if this isn't making sense the the concept here is that we have images and we have to turn them into numbers in some way shape or form before we can pass them to our machine learning algorithm and then our machine learning algorithm often already exists if we're using something like transfer learning if not we can build our own custom one and then we pass it through the
machine learning algorithm is going to learn patterns in the data and then create some prediction outputs so let's see here what have we got okay we've got another tensor and it seems that the highest value here is highlighted for sushi all right but we get 0.00 for steak and 0.03 for pizza is that the first image so that would be correct okay that's great but then this second image the one of the steak the highest value it predicted sushi and then these two values are lower um for steak and pizza so that would be wrong
this one okay so one out of two not too bad and then this final one here say this is for the pizza image same process again turn it into numbers pass it to our machine learning algorithm and then it's got a predicted output here of 0.03 0.07 for steak and then 0.09 for pizza so the highest value is for pizza and so that would be correct wonderful now how does it generate these well if we have a look what are the actual outputs that we were looking for in our case we're building an application to
classify different images of food we take a photo of food we run it through our machine learning algorithm and then the ideal output is for this photo is sushi this photo is steak and this photo is pizza so if we wanted our machine learning algorithm to be able to create predicted outputs like this perhaps we can show it lots of different examples of images and their ideal output ah okay so this is the premise of neural network or machine learning classification this is our inputs over here we numerically encode them pass them to our machine
learning algorithm and then have some sort of output all right so the example we've seen here is for multi-class classification because there's one or more each image or each sample can be one or another class however this premise of taking some sort of samples numerically encoding them passing them to a machine learning algorithm and then have something some sort of output goes for all different types of classification inputs and outputs so now we're we're familiar with the high level overview of what our classification inputs and outputs will be let's have a look at what the
shape of these tensors would be because that's another very important point as we'll see in practice later on but before we do so i want you to have a think about if we've got these inputs what might the the shape of our input tensor be and if we've got these outputs or these ones here what might the output shape of our output tensor look like have a think about that and we'll cover that in the next video welcome back in the last video we checked out an example multi-class classification problem or in other words the
inputs and outputs of what that type of problem may look like so we had some images here food images and we wanted to take a photo of them and classify them as what's in that image so in our case we had sushi steak and pizza we learned that no matter what classification inputs we have we have to numerically encode them in some way so for an image we grab the pixel values so we learned that the width was 224 the height was 224 and there was three different color channels red green and blue and we
figured out that we can turn those into a tensor and pass that to our machine learning algorithm it's going to figure out some patterns in these input tensor or in the input images and then create some predicted output that's based on looking at lots of actual samples of image to label pairs so knowing that our inputs are in the form of a tensor and the outputs are in the form of a tensor let's now have a look at what the input and output shapes of those tensors are which is a very important concept because we're
working in tensorflow all of our data is going to be contained within tensors inputs and outputs of our neural networks of our deep learning models of our machine learning models are all intenses and if the wrong shape well we run into errors so this is for an image classification example we're continuing off with what we just saw we have our little setup here inputs machine learning algorithm outputs we have our image of food in this case it's just the sushi one and we know that the the width is 224 and the height is 224 and
there's three color channels red green and blue so what we might do is numerically encode this pass it as an input to our machine learning algorithm and it's going to output some kind of prediction probability we haven't covered this but we will see this in code later on this is just more so the concept now the reason why there are three options here is because we are building a multi-class classification image classifier where the output could be sushi steak or pizza and in this case our machine learning algorithm has got this one correct because the
highest value is for sushi and that's what the input image is now let's check out what or how our image would get represented as a tensor so here we go these might be the dimensions of our input tensor batch size something we haven't covered yet but that's all right we'll see that coming up with height color channels we have covered these so the width here is 224 uh the height here is 224 and the color channels would be three for red green and blue now the shape of this tensor will be for batch size might
be none two twenty four two twenty four three do you see how these the width value goes into here the height value goes into here and the color channels go to here or we might have a shape of 32 for the batch size 224 for the width 224 for the height and 3 for the color channels now i've set 32 here because 32 is a very common batch size all you need to know for now is that often times when we're training a machine learning algorithm depending on how much data we're working with and depending
on the size of the computing chip that we're working with it may only have enough memory to look at 32 samples at one time so say we're working with 10 000 images which is very common in image classification our machine learning algorithm may only look at 32 images at a time so that it doesn't run out of memory and again the reason why it's none or 32 32 is very common i think it's actually the default batch size in tensorflow but none is because this could be an arbitrary number and it's important here that the
width of this image isn't set at 224 and the height isn't set at 224 but again these are very common values you're going to come across so that might be what our input tensor shape looks like for a single image or if we have a batch of 32 images it might be of this shape and then if we have a look at our output the shape of this might be three because we're dealing with multi-class classification and we have three potential classes so that's the shape of this output tensor now of course this is for
our image classification example however these shapes will vary depending on the problem you're working on so say what if we had 10 different types of food here what do you think the output shape would be that might be 10 and say this image was 300 by 300 what might change here well the width and the height might change to 300 and 300 so we're going to get very hands-on with this going forward but this is just uh to demonstrate conceptually what we have to do for our classification problems to to work on them with deep
learning and machine learning algorithms we have to take our data numerically encode it into some way which is typically represented as a tensor and because we're using tensorflow it will always be a tensor and then the same with the output depending on what outputs we have we have to define them as some sort of tensor and they will typically have a shape dealing with the number of classes if you're working on multi-class classification that is the output shape will have the same number or will be the same number as the number of classes we have
but again this is just a conceptual overview it'll start to make a bit more sense once we start to build classification neural networks and speaking of which let's have a look at that in the next video what the typical architecture of a classification neural network looks like welcome back in the last video we had a look at what the input and output shapes of an image classification example might look like and remember this is specifically for an image classification example these may change depending on the problem you're working on so say we're working on text
classification your input shape may differ to be the number of words in a string so it may be 32 and then 100 for 100 words and it won't have these dimensions over here it'll just be one number for each word in that string and then the same over here say we would just wanted to classify an email as spam or not spam we may have an output shape of two again lots of practice coming up but just keep in mind one of the big things for classification problems is defining the input output shapes of our
machine learning or deep learning algorithms so now speaking of those let's have a look at what the architecture of a classification model might look like we've got a little spoiler alert here here's some tensorflow code we've been very familiar with this in uh in the regression section however we might notice a few different things oh we've got input layer all right now we've got activation we haven't seen much of that we've got another activation here softmax all right what else is different we've got loss okay so we're familiar with these steps one two three four
the main differences you might see here is the loss function and the input shape and the activation hmm we haven't been hands on with those just yet but throughout the classification section as you might have guessed we definitely are going to be so now it's important to note that this is a typical architecture of a classification model and we're going to building lots of these so although i'm giving you sort of a guideline as to what these architectures may look like they definitely vary depending on the problem we're working on however this is pretty universal
and again this is adapted from page 295 of the hands-on machine learning with scikit-learn carers in tensorflow book by aurelion garon highly recommended that book it'll be in the extra curriculum section of this course so let's take a look at what's going on here we have hyper parameters and remember a hyper parameter is something that we can adjust ourselves as developers and we might have a binary classification problem and a multi-class classification problem now note here we haven't got multi-label classification however multi-label classification architecture is often very similar to a multi-class classification so let's start
with the input layer shape if we go here and all of these colors are going to activate for me automatically aren't they but that's all right we'll step through it one by one input layer shape we have here tf carers input so here's where we're defining the input layer shape do you remember in just the previous example we looked at before it might be the same number of features eg 5 for age sex height weight smoking status in a heart disease classification problem but we looked at an image classification problem so here we might have
the input shape of our of a particular image might be width height number of color channels so that's what our image looks like in tensorform again this shape will change depending on the problem you're working on now for multi-class classification it's going to be the same thing as binary class classification now hidden layers again this is plural on purpose we've got one here and this is problem specific what you set this to the minimum is going to be one the maximum is unlimited so i think we're actually going to be working with a neural network
with over 100 layers that might be in this section or a later on section but this one just has one hidden layer there and for multi-class classification we can look at the same value here can be one or up to unlimited neurons per hidden layer that's this green value here problem specific generally 10 to 100 again this can wildly vary depending on what you're working on in our case we've set a 100 here to the hidden layer and for multi-class classification we've got same as binary classification now the output layer shape for binary classification it's
going to be one so remember binary classification is is something one thing or another is this email spam or not spam so we want the output layer shape to be one and multi-class classification is going to be one per class eg 3 for food person or dog photo if we were working on an image classification problem to decide whether a photo was of food of a person or of a dog and that's the value we've put here three so in our case this input shape might actually work for this problem here oh sorry this output
shape but the input shape would also work because it could potentially if these images so the images of food of of a person or a dog were 224 pixels wide 224 pixels high and had three different color channels once we turned them into tensors of course now we have here the hidden activation so this is the hidden layers so the activation function that we have within the hidden layer so it's often defined with the activation parameter now it's usually relu which is rectified linear unit we haven't seen that before but if you want to go
ahead and do a little bit of research before we get into coding feel free to look up relu and again same for multi-class classification we're going to use a very now again there are multiple different hidden activation functions but i say usually relu because that's what you're going to see very often and that's what we're going to be using throughout the course now output activation this is the activation function in the output layer again a concept we haven't really covered so far but i'm just making you aware of what the names of these things are
so for binary classification we're probably going to be using a sigmoid activation for our final layer and for a multi-class classification problem we're probably going to be using a softmax activation which is what we've got here all right so this this uh architecture looks like it's for multi-class classification now loss function across the board we're going to be using cross-entropy which is a loss function for classification problems now for binary classification we're going to be using binary course entropy in tensorflow and for multi-class classification we're going to be using categorical cross-entropy in tensorflow so it
looks like here when we compile our model and define the loss function we've used tf keras losses categorical cross entropy which makes sense because we discussed before that this is a multi-class classification architecture and as for the optimizer we're going to use old faithful stochastic gradient descent or another old faithful is adam so actually i think we might test out both of these but i think we're we're probably going to be sticking with adam because adam is safe and and then we go here multi-class classification same as binary classification all right so that's a typical
architecture of a classification model again a lot of things here that we haven't necessarily covered but we are going to be getting hands-on building a lot of these and i think that's what we've got here yeah we're going to be building lots of these bad boys and the reason you might be wondering why i said adam is safe i probably i want to reveal that so let's go here andre is another external resource neural network recipe i'll put this in the extra curriculum a recipe for training neural network so this is by andrei capathi if
we go here adam a few tips and tricks for this stage adam is safe in the early stages of setting baselines i like to use adam he's talking about the atom optimizer with a learning rate of this i mean learning rate adam's learning rate is actually very good in tensorflow it's default one in my experience atom is much more forgiving to hyper parameters including a bad learning rate for convnets a well-tuned stochastic gradient descent will almost always slightly outperform atom so again a lot of things here you might not be aware of but just know
andre capathy is saying adam is safe and if you're wondering who andre capathy is let's go to about see my website i am the senior director of ai at tesla so andre knows his stuff about deep neural nets so if andre says adam is safe we're going to trust him but anyway actually let's be skeptics let's not trust andre and let's test it out for ourselves so in the upcoming videos we're going to get oh actually i had a little demo of what was happening here so here's our inputs they get encoded into tensors of
this shape and then here's our outputs so that completes that slide but with that being said we've talked enough let's code now that we've looked at what a classification problem is what some tensor input and output shapes might look like let's get hands on and start to code some classification code i'm going to start a new notebook at colab wait for this to load up beautiful now i'm going to call this one o2 and neural network classification with tensorflow and i'm going to add the video tag so that you know once you see this notebook
in the github that this notebook was made during the recording of videos remember we've got in the github uh the ground truth version of this notebook doesn't have the video tag on the end here but we'll get out of that so let's ride here introduction to neural network classification with tensorflow so in this notebook we're going to learn how to write neural networks for classification problems and remember what's the definition of a classification problem a classification problem is where you try to classify something as one thing or another and there's a few types of classification
you've got binary classification multi-class classification and multi-label classification and we've got here a few types of problems now we'll turn this into markdown i'm going to press command mm you might press control mm if you're on windows so we've got a little intro here now the first thing we might have to do is before we can write any code is we need some data so how about we write a title here creating data to view and fit and there's a lot of toy classification data sets out there but i always like uh creating our own
and getting a model to work on our simple data set before moving on to an actual problem so it's like a rehearsal experiment before our actual experiments uh how about we to make some data we can go from sklearn dot data sets import make circles and then we'll go here we want to make a thousand examples so we want a thousand different samples of data and we can create circles oops and samples equals a thousand and then if we should go down to here create circles x y equals make circles and samples again you could
change this ten thousand if you wanted to or a hundred i'm going to stick with a thousand that's a good number and samples noise can be 0.03 and random state can be 42 if you want to re make this reproducible so we have the same data now you might be wondering daniel we've never used mate circles before and you'd be 100 correct this is the first time we've used this in the whole course but how do you get help on something if you weren't sure we want to press command shift space def make circles so
make a large circle containing a smaller circle in 2d a simple toy data set to visualize clustering and classification algorithms beautiful so if we really wanted to to figure out what this is is how to make example classification data with sk loan there we go an introduction to machine learning with scikit learn examples so we've got a few options there i've just chosen that's that's basically what i did when i found out about this make circles function and there's probably a way to do that in tensorflow i'm not quite sure if it's as quick as
this one but if you want to give it a try once you once you see what this data looks like feel free so now let's check out the features which is under the x name space usually a capital x okay so we have an array here and let's check out the labels check the labels all right ones and zeros so we've got two options here i'm guessing one of these samples here so this is a sample has this label number one and another sample further along here say this one has this label zero we might
just make this a bit smaller i'll check the labels that's what we want there so if we have two label options and we're working on a classification problem which one of these three are we working on is it binary is it multi-class or is it multi-label well it's not multi-label because each one of these has one label so each sample has one label and it's not multi-class because there's only one or zero so this is a binary class classification problem but if we're looking at our data like this i mean this is i could look
at this i've used this before so i kind of know what this means but if you're looking at this for the first time it it doesn't really it might not make sense so what's our data explorers motto come back to the keynote the machine learning or even the machine learning explorers motto visualize visualize visualize what's the first thing we should visualize the data we could visualize the model the training predictions and it's a good idea to visualize these as often as possible so let's see how we might visualize our data let's write ourselves a little
note our data is a little hard to understand right now let's visualize it so to get it into a structured format i'm going to import pandas as pd and then i'm going to turn it into a data frame called circles pd dot data frame and i'm going to label it um x0 for this element here or this this sample here and that can be x all of the items in the zeroth axis and then x1 can be x we want all of the items in the first axis there we go and then we want the
label label column to be y there we go and let's look at our circles data frame typo there of course circles beautiful okay so now this is a little bit easier to understand as it is we've got x0 x1 so we have two features per label and so this coordinate 0.754246 and eight 0231481 has the label of one and so on and so on times a thousand because we set n samples to a thousand now this is still a little hard to understand i wonder if we can visualize this with a plot let's try that
out visualize with a plot so how might we do that import matplotlib i like to visualize as much as possible before i start writing neural network code dot scatter we want a scatter plot scatter plot's a very good plot and we're going to just plot this and the color can equal y and the c map can equal this is just the color layout i want it just to be red yellow blue i think that should do let's have a look oh look at that now so seeing this if we read the doc string of make
circles has it done what it says it it's going to do make a large circle containing a smaller circle in 2d i think it's done that we have a large circle and a smaller circle now from this plot can you tell what type of model we're going to build it's okay if you can't but i just just guess like what would be what are we trying to do here what would we try to do i'm giving you a little hint here by running my pointer in between the two circles how about we build one to
classify red or blue dot so in other words a model we want our model to to potentially draw a line right through the middle of these two so if we were trying to predict on another 100 rows and we had values like this would they be a zero or a one would they be red or blue now i want you to think about before we we go ahead this is just a conceptual thing what is the difference between the data we're looking at here and the data we've looked at in our regression notebook so if
we start a new tab um what is a regression problem and then if we went to images what's the difference between this data here and this data here so have a think about that just look at this like google regression problem check out the images and just have a think what's the major difference between our two types of data here and one more thing before pushing forward so that's your challenge just to compare these two types of data i want you to have a look at tensorflow playground oh dot org not dot com here we
go tinker with a neural network right here in your browser don't worry you can't break it i promise so again we've got a little few options here over what data set we want to use and what's this one look like i mean we've got a few watch them change over here we've got this one what's the similarities between this data set and this data set so your two exercises before the next video is to compare this one here you might have already done that because we've talked about it enough and then the other one is
to spend 10 minutes at playground.tensorflow.org playing around with all the different parameters here try changing the learning rate the activation the regularization doesn't matter if you're not sure what these things are press play change the number of hidden layers here change the number of neurons and see what happens with this data set here so give that a try and i'll see you back here in the next video how'd you go did you play around with the neural network playground i've just added a little uh exercise here with this hammer and spanner emoji if you ever
see that throughout the course that's an exercise so before pushing forward spend 10 minutes playing around with the playground tensorflow.org building and running different neural networks see what happens when you change the different hyper parameters so if you haven't done that yet give that a try and then continue the video but otherwise we've created some data we've visualized it now what's another important point is the input and output shapes of our neural network so let's let's uh inspect our data so check the shapes of our features and labels so what's our features shape and our
labels shape wonderful so we have a thousand samples of x and a thousand samples of y and x has a shape of two because there's two samples whereas y is just because there's one output these are scalars so they don't have a second dimension here they're just one value all right so if we wanted to check how many samples we're working with is just len x len y and if we want to view the first example of features and labels so again we're just becoming one with the data here we're really just familiarizing ourselves with
what we're trying to do we've already seen a few of these things but just for completeness we're putting this here so okay we've got two here's what we're trying to do we're trying to take this point feed it to our neural network and generate an output something like this and let's say can we get another one with a zero label are we going to get a zero label nope i believe the fifth one may have been no all right so some of these values here for x have a label of zero that's all you have
to know for that now we've checked the input and output shapes what might be the steps or the next steps so steps in modeling we've practiced this with our regression problem so now that we know the input and output shapes how might we build a neural network to classify whether something is a blue dot or a red dot so how about we go back to our keynote here and step some modeling with tensorflow we reveal our beautiful colorful picture step one is get data ready turn it into tensors well we kind of have that except
they're in numpy arrays at the moment but remember numpy arrays with work beautifully with tensorflow so we'll say that we've finished this one now we have to build or pick a pre-trained model to suit our problem well in our case we're going to build a model so we have step one create a model specified to your problem we might have to define the input shape here or sometimes keras layers can automatically infer the input shape then we might have a hidden layer and an output layer what would our output layer shape here b if we
come back we only want to predict whether it's one thing or another if you remember back to the slide where we talked about the typical architecture of a classification problem that may give you a hint because we're working with binary classification then we want to compile the model that's step two i nearly said fit the model but we have to compile it after we create it remember step two daniel come on now if we're working on binary classification what might our loss function be what optimizer could we use what metrics could we use and then
once we fit the model okay that seems like we've got that set up we've got x and y well we haven't got a training data set actually have we or a testing data set we might just stick with x and y for now all right so now we've got the steps here one two three four i'm going to issue another challenge we've got x and y see if you can create a model with what we've learned so far refer back to the slide where we talked about the architecture of a classification model and write some
neural network code with tensorflow give it a go and i'll come back in the next video and we'll write a neural network classification model together so the last video we checked out what the input and output shapes of our data may be and we also had a brief overview of what the steps in modeling with tensorflow are so how about we start to implement those so remember the steps in modeling number one or steps in modeling with tensorflow uh typically one create or import a model to compile the model and three fit the model and
then the fourth one is evaluate the model etc oh five is we're usually here we'll tweak six evaluate etc etc tweak evaluate tweak evaluate tweak evaluate but let's focus on these first three we'll get these first three done for our classification problem so how about we start set the random seed tf dot random set seed now what's the simplest model we could start with how about we just start with one hidden layer we'll go create the model using the sequential api so we go model one equals tf keras sequential and then we'll add a single
dense layer wonderful and then what's step two after we create a model we have to compile the model so model one just compile here's where it's going to be different from the regression models that we've built and now before i write the code i want you to refer back to this slide here the typical architecture of a classification model where are we up to well we're compiling a model we have to define the loss function and the optimizer so if we look back at what we're working on what are we working with are we working
with binary classification or multi-class classification come back to our problem what does it look like red or blue dots so it is binary classification so what might our loss function be before i write it in here i'd like you to give it a try have a look at this and see what it might be did you give it a go if not that's all right let's write it down loss equals tf carers losses dot binary is this going to autocomplete for us cross entropy i believe it might be capital e but we'll find out in
a secondal error for us now the optimizer what optimizer do you want to use tf carers optimizes we come back to our typical architecture of a classification model we can use sgd or atom or really there's a there's actually a plethora of optimizers we could use but these two are the most common you'll see in practice let's try we'll start with sgd and the metrics is going to be we haven't looked at classification metrics yet but we'll get very familiar with them actually what would you do if you wanted to know different classification metrics i
put accuracy here that's a bit of a spoiler but let's go how to evaluate a classification model here we go evaluating classification model what does this tell us richie ng oh look at this topics review of model evaluation model evaluation procedures model evaluation metrics okay model evaluation metrics boom classification classification accuracy there are many more metrics and we will discuss them today classification accuracy all right beautiful so then we've got a few more options we could keep going through that if we wanted but we're just going to stick with accuracy for now and accuracy is
just basically out of a hundred examples how many did our model get right so what percentage and now if we go here and we want to well step three fit the model so we have model one dot fit we have x and y and we're going to fit for maybe just five epochs so this is we've got the first three steps here and all we've done is we've just followed this the architecture of a classification model we've just implemented this in tensorflow code so we're ready to run this hopefully there's no errors three two one
oh tf is not defined that would help if we imported tensorflow did you catch that let's write some code here we'll get rid of this we want to import tensorflow can't write has a flow code without tensorflow import tensorflow as tf and check our version by the time you watch this your version might be higher than mine right now i'm running 2.3.0 as long as you've got two point something all the code should work in here and let's see if this runs we got this wrong so what is it tab we want is this going
to auto complete for me so maybe it's just a a little e and i think this is on the end there we go fits nice and quick because we're only working with a thousand samples but what's our accuracy 48 this is percentage by the way because we've set up accuracy here so that's saying that on average out of 100 examples our model only gets 48 right now what does that tell you if we're working with a binary classification problem so we're trying to predict whether something is one thing or the other basically heads or tails
and our model is only 48 accurate it's basically guessing so it's just going up here it's looking we have red or blue and 48 is basically 50 so it's just going red blue red blue red blue red blue red blue and you'd get about 50 correct hmm well that's not very good how about we improve it what are some steps to improve our model i got an idea how about we train it for longer so let's try and improve our model by training for longer so we come here model one dot fit we can just
take it straight away and epochs equals let's do another 200 and this time we'll turn verbose to zero so 200 epoch should be enough for our model to figure out the patterns in our data we've only trained for five let's really step it up model one dot evaluate x and y maybe you're going to catch what i'm doing wrong here we'll reveal that after but if you have a look at what we're doing here fit and evaluate i'm doing a cardinal sin right now but that's all right this is just to exemplify what's going on
let's see what happens so it took a little while longer to fit because we're training for 200 epochs instead of five but we ran a value 8 and now we have loss so the loss is 6935 and the accuracy is 5 or fifty so point five times that by a hundred that's fifty only fifty percent accuracy and we trade for two hundred epochs what is happening okay i know how we can really step things up what if we added another layer and train for longer yeah that's a great idea our model is performing as if
it's guessing right now let's write that down since we're working on a binary classification problem and our model is getting around 50 accuracy it's performing as if it's guessing so let's step things up a notch and add an extra layer yeah that worked in our regression problem it should surely work here so we'll set the random seed and tf.random.set seed and we'll go to step one create a model this time with two layers model two equals tf keras sequential open up the the brackets there square brackets and curly brackets tf cameras layers dense this data
set ain't going to know what hit it binary classification two different types of labels two layers and now what do we do after we create a model we have to compile the model we go model two dot compile what's our loss function tf carers losses because we're dealing with binary classification it's binary cross entropy little uh quiz what would it be if we're dealing with multi-class classification optimizer we're going to keep the same we're going to use sgd stochastic gradient descent and the metrics we're going to set to be accuracy oh this is going to
be so much better fit the model all right model 2 dot fit x y by the way if you already guessed what was what we did wrong up here we fit on the same data that we evaluated on what should we ideally do we should fit on training data and evaluate on testing data but well because we're working with a toy problem we're we're allowed to to uh fudge what we're doing a little bit here and now epochs hmm what should we do maybe a hundred that should be enough we've got two layers i mean
this model should be great ready let's run now number four can be evaluate the model because we set verbose equal to zero we don't have very much output but we should have a trained model too evaluate again evaluating on the same data we trained on not ideal but in our case we're just trying to see if our model is learning anything what the loss is about the same even with an extra layer but the accuracy is still 50. so right now our model is still not even as good as guessing like it's getting 50 accuracy
like you could do that if you just for every sample if you had a thousand samples if you just guessed let's see how many there are let's go here code and circles label which is why value counts so 500 and 500 if we just literally went and tried to guess red blue red blue red blue we'd be getting the same results as our current neural network with two layers this is prepostery what we should do in the next video then is that if we're still our model's still only getting guessing results we need to pull
out a few more bags of tricks to improve improve our model so have a think about refer to our architecture of a classification model and have a think about some of the things using my cursor in an area that might reveal something to you by the way uh have a think about some of the things that we haven't yet included in our model and uh see if you can figure out why our model hasn't learned anything yet but if not i'll see in the next video and we'll see how we can improve our classification model
all right so we've built a few classification models so far namely model 2 and model 1 which is a little bit further up there we go however we've seen that despite adding an extra layer model 2 is still performing very poorly i mean it's getting 50 accuracy on our binary classification problem and since we've got an even amount of samples for each class so if we look here we've got 500 samples of number one and 500 samples of zero now or 500 samples of a blue circle and 500 samples of a red circle now if
you were to toss a coin a thousand times you kind of expect about 500 heads and 500 tails hence why we're our model is essentially guessing so we have to look into our bag of tricks to see how else we can improve in our model so let's look into our bag of tricks to see how we can improve our model now recall back from the previous section where we went through the steps in creating a model or the steps in building a model that is but as you might remember there are steps from within each
of those steps that we can try to improve our model so if we go create a model that's number one two is compiling a model and three is fitting a model what can we try here so when we create a model we might want to add more layers or increase the number of hidden units within a layer we've seen this we actually tried this we didn't increase the number of hidden units for model 2 but we did increase the number of layers so we went from one layer to two layers and that didn't really improve
our model what else can we do compiling a model so here we might want to choose a different optimization function such as atom instead of sgd now what optimization function did we use for model 2 we used sgd so we might actually adam could be the the next thing that we try there and then fitting a model so perhaps we might fit our model for more epochs so leave it training for longer so that it might get a better understanding of what's going on or the patterns in our data now these steps have just come
from if we refer back this is what we've been through so far steps in modeling with tensorflow number one create a model we've created a few models here i want you to think about this this one in particular actually there's a little part that we haven't quite covered yet and it may or may not hold the keys to why our model isn't improving but i'll let you have a think about that before we discuss it number two is compile the model so we define the loss function in other words how wrong our model's predictions are
we define the optimizer which is how our model should update its internal patterns to better its predictions we define the metrics which is a human interpretable values for how well our model is doing we fit the model and we evaluate it but now if we have a look at how we might improve our model from a model's perspective we've got a few different ways adding layers we've tried that increase the number of hidden units we haven't tried that change the activation functions we definitely haven't tried that at least for the hidden layers change the optimization
function we could use sgd or atom or one of the other optimizers in the optimizers package change the learning rate we haven't tried that okay so we've got a fair few things here that we could try let's uh let's start implementing some of these and remember because these are hyper parameters or because these are changeable they're called hyper parameters so if we come back instead of staring at our slide let's write some code how about we increase the number of hidden units yeah and we'll add an extra layer let's do that set the random seed
so tf random dot set seed i want to emphasize as well is that if you're thinking well daniel we don't really have a structure to what we're doing here we're just trying things and seeing if they work i mean you're exactly right in thinking that and that is a lot of what machine learning and deep learning is trying things and seeing if they work hence why our machine learning practitioners motto is i'm just creating a sequential model here by the way and i'm calling it model 3 but our machine learning practitioner's motto is experiment experiment
experiment now we've created our first hidden layer here and we're creating a hundred hidden units in this one so we're stepping it up from model two if i create a new cell oh i've turned that into markdown i don't want that you can turn it from markdown back into code by the way by going command m y or control m y if you're on windows keras layers dense wonderful and we'll have the final layer there so if we check model 2 what we've done here with model 3 is we've added 10 hidden units to the
middle layer and we've added an extra layer this time with a hundred hidden units so i'll put there add 100 dense neurons and this one is add another layer with 10 neurons all right this one should definitely work because we've stepped it up a notch here i mean if our model with two layers didn't work and only one hidden neuron well it definitely should work with it with three layers and over a hundred hidden neurons now compile the model so model three compile what do we have to do for compile we define the loss function
which is how wrong our models predictions are for a binary cross for a binary classification problem we use binary cross entropy and then the optimizer oh this time i think we might also change atom or change to atom from sgd let's let's try that out why not metrics equals accuracy which is a very standard baseline classification metric but may not be the best one depending on your problem we'll see other classification metrics in a future video oh two equals there we don't want that and now we're going to fit the model oh i can't wait
to see the results of this one three layers over 100 hidden units equals let's go for 100 epochs what do we do model 2 for i think we only did five i know we did 100 as well so we'll just do 100 this one verbose equals zero so we're not outputting a lot of information and shift and enter did we get any errors no beautiful all right so let's see how number four is evaluate the model so again not necessarily a great idea to evaluate on the same data set that we've trained on but for
this toy example we'll keep it like that shift and enter what what we're still getting 50 percent accuracy we've pulled out our few tricks we've changed the optimizer to adam we've added an extra layer this time with 100 hidden units and even another one with 10 hidden neurons we're still getting about 50 accuracy so just as good as guessing you know what we're going to have to do we're going to have to visualize what do you think we should visualize in this case if we want to see how our model's performing and the evaluation metrics
are telling us it's not performing pretty well what's another thing that we can visualize how about we visualize the predictions yeah i think that's a good idea so in the next video we'll make some predictions with our trained model and see we'll plot them against where's our graph here yeah let's make some predictions on this graph and we'll plot what our model is predicting how to separate these uh blue and red circles and we'll see what they look like just so we can understand what our model is is trying to or what the pattern our
model is trying to to figure out between these two circles i don't think it's very good so give that a play around maybe you want to try a few more things before we plot those but otherwise i'll see you in the next video in the last few videos we've been building models to try and classify our toy data set here of blue and red dots however they're all performing pretty poorly like i mean as good as guessing even we've pulled out a few tricks to improve our models so what we're going to do in this
video is if we come back to our keynote and we remember our machine learning explorer's motto visualize visualize visualize it's a good idea to visualize these as often as possible we visualize our data we know what it looks like we visualized our model we know the layers and and the different components of that uh we've seen it train we haven't plot a loss curve just yet but we don't need to because we know it's performing poorly so let's test out try visualize some of the predictions that our model is making now how might we do
that we could do it like this we could go model three dot predict x and see what's happening all right we get some values there they're all sitting around ah that's a little inkling they're all sitting around 0.5 but we want them to be closer to 0 and 1. hmm all right so that's a little inkling there but i prefer to see things visually so let's uh get rid of this and to do so what we might do is make a function a plotting function called plot decision boundary where what we want is do you
remember with our regression model we had some data that was like a line and then we plotted our models prediction as a line compared to the actual data why don't we do the same thing but with this circular data plot our model's predictions against the actual data so to do so let's create a function right here to visualize our model's predictions let's create a function i'm going to call it plot decision boundary because the decision boundary is just basically where's our model deciding where the bounds are between red and blue dots and this function is
going to this function will that's a better option so we need to take in a trained model features x and labels y so those are the the parameters and then it'll create i might turn this into markdown actually create a mesh grid if you're not sure what a mesh grid is concept in numpy so numpy mesh grid that's what you can do for anything that you're not not sure just search it up and have a little play around and figure out what it is numpy mesh grid you could read the documentation there or we could
just practice coding it create a mesh grid of the different x values and then we're going to make predictions across the mesh grid and then finally we're going to plot the predictions as well as a line between the different zones where each unique class falls now right now these steps are all just in english they may not make sense so let's start to write code for it we'll go here import numpy as np and then we'll create our function def plot decision boundary model x and y and there we go and then we're going to
put in a little dot string here so we're being nice and pythonic so plots decision boundary created by a model predicting on x beautiful so the first thing we have to do is we've taken in a trained model so there's our model parameter there's our features and there's our labels now let's define the axis boundaries of the plot and create a mesh grid so we want the x minimum and the x maximum is going to be we can index on x to get both of these values so we just want the 0th axis the minimum
and we're going to minus 0.1 to give ourselves a little bit of a margin and the same thing with the maximum so we just want the zeroth axis and then we take the max and then we're going to plus 0.1 to give ourselves a little bit of a margin there and then we can do the same thing for the y min and the y max except we're going to index on x and get the first axis dot min the minimum and the same thing here semicolon one dot max plus one for a little bit of
a margin now if you're not sure what these values are what can you do for any type of function you can just take the code copy and paste it or probably better to write it yourself and then x-min to visualize them this is what we want to look at all the time line by line i'm not going to deconstruct this function line by line but if you wanted to understand any function this is what i do i take the line by line and then i visualize it in a single cell so here's the values these
are our boundaries and now we can create the mesh grid yy equals now again this is coming straight from the documentation here for mesh grid is np dot mesh grid and then in here it takes linspace now how do we figure out what something does we can do command shift space oh we haven't imported numpy yet that's why we can't run this function or can't get the dock string we'll import numpy up there and see what linspace does here we go return evenly spaced numbers over a specified interval okay so if we did 0 to
10 it might return evenly spaced numbers between 0 and 10. beautiful but we want evenly spaced numbers between our x min and x max because this is going to be the first parameter of our mesh grid and then we want the same thing for y min ymax and then we're going to go a hundred so here's the minimum here's the maximum return 100 values evenly spaced between x-min and x-max and create a mesh grid out of this one and this one if you want to see what that looks like again you can copy this and
probably put it in a code cell below otherwise let's push four with the function there we've created our mesh grid of the different x values now it's time to create x values we're going to make predictions on these so x in equals np c now we want to unravel xx by passing it to ravel yy ravel and if you're not sure what c does you can go what can we do numpy dot c user manual translate slice objects to concatenation along the second axis so if we pass it into arrays like this what does it
do they were stacked and now horizontal so that's what's happening here so stack 2d arrays together now we're going to make predictions using our trained model we've passed it in with the model parameter here so we go ypred which is our usual variable name for any type of predictions model.predict x in alrighty and now we're going to add a little bit of functionality here that we can check for multi-class we're working with binary classification but let's check for whether we're working on a multi-class classification problem this will help make our function usable if we did
want to if we had red green and blue circles instead of just red and blue so we can check the length of y pred for the first sample if that's greater than one print doing multi-class classification and then if so we have to reshape our predictions to get them ready now here we've got to go why pred equals np arg max we haven't seen this before we could use tf tf arg max as well y pred axis equals one reshape xx dot shape so we're reshaping it just to the shape of xx up here now
let's make else if they're binary we'll print doing binary classification so this is just the else statement for this little if condition here and then y print equals np round so we're just rounding our prediction y print reshape to xx dot shape now we're finally up to plot the decision boundary again this function has a fair few steps but again if you want to understand it take them out deconstruct it and look at what what's happening line by line that could be some uh homework for you after this after this video why why i want
to contour plot so what is this plot contours nice and succinct definition from that documentation there thank you very much but again we're going to see what this looks like rather than just reading docs we're going to write as much code as we can alpha wonderful and we're going to plot a scatter we need to plot the zeroth axis of x and x's first axis and color this in y s equals 40. now what does the s parameter do where is s there scalar so the marker size in point so this will define how how
big the the size of um the scatter plot points come out and then we're going to plot cm rd yellow blue this is our color and plot x limb we want to set the limitations so this is where xx minimum comes in and the xx max comes in as well as the y limitation we can do the same thing for yy min and yy dot max okay that's probably the biggest function we've written so far i mean it almost doesn't fit on the screen a lot going on here but now that we've got a function
to plot our model's decision boundary i mean the cut-off point between the decisions it's making between red and blue dots let's try it out so we go here we'll run that hopefully there's no errors check out the predictions our model is making so if we want to plot decision boundary we pass it in our trained model which is we'll set model 3 because that's our most recent model x is just x and y is equal to y remember x and y are just our features and labels so again one point here leads to a one
or a zero we'll get rid of that you ready three two one oh would you look at that so hmm this is why our model is performing so poorly it looks like it's trying to draw a straight line through the data now what's wrong with this well we've got circular data so the main issue here is that our data isn't separable by a straight line but if we're working on a regression problem what is a regression problem now in a regression problem our model might actually work because it's drawing a straight line so how about
we test that we might do that in the next video and as for this function if you're wondering daniel like how did you how did you learn how to create all of this well the truth is i've borrowed this and i've adapted it for our use case so if we come here one of the resources i used was cs231n neural networks case study now this is a phenomenal course convolutional neural networks for visual recognition we haven't actually worked with convolutional neural networks yet so if you go through this it might be a bit full-on but
i'd highly recommend this as this is going to be a part of the extra curriculum for this section and the convolutional neural network section so if we come down here i dug into the code behind this material here and i found this so they've got some spiral data here and i've just changed it for our circular data and i also so i'm going to put this link in here this function was inspired by two resources number one is this one and then number two was another phenomenal resource made with ml so if we go to
made with ml github and then into the basics so this is a phenomenal phenomenal repo i highly suggest you start this one i definitely have if you want to learn more about machine learning and deep learning and how to code them there are some amazing notebooks here so i got it from this one here the multi-layer perceptrons we come in this one oh no that's for pytorch we come into tensorflow now there's a function in here somewhere very similar to the one we have boom plot multi-class decision boundary so there we go so i'm going
to copy this resource here number two is there so i'll make sure to link those in the extra curriculum but in the meantime let's uh let's try and adapt our neural network to see if it works even though we're working on classification right now because our neural network is plotting a straight line here or predicting a straight line i wonder if we can adapt it to a regression problem let's check that in the next video in the last video we created a function plot decision boundary to visually inspect our model's predictions and we found that
it's performing so poorly because it's predicting that the decision boundary is a straight line whereas our data is circular so we have a linear decision boundary which linear is a fancy word for straight line and we have non-linear data which is another fancy way of saying non-straight data now if we come back up here i've just uh added a little tidbit in here in retrospect but we've already discussed this it's that remember wherever you see the key it's a important point to note whenever your model is performing strangely or there's something going on with your
data you're not quite sure of remember these three words visualize visualize visualize inspect your data inspect your model inspect your model's predictions and that's what we've done here and we also discussed that since our model is predicting a straight line might we be able to to use it for a regression problem let's have a look we'll write some code let's see if our model can be used for a regression problem now we'll need some regression data now i'm going to set the random seed here because we're going to create some regression data and we'll make
x regression so we don't override our already existing x variable and i'm going to do tf range zero to a thousand and the step can be five and then y regression can be tf range 0 to oh actually let's start from 100. so the function here we're trying to predict is just y equals x plus 10. that's the relationship between x and y here oh sorry plus 100 because y starts from 100 finishes at 1 100. x starts from zero and finishes at a thousand they both have the same step so let's check this out
remember visualize visualize visualize inspect your data become one with the data so there we go starting from 0 stepping up by 5 all the way up to 995 starting from 100 stepping by 5 all the way up to one thousand and ninety five so we can remove those let's split our regression data into training and test sets x reg train for x regression can be let's just make it how many values we've got a thousand so there's 200 total and let's do the first 150 can be for the training data set and the last the
last 50 can be for the test data set so x range test equals x regression we want just the last 150 so 150 onwards that's how we can index that then we'll do the same thing for the y data we could have actually done this for our circles data but we'll move on from that one we'll do that later on i think and then we go there and now let's see so we've got training and test data let's see if we can fit our model fit our model to the regression data wonderful model three dot
fit now before we do this i want you to have a think will this work go back up scroll back up in the notebook not going to give you any hints but just step through each of the steps that we use to create model 3 build a model compiler model fit a model we're doing step 3 right now fit a model will this work or will it error inspect how we created it inspect how we compiled it think about the problem we're working on regression and have a think will this work i'll give you about
three seconds or so to try that out i mean you can pause the video but three seconds for me three two one all right did you figure out why it will or won't work you know what i hope you did i hope you ran the code instead instead of scrolling back up i hope you just tried it because that's what i'm going to do oh no what's happened here input 0 of layer sequential 2 is incompatible with the layer expected axis of negative one of input shape to have two but received input with shape none
one so what's happened we've got a shape issue one of the most common issues in deep learning so what's going on we got train yeah we've created our data sets correctly hmm why hasn't our model worked i wonder so we've compiled our model for a binary classification problem so we come back up here where's model 3 compile the model we've set the loss function to be binary cross entropy but what should it be if we're working on a regression problem it should be something like mae or mse mean squared error or mean absolute error so
let's go back how about we right here um oh wait we compiled our model for a binary classification problem but we're now working on a regression problem let's change the model to suit our data all right so the only thing we're going to change in model 3 is the loss function what i might do is get rid of this code so it doesn't interfere with what we're going to write so if you want to try yourself all we're going to do is just recreate model 3 exactly how it is we're going to change the loss
function instead of being binary cross entropy we're going to change it to mean absolute error so that the loss function is regression specific so give that a try and otherwise i'm going to start writing it now so we've got to set up the random seed so we get reproducible results said seed 42 now step one we're going to create the model we'll just recreate model three so right we can override it tf carers and plus what's the harm in having some more practice writing model code tf carers by the end of this course i want
you to have created over 100 models maybe even more who knows i'm not actually sure i haven't counted maybe someone out there could count how many models we actually build together and then uh let me know because that'd be pretty cool you know what i should have done i should have increased this number so we just knew the whole time what what what number model we were up to too late now and so compile the model now this time with a regression specific loss function beautiful model three dot compile loss equals tf carers losses mae
mae beautiful and then the optimizer can be tf carers we're going to keep the optimizer the same we'll keep it as adam optimizes dot atom remember atom is safe metrics equals we need to adjust the metrics as well because it's a regression problem we'll keep mae there and finally three is fit the model so we have model three we've recreated it dot fit on x reg train and y reg train which is our regression data set and we'll set it up for 100 epochs now you're ready to run three two one ho ho there we
go now what did it start with do we have oh so the mae is close to 250 and by the end we go right down to 37 that's beautiful so we see a reduction in loss and a reduction in mae but to make sure seems like our model is learning something from these training metrics let's uh plot them so just like we've plotted our circular data we'll plot our regression data so make predictions with our trained model y regress equals model 3 dot predict x make it on the test data set oh sorry it's going
to be oh yeah predict x reg test yeah that's what we want and now let's uh plot the model's predictions against our regression data so we can create a plot here we're not going to write as big an intricate function as we did for our classification data because this is uh regression data is just a straight line and we're going to do a scatter plot and on this scatter plot is going to be the training data so x reg train y reg train and the color of the training can be blue we'll label it training
data so we know that it is the training data create another scatter plot this time for the test data set so test features and test labels and we will color this with green and give it the label equals test data and plot scatter x reg test and now let's plot our reg predictions and we're going to have to no i think that should be okay we might have to i wonder what dimension they are oh well we'll check if in doubt run the code wait for the error to pop up for us and then we
can see and then plot legend so that's what we've done we've trained a regression model we adapted model 3 to be suited for our regression data we've made some predictions on the test data set and now we're just plotting it just like we did in the regression section training data test data predictions let's go figure oh we want this to be fig size oh would you look at that okay so the predictions aren't perfect i mean if they were the red line would line up perfectly with the green line but they definitely look better than
complete guessing i mean imagine if the predictions were all over the shop like red dots everywhere that's that would be basically guessing for this regression problem now this means that our model must be learning something however it's still missing something for our classification problem what's the difference here what is the main difference between our data sets give you a hint this one this regression problem is a straight line whereas if we come back we discussed this before our classification data is not a straight line it's non-linear but the decision boundary our model's trying to plot
is linear straight line so hmm that might be the missing piece the thing that we haven't introduced to our models yet we haven't introduced non-linearity and if you haven't heard of that before that's okay because we're going to discuss it in the next video and probably maybe a couple of videos after that so let's write that down the missing piece it's like we're on a treasure hunt here non-linearity all righty get excited because we're going to learn one of the most important concepts in neural networks i'll see you in the next video you
Related Videos
Learn TensorFlow and Deep Learning fundamentals with Python (code-first introduction) Part 2/2
3:57:55
Learn TensorFlow and Deep Learning fundame...
Daniel Bourke
741,104 views
TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial
6:52:08
TensorFlow 2.0 Complete Course - Python Ne...
freeCodeCamp.org
3,212,720 views
How to Start Coding | Programming for Beginners | Learn Coding | Intellipaat
33:08
How to Start Coding | Programming for Begi...
Intellipaat
9,617,195 views
Learn PyTorch for deep learning in a day. Literally.
25:36:58
Learn PyTorch for deep learning in a day. ...
Daniel Bourke
1,761,615 views
PyTorch for Deep Learning - Full Course / Tutorial
9:41:40
PyTorch for Deep Learning - Full Course / ...
freeCodeCamp.org
840,123 views
Harvard CS50’s Artificial Intelligence with Python – Full University Course
11:51:22
Harvard CS50’s Artificial Intelligence wit...
freeCodeCamp.org
3,104,993 views
Python Full Course for free 🐍
12:00:00
Python Full Course for free 🐍
Bro Code
20,366,500 views
Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)
5:40:59
Local Retrieval Augmented Generation (RAG)...
Daniel Bourke
152,348 views
Chatbot using Python, NLP, and Data Science | Build Your Own Chatbot | Intellipaat
1:18:16
Chatbot using Python, NLP, and Data Scienc...
Intellipaat
235,082 views
Machine Learning Beginners To Advanced Course 2023 | ML In 10 Hours | Simplilearn
9:44:47
Machine Learning Beginners To Advanced Cou...
Simplilearn
36,642 views
MIT Introduction to Deep Learning | 6.S191
1:09:58
MIT Introduction to Deep Learning | 6.S191
Alexander Amini
924,679 views
Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)
4:22:13
Data Analysis with Python - Full Course fo...
freeCodeCamp.org
3,589,248 views
Tensorflow Object Detection in 5 Hours with Python | Full Course with 3 Projects
5:25:42
Tensorflow Object Detection in 5 Hours wit...
Nicholas Renotte
1,496,983 views
Python Full Course for Beginners [2025]
2:02:21
Python Full Course for Beginners [2025]
Programming with Mosh
69,399 views
PyTorch for Deep Learning & Machine Learning – Full Course
25:37:26
PyTorch for Deep Learning & Machine Learni...
freeCodeCamp.org
2,244,253 views
The Elegant Math Behind Machine Learning
1:53:12
The Elegant Math Behind Machine Learning
Machine Learning Street Talk
208,951 views
Deep Learning Full Course - Learn Deep Learning in 6 Hours | Deep Learning Tutorial | Edureka
6:02:26
Deep Learning Full Course - Learn Deep Lea...
edureka!
572,173 views
Python Full Course for free 🐍 (2024)
12:00:00
Python Full Course for free 🐍 (2024)
Bro Code
2,915,017 views
Data Analysis with Python Course - Numpy, Pandas, Data Visualization
9:56:23
Data Analysis with Python Course - Numpy, ...
freeCodeCamp.org
2,841,165 views
TensorFlow Full Course 2025 | TensorFlow Tutorial for Beginners | TensorFlow Course| Simplilearn
6:45:54
TensorFlow Full Course 2025 | TensorFlow T...
Simplilearn
5,626 views
Copyright © 2025. Made with ♥ in London by YTScribe.com