what's going on everyone this is jay patil and welcome back to another video on recurrent neural networks now in the previous video we saw how the recurrent neural network looks like we saw why cannot we use the simple neural network for natural language processing task as well as we also saw the different types of recurrent neural networks so if you haven't watched that video you can find that video by clicking on the upper eye button we saw that the model architecture looks something like this where we pass one word at a time and it will
produce one output as well as an activation and this activation will be feed on to the same network in the next time stamp so for example when we pass the second word work we feed the activation from the previous time stem to this timestamp and we'll continue repeating this process until we reach at the end of the sentence now in this video we will look at the mathematical details behind the rnn cell we will see what goes behind this cell that looks like a neural network and again if you are new to this channel consider
subscribing and also hit the bell icon so that you get notified whenever i upload new machine learning video and if you find this video helpful then please do hit the like button share it among your friends so that they can also be benefited from this and let's not wait further and let's get riding with this video now in the last video i told you that we pass one word at a time but computer does not understand words they only understand numbers so we cannot pass this word kelly directly to our model so we need to
convert this word into a number or a token for this what we can do is that we can create a vocabulary of words now this vocabulary will have all the words that are known to us but we can limit our knowledge up to only 10 000 english word for example the first word in the our vocabulary will be a second word will be let's say aaron and the last second word will be let's say zebra and the last word is an unknown token which means that any unknown word that we encounter now let's say the
word kelly appears at us some position three four five six and let's say the second word work appear at some eight nine eight ninth position now once we have a number associated with our words we can feed them to our network but we also need to convert this number into a one hot representation one hot representation is a vector now all the value in these vectors will be zero except at the position three four five six which is which will be one now similarly the one hot vector for the word work will be given by
this where all the values will be zero except the value at position eight nine eight nine which will be one now we will convert the remaining words similarly into a one hot factor form now once we have converted our sentences into one hot representation we can feed them to our network now it's time to see the mathematical details behind one rnn cell now for the artificial neural network we know that the activation for the next layer is calculated by multiplying the weight with the activation of the previous layer adding a bias and passing that to
an activation function now we will do the similar thing for rnn as well now for rnn the equation will be similar only but here we have two inputs one input is the activation from the previous layer and one input is the x which is one word let's say our first word is kelly then we know that we have converted this word into a one hot vector which will be of shape let's say nx comma 1 where nx is the vocabulary size here we had only one input and we multiplied that with the weight matrix but
now as here we have two inputs we will multiply with the two weight matrices and the equation will look something like this so the a from the previous timestamp will be multiplied with wa matrix which is a weight matrix and the input x which is this will be multiplied with the wax matrix and again we will add a bias and pass it to an activation function so the rna cell will look something like this we will multiply 80 minus 1 with waa and x with wax then we will add them both with the bias b
a and pass it to an activation function now this will produce an output a t and if you know we also had the by output coming out from here as well so the equation for the output y will be given by this where this a t which is the activation in this timestamp will be multiplied with another weight matrix w y a and added with a bias b y and passing it to an activation function which can be either sigmoid or soft max and it will look something like this so that's it we only have
these two equations for our one rnn cell and this is the more clearer representation of the same so this is our one rnn cell and at every time stem we will pass words from here which will compute some values for the activations and it will produce some output as well so on a broader picture it looks something like this where we will be passing words here and it will compute these activations which will be again passed to the same block in the next timestamp also regarding the naming convention the name w a a w a
x w y a are given such that this later character represent what quantity we are multiplying it with so for example this w a a is multiplied with a that's why you have a here and this w a x is multiplied with x which that's why we have x here and again this w y a is multiplied with a and that's why we have a here and the first character represent what it is producing for example this w a and w a x are producing a while w y a is producing y we also saw
that for some applications like for many to one applications we do not have this output at every time stem but we only have the output at the final word after the final word so the architecture will look something like this let's say we are making a movie a rating system then we will pass the words here and at every time stamp it will feed the activation to the next timestamp but it will not compute any calculation for the output here but it will only calculate the output when it reaches the end of the world now
there are other interesting things that we can know about this and that is that instead of just having one cell in a recurrent neural network we can keep two such cells so for example we will pass the words here it will compute the activation a1 and this activation a1 will again be pass to the next cell in the same time stem which will then compute the output y so here at every time stem we are having two cells where here we have activation a1 which is passed to this cells and we have a2 which will
be passed to these upper cells so i hope you understood how the recurrent neural network model looks like in detail now in the next video we will look at the back propagation we will get an intuitive idea about how the back propagation look like in a recurrent neural network so if you found this helpful hit the like button share it among your friends and i hope to see you in the next video